Evifa-Portal

1

Online Resource

Scaling Python with Dask : from data science to machine learning (2023)

Karau, Holden [VerfasserIn] ; Kimmins, Mika [VerfasserIn]

Sebastopol, CA : O'Reilly Media

add to mindlist on the mindlist

Details

ISBN: 9781098119843 , 1098119843 , 9781098119836 , 1098119835

Language: English

Pages: 1 online resource

Edition: First edition.

Parallel Title: Erscheint auch als

DDC: 005.13/3

Keywords: Python (Computer program language) ; Cloud computing

Abstract: Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source library for parallel computing provides APIs that make it easy to parallelize PyData libraries including NumPy, pandas, and scikit-learn. Authors Holden Karau and Mika Kimmins show you how to use Dask computations in local systems and then scale to the cloud for heavier workloads. This practical book explains why Dask is popular among industry experts and academics and is used by organizations that include Walmart, Capital One, Harvard Medical School, and NASA. With this book, you'll learn: What Dask is, where you can use it, and how it compares with other tools How to use Dask for batch data parallel processing Key distributed system concepts for working with Dask Methods for using Dask with higher-level APIs and building blocks How to work with integrated libraries such as scikit-learn, pandas, and PyTorch How to use Dask with GPUs.

Note: Includes bibliographical references and index. - Description based on online resource; title from digital title page (viewed on August 18, 2023)

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

2

Online Resource

Kubeflow for Machine Learning (2020)

Karau, Holden [VerfasserIn] ; Grant, Trevor [VerfasserIn] ; Filonenko, Ilan [VerfasserIn] ; [et al.]

[Erscheinungsort nicht ermittelbar] : O'Reilly Media, Inc. | Boston, MA : Safari

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (130 pages)

Edition: 1st edition

Keywords: Electronic books ; local

Abstract: If you're training a machine learning model but aren't sure how to put it into production, this book will get you there. Kubeflow provides a collection of cloud native tools for different stages of a model's lifecycle, from data exploration, feature preparation, and model training to model serving. This guide helps data scientists build production-grade machine learning implementations with Kubeflow and shows data engineers how to make models scalable and reliable. Using examples throughout the book, authors Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, and Boris Lublinsky explain how to use Kubeflow to train and serve your machine learning models on top of Kubernetes in the cloud or in a development environment on-premises. Understand Kubeflow's design, core components, and the problems it solves Learn how to set up Kubeflow on a cloud provider or on an in-house cluster Train models using Kubeflow with popular tools including scikit-learn, TensorFlow, and Apache Spark Learn how to add custom stages such as serving and prediction Keep your model up-to-date with Kubeflow Pipelines Understand how to validate machine learning pipelines

Note: Online resource; Title from title page (viewed November 25, 2020) , Mode of access: World Wide Web.

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

3

Online Resource

Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 (2017)

Drabas, Tomasz [VerfasserIn] ; Lee, Denny [MitwirkendeR] ; Karau, Holden [MitwirkendeR]

Birmingham, UK : Packt Publishing

add to mindlist on the mindlist

Details

ISBN: 9781786466259 , 1786466252

Language: English

Pages: 1 online resource (1 volume) , illustrations, maps

Keywords: Application software ; Development ; Python (Computer program language) ; SPARK (Computer program language) ; Electronic books ; Electronic books ; local

Abstract: Build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 About This Book Learn why and how you can efficiently use Python to process data and build machine learning models in Apache Spark 2.0 Develop and deploy efficient, scalable real-time Spark solutions Take your understanding of using Spark with Python to the next level with this jump start guide Who This Book Is For If you are a Python developer who wants to learn about the Apache Spark 2.0 ecosystem, this book is for you. A firm understanding of Python is expected to get the best out of the book. Familiarity with Spark would be useful, but is not mandatory. What You Will Learn Learn about Apache Spark and the Spark 2.0 architecture Build and interact with Spark DataFrames using Spark SQL Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively Read, transform, and understand data and use it to train machine learning models Build machine learning models with MLlib and ML Learn how to submit your applications programmatically using spark-submit Deploy locally built applications to a cluster In Detail Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. This book will show you how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Spark 2.0 architecture and how to set up a Python environment for Spark. You will get familiar with the modules available in PySpark. You will learn how to abstract data with RDDs and DataFrames and understand the streaming capabilities of PySpark. Also, you will get a thorough overview of machine learning capabilities of PySpark using ML and MLlib, graph processing using GraphFrames, and polyglot persistence using Blaze. Finally, you will learn how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will have established a firm understanding of the Spark Python API and how it can be used to build data-intensive applications. Style and approach This book takes a very comprehensive, step-by-step approach so you understand how the Spark ecosystem can be used with Python to develop efficient, scalable solutions. Every chapter is standalone and written in a very easy-to-understand manner, with a focus on both the hows and the whys of each concept. Downloading the...

Note: Includes index. - Description based on online resource; title from title page (viewed March 17, 2017)

URL: https://learning.oreilly.com/library/view/-/9781786463708/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

4

Online Resource

Learning Spark : lightening fast data analysis (2015)

Karau, Holden [VerfasserIn]

Sebastopol, CA : O'Reilly Media

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (1 volume) , illustrations

Edition: First edition.

Keywords: ApacheSpark ; Big data ; Machine learning ; Electronic books ; Electronic books ; local

Abstract: Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3 , this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Note: Includes index. - Description based on online resource; title from cover page (Safari, viewed February 10, 2015)

URL: https://learning.oreilly.com/library/view/-/9781449359034/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

5

Online Resource

Debugging Spark : how to spot and squash common Spark bugs (2018)

Karau, Holden [VerfasserIn]

[Place of publication not identified] : O'Reilly Media

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (1 streaming video file (2 hr., 26 min., 8 sec.)) , digital, sound, color

Keywords: Spark (Electronic resource : Apache Software Foundation) ; Debugging in computer science ; Computer programs ; Testing ; Electronic videos ; local

Abstract: "Apache Spark is an extremely powerful general purpose distributed system that also happens to be extremely difficult to debug. This video, designed for intermediate-level Spark developers and data scientists, looks at some of the most common (and baffling) ways Spark can explode (e.g., out of memory exceptions, unbalanced partitioning, strange serialization errors, debugging errors inside your own code, etc. ) and then provides a set of remedies for keeping those blow-ups under control. You'll pick up techniques for improving your own logging (and reducing your dependence on Spark's verbose logs); learn how to deal with fuzzy data; discover how to connect and use a debugger in a distributed environment; and gain the ability to know which Spark error messages are actually relevant."--Resource description page.

Note: Title from title screen (viewed January 30, 2019)

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

6

Online Resource

Spark學習手冊 (2016)

Karau, Holden [VerfasserIn] ; Konwinski, Andy [VerfasserIn] ; We, Patrick [VerfasserIn]

[Erscheinungsort nicht ermittelbar] : GoTop Information, Inc. | Boston, MA : Safari

add to mindlist on the mindlist

Details

ISBN: 9789864760466

Language: English , Chinese

Pages: 1 online resource (288 pages)

Edition: 1st edition

Keywords: Electronic books ; local ; Electronic books

Abstract: 現今無論在任何領域，資料都逐漸地變大，你該如何有效率的對他們進行處理?此書介紹了Apache Spark，一個開放原始碼叢集運算系統。它使資料可以更快速的寫入以及執行。藉由Spark，你可以透過簡易的Python,Java,或是Scala的API進行快速的大量資料處理。此書為Spark的開發者親自撰寫，可以讓資料科學家以及工程師立即應用書中的知識展開工作。讀者將學會如何透過數行的程式碼進行一個平行處理的工作。本書涵蓋了基礎的批次工作到串流處理以及機器學習等相關的應用。 ‧讓讀者可以快速地了解Spark，例如：分散式資料集、記憶體快取、以及交互式介面等。 ‧利用Spark內建的強大資料庫，包含Spark SQL、Spark Streaming以及MLib。 ‧使用一個Spark程式框架即可取代混合多種工具如Hive、Hadoop、Mahout以及Strom。 ‧學習透過Spark進行交互式、批次、以及串流式的應用程式開發 ‧與諸多資料來源進行連接，包含HDFS、HIVE、JSON以及S3 ‧學習資料切割以及共享變數等進階議題 “本書是我在大數據處理應用程式指南推薦書單中的第一名” --Ben. Lorica資料科學家主席, O’Reilly Media

Note: Online resource; Title from title page (viewed September 1, 2016) , Mode of access: World Wide Web.

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

7

Online Resource

初めてのSpark (2015)

Karau, Holden [VerfasserIn] ; Konwinski, Andy [VerfasserIn] ; Wendell, Patrick [VerfasserIn] ; [et al.]

[Erscheinungsort nicht ermittelbar] : O'Reilly Japan, Inc. | Boston, MA : Safari

add to mindlist on the mindlist

Details

ISBN: 9784873117348

Language: English , Japanese

Pages: 1 online resource (312 pages)

Edition: 1st edition

Keywords: Electronic books ; local

Abstract: Sparkの概要、RDDを使ったプログラミング、キー／値ペアの処理など基礎的な説明から、Sparkの高度なプログラミング、クラスタ上での本格的な利用まで解説した、Sparkの総合的な入門書です。日本語版の内容にはバージョン1.3/1.4での機能強化も取り入れ、土橋昌氏による「原書発行以降の変更点」、猿田浩輔氏による「Spark SQLについて本編の補足」、堀越保徳氏と濱口智大氏による「Spark/MapReduceの機械学習ライブラリ比較検証」を収録。全編にわたりCloudera株式会社エンジニアチームによるテクニカルレビューを実施。

Note: Online resource; Title from title page (viewed August 21, 2015) , Mode of access: World Wide Web.

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

8

Online Resource

Scaling Python with Ray (2023)

Karau, Holden [VerfasserIn] ; Lublinsky, Boris [VerfasserIn]

[Erscheinungsort nicht ermittelbar] : O'Reilly Media, Inc. | Boston, MA : Safari

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (38 pages)

Edition: 1st edition

Keywords: Electronic books

Abstract: Serverless computing enables developers to concentrate solely on their applications rather than worry about where they've been deployed. With the Ray general-purpose serverless implementation in Python, programmers and data scientists can hide servers, implement stateful applications, support direct communication between tasks, and access hardware accelerators. In this book, authors Holden Karau and Boris Lublinsky show you how to scale existing Python applications and pipelines, allowing you to stay in the Python ecosystem while avoiding single points of failure and manual scheduling. If your data processing has grown beyond what a single computer can handle, this book is for you. Written by experienced software architecture practitioners, Scaling Python with Ray is ideal for software architects and developers eager to explore successful case studies and learn more about decision and measurement effectiveness. This book covers distributed processing (the pure Python implementation of serverless) and shows you how to: Implement stateful applications with Ray actors Build workflow management in Ray Use Ray as a unified platform for batch and streaming Implement advanced data processing with Ray Apply microservices with Ray platform Implement reliable Ray applications

Note: Online resource; Title from title page (viewed June 25, 2023) , Mode of access: World Wide Web.

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

9

Online Resource

High performance Spark : best practices for scaling and optimizing Apache Spark (2017)

Karau, Holden [VerfasserIn] ; Warren, Rachel [MitwirkendeR]

Sebastopol, CA : O'Reilly Media

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (1 volume) , illustrations

Edition: First edition.

Keywords: Spark (Electronic resource : Apache Software Foundation) ; Big data ; Data mining ; Computer programs ; Electronic books ; Electronic books ; local

Abstract: Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you'll also learn how to make it sing. With this book, you'll explore: How Spark SQL's new interfaces improve performance over SQL's RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark's key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark's Streaming components and external community packages

Note: Includes index. - Description based on online resource; title from title page (Safari, viewed June 12, 2017)

URL: https://learning.oreilly.com/library/view/-/9781491943199/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

10

Online Resource

Gao xing neng Spark = : High performance Spark / = High performance Spark (2022)

Karau, Holden [VerfasserIn] ; Warren, Rachel [VerfasserIn] ; Xia, Rui [ÜbersetzerIn]

Beijing : Zhongguo dian li chu ban she = China Electric Power Press Ltd.

add to mindlist on the mindlist

Details

Orig.schr. Ausgabe: 第一版.

Title: 高性能 Spark = : High performance Spark /

Publisher: 北京 : 中国电力出版社 = China Electric Power Press Ltd.

ISBN: 9787519863531 , 7519863530

Language: Chinese

Pages: 1 online resource (371 pages) , illustrations

Edition: Di yi ban.

Uniform Title: High performance Spark

DDC: 006.3/12

Keywords: Spark (Electronic resource : Apache Software Foundation) ; Big data ; Data mining Computer programs ; Données volumineuses ; Exploration de données (Informatique) ; Logiciels

Abstract: Detailed summary in vernacular field.

Note: 880-04;O'Reilly Media, Inc. shou quan chu ban

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.