Evifa-Portal

1

Online Resource

Apache Spark graph processing : build, process, and analyze large-scale graphs with Spark (2015)

Ramamonjison, Rindra [VerfasserIn] ; Lee, Denny [MitwirkendeR]

Birmingham, UK : Packt Publishing

add to mindlist on the mindlist

Details

ISBN: 9781784398958 , 1784398950

Language: English

Pages: 1 online resource (1 volume) , illustrations.

Series Statement: Community experience distilled

Keywords: Spark (Electronic resource : Apache Software Foundation) ; Graphic methods ; Computer programs ; Electronic data processing ; Electronic books ; Electronic books ; local

Abstract: Build, process and analyze large-scale graph data effectively with Spark About This Book Find solutions for every stage of data processing from loading and transforming graph data to Improve the scalability of your graphs with a variety of real-world applications with complete Scala code. A concise guide to processing large-scale networks with Apache Spark. Who This Book Is For This book is for data scientists and big data developers who want to learn the processing and analyzing graph datasets at scale. Basic programming experience with Scala is assumed. Basic knowledge of Spark is assumed. What You Will Learn Write, build and deploy Spark applications with the Scala Build Tool. Build and analyze large-scale network datasets Analyze and transform graphs using RDD and graph-specific operations Implement new custom graph operations tailored to specific needs. Develop iterative and efficient graph algorithms using message aggregation and Pregel abstraction Extract subgraphs and use it to discover common clusters Analyze graph data and solve various data science problems using real-world datasets. In Detail Apache Spark is the next standard of open-source cluster-computing engine for processing big data. Many practical computing problems concern large graphs, like the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. Apache Spark GraphX API combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework. This book will teach the user to do graphical programming in Apache Spark, apart from an explanation of the entire process of graphical data analysis. You will journey through the creation of graphs, its uses, its exploration and analysis and finally will also cover the conversion of graph elements into graph structures. This book begins with an introduction of the Spark system, its libraries and the Scala Build Tool. Using a hands-on approach, this book will quickly teach you how to install and leverage Spark interactively on the command line and in a standalone Scala program. Then, it presents all the methods for building Spark graphs using illustrative network datasets. Next, it will walk you through the process of exploring, visualizing and analyzing different network characteristics. This book will also teach you how to transform...

Note: Includes bibliographical references and index. - Description based on online resource; title from cover page (Safari, viewed September 28, 2015)

URL: https://learning.oreilly.com/library/view/-/9781784391805/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

2

Online Resource

PySpark cookbook : over 60 recipes for implementing big data processing and analytics using Apache Spark and Python (2018)

Lee, Denny [VerfasserIn] ; Drabas, Tomasz [MitwirkendeR]

Birmingham, UK : Packt Publishing

add to mindlist on the mindlist

Details

ISBN: 9781788834254 , 1788834259

Language: English

Pages: 1 online resource (1 volume) , illustrations

Keywords: Application software ; Development ; Python (Computer program language) ; SPARK (Computer program language) ; Electronic books ; Electronic books ; local

Abstract: Combine the power of Apache Spark and Python to build effective big data applications About This Book Perform effective data processing, machine learning, and analytics using PySpark Overcome challenges in developing and deploying Spark solutions using Python Explore recipes for efficiently combining Python and Apache Spark to process data Who This Book Is For The PySpark Cookbook is for you if you are a Python developer looking for hands-on recipes for using the Apache Spark 2.x ecosystem in the best possible way. A thorough understanding of Python (and some familiarity with Spark) will help you get the best out of the book. What You Will Learn Configure a local instance of PySpark in a virtual environment Install and configure Jupyter in local and multi-node environments Create DataFrames from JSON and a dictionary using pyspark.sql Explore regression and clustering models available in the ML module Use DataFrames to transform data used for modeling Connect to PubNub and perform aggregations on streams In Detail Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. The PySpark Cookbook presents effective and time-saving recipes for leveraging the power of Python and putting it to use in the Spark ecosystem. You'll start by learning the Apache Spark architecture and how to set up a Python environment for Spark. You'll then get familiar with the modules available in PySpark and start using them effortlessly. In addition to this, you'll discover how to abstract data with RDDs and DataFrames, and understand the streaming capabilities of PySpark. You'll then move on to using ML and MLlib in order to solve any problems related to the machine learning capabilities of PySpark and use GraphFrames to solve graph-processing problems. Finally, you will explore how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will be able to use the Python API for Apache Spark to solve any problems associated with building data-intensive applications. Style and approach This book is a rich collection of recipes that will come in handy when you are working with PySpark Addressing your common and not-so-common pain points, this is a book that you must have on the shelf.

Note: Description based on online resource; title from title page (Safari, viewed July 30, 2018)

URL: https://learning.oreilly.com/library/view/-/9781788835367/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

3

Online Resource

Introduction to Apache Spark 2.0 : a primer on Spark 2.0 fundamentals and architecture (2017)

Lee, Denny [VerfasserIn]

[Place of publication not identified] : O'Reilly Media

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (1 streaming video file (53 min., 27 sec.)) , digital, sound, color

Keywords: Spark (Electronic resource : Apache Software Foundation) ; Machine learning ; Data mining ; Electronic videos ; local

Abstract: "This video series highlights what's new in Apache 2.0 and reviews its core concepts. The course starts with a high-level overview of Spark's components and then dives into Spark 2.0's three main themes: simplicity, speed, and intelligence. The simplicity section describes how Spark 2.0 unifies the Spark APIs and Spark session, and how Spark 2.0 simplifies machine learning via ML pipelines. The speed section illustrates how Spark 2.0 improves Spark performance with the push toward whole-stage code generation. And the intelligence section provides a quick primer on Spark Streaming and an introduction to the concepts of Structured Streaming. The course is designed for data scientists and data engineers with some basic experience using machine learning tools such as Python scikit-learn."--Resource description page.

Note: Title from title screen (viewed July 26, 2017). - Date of publication from resource description page

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

4

Online Resource

Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 (2017)

Drabas, Tomasz [VerfasserIn] ; Lee, Denny [MitwirkendeR] ; Karau, Holden [MitwirkendeR]

Birmingham, UK : Packt Publishing

add to mindlist on the mindlist

Details

ISBN: 9781786466259 , 1786466252

Language: English

Pages: 1 online resource (1 volume) , illustrations, maps

Keywords: Application software ; Development ; Python (Computer program language) ; SPARK (Computer program language) ; Electronic books ; Electronic books ; local

Abstract: Build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 About This Book Learn why and how you can efficiently use Python to process data and build machine learning models in Apache Spark 2.0 Develop and deploy efficient, scalable real-time Spark solutions Take your understanding of using Spark with Python to the next level with this jump start guide Who This Book Is For If you are a Python developer who wants to learn about the Apache Spark 2.0 ecosystem, this book is for you. A firm understanding of Python is expected to get the best out of the book. Familiarity with Spark would be useful, but is not mandatory. What You Will Learn Learn about Apache Spark and the Spark 2.0 architecture Build and interact with Spark DataFrames using Spark SQL Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively Read, transform, and understand data and use it to train machine learning models Build machine learning models with MLlib and ML Learn how to submit your applications programmatically using spark-submit Deploy locally built applications to a cluster In Detail Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. This book will show you how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Spark 2.0 architecture and how to set up a Python environment for Spark. You will get familiar with the modules available in PySpark. You will learn how to abstract data with RDDs and DataFrames and understand the streaming capabilities of PySpark. Also, you will get a thorough overview of machine learning capabilities of PySpark using ML and MLlib, graph processing using GraphFrames, and polyglot persistence using Blaze. Finally, you will learn how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will have established a firm understanding of the Spark Python API and how it can be used to build data-intensive applications. Style and approach This book takes a very comprehensive, step-by-step approach so you understand how the Spark ecosystem can be used with Python to develop efficient, scalable solutions. Every chapter is standalone and written in a very easy-to-understand manner, with a focus on both the hows and the whys of each concept. Downloading the...

Note: Includes index. - Description based on online resource; title from title page (viewed March 17, 2017)

URL: https://learning.oreilly.com/library/view/-/9781786463708/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

5

Online Resource

Learning Spark, 2nd Edition (2020)

Damji, Jules [VerfasserIn] ; Lee, Denny [MitwirkendeR] ; Wenig, Brooke [MitwirkendeR] ; [et al.]

[Erscheinungsort nicht ermittelbar] : O'Reilly Media, Inc. | Boston, MA : Safari

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (300 pages)

Edition: 2nd edition

Keywords: Electronic books ; local

Abstract: Data is getting bigger, arriving faster, and coming in varied formats—and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark. Updated to emphasize new features in Spark 2.x., this second edition shows data engineers and scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine-learning algorithms. Through discourse, code snippets, and notebooks, you’ll be able to: Learn Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets Peek under the hood of the Spark SQL engine to understand Spark transformations and performance Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow Use open source Pandas framework Koalas and Spark for data transformation and feature engineering

Note: Online resource; Title from title page (viewed June 25, 2020)

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

6

Online Resource

Professional Microsoft® PowerPivot for Excel® and SharePoint® (2010)

Harinath, Sivakumar [VerfasserIn] ; Pihlgren, Ron [VerfasserIn] ; Lee, Denny [VerfasserIn]

[Erscheinungsort nicht ermittelbar] : Wrox | Boston, MA : Safari

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (558 pages)

Edition: 1st edition

Keywords: Electronic books ; local

Abstract: With PowerPivot, Microsoft brings the power of Microsoft's business intelligence tools to Excel and SharePoint users. Self-service business intelligence today augments traditional BI methods, allowing faster response time and greater flexibility. If you're a business decision-maker who uses Microsoft Office or an IT professional responsible for deploying and managing your organization's business intelligence systems, this guide will help you make the most of PowerPivot. Professional Microsoft PowerPivot for Excel and SharePoint describes all aspects of PowerPivot and shows you how to use each of its major features. By the time you are finished with this book, you will be well on your way to becoming a PowerPivot expert. This book is for people who want to learn about PowerPivot from end to end. You should have some rudimentary knowledge of databases and data analysis. Familiarity with Microsoft Excel and Microsoft SharePoint is helpful, since PowerPivot builds on those two products. This book covers the first version of PowerPivot, which ships with SQL Server 2008 R2 and enhances Microsoft Office 2010. It provides an overview of PowerPivot and a detailed look its two components: PowerPivot for Excel and PowerPivot for SharePoint. It explains the technologies that make up these two components, and gives some insight into why these components were implemented the way they were. Through an extended example, it shows how to build a PowerPivot application from end to end. The companion Web site includes all the sample applications and reports discussed. What This Book Covers After discussing self-service BI and the motivation for creating PowerPivot, the book presents a quick, end-to-end tutorial showing how to create and publish a simple PowerPivot application. It then drilsl into the features of PowerPivot for Excel in detail and, in the process, builds a more complex PowerPivot application based on a real-world case study. Finally, it discusses the server side of PowerPivot (PowerPivot for SharePoint) and provides detailed information about its installation and maintenance. Chapter 1, "Self-Service Business Intelligence and Microsoft PowerPivot, " begins Part I of the book. This chapter describes self-service BI and introduces PowerPivot, Microsoft's first self-service BI tool. It provides a high-level look at the two components that make up PowerPivot - PowerPivot for Excel and PowerPivot for SharePoint. Chapter 2, "A First Look at PowerPivot, ...

Note: Online resource; Title from title page (viewed June 15, 2010) , Mode of access: World Wide Web.

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

7

Online Resource

Spark kuai su da shu ju fen xi : (di 2 ban) = Learnig Spark : second edition (2021)

Damji, Jules S. [VerfasserIn] ; Wenig, Brooke [VerfasserIn] ; Das, Tathagata [VerfasserIn] ; [et al.]

Beijing ; : Ren min you dian chu ban she

add to mindlist on the mindlist

Details

Orig.schr. Ausgabe: 第1版.

Title: Spark快速大数据分析 : : (第2版) = Learnig Spark : second edition /

Publisher: O'Reilly Media ;

ISBN: 9787115576019 , 7115576017

Language: Chinese

Pages: 1 online resource , illustrations.

Edition: Di 1 ban.

Series Statement: Tu ling cheng xu she ji cong shu

Uniform Title: Learning Spark

DDC: 006.3/12

Keywords: Spark (Electronic resource : Apache Software Foundation) ; Big data ; Data mining Computer programs ; Machine learning ; Electronic books

Abstract: Detailed summary in vernacular field,

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

8

Online Resource

Nyūmon PySpark : Python to Jupyter de katsuyōsuru Spark2 ekoshisutemu (2017)

Drabas, Tomasz [VerfasserIn] ; Lee, Denny Guang-Yeu [VerfasserIn] ; Tamagawa, Ryūji [ÜbersetzerIn]

Tōkyō-to Shinjuku-ku : Orairī Japan

add to mindlist on the mindlist

Details

Orig.schr. Ausgabe: 初版.

Title: 入門PySpark : : PythonとJupyterで活用するSpark2エコシステム /

Publisher: オライリー・ジャパン,

ISBN: 9784873118185 , 4873118182

Language: Japanese

Pages: 1 online resource (328 pages)

Edition: Shohan.

Uniform Title: Learning PySpark

DDC: 005.1

Keywords: Application software Development ; Python (Computer program language) ; SPARK (Computer program language) ; Application software ; Development ; Python (Computer program language) ; SPARK (Computer program language)

Note: Includes bibiographical references (page 295) and index , In Japanese.

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.