Evifa-Portal

Hits per page

hits 1 - 3 | 3 hits

Sorting

Online Resource

Apache Spark quick start guide : quickly learn the art of writing efficient big data applications with Apache Spark (2019)

Mehrotra, Shrey [VerfasserIn] ; Grade, Akash [MitwirkendeR]

Birmingham, UK : Packt Publishing

add to mindlist on the mindlist

Details

ISBN: 9781789342666 , 178934266X

Language: English

Pages: 1 online resource (1 volume) , illustrations

Keywords: Spark (Electronic resource : Apache Software Foundation) ; Electronic data processing ; Distributed processing ; Management ; Big data ; Machine learning ; Electronic books ; Electronic books ; local

Abstract: A practical guide for solving complex data processing challenges by applying the best optimizations techniques in Apache Spark. Key Features Learn about the core concepts and the latest developments in Apache Spark Master writing efficient big data applications with Spark's built-in modules for SQL, Streaming, Machine Learning and Graph analysis Get introduced to a variety of optimizations based on the actual experience Book Description Apache Spark is a flexible framework that allows processing of batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to get started with Apache Spark 2.0 and write big data applications for a variety of use cases. It will also introduce you to Apache Spark ? one of the most popular Big Data processing frameworks. Although this book is intended to help you get started with Apache Spark, but it also focuses on explaining the core concepts. This practical guide provides a quick start to the Spark 2.0 architecture and its components. It teaches you how to set up Spark on your local machine. As we move ahead, you will be introduced to resilient distributed datasets (RDDs) and DataFrame APIs, and their corresponding transformations and actions. Then, we move on to the life cycle of a Spark application and learn about the techniques used to debug slow-running applications. You will also go through Spark's built-in modules for SQL, streaming, machine learning, and graph analysis. Finally, the book will lay out the best practices and optimization techniques that are key for writing efficient Spark applications. By the end of this book, you will have a sound fundamental understanding of the Apache Spark framework and you will be able to write and optimize Spark applications. What you will learn Learn core concepts such as RDDs, DataFrames, transformations, and more Set up a Spark development environment Choose the right APIs for your applications Understand Spark's architecture and the execution flow of a Spark application Explore built-in modules for SQL, streaming, ML, and graph analysis Optimize your Spark job for better performance Who this book is for If you are a big data enthusiast and love processing huge amount of data, this book is for you. If you are data engineer and looking for the best optimization techniques for your Spark applications, then you will find this book helpful. This book also helps data scientists who want to implement their mach...

Note: Includes bibliographical references. - Description based on online resource; title from title page (Safari, viewed March 25, 2019)

URL: https://learning.oreilly.com/library/view/-/9781789349108/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

Online Resource

Learning YARN : moving beyond MapReduce--learn resource management and big data processing using YARN (2015)

Arora, Akhil [VerfasserIn] ; Mehrotra, Shrey [MitwirkendeR]

Birmingham, UK : Packt Publishing

add to mindlist on the mindlist

Details

ISBN: 9781784394585 , 1784394580

Language: English

Pages: 1 online resource (1 volume) , illustrations.

Series Statement: Community experience distilled

Keywords: Apache Hadoop ; Electronic data processing ; Distributed processing ; Electronic books ; Electronic books ; local

Abstract: Moving beyond MapReduce - learn resource management and big data processing using YARN About This Book Deep dive into YARN components, schedulers, life cycle management and security architecture Create your own Hadoop-YARN applications and integrate big data technologies with YARN Step-by-step guide to provision, manage, and monitor Hadoop-YARN clusters with ease Who This Book Is For This book is intended for those who want to understand what YARN is and how to efficiently use it for the resource management of large clusters. For cluster administrators, this book gives a detailed explanation of provisioning and managing YARN clusters. If you are a Java developer or an open source contributor, this book will help you to drill down the YARN architecture, write your own YARN applications and understand the application execution phases. This book will also help big data engineers explore YARN integration with real-time analytics technologies such as Spark and Storm. What You Will Learn Explore YARN features and offerings Manage big data clusters efficiently using the YARN framework Create single as well as multi-node Hadoop-YARN clusters on Linux machines Understand YARN components and their administration Gain insights into application execution flow over a YARN cluster Write your own distributed application and execute it over YARN cluster Work with schedulers and queues for efficient scheduling of applications Integrate big data projects like Spark and Storm with YARN In Detail Today enterprises generate huge volumes of data. In order to provide effective services and to make smarter and more intelligent decisions from these huge volumes of data, enterprises use big-data analytics. In recent years, Hadoop has been used for massive data storage and efficient distributed processing of data. The Yet Another Resource Negotiator (YARN) framework solves the design problems related to resource management faced by the Hadoop 1.x framework by providing a more scalable, efficient, flexible, and highly available resource management framework for distributed data processing. This book starts with an overview of the YARN features and explains how YARN provides a business solution for growing big data needs. You will learn to provision and manage single, as well as multi-node, Hadoop-YARN clusters in the easiest way. You will walk through the YARN administration, life cycle management, application execution, REST APIs, schedulers, security framework and so o...

Note: Includes index. - Description based on online resource; title from cover (Safari, viewed September 20, 2015)

URL: https://learning.oreilly.com/library/view/-/9781784393960/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

Online Resource

Apache Hive cookbook : easy, hands-on recipes to help you understand Hive and its integration with frameworks that are used widely in today's big data world (2016)

Bansal, Hanish [VerfasserIn] ; Mehrotra, Shrey [MitwirkendeR] ; Chauhan, Saurabh [MitwirkendeR]

Birmingham, UK : Packt Publishing

add to mindlist on the mindlist

Details

ISBN: 9781782161097 , 1782161090

Language: English

Pages: 1 online resource (1 volume) , illustrations

Series Statement: Quick answers to common problems

Keywords: Apache (Computer file : Apache Group) ; Apache Hadoop ; Database management ; Electronic books ; Electronic books ; local

Abstract: Easy, hands-on recipes to help you understand Hive and its integration with frameworks that are used widely in today's big data world About This Book Grasp a complete reference of different Hive topics. Get to know the latest recipes in development in Hive including CRUD operations Understand Hive internals and integration of Hive with different frameworks used in today's world. Who This Book Is For The book is intended for those who want to start in Hive or who have basic understanding of Hive framework. Prior knowledge of basic SQL command is also required What You Will Learn Learn different features and offering on the latest Hive Understand the working and structure of the Hive internals Get an insight on the latest development in Hive framework Grasp the concepts of Hive Data Model Master the key concepts like Partition, Buckets and Statistics Know how to integrate Hive with other frameworks such as Spark, Accumulo, etc In Detail Hive was developed by Facebook and later open sourced in Apache community. Hive provides SQL like interface to run queries on Big Data frameworks. Hive provides SQL like syntax also called as HiveQL that includes all SQL capabilities like analytical functions which are the need of the hour in today's Big Data world. This book provides you easy installation steps with different types of metastores supported by Hive. This book has simple and easy to learn recipes for configuring Hive clients and services. You would also learn different Hive optimizations including Partitions and Bucketing. The book also covers the source code explanation of latest Hive version. Hive Query Language is being used by other frameworks including spark. Towards the end you will cover integration of Hive with these frameworks. Style and approach Starting with the basics and covering the core concepts with the practical usage, this book is a complete guide to learn and explore Hive offerings.

Note: Includes index. - Description based on online resource; title from cover (viewed May 10, 2016)

URL: https://learning.oreilly.com/library/view/-/9781782161080/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

hits 1 - 3 | 3 hits