Evifa-Portal

Hits per page

hits 1 - 4 | 4 hits

Sorting

Online Resource

Disrupting data discovery (2019)

Grover, Mark [VerfasserIn] ; Feng, Tao [MitwirkendeR]

[Erscheinungsort nicht ermittelbar] : O'Reilly Media, Inc. | Boston, MA : Safari

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (1 video file, approximately 42 min.)

Edition: 1st edition

Keywords: Electronic videos ; local

Abstract: Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There’s no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn’t discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What’s unique to Amundsen is that it treats people as a first-class data asset; in other words, there’s a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco.

Note: Online resource; Title from title screen (viewed October 31, 2019)

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

Online Resource

Learning path : understanding tool integration for big data architecture (2016)

Grover, Mark [MitwirkendeR] ; Morrow, Rich [MitwirkendeR] ; Yahalom, David [MitwirkendeR] ; [et al.]

[Place of publication not identified] : O'Reilly Media

add to mindlist on the mindlist

Details

ISBN: 9781491978641 , 1491978643

Language: English

Pages: 1 online resource (1 streaming video file (21 hr., 44 min., 26 sec.)) , digital, sound, color

Keywords: Internet videos ; Streaming video ; Apache Hadoop ; Spark (Electronic resource : Apache Software Foundation) ; Big data ; Electronic data processing ; Distributed processing ; Electronic videos ; local ; Vidéos sur Internet ; Vidéo en continu ; streaming video ; Internet videos ; Streaming video ; Electronic videos

Abstract: "Implementing solutions with Apache Hadoop requires understanding not just Hadoop, but a broad range of related projects within the Hadoop ecosystem. In this Learning Path, you'll discover how to integrate Hadoop components to implement big data solutions through a variety of end-to-end use case studies addressing clickstream analytics, time series problems, and data transfer between Hadoop and relational databases. You'll master techniques for integrating Spark with other key components of the Hadoop ecosystem, including HDFS, YARN, and MapReduce, and gain direct experience integrating Spark and Hadoop through a series of sixteen hands-on labs. In closing, you'll get a glimpse into the latest trends in data infrastructure, in a changing financial industry, through a series of talks from Strata+Hadoop world on next-gen finance."--Resource description page.

Note: Title and publication information from resource description page (Safari, viewed January 10, 2017)

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

Online Resource

Architectural considerations for Hadoop applications : using clickstream analytics as an end-to-end example (2015)

Grover, Mark [VerfasserIn] ; Malaska, Ted [MitwirkendeR] ; Seidman, Jonathan [MitwirkendeR] ; [et al.]

[Place of publication not identified] : O'Reilly Media

add to mindlist on the mindlist

Details

ISBN: 9781491923313

Language: English

Pages: 1 online resource (1 streaming video file (2 hr., 31 min., 32 sec.)) , digital, sound, color.

Keywords: Apache Hadoop ; Electronic data processing ; Distributed processing ; File organization (Computer science) ; Data mining ; Electronic videos ; local

Abstract: "Implementing solutions with Apache Hadoop requires understanding notjust Hadoop, but a broad range of related projects in the Hadoopecosystem such as Hive, Pig, Oozie, Sqoop, and Flume. The good news isthat there's an abundance of materials - books, web sites,conferences, etc. - for gaining a deep understanding of Hadoop andthese related projects. The bad news is there's still a scarcity of information on how to integrate these components to implement completesolutions. In this video we'll walk through an end-to-end case studyof a clickstream analytics engine to provide a concrete example of howto architect and implement a complete solution with Hadoop."--Resource description page.

Note: Title from title screen (viewed May 21, 2015)

URL: lizenzpflichtig

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

Online Resource

Hadoop application architectures : designing real world big data applications (2015)

Grover, Mark [VerfasserIn]

Sebastopol, CA : O'Reilly

add to mindlist on the mindlist

Details

Language: English

Pages: 1 online resource (1 volume) , illustrations

Edition: First edition.

Keywords: Apache Hadoop ; Computer architecture ; Big data ; Electronic books ; Electronic books ; local

Abstract: Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case.

Note: Includes index. - Description based on online resource; title from cover (Safari, viewed July 17, 2015)

URL: https://learning.oreilly.com/library/view/-/9781491910313/?ar

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

MPI Ethno. Forsch.

hits 1 - 4 | 4 hits