Language:
English
Pages:
1 online resource (1 video file, approximately 58 min.)
Edition:
1st edition
DDC:
004.67/82
Keywords:
Amazon Web Services (Firm)
;
Cloud computing
;
Electronic videos
;
Amazon Web Services (Firm)
;
Infonuagique
;
Cloud computing
;
Instructional films
;
Internet videos
;
Nonfiction films
;
Instructional films
;
Nonfiction films
;
Internet videos
;
Films de formation
;
Films autres que de fiction
;
Vidéos sur Internet
;
Webcast
Abstract:
Though we have in the past depended on traditional data warehouses to drive business intelligence from data, they were generally based on databases and structured data formats, proving insufficient for the challenges that the current data driven world faces, especially the pace of data growth. Enter data lakes, which are optimized for unstructured and semi-structured data, can scale to PetaBytes easily, and allow better integration of a wide range of tools to help businesses get the most out of their data. There are few important properties worth understanding about data lakes. They include: Support for unstructured and semi-structured data. Scalability to PetaBytes and higher. SQL-like interface to interact with the stored data. Ability to connect various analytics tools as seamlessly as possible. Combine decoupled storage and analytics tools. Data volumes have grown to new scales and the demands of businesses have become more ambitious. For example, users now expect faster query times, better scalability, ease of management and so on. Former big data tools like Hadoop, Hive, and HDFS have made way for new and better technology platforms. Data and software professionals are now moving towards a disaggregated architecture, with Storage and Analytics layers very loosely coupled using REST APIs. This makes each layer much more independent (in terms of scaling and management) and allows using the perfect tool for each job. For example, in this disaggregated model, users can choose to use Spark for batch workloads for analytics, while using Presto for SQL heavy workloads, with both Spark and Presto using the same backend storage platform. This approach is now rapidly becoming the standard. Commonly used storage platforms include object storage platforms like AWS S3, Azure Blob Storage, Google Cloud Storage (GCS), Ceph, MinIO among others. Analytics platforms vary from simple Python & R based notebooks to Tensorflow to Spark, Presto to Splunk, Vertica and others. This case study will explain how California State University applied DataOps techniques in their current environments to create reusable, scalable and extensible data architectures. We will discuss strategies to build a data lakehouse architecture to rapidly scale and deploy use cases in your current environments. This case study is for you if... You want to learn how to apply scalable data architecture techniques in a data processing environment. You're a data Architect/ System Architect,...
Note:
Online resource; Title from title screen (viewed November 2, 2021)
,
Mode of access: World Wide Web.
Permalink