0% found this document useful (0 votes)
67 views

Dana 262 Analyzing With Cloudera Data Warehouse

Cloudera - DANA-262: Analyzing with Cloudera Data Warehouse

Uploaded by

donothingaccount
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Dana 262 Analyzing With Cloudera Data Warehouse

Cloudera - DANA-262: Analyzing with Cloudera Data Warehouse

Uploaded by

donothingaccount
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

EDUCATIONAL SERVICES DATASHEET

DANA-262: Analyzing with Cloudera Data Warehouse


Cloudera Data Warehouse powered by Apache Hive and Apache Impala

About This Training


Course Overview
This four-day Analyzing with Data Warehouse course will teach you to apply
Course Type
traditional data analytics and business intelligence skills to big data. This
Instructor-led training course course presents the tools data professionals need to access, manipulate,
Level transform, and analyze complex data sets using SQL and familiar scripting
languages.
Intermediate
Duration What Skills You Will Gain
4 days
Through instructor-led discussion and interactive, hands-on exercises,
Platform participants will navigate the ecosystem, learning how to:
CDP Public Cloud
• Use Apache Hive and Apache Impala to access data through
Topics Covered queries
• Apache Hive • • Identify distinctions between Hive and Impala, such as
• Apache Impala differences in syntax, data formats, and supported features
• Write and execute queries that use functions, aggregate
functions, and subqueries
• Use joins and unions to combine datasets
• Create, modify, and delete tables, views, and databases
• Load data into tables and store query results
• Select file formats and develop partitioning schemes for better
performance
• Use analytic and windowing functions to gain insight into their
data
• Store and query complex or nested data structures
• Process and analyze semi-structured and unstructured data
• Optimize and extend the capabilities of Hive and Impala
• Determine whether Hive, Impala, an RDBMS, or a mix of these is
the best choice for a given task
• Utilize the benefits of CDP Public Cloud Data Warehouse

Who Should Take This Course?

This course is designed for data analysts, business intelligence specialists,


developers, system architects, and database administrators. Some
knowledge of SQL is assumed, as is basic Linux command-line familiarity.
EDUCATIONAL SERVICES DATASHEET

DANA-262: Analyzing with Cloudera Data Warehouse


Training Outline (Page 2 of 3)

Foundations for Big Data Analytics Data Storage and Performance


• Big Data Analytics Overview • Partitioning Tables
• Data Storage: HDFS • Loading Data into Partitioned Tables
• Distributed Data Processing: YARN, • When to Use Partitioning
• MapReduce, and Spark • Choosing a File Format
• Data Processing and Analysis: Hive • Using Avro and Parquet File Formats
and Impala
• Database Integration: Sqoop Working with Multiple Datasets
• Other Data Tools • UNION and Joins
• Exercise Scenario Explanation • Handling NULL Values in Joins
• Advanced Joins
Introduction to Apache Hive and Impala
• What Is Hive? Analytic Functions and Windowing
• What Is Impala? • Using Analytic Functions
• Why Use Hive and Impala? • Other Analytic Functions
• Schema and Data Storage • Sliding Windows
• Comparing Hive and Impala to
Traditional Databases Complex Data
• Use Cases • Complex Data with Hive
• Complex Data with Impala
Querying with Apache Hive and Impala
• Databases and Tables Analyzing Text
• Basic Hive and Impala Query • Using Regular Expressions with Hive
Language Syntax and Impala
• Data Types • Processing Text Data with SerDes
• Using Hue to Execute Queries in Hive
• Using Beeline (Hive's Shell) • Sentiment Analysis and n-grams
• Using the Impala Shell in Hive

Common Operators and Built-In Functions Apache Hive Optimization


• Operators • Understanding Query Performance
• Scalar Functions • Cost-Based Optimization and
• Aggregate Functions Statistics
• Bucketing
Data Management • ORC File Optimizations
• Data Storage
• Creating Databases and Tables Apache Impala Optimization
• Loading Data • How Impala Executes Queries
• Altering Databases and Tables • Improving Impala Performance
• Simplifying Queries with Views
• Storing Query Results Extending Hive and Impala
• User-Defined Functions
• Parameterized Queries

Cloudera, Inc. 5470 Great America Parkway, Santa Clara, CA 95054 cloudera.com
© 2023 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of
Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies.
Information is subject to change without notice.
EDUCATIONAL SERVICES DATASHEET

DANA-262: Analyzing with Cloudera Data Warehouse


Training Outline (Page 3 of 3)

Choosing the Best Tool for the Job


• Comparing Hive, Impala, and
• Relational Databases
• Which to Choose?

CDP Public Cloud Data Warehouse


• Data Warehouse Overview
• Auto-Scaling
• Managing Virtual Warehouses
• Querying Data Using CLI and Third-Party Integration

Appendix: Apache Kudu


• What Is Kudu?
• Kudu Tables
• Using Impala with Kudu

Cloudera, Inc. 5470 Great America Parkway, Santa Clara, CA 95054 cloudera.com
© 2023 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of
Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies.
Information is subject to change without notice.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy