Duration
4 Days
24 CPD hours
This course is intended for
This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators.
Overview
Skills gained in this training include:The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysisThe fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with HadoopHow Pig, Hive, and Impala improve productivity for typical analysis tasksJoining diverse datasets to gain valuable business insightPerforming real-time, complex queries on datasets
Cloudera University?s four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data.
Hadoop Fundamentals
The Motivation for Hadoop
Hadoop Overview
Data Storage: HDFS
Distributed Data Processing: YARN, MapReduce, and Spark
Data Processing and Analysis: Pig, Hive, and Impala
Data Integration: Sqoop
Other Hadoop Data Tools
Exercise Scenarios Explanation
Introduction to Pig
What Is Pig?
Pig?s Features
Pig Use Cases
Interacting with Pig
Basic Data Analysis with Pig
Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions
Processing Complex Data with Pig
Storage Formats
Complex/Nested Data Types
Grouping
Built-In Functions for Complex Data
Iterating Grouped Data
Multi-Dataset Operations with Pig
Techniques for Combining Data Sets
Joining Data Sets in Pig
Set Operations
Splitting Data Sets
Pig Troubleshoot & Optimization
Troubleshooting Pig
Logging
Using Hadoop?s Web UI
Data Sampling and Debugging
Performance Overview
Understanding the Execution Plan
Tips for Improving the Performance of Your Pig Jobs
Introduction to Hive & Impala
What Is Hive?
What Is Impala?
Schema and Data Storage
Comparing Hive to Traditional Databases
Hive Use Cases
Querying with Hive & Impala
Databases and Tables
Basic Hive and Impala Query Language Syntax
Data Types
Differences Between Hive and Impala Query Syntax
Using Hue to Execute Queries
Using the Impala Shell
Data Management
Data Storage
Creating Databases and Tables
Loading Data
Altering Databases and Tables
Simplifying Queries with Views
Storing Query Results
Data Storage & Performance
Partitioning Tables
Choosing a File Format
Managing Metadata
Controlling Access to Data
Relational Data Analysis with Hive & Impala
Joining Datasets
Common Built-In Functions
Aggregation and Windowing
Working with Impala
How Impala Executes Queries
Extending Impala with User-Defined Functions
Improving Impala Performance
Analyzing Text and Complex Data with Hive
Complex Values in Hive
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Conclusion
Hive Optimization
Understanding Query Performance
Controlling Job Execution Plan
Bucketing
Indexing Data
Extending Hive
SerDes
Data Transformation with Custom Scripts
User-Defined Functions
Parameterized Queries
Choosing the Best Tool for the Job
Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
Which to Choose?