Booking options
£70.99
£70.99
On-Demand course
5 hours 38 minutes
All levels
A complete course on Sqoop, Flume, and Hive: Ideal for achieving CCA175 and Hortonworks Spark Certification
In this course, you will start by learning about the Hadoop Distributed File System (HDFS) and the most common Hadoop commands required to work with HDFS. Next, you'll be introduced to Sqoop Import, which will help you gain insights into the lifecycle of the Sqoop command and how to use the import command to migrate data from MySQL to HDFS, and from MySQL to Hive. In addition to this, you will get up to speed with Sqoop Export for migrating data effectively, along with using Apache Flume to ingest data. As you progress, you will delve into Apache Hive, external and managed tables, working with different files, and Parquet and Avro. Toward the concluding section, you will focus on Spark DataFrames and Spark SQL. By the end of this course, you will have gained comprehensive insights into big data ingestion and analytics with Flume, Sqoop, Hive, and Spark. All code and supporting files are available at - https://github.com/PacktPublishing/Master-Big-Data-Ingestion-and-Analytics-with-Flume-Sqoop-Hive-and-Spark
Explore the Hadoop Distributed File System (HDFS) and commands
Get to grips with the lifecycle of the Sqoop command
Use the Sqoop Import command to migrate data from MySQL to HDFS and Hive
Understand split-by and boundary queries
Use the incremental mode to migrate data from MySQL to HDFS
Employ Sqoop Export to migrate data from HDFS to MySQL
Discover Spark DataFrames and gain insights into working with different file formats and compression
This course is for anyone who wants to learn Sqoop and Flume or those looking to achieve CCA and HDP certification.
A complete course packed with step-by-step instructions, working examples, and helpful advice. This course is systematically divided into small sections that will help you understand each part individually and learn at your own pace.
Learn Sqoop, Flume, and Hive and successfully achieve CCA175 and Hortonworks Spark Certification * Understand the Hadoop Distributed File System (HDFS), along with exploring Hadoop commands to work effectively with HDFS
https://github.com/packtpublishing/master-big-data-ingestion-and-analytics-with-flume-sqoop-hive-and-spark
Navdeep Kaur - Technical Trainer
Navdeep Kaur is a big data professionals with 11 years of industry experience in different technologies and domains. She has a keen interest in providing training in new technologies. She has received CCA175 Hadoop and Spark developer certification and AWS solution architect certification. She loves guiding people and helping them achieves new goals.
1. Hadoop Introduction
1. HDFS and Hadoop Commands Hadoop Introduction: HDFS and Hadoop Commands |
2. Sqoop Import
1. Sqoop Introduction Sqoop Import: Sqoop Introduction |
2. Managing Target Directories Sqoop Import: Managing Target Directories |
3. Working with Different File Formats Sqoop Import: Working with Different File Formats |
4. Working with Different Compressions Sqoop Import: Working with Different Compressions |
5. Conditional Imports Sqoop Import: Conditional Imports |
6. Split-by and Boundary Queries Sqoop Import: Split-by and Boundary Queries |
7. Field delimeters Sqoop Import: Field delimeters |
8. Incremental Appends Sqoop Import: Incremental Appends |
9. Sqoop Hive Import Sqoop Import: Sqoop Hive Import |
10. Sqoop List Tables/Database Sqoop Import: Sqoop List Tables/Database |
11. Sqoop Import Practice1 Sqoop Import: Sqoop Import Practice1 |
12. Sqoop Import Practice2 Sqoop Import: Sqoop Import Practice2 |
13. Sqoop Import Practice3 Sqoop Import: Sqoop Import Practice3 |
3. Sqoop Export
1. Export from Hdfs to Mysql Sqoop Export: Export from Hdfs to Mysql |
2. Export from Hive to Mysql Sqoop Export: Export from Hive to Mysql |
4. Apache Flume
1. Flume Introduction & Architecture Apache Flume: Flume Introduction & Architecture |
2. Exec Source and Logger Sink Apache Flume: Exec Source and Logger Sink |
3. Moving data from Twitter to HDFS Apache Flume: Moving data from Twitter to HDFS |
4. Moving data from NetCat to HDFS Apache Flume: Moving data from NetCat to HDFS |
5. Flume Interceptors Apache Flume: Flume Interceptors |
6. Flume Interceptor Example Apache Flume: Flume Interceptor Example |
7. Flume Multi-Agent Flow Apache Flume: Flume Multi-Agent Flow |
8. Flume Consolidation Apache Flume: Flume Consolidation |
5. Apache Hive
1. Hive Introduction Apache Hive: Hive Introduction |
2. Hive Database Apache Hive: Hive Database |
3. Hive Managed Tables Apache Hive: Hive Managed Tables |
4. Hive External Tables Apache Hive: Hive External Tables |
5. Hive Inserts Apache Hive: Hive Inserts |
6. Hive Analytics Apache Hive: Hive Analytics |
7. Working with Parquet Apache Hive: Working with Parquet |
8. Compressing Parquet Apache Hive: Compressing Parquet |
9. Working with Fixed File Format Apache Hive: Working with Fixed File Format |
10. Alter Command Apache Hive: Alter Command |
11. Hive String Functions Apache Hive: Hive String Functions |
12. Hive Date Functions Apache Hive: Hive Date Functions |
13. Hive Partitioning Apache Hive: Hive Partitioning |
14. Hive Bucketing Apache Hive: Hive Bucketing |
6. Spark Introduction
1. Spark Introduction Spark Introduction: Spark Introduction |
2. Resilient Distributed Datasets Spark Introduction: Resilient Distributed Datasets |
3. Cluster Overview Spark Introduction: Cluster Overview |
4. Directed Acyclic Graph (DAG) & Stages Spark Introduction: Directed Acyclic Graph (DAG) & Stages |
7. Spark Transformations & Actions
1. Map/FlatMap Transformation Spark Transformations & Actions: Map/FlatMap Transformation |
2. Filter/Intersection Spark Transformations & Actions: Filter/Intersection |
3. Union/Distinct Transformation Spark Transformations & Actions: Union/Distinct Transformation |
4. GroupByKey/ Group people based on Birthday months Spark Transformations & Actions: GroupByKey/ Group people based on Birthday months |
5. ReduceByKey / Total Number of students in each Subject Spark Transformations & Actions: ReduceByKey / Total Number of students in each Subject |
6. SortByKey / Sort students based on their rollno Spark Transformations & Actions: SortByKey / Sort students based on their rollno |
7. MapPartition / MapPartitionWithIndex Spark Transformations & Actions: MapPartition / MapPartitionWithIndex |
8. Change number of Partitions Spark Transformations & Actions: Change number of Partitions |
9. Join / Join email address based on customer name Spark Transformations & Actions: Join / Join email address based on customer name |
10. Spark Actions Spark Transformations & Actions: Spark Actions |
8. Spark RDD Practice
1. Scala Tuples Spark RDD Practice: Scala Tuples |
2. Extract Error Logs from log files Spark RDD Practice: Extract Error Logs from log files |
3. Frequency of word in Text File Spark RDD Practice: Frequency of word in Text File |
4. Population of each City Spark RDD Practice: Population of each City |
5. Orders placed by Customers Spark RDD Practice: Orders placed by Customers |
6. Movie Average Rating greater than 3 Spark RDD Practice: Movie Average Rating greater than 3 |
9. Spark Dataframes & Spark SQL
1. Dataframe Intro Spark Dataframes & Spark SQL: Dataframe Intro |
2. Dafaframe from Json Files Spark Dataframes & Spark SQL: Dafaframe from Json Files |
3. Dataframe from Parquet Files Spark Dataframes & Spark SQL: Dataframe from Parquet Files |
4. Dataframe from CSV Files Spark Dataframes & Spark SQL: Dataframe from CSV Files |
5. Dataframe from Avro/XML Files Spark Dataframes & Spark SQL: Dataframe from Avro/XML Files |
6. Working with Different Compressions Spark Dataframes & Spark SQL: Working with Different Compressions |
7. DataFrame API Part1 Spark Dataframes & Spark SQL: DataFrame API Part1 |
8. DataFrame API Part2 Spark Dataframes & Spark SQL: DataFrame API Part2 |
9. Spark SQL Spark Dataframes & Spark SQL: Spark SQL |
10. Working with Hive Tables in Spark Spark Dataframes & Spark SQL: Working with Hive Tables in Spark |