Booking options
£74.99
£74.99
On-Demand course
8 hours 19 minutes
All levels
This course will help you explore the world of Big Data technologies and frameworks. You will develop skills that will help you to pick the right Big Data technology and framework for your job and build the confidence to design robust Big Data pipelines.
Do you want a guide that will help you to pick the right Big Data technology for your project? Or do you want to get a solid understanding of the Big Data architecture and pipelines? This course will help you out. After highlighting the course structure and learning objectives, the course will take you through the steps needed for setting up the environment. Next, you will understand the Big Data logical architecture, study the evolution of Big Data technologies, and explore Big Data pipelines. Moving along, you will become familiar with ingestion frameworks, such as Kafka, Flume, Nifi, and Sqoop. Next, you will learn about key storage frameworks, such as HDFS, HBase, Kudu, and Cassandra. Finally, you will go through the various data formats and uncover key data processing and data analysis frameworks. By the end of this course, you will have a good understanding of the Big Data architecture and technologies and will have developed the skills to build real-world Big Data pipelines. All the resources and support files for this course are available at https://github.com/PacktPublishing/Big-Data-for-Architects
Create a Google account and a Dataproc cluster
Understand the Big Data architecture and pipelines
Study factors to consider while comparing ingestion frameworks
Gain a solid understanding of storage frameworks
Distinguish between text and binary data format
Find the key differences between the Spark, Tez, and Flink frameworks
Build a scalable Extract, Transform, Load (ETL) pipeline with Kafka Connect
If you are a software engineer, who is looking to build Big Data pipelines or planning to appear for certifications such as CCA175 or CCA159, this video course is for you. A basic understanding of Big Data is needed to get started with this course.
With the help of simple explanations, white-board sessions, and interesting activities, this course will make you familiar with the Big Data architecture and technologies. It will make you confident to design Big Data pipelines using modern frameworks.
Get a holistic picture of the Big Data ecosystem * Become an expert in choosing Big Data technology as per the requirements * Get ready to build end-to-end Big Data batch and streaming pipelines
https://github.com/PacktPublishing/Big-Data-for-Architects
Bhavuk Chawla has over 16 years of experience in IT, more than 8 years of experience implementing Cloud/ML/AI/Big Data Science related projects. He is an official instructor for Google, Confluent, and Cloudera. He has delivered and continues to deliver his training sessions in various companies including Google Singapore, Microsoft Bengaluru (Bangalore), Starbucks Coffee Seattle, Adobe India, EMEA Region, and more. He was recognized by Cloudera as the Instructor of the Year 2016 (APAC) for his exceptionally high ratings received in various training sessions.
1. Introduction
1. Course Structure and Approach This video highlights the course structure and explains how to approach the course. |
2. Course Pre-Requisites This video focuses on the course pre-requisites. |
3. Course Audience This video focuses on the course audience. |
4. About the Author This video introduces you to the author. |
2. Setting Up the Environment
1. Setting up a Google Cloud Account This video demonstrates how to set up a Google Cloud account. |
2. Creating a Dataproc Cluster This video explains how to create a Dataproc cluster. |
3. Google Cloud Platform (GCP) Account Best Practices This video focuses on the best practices when using a GCP account. |
3. Holistic View of Architectures and Pipelines
1. Big Data Logical Architecture This video explains the Big Data logical architecture. |
2. Evolution of Big Data Technologies This video focuses on the evolution of Big Data technologies. |
3. Key Big Data Architectures This video explains key Big Data architectures. |
4. Typical Big Data Batch Pipeline This video introduces you to the typical Big Data batch pipeline. |
5. Typical Big Data Streaming Pipeline This video explains the typical Big Data streaming pipeline. |
6. Example 01: Big Data Streaming Pipeline This video presents an example of Big Data streaming pipeline. |
7. Example 02: Big Data Streaming Pipeline This video presents another example of a Big Data streaming pipeline. |
4. Key Ingestion/Dataflow Frameworks
1. Factors to Consider while Comparing Ingestion Frameworks This video highlights the factors to consider while comparing ingestion frameworks. |
2. Kafka Versus Flume This video highlights the difference between Kafka and Flume. |
3. NiFi Versus Kafka This video provides the difference between NiFi and Kafka. |
4. Sqoop Versus Flume This video explains the difference between Sqoop and Flume. |
5. Sqoop Versus Kafka Connect This video highlights the difference between Sqoop and Kafka Connect. |
6. Installing NiFi This video demonstrates how to install NiFi. |
7. Installing Kafka This video explains how to install Kafka. |
8. Hands-on Kafka and NiFi Integration Background This video provides a background of Kafka and NiFi integration. |
9. Integrating Kafka and NiFi This video shows how to integrate Kafka and NiFi. |
5. Key Storage Frameworks
1. Factors to Consider while Comparing Storage Frameworks This video highlights the factors to consider while comparing storage frameworks. |
2. Hadoop Distributed File System (HDFS) Versus HBase This video highlights the difference between HDFS and HBase. |
3. HBase Versus Kudu This video explains the difference between HBase and Kudu. |
4. Hadoop Distributed File System (HDFS) Versus Kudu This video provides the difference between HDFS and Kudu. |
5. HBase Versus Cassandra This video highlights the difference between HBase and Cassandra. |
6. Data formats
1. Text Versus Binary This video highlights the difference between text and binary. |
2. Interoperability This video focuses on interoperability. |
3. Row-Oriented Versus Column-Oriented This video explains the difference between row-oriented and column-oriented. |
4. Splittable Formats This video introduces you to splittable formats. |
5. Schema Evolution This video focuses on schema evolution. |
6. Comparing Data Formats This video compares data formats. |
7. Installing Sqoop on Dataproc Cluster This video demonstrates how to install Sqoop on Dataproc cluster. |
8. Hands-on Big Data Batch Pipeline Using the Avro Format This video focuses on Big Data batch pipeline using the Avro format. |
7. Key Data Processing Frameworks
1. Factors to Consider while Comparing Processing Frameworks This video highlights the factors to consider while comparing processing frameworks. |
2. MapReduce (MR) Versus Spark Logical Architecture This video highlights the difference between MR and Spark logical architecture. |
3. MapReduce (MR) Versus Spark Performance This video provides the difference between MR and Spark performance. |
4. Spark Versus Tez This video explains the difference between Spark and Tez. |
5. Spark Versus Flink This video highlights the difference between Spark and Flink. |
6. Kafka Streams Versus Spark Streaming This video highlights the difference between Kafka streams and Spark streaming. |
7. Spark 2.x Streaming Versus Spark 1.x Streaming This video provides the difference between Spark 2.x streaming and Spark 1.x streaming. |
8. Spark Core Versus Spark Structured Query Language (SQL) This video explains the difference between Spark core and Spark SQL. |
9. Integrating Kafka and Spark Streaming This video demonstrates how to integrate Kafka and Spark streaming. |
8. Key Data Analysis Frameworks
1. Factors to Consider while Comparing Analysis Frameworks This video highlights the factors to consider while comparing analysis frameworks. |
2. Hive Versus Impala This video highlights the difference between Hive and Impala. |
3. Hive Versus Pig This video provides the difference between Hive and Pig. |
4. Hive Versus Spark Structured Query Language (SQL) This video explains the difference between Hive and Spark SQL. |
5. Hive Versus Hive Live Long and Process (LLAP) Versus Impala This video highlights the difference between Hive, Hive LLAP, and Impala. |
6. Hive Versus KSQL This video provides the difference between Hive and KSQL. |
7. KSQL Versus KSQLDB This video explains the difference between KSQL and KSQLDB. |
8. Hands-On KSQL This video explains how to work with KSQL. |
9. Writing to a Stream and Table Using KSQL This video demonstrates how to write to a stream and table using KSQL. |
10. Streaming Extract, Transform, Load (ETL) Pipeline Background This video provides a background of how to stream the ETL pipeline. |
11. Building a Scalable Extract, Transform, Load (ETL) Pipeline with Kafka Connect - Part 1 This is the first part of the two-part video that shows how to build a scalable ETL pipeline with Kafka Connect. |
12. Building a Scalable Extract, Transform, Load (ETL) Pipeline with Kafka Connect - Part 2 This is the second part of the two-part video that demonstrates how to build a scalable ETL pipeline with the Kafka Connect. |
9. Delta Lake
1. Delta Architecture This video explains the Delta architecture in detail. |
2. Why Delta Lake Let's understand why Delta Lake is important in this lesson. |
3. Challenges with Delta Lake This video talks about the different challenges with Delta Lake. |
4. Delta Lake Demo Let's take a look at Delta Lake demonstration in this video session. |
10. Additional Material
1. Solr Versus Elasticsearch This video highlights the difference between Solr and Elasticsearch. |
2. Cloudera Search Versus Solr This video provides the difference between Cloudera search and Solr. |
3. Oozie Versus Airflow This video explains the difference between Oozie and Airflow. |
4. KSQL Versus KStreams This video highlights the difference between KSQL and KStreams. |
11. Summary
1. Conclusion This video provides the course conclusion. |