Booking options
Price on Enquiry
Price on Enquiry
Delivered Online
4 days
All levels
Duration
4 Days
24 CPD hours
This course is intended for
This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems.
Overview
Skills learned in this course include:Creating a data set with Kite SDKDeveloping custom Flume components for data ingestionManaging a multi-stage workflow with OozieAnalyzing data with CrunchWriting user-defined functions for Hive and ImpalaWriting user-defined functions for Hive and ImpalaIndexing data with Cloudera Search
Cloudera University?s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH).
IntroductionApplication Architecture
Scenario Explanation
Understanding the Development Environment
Identifying and Collecting Input Data
Selecting Tools for Data Processing and Analysis
Presenting Results to the Use
Defining & Using Datasets
Metadata Management
What is Apache Avro?
Avro Schemas
Avro Schema Evolution
Selecting a File Format
Performance Considerations
Using the Kite SDK Data Module
What is the Kite SDK?
Fundamental Data Module Concepts
Creating New Data Sets Using the Kite SDK
Loading, Accessing, and Deleting a Data Set
Importing Relational Data with Apache Sqoop
What is Apache Sqoop?
Basic Imports
Limiting Results
Improving Sqoop?s Performance
Sqoop 2
Capturing Data with Apache Flume
What is Apache Flume?
Basic Flume Architecture
Flume Sources
Flume Sinks
Flume Configuration
Logging Application Events to Hadoop
Developing Custom Flume Components
Flume Data Flow and Common Extension Points
Custom Flume Sources
Developing a Flume Pollable Source
Developing a Flume Event-Driven Source
Custom Flume Interceptors
Developing a Header-Modifying Flume Interceptor
Developing a Filtering Flume Interceptor
Writing Avro Objects with a Custom Flume Interceptor
Managing Workflows with Apache Oozie
The Need for Workflow Management
What is Apache Oozie?
Defining an Oozie Workflow
Validation, Packaging, and Deployment
Running and Tracking Workflows Using the CLI
Hue UI for Oozie
Processing Data Pipelines with Apache Crunch
What is Apache Crunch?
Understanding the Crunch Pipeline
Comparing Crunch to Java MapReduce
Working with Crunch Projects
Reading and Writing Data in Crunch
Data Collection API Functions
Utility Classes in the Crunch API
Working with Tables in Apache Hive
What is Apache Hive?
Accessing Hive
Basic Query Syntax
Creating and Populating Hive Tables
How Hive Reads Data
Using the RegexSerDe in Hive
Developing User-Defined Functions
What are User-Defined Functions?
Implementing a User-Defined Function
Deploying Custom Libraries in Hive
Registering a User-Defined Function in Hive
Executing Interactive Queries with Impala
What is Impala?
Comparing Hive to Impala
Running Queries in Impala
Support for User-Defined Functions
Data and Metadata Management
Understanding Cloudera Search
What is Cloudera Search?
Search Architecture
Supported Document Formats
Indexing Data with Cloudera Search
Collection and Schema Management
Morphlines
Indexing Data in Batch Mode
Indexing Data in Near Real Time
Presenting Results to Users
Solr Query Syntax
Building a Search UI with Hue
Accessing Impala through JDBC
Powering a Custom Web Application with Impala and Search
Nexus Human, established over 20 years ago, stands as a pillar of excellence in the realm of IT and Business Skills Training and education in Ireland and the UK....