• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

19 Hadoop courses in Coventry delivered Live Online

Introduction to Hadoop Administration (TTDS6503)

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for This is an introductory-level course designed to teach experienced systems administrators how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Previous Hadoop experience is not required. Overview Working within in an engaging, hands-on learning environment, guided by our expert team, attendees will learn to: Understand the benefits of distributed computing Understand the Hadoop architecture (including HDFS and MapReduce) Define administrator participation in Big Data projects Plan, implement, and maintain Hadoop clusters Deploy and maintain additional Big Data tools (Pig, Hive, Flume, etc.) Plan, deploy and maintain HBase on a Hadoop cluster Monitor and maintain hundreds of servers Pinpoint performance bottlenecks and fix them Apache Hadoop is an open source framework for creating reliable and distributable compute clusters. Hadoop provides an excellent platform (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence. This is an introductory-level, hands-on lab-intensive course geared for the administrator (new to Hadoop) who is charged with maintaining a Hadoop cluster and its related components. You will learn how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Introduction Hadoop history and concepts Ecosystem Distributions High level architecture Hadoop myths Hadoop challenges (hardware / software) Planning and installation Selecting software and Hadoop distributions Sizing the cluster and planning for growth Selecting hardware and network Rack topology Installation Multi-tenancy Directory structure and logs Benchmarking HDFS operations Concepts (horizontal scaling, replication, data locality, rack awareness) Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) Health monitoring Command-line and browser-based administration Adding storage and replacing defective drives MapReduce operations Parallel computing before MapReduce: compare HPC versus Hadoop administration MapReduce cluster loads Nodes and Daemons (JobTracker, TaskTracker) MapReduce UI walk through MapReduce configuration Job config Job schedulers Administrator view of MapReduce best practices Optimizing MapReduce Fool proofing MR: what to tell your programmers YARN: architecture and use Advanced topics Hardware monitoring System software monitoring Hadoop cluster monitoring Adding and removing servers and upgrading Hadoop Backup, recovery, and business continuity planning Cluster configuration tweaks Hardware maintenance schedule Oozie scheduling for administrators Securing your cluster with Kerberos The future of Hadoop

Introduction to Hadoop Administration (TTDS6503)
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Administrator Training for Apache Hadoop

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is best suited to systems administrators and IT managers. Overview Skills gained in this training include:Determining the correct hardware and infrastructure for your clusterProper cluster configuration and deployment to integrate with the data centerConfiguring the FairScheduler to provide service-level agreements for multiple users of a clusterBest practices for preparing and maintaining Apache Hadoop in productionTroubleshooting, diagnosing, tuning, and solving Hadoop issues Cloudera University?s four-day administrator training course for Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. The Case for Apache Hadoop Why Hadoop? Core Hadoop Components Fundamental Concepts HDFS HDFS Features Writing and Reading Files NameNode Memory Considerations Overview of HDFS Security Using the Namenode Web UI Using the Hadoop File Shell Getting Data into HDFS Ingesting Data from External Sources with Flume Ingesting Data from Relational Databases with Sqoop REST Interfaces Best Practices for Importing Data YARN & MapReduce What Is MapReduce? Basic MapReduce Concepts YARN Cluster Architecture Resource Allocation Failure Recovery Using the YARN Web UI MapReduce Version 1 Planning Your Hadoop Cluster General Planning Considerations Choosing the Right Hardware Network Considerations Configuring Nodes Planning for Cluster Management Hadoop Installation and Initial Configuration Deployment Types Installing Hadoop Specifying the Hadoop Configuration Performing Initial HDFS Configuration Performing Initial YARN and MapReduce Configuration Hadoop Logging Installing and Configuring Hive, Impala, and Pig Hive Impala Pig Hadoop Clients What is a Hadoop Client? Installing and Configuring Hadoop Clients Installing and Configuring Hue Hue Authentication and Authorization Cloudera Manager The Motivation for Cloudera Manager Cloudera Manager Features Express and Enterprise Versions Cloudera Manager Topology Installing Cloudera Manager Installing Hadoop Using Cloudera Manager Performing Basic Administration Tasks Using Cloudera Manager Advanced Cluster Configuration Advanced Configuration Parameters Configuring Hadoop Ports Explicitly Including and Excluding Hosts Configuring HDFS for Rack Awareness Configuring HDFS High Availability Hadoop Security Why Hadoop Security Is Important Hadoop?s Security System Concepts What Kerberos Is and How it Works Securing a Hadoop Cluster with Kerberos Managing and Scheduling Jobs Managing Running Jobs Scheduling Hadoop Jobs Configuring the FairScheduler Impala Query Scheduling Cluster Maintainence Checking HDFS Status Copying Data Between Clusters Adding and Removing Cluster Nodes Rebalancing the Cluster Cluster Upgrading Cluster Monitoring & Troubleshooting General System Monitoring Monitoring Hadoop Clusters Common Troubleshooting Hadoop Clusters Common Misconfigurations Additional course details: Nexus Humans Cloudera Administrator Training for Apache Hadoop training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Administrator Training for Apache Hadoop course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cloudera Administrator Training for Apache Hadoop
Delivered OnlineFlexible Dates
Price on Enquiry

Developer Training for Spark and Hadoop

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for Hadoop Developers Overview Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:How data is distributed, stored, and processed in a Hadoop clusterHow to use Sqoop and Flume to ingest dataHow to process distributed data with Apache SparkHow to model structured data as tables in Impala and HiveHow to choose the best data storage format for different data usage patternsBest practices for data storage This training course is the best preparation for the challenges faced by Hadoop developers. Participants will learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools. Course Outline Introduction Introduction to Hadoop and the Hadoop Ecosystem Hadoop Architecture and HDFS Importing Relational Data with Apache Sqoop Introduction to Impala and Hive Modeling and Managing Data with Impala and Hive Data Formats Data Partitioning Capturing Data with Apache Flume Spark Basics Working with RDDs in Spark Writing and Deploying Spark Applications Parallel Programming with Spark Spark Caching and Persistence Common Patterns in Spark Data Processing Spark SQL and DataFrames Conclusion Additional course details: Nexus Humans Developer Training for Spark and Hadoop training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Developer Training for Spark and Hadoop course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Developer Training for Spark and Hadoop
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Essentials for Apache Hadoop

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for The course is appropriate for IT managers, architects or anyone who wants to understand the big picture of what Apache Hadoop brings to the enterprise. All levels of technology knowledge are welcome. In this course, students unveil Apache Hadoop, giving themselves a thorough understanding of what the technology is and how it would impact their organizations. In this course, students unveil Apache Hadoop, giving themselves a thorough understanding of what the technology is and how it would impact their organizations.

Cloudera Essentials for Apache Hadoop
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Data Analyst Training - Using Pig, Hive, and Impala with Hadoop

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Overview Skills gained in this training include:The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysisThe fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with HadoopHow Pig, Hive, and Impala improve productivity for typical analysis tasksJoining diverse datasets to gain valuable business insightPerforming real-time, complex queries on datasets Cloudera University?s four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data. Hadoop Fundamentals The Motivation for Hadoop Hadoop Overview Data Storage: HDFS Distributed Data Processing: YARN, MapReduce, and Spark Data Processing and Analysis: Pig, Hive, and Impala Data Integration: Sqoop Other Hadoop Data Tools Exercise Scenarios Explanation Introduction to Pig What Is Pig? Pig?s Features Pig Use Cases Interacting with Pig Basic Data Analysis with Pig Pig Latin Syntax Loading Data Simple Data Types Field Definitions Data Output Viewing the Schema Filtering and Sorting Data Commonly-Used Functions Processing Complex Data with Pig Storage Formats Complex/Nested Data Types Grouping Built-In Functions for Complex Data Iterating Grouped Data Multi-Dataset Operations with Pig Techniques for Combining Data Sets Joining Data Sets in Pig Set Operations Splitting Data Sets Pig Troubleshoot & Optimization Troubleshooting Pig Logging Using Hadoop?s Web UI Data Sampling and Debugging Performance Overview Understanding the Execution Plan Tips for Improving the Performance of Your Pig Jobs Introduction to Hive & Impala What Is Hive? What Is Impala? Schema and Data Storage Comparing Hive to Traditional Databases Hive Use Cases Querying with Hive & Impala Databases and Tables Basic Hive and Impala Query Language Syntax Data Types Differences Between Hive and Impala Query Syntax Using Hue to Execute Queries Using the Impala Shell Data Management Data Storage Creating Databases and Tables Loading Data Altering Databases and Tables Simplifying Queries with Views Storing Query Results Data Storage & Performance Partitioning Tables Choosing a File Format Managing Metadata Controlling Access to Data Relational Data Analysis with Hive & Impala Joining Datasets Common Built-In Functions Aggregation and Windowing Working with Impala How Impala Executes Queries Extending Impala with User-Defined Functions Improving Impala Performance Analyzing Text and Complex Data with Hive Complex Values in Hive Using Regular Expressions in Hive Sentiment Analysis and N-Grams Conclusion Hive Optimization Understanding Query Performance Controlling Job Execution Plan Bucketing Indexing Data Extending Hive SerDes Data Transformation with Custom Scripts User-Defined Functions Parameterized Queries Choosing the Best Tool for the Job Comparing MapReduce, Pig, Hive, Impala, and Relational Databases Which to Choose?

Cloudera Data Analyst Training - Using Pig, Hive, and Impala with Hadoop
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Training for Apache HBase

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is appropriate for developers and administrators who intend to use HBase. Overview Skills learned on the course include:The use cases and usage occasions for HBase, Hadoop, and RDBMSUsing the HBase shell to directly manipulate HBase tablesDesigning optimal HBase schemas for efficient data storage and recoveryHow to connect to HBase using the Java API, configure the HBase cluster, and administer an HBase clusterBest practices for identifying and resolving performance bottlenecks Cloudera University?s four-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second. Introduction to Hadoop & HBase What Is Big Data? Introducing Hadoop Hadoop Components What Is HBase? Why Use HBase? Strengths of HBase HBase in Production Weaknesses of HBase HBase Tables HBase Concepts HBase Table Fundamentals Thinking About Table Design The HBase Shell Creating Tables with the HBase Shell Working with Tables Working with Table Data HBase Architecture Fundamentals HBase Regions HBase Cluster Architecture HBase and HDFS Data Locality HBase Schema Design General Design Considerations Application-Centric Design Designing HBase Row Keys Other HBase Table Features Basic Data Access with the HBase API Options to Access HBase Data Creating and Deleting HBase Tables Retrieving Data with Get Retrieving Data with Scan Inserting and Updating Data Deleting Data More Advanced HBase API Features Filtering Scans Best Practices HBase Coprocessors HBase on the Cluster How HBase Uses HDFS Compactions and Splits HBase Reads & Writes How HBase Writes Data How HBase Reads Data Block Caches for Reading HBase Performance Tuning Column Family Considerations Schema Design Considerations Configuring for Caching Dealing with Time Series and Sequential Data Pre-Splitting Regions HBase Administration and Cluster Management HBase Daemons ZooKeeper Considerations HBase High Availability Using the HBase Balancer Fixing Tables with hbck HBase Security HBase Replication & Backup HBase Replication HBase Backup MapReduce and HBase Clusters Using Hive & Impala with HBase Using Hive and Impala with HBase Appendix A: Accessing Data with Python and Thrift Thrift Usage Working with Tables Getting and Putting Data Scanning Data Deleting Data Counters Filters Appendix B: OpenTSDB

Cloudera Training for Apache HBase
Delivered OnlineFlexible Dates
Price on Enquiry

Designing and Building Big Data Applications

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems. Overview Skills learned in this course include:Creating a data set with Kite SDKDeveloping custom Flume components for data ingestionManaging a multi-stage workflow with OozieAnalyzing data with CrunchWriting user-defined functions for Hive and ImpalaWriting user-defined functions for Hive and ImpalaIndexing data with Cloudera Search Cloudera University?s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). IntroductionApplication Architecture Scenario Explanation Understanding the Development Environment Identifying and Collecting Input Data Selecting Tools for Data Processing and Analysis Presenting Results to the Use Defining & Using Datasets Metadata Management What is Apache Avro? Avro Schemas Avro Schema Evolution Selecting a File Format Performance Considerations Using the Kite SDK Data Module What is the Kite SDK? Fundamental Data Module Concepts Creating New Data Sets Using the Kite SDK Loading, Accessing, and Deleting a Data Set Importing Relational Data with Apache Sqoop What is Apache Sqoop? Basic Imports Limiting Results Improving Sqoop?s Performance Sqoop 2 Capturing Data with Apache Flume What is Apache Flume? Basic Flume Architecture Flume Sources Flume Sinks Flume Configuration Logging Application Events to Hadoop Developing Custom Flume Components Flume Data Flow and Common Extension Points Custom Flume Sources Developing a Flume Pollable Source Developing a Flume Event-Driven Source Custom Flume Interceptors Developing a Header-Modifying Flume Interceptor Developing a Filtering Flume Interceptor Writing Avro Objects with a Custom Flume Interceptor Managing Workflows with Apache Oozie The Need for Workflow Management What is Apache Oozie? Defining an Oozie Workflow Validation, Packaging, and Deployment Running and Tracking Workflows Using the CLI Hue UI for Oozie Processing Data Pipelines with Apache Crunch What is Apache Crunch? Understanding the Crunch Pipeline Comparing Crunch to Java MapReduce Working with Crunch Projects Reading and Writing Data in Crunch Data Collection API Functions Utility Classes in the Crunch API Working with Tables in Apache Hive What is Apache Hive? Accessing Hive Basic Query Syntax Creating and Populating Hive Tables How Hive Reads Data Using the RegexSerDe in Hive Developing User-Defined Functions What are User-Defined Functions? Implementing a User-Defined Function Deploying Custom Libraries in Hive Registering a User-Defined Function in Hive Executing Interactive Queries with Impala What is Impala? Comparing Hive to Impala Running Queries in Impala Support for User-Defined Functions Data and Metadata Management Understanding Cloudera Search What is Cloudera Search? Search Architecture Supported Document Formats Indexing Data with Cloudera Search Collection and Schema Management Morphlines Indexing Data in Batch Mode Indexing Data in Near Real Time Presenting Results to Users Solr Query Syntax Building a Search UI with Hue Accessing Impala through JDBC Powering a Custom Web Application with Impala and Search

Designing and Building Big Data Applications
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Data Scientist Training

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview of data science and machine learning at scaleOverview of the Hadoop ecosystemWorking with HDFS data and Hive tables using HueIntroduction to Cloudera Data Science WorkbenchOverview of Apache Spark 2Reading and writing dataInspecting data qualityCleansing and transforming dataSummarizing and grouping dataCombining, splitting, and reshaping dataExploring dataConfiguring, monitoring, and troubleshooting Spark applicationsOverview of machine learning in Spark MLlibExtracting, transforming, and selecting featuresBuilding and evauating regression modelsBuilding and evaluating classification modelsBuilding and evaluating clustering modelsCross-validating models and tuning hyperparametersBuilding machine learning pipelinesDeploying machine learning models Additional course details: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cloudera Data Scientist Training
Delivered OnlineFlexible Dates
Price on Enquiry

55321 SQL Server Integration Services

By Nexus Human

Duration 5 Days 30 CPD hours This course is intended for The primary audience for this course is database professionals who need to fulfil a Business Intelligence Developer role. They will need to focus on hands-on work creating BI solutions including Data Warehouse implementation, ETL, and data cleansing. Overview Create sophisticated SSIS packages for extracting, transforming, and loading data Use containers to efficiently control repetitive tasks and transactions Configure packages to dynamically adapt to environment changes Use Data Quality Services to cleanse data Successfully troubleshoot packages Create and Manage the SSIS Catalog Deploy, configure, and schedule packages Secure the SSIS Catalog SQL Server Integration Services is the Community Courseware version of 20767CC Implementing a SQL Data Warehouse. This five-day instructor-led course is intended for IT professionals who need to learn how to use SSIS to build, deploy, maintain, and secure Integration Services projects and packages, and to use SSIS to extract, transform, and load data to and from SQL Server. This course is similar to the retired Course 20767-C: Implementing a SQL Data Warehouse but focuses more on building packages, rather than the entire data warehouse design and implementation. Prerequisites Working knowledge of T-SQL and SQL Server Agent jobs is helpful, but not required. Basic knowledge of the Microsoft Windows operating system and its core functionality. Working knowledge of relational databases. Some experience with database design. 1 - SSIS Overview Import/Export Wizard Exporting Data with the Wizard Common Import Concerns Quality Checking Imported/Exported Data 2 - Working with Solutions and Projects Working with SQL Server Data Tools Understanding Solutions and Projects Working with the Visual Studio Interface 3 - Basic Control Flow Working with Tasks Understanding Precedence Constraints Annotating Packages Grouping Tasks Package and Task Properties Connection Managers Favorite Tasks 4 - Common Tasks Analysis Services Processing Data Profiling Task Execute Package Task Execute Process Task Expression Task File System Task FTP Task Hadoop Task Script Task Introduction Send Mail Task Web Service Task XML Task 5 - Data Flow Sources and Destinations The Data Flow Task The Data Flow SSIS Toolbox Working with Data Sources SSIS Data Sources Working with Data Destinations SSIS Data Destinations 6 - Data Flow Transformations Transformations Configuring Transformations 7 - Making Packages Dynamic Features for Making Packages Dynamic Package Parameters Project Parameters Variables SQL Parameters Expressions in Tasks Expressions in Connection Managers After Deployment How It All Fits Together 8 - Containers Sequence Containers For Loop Containers Foreach Loop Containers 9 - Troubleshooting and Package Reliability Understanding MaximumErrorCount Breakpoints Redirecting Error Rows Logging Event Handlers Using Checkpoints Transactions 10 - Deploying to the SSIS Catalog The SSIS Catalog Deploying Projects Working with Environments Executing Packages in SSMS Executing Packages from the Command Line Deployment Model Differences 11 - Installing and Administering SSIS Installing SSIS Upgrading SSIS Managing the SSIS Catalog Viewing Built-in SSIS Reports Managing SSIS Logging and Operation Histories Automating Package Execution 12 - Securing the SSIS Catalog Principals Securables Grantable Permissions Granting Permissions Configuring Proxy Accounts Additional course details: Nexus Humans 55321 SQL Server Integration Services training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the 55321 SQL Server Integration Services course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

55321 SQL Server Integration Services
Delivered OnlineFlexible Dates
£2,975

Building Batch Data Analytics Solutions on AWS

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for This course is intended for: Data platform engineers Architects and operators who build and manage data analytics pipelines Overview In this course, you will learn to: Compare the features and benefits of data warehouses, data lakes, and modern data architectures Design and implement a batch data analytics solution Identify and apply appropriate techniques, including compression, to optimize data storage Select and deploy appropriate options to ingest, transform, and store data Choose the appropriate instance and node types, clusters, auto scaling, and network topology for a particular business use case Understand how data storage and processing affect the analysis and visualization mechanisms needed to gain actionable business insights Secure data at rest and in transit Monitor analytics workloads to identify and remediate problems Apply cost management best practices In this course, you will learn to build batch data analytics solutions using Amazon EMR, an enterprise-grade Apache Spark and Apache Hadoop managed service. You will learn how Amazon EMR integrates with open-source projects such as Apache Hive, Hue, and HBase, and with AWS services such as AWS Glue and AWS Lake Formation. The course addresses data collection, ingestion, cataloging, storage, and processing components in the context of Spark and Hadoop. You will learn to use EMR Notebooks to support both analytics and machine learning workloads. You will also learn to apply security, performance, and cost management best practices to the operation of Amazon EMR. Module A: Overview of Data Analytics and the Data Pipeline Data analytics use cases Using the data pipeline for analytics Module 1: Introduction to Amazon EMR Using Amazon EMR in analytics solutions Amazon EMR cluster architecture Interactive Demo 1: Launching an Amazon EMR cluster Cost management strategies Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage Storage optimization with Amazon EMR Data ingestion techniques Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR Apache Spark on Amazon EMR use cases Why Apache Spark on Amazon EMR Spark concepts Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell Transformation, processing, and analytics Using notebooks with Amazon EMR Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive Using Amazon EMR with Hive to process batch data Transformation, processing, and analytics Practice Lab 2: Batch data processing using Amazon EMR with Hive Introduction to Apache HBase on Amazon EMR Module 5: Serverless Data Processing Serverless data processing, transformation, and analytics Using AWS Glue with Amazon EMR workloads Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions Module 6: Security and Monitoring of Amazon EMR Clusters Securing EMR clusters Interactive Demo 3: Client-side encryption with EMRFS Monitoring and troubleshooting Amazon EMR clusters Demo: Reviewing Apache Spark cluster history Module 7: Designing Batch Data Analytics Solutions Batch data analytics use cases Activity: Designing a batch data analytics workflow Module B: Developing Modern Data Architectures on AWS Modern data architectures

Building Batch Data Analytics Solutions on AWS
Delivered OnlineFlexible Dates
Price on Enquiry