• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

120 Apache courses

Definitive Puppet for engineers

5.0(3)

By Systems & Network Training

Definitive Puppet training course description Puppet is a framework and toolset for configuration management. This course covers Puppet to enable delegates to manage configurations. Hands on sessions follow all the major sections. What will you learn Deploy Puppet. Manage configurations with Puppet. Build hosts with Puppet. Produce reports with Puppet. Definitive Puppet training course details Who will benefit: Anyone working with Puppet. Prerequisites: Linux fundamentals. Duration 2 days Definitive Puppet training course contents Getting started with Puppet What is Puppet, Selecting the right version of Puppet, Installing Puppet, Configuring Puppet. Developing and deploying Puppet The puppet apply command and modes of operation, Foreground Puppet master, Developing Puppet with Vagrant, Environments, Making changes to the development environment, Testing the new environments with the Puppet agent, Environment branching and merging, Dynamic Puppet environments with Git branches, Summary, Resources. Scaling Puppet Identifying the challenges, Running the Puppet master with Apache and Passenger, Testing the Puppet master in Apache, Load balancing multiple Puppet masters, Scaling further, Load balancing alternatives. Measuring performance, Splay time, Summary, Going further, Resources. Externalizing Puppet configuration External node classification, Storing node configuration in LDAP, Summary, Resources. Exporting and storing configuration Virtual resources, Getting started with exported and stored configurations, Using exported resources, Expiring state resources, Summary, Resources. Puppet consoles The foreman, Puppet enterprise console, Puppetboard, Summary, Resources. Tools and integration Puppet forge and the module tool, Searching and installing a module from the forge, Generating a module, Managing module dependencies, Testing the modules, Developing Puppet modules with Geppetto, Summary, Resources. Reporting with Puppet Getting started, Configuring reporting, Report processors, Custom reporting, Other Puppet reporters, Summary, Resources. Extending Facter and Puppet Writing and distributing custom facts, Developing custom types, providers and functions, Summary, Resources, Complex data structures, Additional backends, Hiera functions in depth, Module data bindings, Hiera examples. Jiera-2, Summary, Resources. Mcollective Installing and configuring Mcollective, testing, Mcollective plugins, accessing hosts with Metadata. Hiera Lists, initial Hiera configuration, Hiera command line utility, complex data structures, additional backends, Hiera functions in depth, module data bindings. Hiera-2.

Definitive Puppet for engineers
Delivered in Internationally or OnlineFlexible Dates
£1,727

Apache Spark 3 for Data Engineering and Analytics with Python

By Packt

This course primarily focuses on explaining the concepts of Python and PySpark. It will help you enhance your data analysis skills using structured Spark DataFrames APIs.

Apache Spark 3 for Data Engineering and Analytics with Python
Delivered Online On Demand8 hours 30 minutes
£41.99

Designing and Building Big Data Applications

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems. Overview Skills learned in this course include:Creating a data set with Kite SDKDeveloping custom Flume components for data ingestionManaging a multi-stage workflow with OozieAnalyzing data with CrunchWriting user-defined functions for Hive and ImpalaWriting user-defined functions for Hive and ImpalaIndexing data with Cloudera Search Cloudera University?s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). IntroductionApplication Architecture Scenario Explanation Understanding the Development Environment Identifying and Collecting Input Data Selecting Tools for Data Processing and Analysis Presenting Results to the Use Defining & Using Datasets Metadata Management What is Apache Avro? Avro Schemas Avro Schema Evolution Selecting a File Format Performance Considerations Using the Kite SDK Data Module What is the Kite SDK? Fundamental Data Module Concepts Creating New Data Sets Using the Kite SDK Loading, Accessing, and Deleting a Data Set Importing Relational Data with Apache Sqoop What is Apache Sqoop? Basic Imports Limiting Results Improving Sqoop?s Performance Sqoop 2 Capturing Data with Apache Flume What is Apache Flume? Basic Flume Architecture Flume Sources Flume Sinks Flume Configuration Logging Application Events to Hadoop Developing Custom Flume Components Flume Data Flow and Common Extension Points Custom Flume Sources Developing a Flume Pollable Source Developing a Flume Event-Driven Source Custom Flume Interceptors Developing a Header-Modifying Flume Interceptor Developing a Filtering Flume Interceptor Writing Avro Objects with a Custom Flume Interceptor Managing Workflows with Apache Oozie The Need for Workflow Management What is Apache Oozie? Defining an Oozie Workflow Validation, Packaging, and Deployment Running and Tracking Workflows Using the CLI Hue UI for Oozie Processing Data Pipelines with Apache Crunch What is Apache Crunch? Understanding the Crunch Pipeline Comparing Crunch to Java MapReduce Working with Crunch Projects Reading and Writing Data in Crunch Data Collection API Functions Utility Classes in the Crunch API Working with Tables in Apache Hive What is Apache Hive? Accessing Hive Basic Query Syntax Creating and Populating Hive Tables How Hive Reads Data Using the RegexSerDe in Hive Developing User-Defined Functions What are User-Defined Functions? Implementing a User-Defined Function Deploying Custom Libraries in Hive Registering a User-Defined Function in Hive Executing Interactive Queries with Impala What is Impala? Comparing Hive to Impala Running Queries in Impala Support for User-Defined Functions Data and Metadata Management Understanding Cloudera Search What is Cloudera Search? Search Architecture Supported Document Formats Indexing Data with Cloudera Search Collection and Schema Management Morphlines Indexing Data in Batch Mode Indexing Data in Near Real Time Presenting Results to Users Solr Query Syntax Building a Search UI with Hue Accessing Impala through JDBC Powering a Custom Web Application with Impala and Search

Designing and Building Big Data Applications
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Training for Apache HBase

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is appropriate for developers and administrators who intend to use HBase. Overview Skills learned on the course include:The use cases and usage occasions for HBase, Hadoop, and RDBMSUsing the HBase shell to directly manipulate HBase tablesDesigning optimal HBase schemas for efficient data storage and recoveryHow to connect to HBase using the Java API, configure the HBase cluster, and administer an HBase clusterBest practices for identifying and resolving performance bottlenecks Cloudera University?s four-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second. Introduction to Hadoop & HBase What Is Big Data? Introducing Hadoop Hadoop Components What Is HBase? Why Use HBase? Strengths of HBase HBase in Production Weaknesses of HBase HBase Tables HBase Concepts HBase Table Fundamentals Thinking About Table Design The HBase Shell Creating Tables with the HBase Shell Working with Tables Working with Table Data HBase Architecture Fundamentals HBase Regions HBase Cluster Architecture HBase and HDFS Data Locality HBase Schema Design General Design Considerations Application-Centric Design Designing HBase Row Keys Other HBase Table Features Basic Data Access with the HBase API Options to Access HBase Data Creating and Deleting HBase Tables Retrieving Data with Get Retrieving Data with Scan Inserting and Updating Data Deleting Data More Advanced HBase API Features Filtering Scans Best Practices HBase Coprocessors HBase on the Cluster How HBase Uses HDFS Compactions and Splits HBase Reads & Writes How HBase Writes Data How HBase Reads Data Block Caches for Reading HBase Performance Tuning Column Family Considerations Schema Design Considerations Configuring for Caching Dealing with Time Series and Sequential Data Pre-Splitting Regions HBase Administration and Cluster Management HBase Daemons ZooKeeper Considerations HBase High Availability Using the HBase Balancer Fixing Tables with hbck HBase Security HBase Replication & Backup HBase Replication HBase Backup MapReduce and HBase Clusters Using Hive & Impala with HBase Using Hive and Impala with HBase Appendix A: Accessing Data with Python and Thrift Thrift Usage Working with Tables Getting and Putting Data Scanning Data Deleting Data Counters Filters Appendix B: OpenTSDB

Cloudera Training for Apache HBase
Delivered OnlineFlexible Dates
Price on Enquiry

gRPC [Golang] Master Class: Build Modern API and Microservices

By Packt

Better than REST APIs! Build a fast and scalable HTTP/2 API for a Go microservice with gRPC and protocol buffers (protobufs)

gRPC [Golang] Master Class: Build Modern API and Microservices
Delivered Online On Demand5 hours 25 minutes
£130.99

Building Batch Data Analytics Solutions on AWS

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for This course is intended for: Data platform engineers Architects and operators who build and manage data analytics pipelines Overview In this course, you will learn to: Compare the features and benefits of data warehouses, data lakes, and modern data architectures Design and implement a batch data analytics solution Identify and apply appropriate techniques, including compression, to optimize data storage Select and deploy appropriate options to ingest, transform, and store data Choose the appropriate instance and node types, clusters, auto scaling, and network topology for a particular business use case Understand how data storage and processing affect the analysis and visualization mechanisms needed to gain actionable business insights Secure data at rest and in transit Monitor analytics workloads to identify and remediate problems Apply cost management best practices In this course, you will learn to build batch data analytics solutions using Amazon EMR, an enterprise-grade Apache Spark and Apache Hadoop managed service. You will learn how Amazon EMR integrates with open-source projects such as Apache Hive, Hue, and HBase, and with AWS services such as AWS Glue and AWS Lake Formation. The course addresses data collection, ingestion, cataloging, storage, and processing components in the context of Spark and Hadoop. You will learn to use EMR Notebooks to support both analytics and machine learning workloads. You will also learn to apply security, performance, and cost management best practices to the operation of Amazon EMR. Module A: Overview of Data Analytics and the Data Pipeline Data analytics use cases Using the data pipeline for analytics Module 1: Introduction to Amazon EMR Using Amazon EMR in analytics solutions Amazon EMR cluster architecture Interactive Demo 1: Launching an Amazon EMR cluster Cost management strategies Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage Storage optimization with Amazon EMR Data ingestion techniques Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR Apache Spark on Amazon EMR use cases Why Apache Spark on Amazon EMR Spark concepts Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell Transformation, processing, and analytics Using notebooks with Amazon EMR Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive Using Amazon EMR with Hive to process batch data Transformation, processing, and analytics Practice Lab 2: Batch data processing using Amazon EMR with Hive Introduction to Apache HBase on Amazon EMR Module 5: Serverless Data Processing Serverless data processing, transformation, and analytics Using AWS Glue with Amazon EMR workloads Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions Module 6: Security and Monitoring of Amazon EMR Clusters Securing EMR clusters Interactive Demo 3: Client-side encryption with EMRFS Monitoring and troubleshooting Amazon EMR clusters Demo: Reviewing Apache Spark cluster history Module 7: Designing Batch Data Analytics Solutions Batch data analytics use cases Activity: Designing a batch data analytics workflow Module B: Developing Modern Data Architectures on AWS Modern data architectures

Building Batch Data Analytics Solutions on AWS
Delivered OnlineFlexible Dates
Price on Enquiry

gRPC [Java] Master Class: Build Modern API and Microservices

By Packt

Better than REST APIs! Build a fast and scalable HTTP/2 API for your microservice with gRPC and protocol buffers (protobufs).

gRPC [Java] Master Class: Build Modern API and Microservices
Delivered Online On Demand5 hours 4 minutes
£68.99

AWS Certified Data Analytics Specialty (2023) Hands-on

By Packt

This course covers the important topics needed to pass the AWS Certified Data Analytics-Specialty exam (AWS DAS-C01). You will learn about Kinesis, EMR, DynamoDB, and Redshift, and get ready for the exam by working through quizzes, exercises, and practice exams, along with exploring essential tips and techniques.

AWS Certified Data Analytics Specialty (2023) Hands-on
Delivered Online On Demand16 hours 33 minutes
£68.99

DP-203T00 Data Engineering on Microsoft Azure

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and building analytical solutions using data platform technologies that exist on Microsoft Azure. The secondary audience for this course includes data analysts and data scientists who work with analytical solutions built on Microsoft Azure. In this course, the student will learn how to implement and manage data engineering workloads on Microsoft Azure, using Azure services such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Stream Analytics, Azure Databricks, and others. The course focuses on common data engineering tasks such as orchestrating data transfer and transformation pipelines, working with data files in a data lake, creating and loading relational data warehouses, capturing and aggregating streams of real-time data, and tracking data assets and lineage. Prerequisites Successful students start this course with knowledge of cloud computing and core data concepts and professional experience with data solutions. AZ-900T00 Microsoft Azure Fundamentals DP-900T00 Microsoft Azure Data Fundamentals 1 - Introduction to data engineering on Azure What is data engineering Important data engineering concepts Data engineering in Microsoft Azure 2 - Introduction to Azure Data Lake Storage Gen2 Understand Azure Data Lake Storage Gen2 Enable Azure Data Lake Storage Gen2 in Azure Storage Compare Azure Data Lake Store to Azure Blob storage Understand the stages for processing big data Use Azure Data Lake Storage Gen2 in data analytics workloads 3 - Introduction to Azure Synapse Analytics What is Azure Synapse Analytics How Azure Synapse Analytics works When to use Azure Synapse Analytics 4 - Use Azure Synapse serverless SQL pool to query files in a data lake Understand Azure Synapse serverless SQL pool capabilities and use cases Query files using a serverless SQL pool Create external database objects 5 - Use Azure Synapse serverless SQL pools to transform data in a data lake Transform data files with the CREATE EXTERNAL TABLE AS SELECT statement Encapsulate data transformations in a stored procedure Include a data transformation stored procedure in a pipeline 6 - Create a lake database in Azure Synapse Analytics Understand lake database concepts Explore database templates Create a lake database Use a lake database 7 - Analyze data with Apache Spark in Azure Synapse Analytics Get to know Apache Spark Use Spark in Azure Synapse Analytics Analyze data with Spark Visualize data with Spark 8 - Transform data with Spark in Azure Synapse Analytics Modify and save dataframes Partition data files Transform data with SQL 9 - Use Delta Lake in Azure Synapse Analytics Understand Delta Lake Create Delta Lake tables Create catalog tables Use Delta Lake with streaming data Use Delta Lake in a SQL pool 10 - Analyze data in a relational data warehouse Design a data warehouse schema Create data warehouse tables Load data warehouse tables Query a data warehouse 11 - Load data into a relational data warehouse Load staging tables Load dimension tables Load time dimension tables Load slowly changing dimensions Load fact tables Perform post load optimization 12 - Build a data pipeline in Azure Synapse Analytics Understand pipelines in Azure Synapse Analytics Create a pipeline in Azure Synapse Studio Define data flows Run a pipeline 13 - Use Spark Notebooks in an Azure Synapse Pipeline Understand Synapse Notebooks and Pipelines Use a Synapse notebook activity in a pipeline Use parameters in a notebook 14 - Plan hybrid transactional and analytical processing using Azure Synapse Analytics Understand hybrid transactional and analytical processing patterns Describe Azure Synapse Link 15 - Implement Azure Synapse Link with Azure Cosmos DB Enable Cosmos DB account to use Azure Synapse Link Create an analytical store enabled container Create a linked service for Cosmos DB Query Cosmos DB data with Spark Query Cosmos DB with Synapse SQL 16 - Implement Azure Synapse Link for SQL What is Azure Synapse Link for SQL? Configure Azure Synapse Link for Azure SQL Database Configure Azure Synapse Link for SQL Server 2022 17 - Get started with Azure Stream Analytics Understand data streams Understand event processing Understand window functions 18 - Ingest streaming data using Azure Stream Analytics and Azure Synapse Analytics Stream ingestion scenarios Configure inputs and outputs Define a query to select, filter, and aggregate data Run a job to ingest data 19 - Visualize real-time data with Azure Stream Analytics and Power BI Use a Power BI output in Azure Stream Analytics Create a query for real-time visualization Create real-time data visualizations in Power BI 20 - Introduction to Microsoft Purview What is Microsoft Purview? How Microsoft Purview works When to use Microsoft Purview 21 - Integrate Microsoft Purview and Azure Synapse Analytics Catalog Azure Synapse Analytics data assets in Microsoft Purview Connect Microsoft Purview to an Azure Synapse Analytics workspace Search a Purview catalog in Synapse Studio Track data lineage in pipelines 22 - Explore Azure Databricks Get started with Azure Databricks Identify Azure Databricks workloads Understand key concepts 23 - Use Apache Spark in Azure Databricks Get to know Spark Create a Spark cluster Use Spark in notebooks Use Spark to work with data files Visualize data 24 - Run Azure Databricks Notebooks with Azure Data Factory Understand Azure Databricks notebooks and pipelines Create a linked service for Azure Databricks Use a Notebook activity in a pipeline Use parameters in a notebook Additional course details: Nexus Humans DP-203T00 Data Engineering on Microsoft Azure training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the DP-203T00 Data Engineering on Microsoft Azure course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

DP-203T00 Data Engineering on Microsoft Azure
Delivered OnlineFlexible Dates
£2,380

Professional Certificate Course in Big Data Infrastructure in London 2024

4.9(261)

By Metropolitan School of Business & Management UK

Dive into the heart of Big Data Infrastructure, exploring storage systems, distributed file frameworks, and processing paradigms. This course provides a comprehensive understanding of key components like HDFS, Apache Spark, and Cassandra, offering insights into their architecture, use cases, and real-world applications. This course is a deep dive into the complex landscape of Big Data Infrastructure. From unravelling the architecture of Apache Spark to dissecting the benefits of distributed file systems, participants gain expertise in assessing, comparing, and implementing various Big Data storage and processing systems. Scalability, fault-tolerance, and industry-specific case studies add practical depth to theoretical knowledge. After the successful completion of this course, you will be able to: Understand the Components of Big Data Infrastructure, Including Storage Systems, Distributed File Systems, and Processing Frameworks. Identify the Characteristics and Benefits of Distributed File Systems Such as Hadoop Distributed File System (H.D.F.S). Describe the Architecture and Capabilities of Apache Spark and its Role in Big Data Processing. Recognise the Use Cases and Benefits of Apache Cassandra as a Distributed N..O.S.Q.L Database. Compare and Contrast Different Big Data Storage and Processing Systems Such as Hadoop, Spark, and Cassandra. Understand the Scalability and Fault-tolerance Mechanisms Used in Big Data Infrastructure, Such as Sharding and Replication. Appreciate the Challenges Associated with Deploying and Managing Big Data Infrastructure, Such as Hardware and Software Configuration and Security Considerations. Explore the intricacies of Big Data Infrastructure, from understanding storage systems to unraveling the nuances of distributed file frameworks and processing engines. Gain a comprehensive view of scalability, fault-tolerance mechanisms, and industry-specific challenges through engaging case studies. Equip yourself to navigate the dynamic landscape of Big Data with confidence and expertise. VIDEO - Course Structure and Assessment Guidelines Watch this video to gain further insight. Navigating the MSBM Study Portal Watch this video to gain further insight. Interacting with Lectures/Learning Components Watch this video to gain further insight. Big Data Infrastructure Self-paced pre-recorded learning content on this topic. Big Data Infrastructure Put your knowledge to the test with this quiz. Read each question carefully and choose the response that you feel is correct. All MSBM courses are accredited by the relevant partners and awarding bodies. Please refer to MSBM accreditation in about us for more details. There are no strict entry requirements for this course. Work experience will be an added advantage to understanding the content of the course. The certificate is designed to enhance the learner's knowledge in the field. This certificate is for everyone who is eager to know more and get updated on current ideas in their respective field. We recommend this certificate for the following audience. Big Data Infrastructure Engineer Hadoop Administrator Spark Developer Cassandra Database Administrator Big Data Solutions Architect Data Infrastructure Manager NoSQL Database Analyst Big Data Consultant Average Completion Time 2 Weeks Accreditation 3 CPD Hours Level Advanced Start Time Anytime 100% Online Study online with ease. Unlimited Access 24/7 unlimited access with pre-recorded lectures. Low Fees Our fees are low and easy to pay online.

Professional Certificate Course in Big Data Infrastructure in London 2024
Delivered Online On Demand14 days
£28