• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

115 Data Engineering courses delivered Online

Working with Elasticsearch (TTDS6882)

By Nexus Human

Duration 2 Days 12 CPD hours This course is intended for This training is ideally suited for data analysts, IT professionals, and software developers who seek to augment their data processing and analytics capabilities. It will also benefit system administrators and data engineers who wish to harness Elastic Stack's functionalities for efficient system logging, monitoring, and robust data visualization. With a focus on practical application, this course is perfect for those aspiring to solve complex data challenges in real-time environments across diverse industry verticals. Overview This course combines engaging instructor-led presentations and useful demonstrations with valuable hands-on labs and engaging group activities. Throughout the course you'll explore: New features and updates introduced in Elastic Stack 7.0 Fundamentals of Elastic Stack including Elasticsearch, Logstash, and Kibana Useful tips for using Elastic Cloud and deploying Elastic Stack in production environments How to install and configure an Elasticsearch architecture How to solve the full-text search problem with Elasticsearch Powerful analytics capabilities through aggregations using Elasticsearch How to build a data pipeline to transfer data from a variety of sources into Elasticsearch for analysis How to create interactive dashboards for effective storytelling with your data using Kibana How to secure, monitor and use Elastic Stack's alerting and reporting capabilities The Elastic Stack is a powerful combination of tools for techniques such as distributed search, analytics, logging, and visualization of data. Elastic Stack 7.0 encompasses new features and capabilities that will enable you to find unique insights into analytics using these techniques. Geared for experienced data analysts, IT professionals, and software developers who seek to augment their data processing and analytics capabilities, Working with Elasticsearch will explore how to use Elastic Stack and Elasticsearch efficiently to build powerful real-time data processing applications. Throughout the two-day hands-on course, you?ll explore the power of this robust toolset that enables advanced distributed search, analytics, logging, and visualization of data, enabled by new features in Elastic Stack 7.0. You?ll delve into the core functionalities of Elastic Stack, understanding the role of each component in constructing potent real-time data processing applications. You?ll gain proficiency in Elasticsearch for distributed searching and analytics, Logstash for logging, and Kibana for compelling data visualization. You?ll also explore the art of crafting custom plugins using Kibana and Beats, and familiarize yourself with Elastic X-Pack, a vital extension for effective security and monitoring. The course also covers essentials like Elasticsearch architecture, solving full-text search problems, data pipeline building, and creating interactive Kibana dashboards. Learn how to deploy Elastic Stack in production environments and explore the powerful analytics capabilities offered through Elasticsearch aggregations. The course will also touch upon securing, monitoring, and utilizing Elastic Stack's alerting and reporting capabilities. Hands-on labs, captivating demonstrations, and interactive group activities enrich your learning journey throughout the course. Introducing Elastic Stack What is Elasticsearch, and why use it? Exploring the components of the Elastic Stack Use cases of Elastic Stack Downloading and installing Getting Started with Elasticsearch Using the Kibana Console UI Core concepts of Elasticsearch CRUD operations Creating indexes and taking control of mapping REST API overview Searching - What is Relevant The basics of text analysis Searching from structured data Searching from the full text Writing compound queries Modeling relationships Analytics with Elasticsearch The basics of aggregations Preparing data for analysis Metric aggregations Bucket aggregations Pipeline aggregations Substantial Lab and Case Study Analyzing Log Data Log analysis challenges Using Logstash The Logstash architecture Overview of Logstash plugins Ingest node Visualizing Data with Kibana Downloading and installing Kibana Preparing data Kibana UI Timelion Using plugins

Working with Elasticsearch (TTDS6882)
Delivered OnlineFlexible Dates
Price on Enquiry

Practical Data Science with Amazon SageMaker

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for This course is intended for: A technical audience at an intermediate level Overview Using Amazon SageMaker, this course teaches you how to: Prepare a dataset for training. Train and evaluate a machine learning model. Automatically tune a machine learning model. Prepare a machine learning model for production. Think critically about machine learning model results In this course, learn how to solve a real-world use case with machine learning and produce actionable results using Amazon SageMaker. This course teaches you how to use Amazon SageMaker to cover the different stages of the typical data science process, from analyzing and visualizing a data set, to preparing the data and feature engineering, down to the practical aspects of model building, training, tuning and deployment. Day 1 Business problem: Churn prediction Load and display the dataset Assess features and determine which Amazon SageMaker algorithm to use Use Amazon Sagemaker to train, evaluate, and automatically tune the model Deploy the model Assess relative cost of errors Additional course details: Nexus Humans Practical Data Science with Amazon SageMaker training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Practical Data Science with Amazon SageMaker course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Practical Data Science with Amazon SageMaker
Delivered OnlineFlexible Dates
Price on Enquiry

KM204 IBM InfoSphere DataStage Essentials (v11.5)

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for Project administrators and ETL developers responsible for data extraction and transformation using DataStage. Overview Describe the uses of DataStage and the DataStage workflowDescribe the Information Server architecture and how DataStage fits within itDescribe the Information Server and DataStage deployment optionsUse the Information Server Web Console and the DataStage Administrator client to create DataStage users and to configure the DataStage environmentImport and export DataStage objects to a fileImport table definitions for sequential files and relational tablesDesign, compile, run, and monitor DataStage parallel jobsDesign jobs that read and write to sequential filesDescribe the DataStage parallel processing architectureDesign jobs that combine data using joins and lookupsDesign jobs that sort and aggregate dataImplement complex business logic using the DataStage Transformer stageDebug DataStage jobs using the DataStage PX Debugger This course enables the project administrators & developers to acquire the skills necessary to develop parallel jobs in DataStage. Students will learn to create parallel jobs that access sequential & relational data, and combine and transform the data. Course Outline Introduction to DataStage Deployment DataStage Administration Work with Metadata Create Parallel Jobs Access Sequential Data Partitioning and Collecting Algorithms Combine Data Group Processing Stages Transformer Stage Repository Functions Work with Relational Data Control Jobs

KM204 IBM InfoSphere DataStage Essentials (v11.5)
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Data Scientist Training

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview of data science and machine learning at scaleOverview of the Hadoop ecosystemWorking with HDFS data and Hive tables using HueIntroduction to Cloudera Data Science WorkbenchOverview of Apache Spark 2Reading and writing dataInspecting data qualityCleansing and transforming dataSummarizing and grouping dataCombining, splitting, and reshaping dataExploring dataConfiguring, monitoring, and troubleshooting Spark applicationsOverview of machine learning in Spark MLlibExtracting, transforming, and selecting featuresBuilding and evauating regression modelsBuilding and evaluating classification modelsBuilding and evaluating clustering modelsCross-validating models and tuning hyperparametersBuilding machine learning pipelinesDeploying machine learning models Additional course details: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cloudera Data Scientist Training
Delivered OnlineFlexible Dates
Price on Enquiry

Designing and Building Big Data Applications

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems. Overview Skills learned in this course include:Creating a data set with Kite SDKDeveloping custom Flume components for data ingestionManaging a multi-stage workflow with OozieAnalyzing data with CrunchWriting user-defined functions for Hive and ImpalaWriting user-defined functions for Hive and ImpalaIndexing data with Cloudera Search Cloudera University?s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). IntroductionApplication Architecture Scenario Explanation Understanding the Development Environment Identifying and Collecting Input Data Selecting Tools for Data Processing and Analysis Presenting Results to the Use Defining & Using Datasets Metadata Management What is Apache Avro? Avro Schemas Avro Schema Evolution Selecting a File Format Performance Considerations Using the Kite SDK Data Module What is the Kite SDK? Fundamental Data Module Concepts Creating New Data Sets Using the Kite SDK Loading, Accessing, and Deleting a Data Set Importing Relational Data with Apache Sqoop What is Apache Sqoop? Basic Imports Limiting Results Improving Sqoop?s Performance Sqoop 2 Capturing Data with Apache Flume What is Apache Flume? Basic Flume Architecture Flume Sources Flume Sinks Flume Configuration Logging Application Events to Hadoop Developing Custom Flume Components Flume Data Flow and Common Extension Points Custom Flume Sources Developing a Flume Pollable Source Developing a Flume Event-Driven Source Custom Flume Interceptors Developing a Header-Modifying Flume Interceptor Developing a Filtering Flume Interceptor Writing Avro Objects with a Custom Flume Interceptor Managing Workflows with Apache Oozie The Need for Workflow Management What is Apache Oozie? Defining an Oozie Workflow Validation, Packaging, and Deployment Running and Tracking Workflows Using the CLI Hue UI for Oozie Processing Data Pipelines with Apache Crunch What is Apache Crunch? Understanding the Crunch Pipeline Comparing Crunch to Java MapReduce Working with Crunch Projects Reading and Writing Data in Crunch Data Collection API Functions Utility Classes in the Crunch API Working with Tables in Apache Hive What is Apache Hive? Accessing Hive Basic Query Syntax Creating and Populating Hive Tables How Hive Reads Data Using the RegexSerDe in Hive Developing User-Defined Functions What are User-Defined Functions? Implementing a User-Defined Function Deploying Custom Libraries in Hive Registering a User-Defined Function in Hive Executing Interactive Queries with Impala What is Impala? Comparing Hive to Impala Running Queries in Impala Support for User-Defined Functions Data and Metadata Management Understanding Cloudera Search What is Cloudera Search? Search Architecture Supported Document Formats Indexing Data with Cloudera Search Collection and Schema Management Morphlines Indexing Data in Batch Mode Indexing Data in Near Real Time Presenting Results to Users Solr Query Syntax Building a Search UI with Hue Accessing Impala through JDBC Powering a Custom Web Application with Impala and Search

Designing and Building Big Data Applications
Delivered OnlineFlexible Dates
Price on Enquiry

KM423 IBM InfoSphere DataStage v11.5 - Advanced Data Processing

By Nexus Human

Duration 2 Days 12 CPD hours This course is intended for Experienced DataStage developers seeking training in more advanced DataStage job techniques and who seek techniques for working with complex types of data resources. Overview Use Connector stages to read from and write to database tables Handle SQL errors in Connector stages Use Connector stages with multiple input links Use the File Connector stage to access Hadoop HDFS data Optimize jobs that write to database tables Use the Unstructured Data stage to extract data from Excel spreadsheets Use the Data Masking stage to mask sensitive data processed within a DataStage job Use the Hierarchical stage to parse, compose, and transform XML data Use the Schema Library Manager to import and manage XML schemas Use the Data Rules stage to validate fields of data within a DataStage job Create custom data rules for validating data Design a job that processes a star schema data warehouse with Type 1 and Type 2 slowly changing dimensions This course is designed to introduce you to advanced parallel job data processing techniques in DataStage v11.5. In this course you will develop data techniques for processing different types of complex data resources including relational data, unstructured data (Excel spreadsheets), and XML data. In addition, you will learn advanced techniques for processing data, including techniques for masking data and techniques for validating data using data rules. Finally, you will learn techniques for updating data in a star schema data warehouse using the DataStage SCD (Slowly Changing Dimensions) stage. Even if you are not working with all of these specific types of data, you will benefit from this course by learning advanced DataStage job design techniques, techniques that go beyond those utilized in the DataStage Essentials course. Accessing databases Connector stage overview - Use Connector stages to read from and write to relational tables - Working with the Connector stage properties Connector stage functionality - Before / After SQL - Sparse lookups - Optimize insert/update performance Error handling in Connector stages - Reject links - Reject conditions Multiple input links - Designing jobs using Connector stages with multiple input links - Ordering records across multiple input links File Connector stage - Read and write data to Hadoop file systems Demonstration 1: Handling database errors Demonstration 2: Parallel jobs with multiple Connector input links Demonstration 3: Using the File Connector stage to read and write HDFS files Processing unstructured data Using the Unstructured Data stage in DataStage jobs - Extract data from an Excel spreadsheet - Specify a data range for data extraction in an Unstructured Data stage - Specify document properties for data extraction. Demonstration 1: Processing unstructured data Data masking Using the Data Masking stage in DataStage jobs - Data masking techniques - Data masking policies - Applying policies for masquerading context-aware data types - Applying policies for masquerading generic data types - Repeatable replacement - Using reference tables - Creating custom reference tables Demonstration 1: Data masking Using data rules Introduction to data rules - Using the Data Rules Editor - Selecting data rules - Binding data rule variables - Output link constraints - Adding statistics and attributes to the output information Use the Data Rules stage to valid foreign key references in source data Create custom data rules Demonstration 1: Using data rules Processing XML data Introduction to the Hierarchical stage - Hierarchical stage Assembly editor - Use the Schema Library Manager to import and manage XML schemas Composing XML data - Using the HJoin step to create parent-child relationships between input lists - Using the Composer step Writing Hierarchical data to a relational table Using the Regroup step Consuming XML data - Using the XML Parser step - Propagating columns Topic 6: Transforming XML data - Using the Aggregate step - Using the Sort step - Using the Switch step - Using the H-Pivot step Demonstration 1: Importing XML schemas Demonstration 2: Compose hierarchical data Demonstration 3: Consume hierarchical data Demonstration 4: Transform hierarchical data Updating a star schema database Surrogate keys - Design a job that creates and updates a surrogate key source key file from a dimension table Slowly Changing Dimensions (SCD) stage - Star schema databases - SCD stage Fast Path pages - Specifying purpose codes - Dimension update specification - Design a job that processes a star schema database with Type 1 and Type 2 slowly changing dimensions Demonstration 1: Build a parallel job that updates a star schema database with two dimensions Additional course details: Nexus Humans KM423 IBM InfoSphere DataStage v11.5 - Advanced Data Processing training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the KM423 IBM InfoSphere DataStage v11.5 - Advanced Data Processing course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

KM423 IBM InfoSphere DataStage v11.5 - Advanced Data Processing
Delivered OnlineFlexible Dates
Price on Enquiry

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)

By Nexus Human

Duration 5 Days 30 CPD hours This course is intended for This intermediate and beyond level course is geared for experienced technical professionals in various roles, such as developers, data analysts, data engineers, software engineers, and machine learning engineers who want to leverage Scala and Spark to tackle complex data challenges and develop scalable, high-performance applications across diverse domains. Practical programming experience is required to participate in the hands-on labs. Overview Working in a hands-on learning environment led by our expert instructor you'll: Develop a basic understanding of Scala and Apache Spark fundamentals, enabling you to confidently create scalable and high-performance applications. Learn how to process large datasets efficiently, helping you handle complex data challenges and make data-driven decisions. Gain hands-on experience with real-time data streaming, allowing you to manage and analyze data as it flows into your applications. Acquire practical knowledge of machine learning algorithms using Spark MLlib, empowering you to create intelligent applications and uncover hidden insights. Master graph processing with GraphX, enabling you to analyze and visualize complex relationships in your data. Discover generative AI technologies using GPT with Spark and Scala, opening up new possibilities for automating content generation and enhancing data analysis. Embark on a journey to master the world of big data with our immersive course on Scala and Spark! Mastering Scala with Apache Spark for the Modern Data Enterprise is a five day hands on course designed to provide you with the essential skills and tools to tackle complex data projects using Scala programming language and Apache Spark, a high-performance data processing engine. Mastering these technologies will enable you to perform a wide range of tasks, from data wrangling and analytics to machine learning and artificial intelligence, across various industries and applications.Guided by our expert instructor, you?ll explore the fundamentals of Scala programming and Apache Spark while gaining valuable hands-on experience with Spark programming, RDDs, DataFrames, Spark SQL, and data sources. You?ll also explore Spark Streaming, performance optimization techniques, and the integration of popular external libraries, tools, and cloud platforms like AWS, Azure, and GCP. Machine learning enthusiasts will delve into Spark MLlib, covering basics of machine learning algorithms, data preparation, feature extraction, and various techniques such as regression, classification, clustering, and recommendation systems. Introduction to Scala Brief history and motivation Differences between Scala and Java Basic Scala syntax and constructs Scala's functional programming features Introduction to Apache Spark Overview and history Spark components and architecture Spark ecosystem Comparing Spark with other big data frameworks Basics of Spark Programming SparkContext and SparkSession Resilient Distributed Datasets (RDDs) Transformations and Actions Working with DataFrames Spark SQL and Data Sources Spark SQL library and its advantages Structured and semi-structured data sources Reading and writing data in various formats (CSV, JSON, Parquet, Avro, etc.) Data manipulation using SQL queries Basic RDD Operations Creating and manipulating RDDs Common transformations and actions on RDDs Working with key-value data Basic DataFrame and Dataset Operations Creating and manipulating DataFrames and Datasets Column operations and functions Filtering, sorting, and aggregating data Introduction to Spark Streaming Overview of Spark Streaming Discretized Stream (DStream) operations Windowed operations and stateful processing Performance Optimization Basics Best practices for efficient Spark code Broadcast variables and accumulators Monitoring Spark applications Integrating External Libraries and Tools, Spark Streaming Using popular external libraries, such as Hadoop and HBase Integrating with cloud platforms: AWS, Azure, GCP Connecting to data storage systems: HDFS, S3, Cassandra, etc. Introduction to Machine Learning Basics Overview of machine learning Supervised and unsupervised learning Common algorithms and use cases Introduction to Spark MLlib Overview of Spark MLlib MLlib's algorithms and utilities Data preparation and feature extraction Linear Regression and Classification Linear regression algorithm Logistic regression for classification Model evaluation and performance metrics Clustering Algorithms Overview of clustering algorithms K-means clustering Model evaluation and performance metrics Collaborative Filtering and Recommendation Systems Overview of recommendation systems Collaborative filtering techniques Implementing recommendations with Spark MLlib Introduction to Graph Processing Overview of graph processing Use cases and applications of graph processing Graph representations and operations Introduction to Spark GraphX Overview of GraphX Creating and transforming graphs Graph algorithms in GraphX Big Data Innovation! Using GPT and Generative AI Technologies with Spark and Scala Overview of generative AI technologies Integrating GPT with Spark and Scala Practical applications and use cases Bonus Topics / Time Permitting Introduction to Spark NLP Overview of Spark NLP Preprocessing text data Text classification and sentiment analysis Putting It All Together Work on a capstone project that integrates multiple aspects of the course, including data processing, machine learning, graph processing, and generative AI technologies.

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)
Delivered OnlineFlexible Dates
Price on Enquiry

Developer Training for Spark and Hadoop

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for Hadoop Developers Overview Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:How data is distributed, stored, and processed in a Hadoop clusterHow to use Sqoop and Flume to ingest dataHow to process distributed data with Apache SparkHow to model structured data as tables in Impala and HiveHow to choose the best data storage format for different data usage patternsBest practices for data storage This training course is the best preparation for the challenges faced by Hadoop developers. Participants will learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools. Course Outline Introduction Introduction to Hadoop and the Hadoop Ecosystem Hadoop Architecture and HDFS Importing Relational Data with Apache Sqoop Introduction to Impala and Hive Modeling and Managing Data with Impala and Hive Data Formats Data Partitioning Capturing Data with Apache Flume Spark Basics Working with RDDs in Spark Writing and Deploying Spark Applications Parallel Programming with Spark Spark Caching and Persistence Common Patterns in Spark Data Processing Spark SQL and DataFrames Conclusion Additional course details: Nexus Humans Developer Training for Spark and Hadoop training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Developer Training for Spark and Hadoop course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Developer Training for Spark and Hadoop
Delivered OnlineFlexible Dates
Price on Enquiry

Data Wrangling with Python

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for Data Wrangling with Python takes a practical approach to equip beginners with the most essential data analysis tools in the shortest possible time. It contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context. Overview By the end of this course, you will be confident in using a diverse array of sources to extract, clean, transform, and format your data efficiently. In this course you will start with the absolute basics of Python, focusing mainly on data structures. Then you will delve into the fundamental tools of data wrangling like NumPy and Pandas libraries. You'll explore useful insights into why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of the specialized pre-built routines in Python.This combination of Python tips and tricks will also demonstrate how to use the same Python backend and extract/transform data from an array of sources including the Internet, large database vaults, and Excel financial tables. To help you prepare for more challenging scenarios, you'll cover how to handle missing or wrong data, and reformat it based on the requirements from the downstream analytics tool. The course will further help you grasp concepts through real-world examples and datasets. Introduction to Data Structure using Python Python for Data Wrangling Lists, Sets, Strings, Tuples, and Dictionaries Advanced Operations on Built-In Data Structure Advanced Data Structures Basic File Operations in Python Introduction to NumPy, Pandas, and Matplotlib NumPy Arrays Pandas DataFrames Statistics and Visualization with NumPy and Pandas Using NumPy and Pandas to Calculate Basic Descriptive Statistics on the DataFrame Deep Dive into Data Wrangling with Python Subsetting, Filtering, and Grouping Detecting Outliers and Handling Missing Values Concatenating, Merging, and Joining Useful Methods of Pandas Get Comfortable with a Different Kind of Data Sources Reading Data from Different Text-Based (and Non-Text-Based) Sources Introduction to BeautifulSoup4 and Web Page Parsing Learning the Hidden Secrets of Data Wrangling Advanced List Comprehension and the zip Function Data Formatting Advanced Web Scraping and Data Gathering Basics of Web Scraping and BeautifulSoup libraries Reading Data from XML RDBMS and SQL Refresher of RDBMS and SQL Using an RDBMS (MySQL/PostgreSQL/SQLite) Application in real life and Conclusion of course Applying Your Knowledge to a Real-life Data Wrangling Task An Extension to Data Wrangling

Data Wrangling with Python
Delivered OnlineFlexible Dates
Price on Enquiry

Data Science Projects with Python

By Nexus Human

Duration 2 Days 12 CPD hours This course is intended for If you are a data analyst, data scientist, or a business analyst who wants to get started with using Python and machine learning techniques to analyze data and predict outcomes, this book is for you. Basic knowledge of computer programming and data analytics is a must. Familiarity with mathematical concepts such as algebra and basic statistics will be useful. Overview By the end of this course, you will have the skills you need to confidently use various machine learning algorithms to perform detailed data analysis and extract meaningful insights from data. This course is designed to give you practical guidance on industry-standard data analysis and machine learning tools in Python, with the help of realistic data. The course will help you understand how you can use pandas and Matplotlib to critically examine a dataset with summary statistics and graphs, and extract the insights you seek to derive. You will continue to build on your knowledge as you learn how to prepare data and feed it to machine learning algorithms, such as regularized logistic regression and random forest, using the scikit-learn package. You?ll discover how to tune the algorithms to provide the best predictions on new and unseen data. As you delve into later sections, you?ll be able to understand the working and output of these algorithms and gain insight into not only the predictive capabilities of the models but also their reasons for making these predictions. Data Exploration and Cleaning Python and the Anaconda Package Management System Different Types of Data Science Problems Loading the Case Study Data with Jupyter and pandas Data Quality Assurance and Exploration Exploring the Financial History Features in the Dataset Activity 1: Exploring Remaining Financial Features in the Dataset Introduction to Scikit-Learn and Model Evaluation Introduction Model Performance Metrics for Binary Classification Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve Details of Logistic Regression and Feature Exploration Introduction Examining the Relationships between Features and the Response Univariate Feature Selection: What It Does and Doesn't Do Building Cloud-Native Applications Activity 3: Fitting a Logistic Regression Model and Directly Using the Coefficients The Bias-Variance Trade-off Introduction Estimating the Coefficients and Intercepts of Logistic Regression Cross Validation: Choosing the Regularization Parameter and Other Hyperparameters Activity 4: Cross-Validation and Feature Engineering with the Case Study Data Decision Trees and Random Forests Introduction Decision trees Random Forests: Ensembles of Decision Trees Activity 5: Cross-Validation Grid Search with Random Forest Imputation of Missing Data, Financial Analysis, and Delivery to Client Introduction Review of Modeling Results Dealing with Missing Data: Imputation Strategies Activity 6: Deriving Financial Insights Final Thoughts on Delivering the Predictive Model to the Client

Data Science Projects with Python
Delivered OnlineFlexible Dates
Price on Enquiry