• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

414 Courses delivered Online

Introduction to Hadoop Administration (TTDS6503)

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for This is an introductory-level course designed to teach experienced systems administrators how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Previous Hadoop experience is not required. Overview Working within in an engaging, hands-on learning environment, guided by our expert team, attendees will learn to: Understand the benefits of distributed computing Understand the Hadoop architecture (including HDFS and MapReduce) Define administrator participation in Big Data projects Plan, implement, and maintain Hadoop clusters Deploy and maintain additional Big Data tools (Pig, Hive, Flume, etc.) Plan, deploy and maintain HBase on a Hadoop cluster Monitor and maintain hundreds of servers Pinpoint performance bottlenecks and fix them Apache Hadoop is an open source framework for creating reliable and distributable compute clusters. Hadoop provides an excellent platform (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence. This is an introductory-level, hands-on lab-intensive course geared for the administrator (new to Hadoop) who is charged with maintaining a Hadoop cluster and its related components. You will learn how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Introduction Hadoop history and concepts Ecosystem Distributions High level architecture Hadoop myths Hadoop challenges (hardware / software) Planning and installation Selecting software and Hadoop distributions Sizing the cluster and planning for growth Selecting hardware and network Rack topology Installation Multi-tenancy Directory structure and logs Benchmarking HDFS operations Concepts (horizontal scaling, replication, data locality, rack awareness) Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) Health monitoring Command-line and browser-based administration Adding storage and replacing defective drives MapReduce operations Parallel computing before MapReduce: compare HPC versus Hadoop administration MapReduce cluster loads Nodes and Daemons (JobTracker, TaskTracker) MapReduce UI walk through MapReduce configuration Job config Job schedulers Administrator view of MapReduce best practices Optimizing MapReduce Fool proofing MR: what to tell your programmers YARN: architecture and use Advanced topics Hardware monitoring System software monitoring Hadoop cluster monitoring Adding and removing servers and upgrading Hadoop Backup, recovery, and business continuity planning Cluster configuration tweaks Hardware maintenance schedule Oozie scheduling for administrators Securing your cluster with Kerberos The future of Hadoop

Introduction to Hadoop Administration (TTDS6503)
Delivered OnlineFlexible Dates
Price on Enquiry

From Data to Insights with Google Cloud Platform

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for Data Analysts, Business Analysts, Business Intelligence professionals Cloud Data Engineers who will be partnering with Data Analysts to build scalable data solutions on Google Cloud Platform Overview This course teaches students the following skills: Derive insights from data using the analysis and visualization tools on Google Cloud Platform Interactively query datasets using Google BigQuery Load, clean, and transform data at scale Visualize data using Google Data Studio and other third-party platforms Distinguish between exploratory and explanatory analytics and when to use each approach Explore new datasets and uncover hidden insights quickly and effectively Optimizing data models and queries for price and performance Want to know how to query and process petabytes of data in seconds? Curious about data analysis that scales automatically as your data grows? Welcome to the Data Insights course! This four-course accelerated online specialization teaches course participants how to derive insights through data analysis and visualization using the Google Cloud Platform. The courses feature interactive scenarios and hands-on labs where participants explore, mine, load, visualize, and extract insights from diverse Google BigQuery datasets. The courses also cover data loading, querying, schema modeling, optimizing performance, query pricing, and data visualization. This specialization is intended for the following participants: Data Analysts, Business Analysts, Business Intelligence professionals Cloud Data Engineers who will be partnering with Data Analysts to build scalable data solutions on Google Cloud Platform To get the most out of this specialization, we recommend participants have some proficiency with ANSI SQL. Introduction to Data on the Google Cloud Platform Highlight Analytics Challenges Faced by Data Analysts Compare Big Data On-Premises vs on the Cloud Learn from Real-World Use Cases of Companies Transformed through Analytics on the Cloud Navigate Google Cloud Platform Project Basics Lab: Getting started with Google Cloud Platform Big Data Tools Overview Walkthrough Data Analyst Tasks, Challenges, and Introduce Google Cloud Platform Data Tools Demo: Analyze 10 Billion Records with Google BigQuery Explore 9 Fundamental Google BigQuery Features Compare GCP Tools for Analysts, Data Scientists, and Data Engineers Lab: Exploring Datasets with Google BigQuery Exploring your Data with SQL Compare Common Data Exploration Techniques Learn How to Code High Quality Standard SQL Explore Google BigQuery Public Datasets Visualization Preview: Google Data Studio Lab: Troubleshoot Common SQL Errors Google BigQuery Pricing Walkthrough of a BigQuery Job Calculate BigQuery Pricing: Storage, Querying, and Streaming Costs Optimize Queries for Cost Lab: Calculate Google BigQuery Pricing Cleaning and Transforming your Data Examine the 5 Principles of Dataset Integrity Characterize Dataset Shape and Skew Clean and Transform Data using SQL Clean and Transform Data using a new UI: Introducing Cloud Dataprep Lab: Explore and Shape Data with Cloud Dataprep Storing and Exporting Data Compare Permanent vs Temporary Tables Save and Export Query Results Performance Preview: Query Cache Lab: Creating new Permanent Tables Ingesting New Datasets into Google BigQuery Query from External Data Sources Avoid Data Ingesting Pitfalls Ingest New Data into Permanent Tables Discuss Streaming Inserts Lab: Ingesting and Querying New Datasets Data Visualization Overview of Data Visualization Principles Exploratory vs Explanatory Analysis Approaches Demo: Google Data Studio UI Connect Google Data Studio to Google BigQuery Lab: Exploring a Dataset in Google Data Studio Joining and Merging Datasets Merge Historical Data Tables with UNION Introduce Table Wildcards for Easy Merges Review Data Schemas: Linking Data Across Multiple Tables Walkthrough JOIN Examples and Pitfalls Lab: Join and Union Data from Multiple Tables Advanced Functions and Clauses Review SQL Case Statements Introduce Analytical Window Functions Safeguard Data with One-Way Field Encryption Discuss Effective Sub-query and CTE design Compare SQL and Javascript UDFs Lab: Deriving Insights with Advanced SQL Functions Schema Design and Nested Data Structures Compare Google BigQuery vs Traditional RDBMS Data Architecture Normalization vs Denormalization: Performance Tradeoffs Schema Review: The Good, The Bad, and The Ugly Arrays and Nested Data in Google BigQuery Lab: Querying Nested and Repeated Data More Visualization with Google Data Studio Create Case Statements and Calculated Fields Avoid Performance Pitfalls with Cache considerations Share Dashboards and Discuss Data Access considerations Optimizing for Performance Avoid Google BigQuery Performance Pitfalls Prevent Hotspots in your Data Diagnose Performance Issues with the Query Explanation map Lab: Optimizing and Troubleshooting Query Performance Advanced Insights Introducing Cloud Datalab Cloud Datalab Notebooks and Cells Benefits of Cloud Datalab Data Access Compare IAM and BigQuery Dataset Roles Avoid Access Pitfalls Review Members, Roles, Organizations, Account Administration, and Service Accounts

From Data to Insights with Google Cloud Platform
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Training for Apache HBase

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is appropriate for developers and administrators who intend to use HBase. Overview Skills learned on the course include:The use cases and usage occasions for HBase, Hadoop, and RDBMSUsing the HBase shell to directly manipulate HBase tablesDesigning optimal HBase schemas for efficient data storage and recoveryHow to connect to HBase using the Java API, configure the HBase cluster, and administer an HBase clusterBest practices for identifying and resolving performance bottlenecks Cloudera University?s four-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second. Introduction to Hadoop & HBase What Is Big Data? Introducing Hadoop Hadoop Components What Is HBase? Why Use HBase? Strengths of HBase HBase in Production Weaknesses of HBase HBase Tables HBase Concepts HBase Table Fundamentals Thinking About Table Design The HBase Shell Creating Tables with the HBase Shell Working with Tables Working with Table Data HBase Architecture Fundamentals HBase Regions HBase Cluster Architecture HBase and HDFS Data Locality HBase Schema Design General Design Considerations Application-Centric Design Designing HBase Row Keys Other HBase Table Features Basic Data Access with the HBase API Options to Access HBase Data Creating and Deleting HBase Tables Retrieving Data with Get Retrieving Data with Scan Inserting and Updating Data Deleting Data More Advanced HBase API Features Filtering Scans Best Practices HBase Coprocessors HBase on the Cluster How HBase Uses HDFS Compactions and Splits HBase Reads & Writes How HBase Writes Data How HBase Reads Data Block Caches for Reading HBase Performance Tuning Column Family Considerations Schema Design Considerations Configuring for Caching Dealing with Time Series and Sequential Data Pre-Splitting Regions HBase Administration and Cluster Management HBase Daemons ZooKeeper Considerations HBase High Availability Using the HBase Balancer Fixing Tables with hbck HBase Security HBase Replication & Backup HBase Replication HBase Backup MapReduce and HBase Clusters Using Hive & Impala with HBase Using Hive and Impala with HBase Appendix A: Accessing Data with Python and Thrift Thrift Usage Working with Tables Getting and Putting Data Scanning Data Deleting Data Counters Filters Appendix B: OpenTSDB

Cloudera Training for Apache HBase
Delivered OnlineFlexible Dates
Price on Enquiry

Designing and Building Big Data Applications

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems. Overview Skills learned in this course include:Creating a data set with Kite SDKDeveloping custom Flume components for data ingestionManaging a multi-stage workflow with OozieAnalyzing data with CrunchWriting user-defined functions for Hive and ImpalaWriting user-defined functions for Hive and ImpalaIndexing data with Cloudera Search Cloudera University?s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). IntroductionApplication Architecture Scenario Explanation Understanding the Development Environment Identifying and Collecting Input Data Selecting Tools for Data Processing and Analysis Presenting Results to the Use Defining & Using Datasets Metadata Management What is Apache Avro? Avro Schemas Avro Schema Evolution Selecting a File Format Performance Considerations Using the Kite SDK Data Module What is the Kite SDK? Fundamental Data Module Concepts Creating New Data Sets Using the Kite SDK Loading, Accessing, and Deleting a Data Set Importing Relational Data with Apache Sqoop What is Apache Sqoop? Basic Imports Limiting Results Improving Sqoop?s Performance Sqoop 2 Capturing Data with Apache Flume What is Apache Flume? Basic Flume Architecture Flume Sources Flume Sinks Flume Configuration Logging Application Events to Hadoop Developing Custom Flume Components Flume Data Flow and Common Extension Points Custom Flume Sources Developing a Flume Pollable Source Developing a Flume Event-Driven Source Custom Flume Interceptors Developing a Header-Modifying Flume Interceptor Developing a Filtering Flume Interceptor Writing Avro Objects with a Custom Flume Interceptor Managing Workflows with Apache Oozie The Need for Workflow Management What is Apache Oozie? Defining an Oozie Workflow Validation, Packaging, and Deployment Running and Tracking Workflows Using the CLI Hue UI for Oozie Processing Data Pipelines with Apache Crunch What is Apache Crunch? Understanding the Crunch Pipeline Comparing Crunch to Java MapReduce Working with Crunch Projects Reading and Writing Data in Crunch Data Collection API Functions Utility Classes in the Crunch API Working with Tables in Apache Hive What is Apache Hive? Accessing Hive Basic Query Syntax Creating and Populating Hive Tables How Hive Reads Data Using the RegexSerDe in Hive Developing User-Defined Functions What are User-Defined Functions? Implementing a User-Defined Function Deploying Custom Libraries in Hive Registering a User-Defined Function in Hive Executing Interactive Queries with Impala What is Impala? Comparing Hive to Impala Running Queries in Impala Support for User-Defined Functions Data and Metadata Management Understanding Cloudera Search What is Cloudera Search? Search Architecture Supported Document Formats Indexing Data with Cloudera Search Collection and Schema Management Morphlines Indexing Data in Batch Mode Indexing Data in Near Real Time Presenting Results to Users Solr Query Syntax Building a Search UI with Hue Accessing Impala through JDBC Powering a Custom Web Application with Impala and Search

Designing and Building Big Data Applications
Delivered OnlineFlexible Dates
Price on Enquiry

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)

By Nexus Human

Duration 5 Days 30 CPD hours This course is intended for This intermediate and beyond level course is geared for experienced technical professionals in various roles, such as developers, data analysts, data engineers, software engineers, and machine learning engineers who want to leverage Scala and Spark to tackle complex data challenges and develop scalable, high-performance applications across diverse domains. Practical programming experience is required to participate in the hands-on labs. Overview Working in a hands-on learning environment led by our expert instructor you'll: Develop a basic understanding of Scala and Apache Spark fundamentals, enabling you to confidently create scalable and high-performance applications. Learn how to process large datasets efficiently, helping you handle complex data challenges and make data-driven decisions. Gain hands-on experience with real-time data streaming, allowing you to manage and analyze data as it flows into your applications. Acquire practical knowledge of machine learning algorithms using Spark MLlib, empowering you to create intelligent applications and uncover hidden insights. Master graph processing with GraphX, enabling you to analyze and visualize complex relationships in your data. Discover generative AI technologies using GPT with Spark and Scala, opening up new possibilities for automating content generation and enhancing data analysis. Embark on a journey to master the world of big data with our immersive course on Scala and Spark! Mastering Scala with Apache Spark for the Modern Data Enterprise is a five day hands on course designed to provide you with the essential skills and tools to tackle complex data projects using Scala programming language and Apache Spark, a high-performance data processing engine. Mastering these technologies will enable you to perform a wide range of tasks, from data wrangling and analytics to machine learning and artificial intelligence, across various industries and applications.Guided by our expert instructor, you?ll explore the fundamentals of Scala programming and Apache Spark while gaining valuable hands-on experience with Spark programming, RDDs, DataFrames, Spark SQL, and data sources. You?ll also explore Spark Streaming, performance optimization techniques, and the integration of popular external libraries, tools, and cloud platforms like AWS, Azure, and GCP. Machine learning enthusiasts will delve into Spark MLlib, covering basics of machine learning algorithms, data preparation, feature extraction, and various techniques such as regression, classification, clustering, and recommendation systems. Introduction to Scala Brief history and motivation Differences between Scala and Java Basic Scala syntax and constructs Scala's functional programming features Introduction to Apache Spark Overview and history Spark components and architecture Spark ecosystem Comparing Spark with other big data frameworks Basics of Spark Programming SparkContext and SparkSession Resilient Distributed Datasets (RDDs) Transformations and Actions Working with DataFrames Spark SQL and Data Sources Spark SQL library and its advantages Structured and semi-structured data sources Reading and writing data in various formats (CSV, JSON, Parquet, Avro, etc.) Data manipulation using SQL queries Basic RDD Operations Creating and manipulating RDDs Common transformations and actions on RDDs Working with key-value data Basic DataFrame and Dataset Operations Creating and manipulating DataFrames and Datasets Column operations and functions Filtering, sorting, and aggregating data Introduction to Spark Streaming Overview of Spark Streaming Discretized Stream (DStream) operations Windowed operations and stateful processing Performance Optimization Basics Best practices for efficient Spark code Broadcast variables and accumulators Monitoring Spark applications Integrating External Libraries and Tools, Spark Streaming Using popular external libraries, such as Hadoop and HBase Integrating with cloud platforms: AWS, Azure, GCP Connecting to data storage systems: HDFS, S3, Cassandra, etc. Introduction to Machine Learning Basics Overview of machine learning Supervised and unsupervised learning Common algorithms and use cases Introduction to Spark MLlib Overview of Spark MLlib MLlib's algorithms and utilities Data preparation and feature extraction Linear Regression and Classification Linear regression algorithm Logistic regression for classification Model evaluation and performance metrics Clustering Algorithms Overview of clustering algorithms K-means clustering Model evaluation and performance metrics Collaborative Filtering and Recommendation Systems Overview of recommendation systems Collaborative filtering techniques Implementing recommendations with Spark MLlib Introduction to Graph Processing Overview of graph processing Use cases and applications of graph processing Graph representations and operations Introduction to Spark GraphX Overview of GraphX Creating and transforming graphs Graph algorithms in GraphX Big Data Innovation! Using GPT and Generative AI Technologies with Spark and Scala Overview of generative AI technologies Integrating GPT with Spark and Scala Practical applications and use cases Bonus Topics / Time Permitting Introduction to Spark NLP Overview of Spark NLP Preprocessing text data Text classification and sentiment analysis Putting It All Together Work on a capstone project that integrates multiple aspects of the course, including data processing, machine learning, graph processing, and generative AI technologies.

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Data Scientist Training

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview of data science and machine learning at scaleOverview of the Hadoop ecosystemWorking with HDFS data and Hive tables using HueIntroduction to Cloudera Data Science WorkbenchOverview of Apache Spark 2Reading and writing dataInspecting data qualityCleansing and transforming dataSummarizing and grouping dataCombining, splitting, and reshaping dataExploring dataConfiguring, monitoring, and troubleshooting Spark applicationsOverview of machine learning in Spark MLlibExtracting, transforming, and selecting featuresBuilding and evauating regression modelsBuilding and evaluating classification modelsBuilding and evaluating clustering modelsCross-validating models and tuning hyperparametersBuilding machine learning pipelinesDeploying machine learning models Additional course details: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cloudera Data Scientist Training
Delivered OnlineFlexible Dates
Price on Enquiry

Hands-on Data Analysis with Pandas (TTPS4878)

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for This course is geared for Python-experienced attendees who wish to be equipped with the skills you need to use pandas to ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. Overview Working in a hands-on learning environment, guided by our expert team, attendees will learn to: Understand how data analysts and scientists gather and analyze data Perform data analysis and data wrangling using Python Combine, group, and aggregate data from multiple sources Create data visualizations with pandas, matplotlib, and seaborn Apply machine learning (ML) algorithms to identify patterns and make predictions Use Python data science libraries to analyze real-world datasets Use pandas to solve common data representation and analysis problems Build Python scripts, modules, and packages for reusable analysis code Perform efficient data analysis and manipulation tasks using pandas Apply pandas to different real-world domains with the help of step-by-step demonstrations Get accustomed to using pandas as an effective data exploration tool. Data analysis has become a necessary skill in a variety of domains where knowing how to work with data and extract insights can generate significant value. Geared for data team members with incoming Python scripting experience, Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. Using real-world datasets, you will learn how to use the powerful pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will be able to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding lessons, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data. Students will leave the course armed with the skills required to use pandas to ensure the veracity of their data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. Introduction to Data Analysis Fundamentals of data analysis Statistical foundations Setting up a virtual environment Working with Pandas DataFrames Pandas data structures Bringing data into a pandas DataFrame Inspecting a DataFrame object Grabbing subsets of the data Adding and removing data Data Wrangling with Pandas What is data wrangling? Collecting temperature data Cleaning up the data Restructuring the data Handling duplicate, missing, or invalid data Aggregating Pandas DataFrames Database-style operations on DataFrames DataFrame operations Aggregations with pandas and numpy Time series Visualizing Data with Pandas and Matplotlib An introduction to matplotlib Plotting with pandas The pandas.plotting subpackage Plotting with Seaborn and Customization Techniques Utilizing seaborn for advanced plotting Formatting Customizing visualizations Financial Analysis - Bitcoin and the Stock Market Building a Python package Data extraction with pandas Exploratory data analysis Technical analysis of financial instruments Modeling performance Rule-Based Anomaly Detection Simulating login attempts Exploratory data analysis Rule-based anomaly detection Getting Started with Machine Learning in Python Learning the lingo Exploratory data analysis Preprocessing data Clustering Regression Classification Making Better Predictions - Optimizing Models Hyperparameter tuning with grid search Feature engineering Ensemble methods Inspecting classification prediction confidence Addressing class imbalance Regularization Machine Learning Anomaly Detection Exploring the data Unsupervised methods Supervised methods Online learning The Road Ahead Data resources Practicing working with data Python practice

Hands-on Data Analysis with Pandas (TTPS4878)
Delivered OnlineFlexible Dates
Price on Enquiry

Big Data Architecture Workshop

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for Senior Executives CIOs and CTOs Business Intelligence Executives Marketing Executives Data & Business Analytics Specialists Innovation Specialists & Entrepreneurs Academics, and other people interested in Big Data Overview More specifically, BDAW addresses advanced big data architecture topics, including, data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, security and privacy, minimizing the risk of an unsound architecture and technology selection. Big Data Architecture Workshop (BDAW) is a learning event that addresses advanced big data architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system. Throughout the highly interactive workshop, students apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for students to learn techniques for architecting big data systems, not only from Cloudera?s experience but also from the experiences of fellow students. Workshop Application Use Cases Oz Metropolitan Architectural questions Team activity: Analyze Metroz Application Use Cases Application Vertical Slice Definition Minimizing risk of an unsound architecture Selecting a vertical slice Team activity: Identify an initial vertical slice for Metroz Application Processing Real time, near real time processing Batch processing Data access patterns Delivery and processing guarantees Machine Learning pipelines Team activity: identify delivery and processing patterns in Metroz, characterize response time requirements, identify Machine Learning pipelines Application Data Three V?s of Big Data Data Lifecycle Data Formats Transforming Data Team activity: Metroz Data Requirements Scalable Applications Scale up, scale out, scale to X Determining if an application will scale Poll: scalable airport terminal designs Hadoop and Spark Scalability Team activity: Scaling Metroz Fault Tolerant Distributed Systems Principles Transparency Hardware vs. Software redundancy Tolerating disasters Stateless functional fault tolerance Stateful fault tolerance Replication and group consistency Fault tolerance in Spark and Map Reduce Application tolerance for failures Team activity: Identify Metroz component failures and requirements Security and Privacy Principles Privacy Threats Technologies Team activity: identify threats and security mechanisms in Metroz Deployment Cluster sizing and evolution On-premise vs. Cloud Edge computing Team activity: select deployment for Metroz Technology Selection HDFS HBase Kudu Relational Database Management Systems Map Reduce Spark, including streaming, SparkSQL and SparkML Hive Impala Cloudera Search Data Sets and Formats Team activity: technologies relevant to Metroz Software Architecture Architecture artifacts One platform or multiple, lambda architecture Team activity: produce high level architecture, selected technologies, revisit vertical slice Vertical Slice demonstration Additional course details: Nexus Humans Big Data Architecture Workshop training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Big Data Architecture Workshop course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Big Data Architecture Workshop
Delivered OnlineFlexible Dates
Price on Enquiry

Data Wrangling with Python

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for Data Wrangling with Python takes a practical approach to equip beginners with the most essential data analysis tools in the shortest possible time. It contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context. Overview By the end of this course, you will be confident in using a diverse array of sources to extract, clean, transform, and format your data efficiently. In this course you will start with the absolute basics of Python, focusing mainly on data structures. Then you will delve into the fundamental tools of data wrangling like NumPy and Pandas libraries. You'll explore useful insights into why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of the specialized pre-built routines in Python.This combination of Python tips and tricks will also demonstrate how to use the same Python backend and extract/transform data from an array of sources including the Internet, large database vaults, and Excel financial tables. To help you prepare for more challenging scenarios, you'll cover how to handle missing or wrong data, and reformat it based on the requirements from the downstream analytics tool. The course will further help you grasp concepts through real-world examples and datasets. Introduction to Data Structure using Python Python for Data Wrangling Lists, Sets, Strings, Tuples, and Dictionaries Advanced Operations on Built-In Data Structure Advanced Data Structures Basic File Operations in Python Introduction to NumPy, Pandas, and Matplotlib NumPy Arrays Pandas DataFrames Statistics and Visualization with NumPy and Pandas Using NumPy and Pandas to Calculate Basic Descriptive Statistics on the DataFrame Deep Dive into Data Wrangling with Python Subsetting, Filtering, and Grouping Detecting Outliers and Handling Missing Values Concatenating, Merging, and Joining Useful Methods of Pandas Get Comfortable with a Different Kind of Data Sources Reading Data from Different Text-Based (and Non-Text-Based) Sources Introduction to BeautifulSoup4 and Web Page Parsing Learning the Hidden Secrets of Data Wrangling Advanced List Comprehension and the zip Function Data Formatting Advanced Web Scraping and Data Gathering Basics of Web Scraping and BeautifulSoup libraries Reading Data from XML RDBMS and SQL Refresher of RDBMS and SQL Using an RDBMS (MySQL/PostgreSQL/SQLite) Application in real life and Conclusion of course Applying Your Knowledge to a Real-life Data Wrangling Task An Extension to Data Wrangling

Data Wrangling with Python
Delivered OnlineFlexible Dates
Price on Enquiry

Data Science for Marketing Analytics

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for Data Science for Marketing Analytics is designed for developers and marketing analysts looking to use new, more sophisticated tools in their marketing analytics efforts. It'll help if you have prior experience of coding in Python and knowledge of high school level mathematics. Some experience with databases, Excel, statistics, or Tableau is useful but not necessary. Overview By the end of this course, you will be able to build your own marketing reporting and interactive dashboard solutions. The course starts by teaching you how to use Python libraries, such as pandas and Matplotlib, to read data from Python, manipulate it, and create plots, using both categorical and continuous variables. Then, you'll learn how to segment a population into groups and use different clustering techniques to evaluate customer segmentation.As you make your way through the course, you'll explore ways to evaluate and select the best segmentation approach, and go on to create a linear regression model on customer value data to predict lifetime value. In the concluding sections, you'll gain an understanding of regression techniques and tools for evaluating regression models, and explore ways to predict customer choice using classification algorithms. Finally, you'll apply these techniques to create a churn model for modeling customer product choices. Data Preparation and Cleaning Data Models and Structured Data pandas Data Manipulation Data Exploration and Visualization Identifying the Right Attributes Generating Targeted Insights Visualizing Data Unsupervised Learning: Customer Segmentation Customer Segmentation Methods Similarity and Data Standardization k-means Clustering Choosing the Best Segmentation Approach Choosing the Number of Clusters Different Methods of Clustering Evaluating Clustering Predicting Customer Revenue Using Linear Regression Understanding Regression Feature Engineering for Regression Performing and Interpreting Linear Regression Other Regression Techniques and Tools for Evaluation Evaluating the Accuracy of a Regression Model Using Regularization for Feature Selection Tree-Based Regression Models Supervised Learning: Predicting Customer Churn Classification Problems Understanding Logistic Regression Creating a Data Science Pipeline Fine-Tuning Classification Algorithms Support Vector Machine Decision Trees Random Forest Preprocessing Data for Machine Learning Models Model Evaluation Performance Metrics Modeling Customer Choice Understanding Multiclass Classification Class Imbalanced Data Additional course details: Nexus Humans Data Science for Marketing Analytics training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Data Science for Marketing Analytics course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Data Science for Marketing Analytics
Delivered OnlineFlexible Dates
Price on Enquiry