• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

418 Courses in Cardiff delivered Online

Data Analysis Basics

By Compete High

Overview With the ever-increasing demand for Data Analysis in personal & professional settings, this online training aims at educating, nurturing, and upskilling individuals to stay ahead of the curve - whatever their level of expertise in Data Analysis may be. Learning about Data Analysis or keeping up to date on it can be confusing at times, and maybe even daunting! But that's not the case with this course from Compete High. We understand the different requirements coming with a wide variety of demographics looking to get skilled in Data Analysis. That's why we've developed this online training in a way that caters to learners with different goals in mind. The course materials are prepared with consultation from the experts of this field and all the information on Data Analysis is kept up to date on a regular basis so that learners don't get left behind on the current trends/updates. The self-paced online learning methodology by compete high in this Data Analysis Basics course helps you learn whenever or however you wish, keeping in mind the busy schedule or possible inconveniences that come with physical classes. The easy-to-grasp, bite-sized lessons are proven to be most effective in memorising and learning the lessons by heart. On top of that, you have the opportunity to receive a certificate after successfully completing the course! Instead of searching for hours, enrol right away on this Data Analysis Basics course from Compete High and accelerate your career in the right path with expert-outlined lessons and a guarantee of success in the long run.   Who is this course for? While we refrain from discouraging anyone wanting to do this Data Analysis Basics course or impose any sort of restrictions on doing this online training, people meeting any of the following criteria will benefit the most from it: Anyone looking for the basics of Data Analysis, Jobseekers in the relevant domains, Anyone with a ground knowledge/intermediate expertise in Data Analysis, Anyone looking for a certificate of completion on doing an online training on this topic, Students of Data Analysis, or anyone with an academic knowledge gap to bridge, Anyone with a general interest/curiosity   Career Path This Data Analysis Basics course smoothens the way up your career ladder with all the relevant information, skills, and online certificate of achievements. After successfully completing the course, you can expect to move one significant step closer to achieving your professional goals - whether it's securing that job you desire, getting the promotion you deserve, or setting up that business of your dreams. Course Curriculum Module - 01 - Introduction to Data Analysis its Applications Introduction to Data Analysis its Applications 00:00 Module - 02 - Probability Probability Distributions Probability Probability Distributions 00:00 Module - 03 - Decision making and Factors to Account for Decision making and Factors to Account for 00:00 Module - 04 - Data Mining Data Mining 00:00 Module - 05 - Optimization Situation modelling Optimization Situation modelling 00:00

Data Analysis Basics
Delivered Online On Demand5 hours
£4.99

Maximizing Revenue Through Google Analytics Mastery

By Compete High

🚀 Unlock Your Business Potential: Maximizing Revenue Through Google Analytics Mastery 🚀 Are you ready to revolutionize your business and skyrocket your revenue? Introducing our groundbreaking online course: Maximizing Revenue Through Google Analytics Mastery! 🌐💰 In today's digital age, data is power, and Google Analytics is the key to unlocking unparalleled insights into your online presence. Whether you're a seasoned entrepreneur or just starting your online journey, this course is your roadmap to transforming raw data into actionable strategies that will supercharge your revenue streams.   🌟 Why Google Analytics Mastery? 🔍 Uncover Hidden Opportunities: Learn how to navigate the intricate web of data with ease. Discover untapped markets, identify high-converting channels, and capitalize on opportunities you never knew existed. 💡 Strategic Decision-Making: Translate data into actionable insights. Develop a data-driven mindset that empowers you to make informed decisions, optimize your marketing efforts, and maximize your ROI. 📊 Revenue-Boosting Tactics: Dive deep into advanced analytics techniques. From setting up custom tracking to interpreting user behavior, we'll teach you the tactics that turn casual visitors into loyal customers.   🚀 Course Highlights: ✅ Comprehensive Curriculum: Our expertly crafted modules cover everything from Google Analytics basics to advanced strategies for revenue optimization. ✅ Hands-On Learning: Practical exercises and real-world case studies ensure you apply your newfound knowledge in a meaningful way. ✅ Expert Guidance: Learn from industry experts with a proven track record in leveraging Google Analytics for substantial revenue growth. ✅ Lifetime Access: Enjoy unlimited access to course materials, updates, and a supportive community to enhance your learning experience. ✅ Certificate of Mastery: Showcase your expertise with a certificate upon course completion. Course Curriculum Basic Pre Sell 00:00 Overview 00:00 Navigation And Admin 00:00 Navigation And Admin(2) 00:00 Creating a New Google Analytics Account 00:00 Website Account Creation 00:00 Connecting To WordPress Website 00:00 Connecting To HTML Site 00:00 Connect Custom Page and Site Builders 00:00 Setting Up Annotations 00:00 Setting Up Intelligence Events 00:00 Set Up Custom Segments 00:00 Export Data For Analysis 00:00 Set Up Custom Reports 00:00 Set Up Google Integrations 00:00 Google Analytics Templates 00:00 Real Time Reporting 00:00 Setting Up Goals 00:00 Third Party Integrations 00:00 Audience Menu Overview 00:00 Interests and Geography 00:00 Conclusion 00:00 Advanced

Maximizing Revenue Through Google Analytics Mastery
Delivered Online On Demand1 hour
£4.99

Google Analytics

By Compete High

Overview With the ever-increasing demand for Google Analytics in personal & professional settings, this online training aims at educating, nurturing, and upskilling individuals to stay ahead of the curve - whatever their level of expertise in Google Analytics may be. Learning about Google Analytics or keeping up to date on it can be confusing at times, and maybe even daunting! But that's not the case with this course from Compete High. We understand the different requirements coming with a wide variety of demographics looking to get skilled in Google Analytics . That's why we've developed this online training in a way that caters to learners with different goals in mind. The course materials are prepared with consultation from the experts of this field and all the information on Google Analytics is kept up to date on a regular basis so that learners don't get left behind on the current trends/updates. The self-paced online learning methodology by compete high in this Google Analytics course helps you learn whenever or however you wish, keeping in mind the busy schedule or possible inconveniences that come with physical classes. The easy-to-grasp, bite-sized lessons are proven to be most effective in memorising and learning the lessons by heart. On top of that, you have the opportunity to receive a certificate after successfully completing the course! Instead of searching for hours, enrol right away on this Google Analytics course from Compete High and accelerate your career in the right path with expert-outlined lessons and a guarantee of success in the long run. Who is this course for? While we refrain from discouraging anyone wanting to do this Google Analytics course or impose any sort of restrictions on doing this online training, people meeting any of the following criteria will benefit the most from it: Anyone looking for the basics of Google Analytics , Jobseekers in the relevant domains, Anyone with a ground knowledge/intermediate expertise in Google Analytics , Anyone looking for a certificate of completion on doing an online training on this topic, Students of Google Analytics , or anyone with an academic knowledge gap to bridge, Anyone with a general interest/curiosity Career Path This Google Analytics course smoothens the way up your career ladder with all the relevant information, skills, and online certificate of achievements. After successfully completing the course, you can expect to move one significant step closer to achieving your professional goals - whether it's securing that job you desire, getting the promotion you deserve, or setting up that business of your dreams.  Course Curriculum Module 1 Installing Analytics Installing Analytics 00:00 Module 2 The Five Report Suites The Five Report Suites 00:00 Module 3 Basic Date Range Reports Basic Date Range Reports 00:00 Module 4 Goals Goals 00:00 Module 5 Practical Uses for Analytics Practical Uses for Analytics 00:00

Google Analytics
Delivered Online On Demand5 hours
£4.99

Basics of Google Analytics

By Compete High

Overview With the ever-increasing demand for Google Analytics in personal & professional settings, this online training aims at educating, nurturing, and upskilling individuals to stay ahead of the curve - whatever their level of expertise in Google Analytics may be. Learning about Google Analytics or keeping up to date on it can be confusing at times, and maybe even daunting! But that's not the case with this course from Compete High. We understand the different requirements coming with a wide variety of demographics looking to get skilled in Google Analytics . That's why we've developed this online training in a way that caters to learners with different goals in mind. The course materials are prepared with consultation from the experts of this field and all the information on Google Analytics is kept up to date on a regular basis so that learners don't get left behind on the current trends/updates. The self-paced online learning methodology by compete high in this Google Analytics course helps you learn whenever or however you wish, keeping in mind the busy schedule or possible inconveniences that come with physical classes. The easy-to-grasp, bite-sized lessons are proven to be most effective in memorising and learning the lessons by heart. On top of that, you have the opportunity to receive a certificate after successfully completing the course! Instead of searching for hours, enrol right away on this Google Analytics course from Compete High and accelerate your career in the right path with expert-outlined lessons and a guarantee of success in the long run. Who is this course for? While we refrain from discouraging anyone wanting to do this Google Analytics course or impose any sort of restrictions on doing this online training, people meeting any of the following criteria will benefit the most from it: Anyone looking for the basics of Google Analytics , Jobseekers in the relevant domains, Anyone with a ground knowledge/intermediate expertise in Google Analytics , Anyone looking for a certificate of completion on doing an online training on this topic, Students of Google Analytics , or anyone with an academic knowledge gap to bridge, Anyone with a general interest/curiosity Career Path This Google Analytics course smoothens the way up your career ladder with all the relevant information, skills, and online certificate of achievements. After successfully completing the course, you can expect to move one significant step closer to achieving your professional goals - whether it's securing that job you desire, getting the promotion you deserve, or setting up that business of your dreams.    Course Curriculum Module 1 Installing Analytics Installing Analytics 00:00 Module 2 The Five Report Suites The Five Report Suites 00:00 Module 3 Basic Date Range Reports Basic Date Range Reports 00:00 Module 4 Goals Goals 00:00 Module 5 Practical Uses for Analytics Practical Uses for Analytics 00:00

Basics of Google Analytics
Delivered Online On Demand5 hours
£4.99

Introduction to Hadoop Administration (TTDS6503)

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for This is an introductory-level course designed to teach experienced systems administrators how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Previous Hadoop experience is not required. Overview Working within in an engaging, hands-on learning environment, guided by our expert team, attendees will learn to: Understand the benefits of distributed computing Understand the Hadoop architecture (including HDFS and MapReduce) Define administrator participation in Big Data projects Plan, implement, and maintain Hadoop clusters Deploy and maintain additional Big Data tools (Pig, Hive, Flume, etc.) Plan, deploy and maintain HBase on a Hadoop cluster Monitor and maintain hundreds of servers Pinpoint performance bottlenecks and fix them Apache Hadoop is an open source framework for creating reliable and distributable compute clusters. Hadoop provides an excellent platform (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence. This is an introductory-level, hands-on lab-intensive course geared for the administrator (new to Hadoop) who is charged with maintaining a Hadoop cluster and its related components. You will learn how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Introduction Hadoop history and concepts Ecosystem Distributions High level architecture Hadoop myths Hadoop challenges (hardware / software) Planning and installation Selecting software and Hadoop distributions Sizing the cluster and planning for growth Selecting hardware and network Rack topology Installation Multi-tenancy Directory structure and logs Benchmarking HDFS operations Concepts (horizontal scaling, replication, data locality, rack awareness) Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) Health monitoring Command-line and browser-based administration Adding storage and replacing defective drives MapReduce operations Parallel computing before MapReduce: compare HPC versus Hadoop administration MapReduce cluster loads Nodes and Daemons (JobTracker, TaskTracker) MapReduce UI walk through MapReduce configuration Job config Job schedulers Administrator view of MapReduce best practices Optimizing MapReduce Fool proofing MR: what to tell your programmers YARN: architecture and use Advanced topics Hardware monitoring System software monitoring Hadoop cluster monitoring Adding and removing servers and upgrading Hadoop Backup, recovery, and business continuity planning Cluster configuration tweaks Hardware maintenance schedule Oozie scheduling for administrators Securing your cluster with Kerberos The future of Hadoop

Introduction to Hadoop Administration (TTDS6503)
Delivered OnlineFlexible Dates
Price on Enquiry

From Data to Insights with Google Cloud Platform

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for Data Analysts, Business Analysts, Business Intelligence professionals Cloud Data Engineers who will be partnering with Data Analysts to build scalable data solutions on Google Cloud Platform Overview This course teaches students the following skills: Derive insights from data using the analysis and visualization tools on Google Cloud Platform Interactively query datasets using Google BigQuery Load, clean, and transform data at scale Visualize data using Google Data Studio and other third-party platforms Distinguish between exploratory and explanatory analytics and when to use each approach Explore new datasets and uncover hidden insights quickly and effectively Optimizing data models and queries for price and performance Want to know how to query and process petabytes of data in seconds? Curious about data analysis that scales automatically as your data grows? Welcome to the Data Insights course! This four-course accelerated online specialization teaches course participants how to derive insights through data analysis and visualization using the Google Cloud Platform. The courses feature interactive scenarios and hands-on labs where participants explore, mine, load, visualize, and extract insights from diverse Google BigQuery datasets. The courses also cover data loading, querying, schema modeling, optimizing performance, query pricing, and data visualization. This specialization is intended for the following participants: Data Analysts, Business Analysts, Business Intelligence professionals Cloud Data Engineers who will be partnering with Data Analysts to build scalable data solutions on Google Cloud Platform To get the most out of this specialization, we recommend participants have some proficiency with ANSI SQL. Introduction to Data on the Google Cloud Platform Highlight Analytics Challenges Faced by Data Analysts Compare Big Data On-Premises vs on the Cloud Learn from Real-World Use Cases of Companies Transformed through Analytics on the Cloud Navigate Google Cloud Platform Project Basics Lab: Getting started with Google Cloud Platform Big Data Tools Overview Walkthrough Data Analyst Tasks, Challenges, and Introduce Google Cloud Platform Data Tools Demo: Analyze 10 Billion Records with Google BigQuery Explore 9 Fundamental Google BigQuery Features Compare GCP Tools for Analysts, Data Scientists, and Data Engineers Lab: Exploring Datasets with Google BigQuery Exploring your Data with SQL Compare Common Data Exploration Techniques Learn How to Code High Quality Standard SQL Explore Google BigQuery Public Datasets Visualization Preview: Google Data Studio Lab: Troubleshoot Common SQL Errors Google BigQuery Pricing Walkthrough of a BigQuery Job Calculate BigQuery Pricing: Storage, Querying, and Streaming Costs Optimize Queries for Cost Lab: Calculate Google BigQuery Pricing Cleaning and Transforming your Data Examine the 5 Principles of Dataset Integrity Characterize Dataset Shape and Skew Clean and Transform Data using SQL Clean and Transform Data using a new UI: Introducing Cloud Dataprep Lab: Explore and Shape Data with Cloud Dataprep Storing and Exporting Data Compare Permanent vs Temporary Tables Save and Export Query Results Performance Preview: Query Cache Lab: Creating new Permanent Tables Ingesting New Datasets into Google BigQuery Query from External Data Sources Avoid Data Ingesting Pitfalls Ingest New Data into Permanent Tables Discuss Streaming Inserts Lab: Ingesting and Querying New Datasets Data Visualization Overview of Data Visualization Principles Exploratory vs Explanatory Analysis Approaches Demo: Google Data Studio UI Connect Google Data Studio to Google BigQuery Lab: Exploring a Dataset in Google Data Studio Joining and Merging Datasets Merge Historical Data Tables with UNION Introduce Table Wildcards for Easy Merges Review Data Schemas: Linking Data Across Multiple Tables Walkthrough JOIN Examples and Pitfalls Lab: Join and Union Data from Multiple Tables Advanced Functions and Clauses Review SQL Case Statements Introduce Analytical Window Functions Safeguard Data with One-Way Field Encryption Discuss Effective Sub-query and CTE design Compare SQL and Javascript UDFs Lab: Deriving Insights with Advanced SQL Functions Schema Design and Nested Data Structures Compare Google BigQuery vs Traditional RDBMS Data Architecture Normalization vs Denormalization: Performance Tradeoffs Schema Review: The Good, The Bad, and The Ugly Arrays and Nested Data in Google BigQuery Lab: Querying Nested and Repeated Data More Visualization with Google Data Studio Create Case Statements and Calculated Fields Avoid Performance Pitfalls with Cache considerations Share Dashboards and Discuss Data Access considerations Optimizing for Performance Avoid Google BigQuery Performance Pitfalls Prevent Hotspots in your Data Diagnose Performance Issues with the Query Explanation map Lab: Optimizing and Troubleshooting Query Performance Advanced Insights Introducing Cloud Datalab Cloud Datalab Notebooks and Cells Benefits of Cloud Datalab Data Access Compare IAM and BigQuery Dataset Roles Avoid Access Pitfalls Review Members, Roles, Organizations, Account Administration, and Service Accounts

From Data to Insights with Google Cloud Platform
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Training for Apache HBase

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is appropriate for developers and administrators who intend to use HBase. Overview Skills learned on the course include:The use cases and usage occasions for HBase, Hadoop, and RDBMSUsing the HBase shell to directly manipulate HBase tablesDesigning optimal HBase schemas for efficient data storage and recoveryHow to connect to HBase using the Java API, configure the HBase cluster, and administer an HBase clusterBest practices for identifying and resolving performance bottlenecks Cloudera University?s four-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second. Introduction to Hadoop & HBase What Is Big Data? Introducing Hadoop Hadoop Components What Is HBase? Why Use HBase? Strengths of HBase HBase in Production Weaknesses of HBase HBase Tables HBase Concepts HBase Table Fundamentals Thinking About Table Design The HBase Shell Creating Tables with the HBase Shell Working with Tables Working with Table Data HBase Architecture Fundamentals HBase Regions HBase Cluster Architecture HBase and HDFS Data Locality HBase Schema Design General Design Considerations Application-Centric Design Designing HBase Row Keys Other HBase Table Features Basic Data Access with the HBase API Options to Access HBase Data Creating and Deleting HBase Tables Retrieving Data with Get Retrieving Data with Scan Inserting and Updating Data Deleting Data More Advanced HBase API Features Filtering Scans Best Practices HBase Coprocessors HBase on the Cluster How HBase Uses HDFS Compactions and Splits HBase Reads & Writes How HBase Writes Data How HBase Reads Data Block Caches for Reading HBase Performance Tuning Column Family Considerations Schema Design Considerations Configuring for Caching Dealing with Time Series and Sequential Data Pre-Splitting Regions HBase Administration and Cluster Management HBase Daemons ZooKeeper Considerations HBase High Availability Using the HBase Balancer Fixing Tables with hbck HBase Security HBase Replication & Backup HBase Replication HBase Backup MapReduce and HBase Clusters Using Hive & Impala with HBase Using Hive and Impala with HBase Appendix A: Accessing Data with Python and Thrift Thrift Usage Working with Tables Getting and Putting Data Scanning Data Deleting Data Counters Filters Appendix B: OpenTSDB

Cloudera Training for Apache HBase
Delivered OnlineFlexible Dates
Price on Enquiry

Cloudera Data Scientist Training

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview of data science and machine learning at scaleOverview of the Hadoop ecosystemWorking with HDFS data and Hive tables using HueIntroduction to Cloudera Data Science WorkbenchOverview of Apache Spark 2Reading and writing dataInspecting data qualityCleansing and transforming dataSummarizing and grouping dataCombining, splitting, and reshaping dataExploring dataConfiguring, monitoring, and troubleshooting Spark applicationsOverview of machine learning in Spark MLlibExtracting, transforming, and selecting featuresBuilding and evauating regression modelsBuilding and evaluating classification modelsBuilding and evaluating clustering modelsCross-validating models and tuning hyperparametersBuilding machine learning pipelinesDeploying machine learning models Additional course details: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cloudera Data Scientist Training
Delivered OnlineFlexible Dates
Price on Enquiry

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)

By Nexus Human

Duration 5 Days 30 CPD hours This course is intended for This intermediate and beyond level course is geared for experienced technical professionals in various roles, such as developers, data analysts, data engineers, software engineers, and machine learning engineers who want to leverage Scala and Spark to tackle complex data challenges and develop scalable, high-performance applications across diverse domains. Practical programming experience is required to participate in the hands-on labs. Overview Working in a hands-on learning environment led by our expert instructor you'll: Develop a basic understanding of Scala and Apache Spark fundamentals, enabling you to confidently create scalable and high-performance applications. Learn how to process large datasets efficiently, helping you handle complex data challenges and make data-driven decisions. Gain hands-on experience with real-time data streaming, allowing you to manage and analyze data as it flows into your applications. Acquire practical knowledge of machine learning algorithms using Spark MLlib, empowering you to create intelligent applications and uncover hidden insights. Master graph processing with GraphX, enabling you to analyze and visualize complex relationships in your data. Discover generative AI technologies using GPT with Spark and Scala, opening up new possibilities for automating content generation and enhancing data analysis. Embark on a journey to master the world of big data with our immersive course on Scala and Spark! Mastering Scala with Apache Spark for the Modern Data Enterprise is a five day hands on course designed to provide you with the essential skills and tools to tackle complex data projects using Scala programming language and Apache Spark, a high-performance data processing engine. Mastering these technologies will enable you to perform a wide range of tasks, from data wrangling and analytics to machine learning and artificial intelligence, across various industries and applications.Guided by our expert instructor, you?ll explore the fundamentals of Scala programming and Apache Spark while gaining valuable hands-on experience with Spark programming, RDDs, DataFrames, Spark SQL, and data sources. You?ll also explore Spark Streaming, performance optimization techniques, and the integration of popular external libraries, tools, and cloud platforms like AWS, Azure, and GCP. Machine learning enthusiasts will delve into Spark MLlib, covering basics of machine learning algorithms, data preparation, feature extraction, and various techniques such as regression, classification, clustering, and recommendation systems. Introduction to Scala Brief history and motivation Differences between Scala and Java Basic Scala syntax and constructs Scala's functional programming features Introduction to Apache Spark Overview and history Spark components and architecture Spark ecosystem Comparing Spark with other big data frameworks Basics of Spark Programming SparkContext and SparkSession Resilient Distributed Datasets (RDDs) Transformations and Actions Working with DataFrames Spark SQL and Data Sources Spark SQL library and its advantages Structured and semi-structured data sources Reading and writing data in various formats (CSV, JSON, Parquet, Avro, etc.) Data manipulation using SQL queries Basic RDD Operations Creating and manipulating RDDs Common transformations and actions on RDDs Working with key-value data Basic DataFrame and Dataset Operations Creating and manipulating DataFrames and Datasets Column operations and functions Filtering, sorting, and aggregating data Introduction to Spark Streaming Overview of Spark Streaming Discretized Stream (DStream) operations Windowed operations and stateful processing Performance Optimization Basics Best practices for efficient Spark code Broadcast variables and accumulators Monitoring Spark applications Integrating External Libraries and Tools, Spark Streaming Using popular external libraries, such as Hadoop and HBase Integrating with cloud platforms: AWS, Azure, GCP Connecting to data storage systems: HDFS, S3, Cassandra, etc. Introduction to Machine Learning Basics Overview of machine learning Supervised and unsupervised learning Common algorithms and use cases Introduction to Spark MLlib Overview of Spark MLlib MLlib's algorithms and utilities Data preparation and feature extraction Linear Regression and Classification Linear regression algorithm Logistic regression for classification Model evaluation and performance metrics Clustering Algorithms Overview of clustering algorithms K-means clustering Model evaluation and performance metrics Collaborative Filtering and Recommendation Systems Overview of recommendation systems Collaborative filtering techniques Implementing recommendations with Spark MLlib Introduction to Graph Processing Overview of graph processing Use cases and applications of graph processing Graph representations and operations Introduction to Spark GraphX Overview of GraphX Creating and transforming graphs Graph algorithms in GraphX Big Data Innovation! Using GPT and Generative AI Technologies with Spark and Scala Overview of generative AI technologies Integrating GPT with Spark and Scala Practical applications and use cases Bonus Topics / Time Permitting Introduction to Spark NLP Overview of Spark NLP Preprocessing text data Text classification and sentiment analysis Putting It All Together Work on a capstone project that integrates multiple aspects of the course, including data processing, machine learning, graph processing, and generative AI technologies.

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)
Delivered OnlineFlexible Dates
Price on Enquiry

Designing and Building Big Data Applications

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems. Overview Skills learned in this course include:Creating a data set with Kite SDKDeveloping custom Flume components for data ingestionManaging a multi-stage workflow with OozieAnalyzing data with CrunchWriting user-defined functions for Hive and ImpalaWriting user-defined functions for Hive and ImpalaIndexing data with Cloudera Search Cloudera University?s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). IntroductionApplication Architecture Scenario Explanation Understanding the Development Environment Identifying and Collecting Input Data Selecting Tools for Data Processing and Analysis Presenting Results to the Use Defining & Using Datasets Metadata Management What is Apache Avro? Avro Schemas Avro Schema Evolution Selecting a File Format Performance Considerations Using the Kite SDK Data Module What is the Kite SDK? Fundamental Data Module Concepts Creating New Data Sets Using the Kite SDK Loading, Accessing, and Deleting a Data Set Importing Relational Data with Apache Sqoop What is Apache Sqoop? Basic Imports Limiting Results Improving Sqoop?s Performance Sqoop 2 Capturing Data with Apache Flume What is Apache Flume? Basic Flume Architecture Flume Sources Flume Sinks Flume Configuration Logging Application Events to Hadoop Developing Custom Flume Components Flume Data Flow and Common Extension Points Custom Flume Sources Developing a Flume Pollable Source Developing a Flume Event-Driven Source Custom Flume Interceptors Developing a Header-Modifying Flume Interceptor Developing a Filtering Flume Interceptor Writing Avro Objects with a Custom Flume Interceptor Managing Workflows with Apache Oozie The Need for Workflow Management What is Apache Oozie? Defining an Oozie Workflow Validation, Packaging, and Deployment Running and Tracking Workflows Using the CLI Hue UI for Oozie Processing Data Pipelines with Apache Crunch What is Apache Crunch? Understanding the Crunch Pipeline Comparing Crunch to Java MapReduce Working with Crunch Projects Reading and Writing Data in Crunch Data Collection API Functions Utility Classes in the Crunch API Working with Tables in Apache Hive What is Apache Hive? Accessing Hive Basic Query Syntax Creating and Populating Hive Tables How Hive Reads Data Using the RegexSerDe in Hive Developing User-Defined Functions What are User-Defined Functions? Implementing a User-Defined Function Deploying Custom Libraries in Hive Registering a User-Defined Function in Hive Executing Interactive Queries with Impala What is Impala? Comparing Hive to Impala Running Queries in Impala Support for User-Defined Functions Data and Metadata Management Understanding Cloudera Search What is Cloudera Search? Search Architecture Supported Document Formats Indexing Data with Cloudera Search Collection and Schema Management Morphlines Indexing Data in Batch Mode Indexing Data in Near Real Time Presenting Results to Users Solr Query Syntax Building a Search UI with Hue Accessing Impala through JDBC Powering a Custom Web Application with Impala and Search

Designing and Building Big Data Applications
Delivered OnlineFlexible Dates
Price on Enquiry