Booking options
Price on Enquiry
Price on Enquiry
Delivered Online
4 days
All levels
Duration
4 Days
24 CPD hours
This course is intended for
The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful.
Overview
Overview of data science and machine learning at scale
Overview of the Hadoop ecosystem
Working with HDFS data and Hive tables using Hue
Introduction to Cloudera Data Science Workbench
Overview of Apache Spark 2
Reading and writing data
Inspecting data quality
Cleansing and transforming data
Summarizing and grouping data
Combining, splitting, and reshaping data
Exploring data
Configuring, monitoring, and troubleshooting Spark applications
Overview of machine learning in Spark MLlib
Extracting, transforming, and selecting features
Building and evaluating regression models
Building and evaluating classification models
Building and evaluating clustering models
Cross-validating models and tuning hyperparameters
Building machine learning pipelines
Deploying machine learning models
Spark, Spark SQL, and Spark MLlib
PySpark and sparklyr
Cloudera Data Science Workbench (CDSW)
Hue
This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful.
Overview of data science and machine learning at scaleOverview of the Hadoop ecosystemWorking with HDFS data and Hive tables using HueIntroduction to Cloudera Data Science WorkbenchOverview of Apache Spark 2Reading and writing dataInspecting data qualityCleansing and transforming dataSummarizing and grouping dataCombining, splitting, and reshaping dataExploring dataConfiguring, monitoring, and troubleshooting Spark applicationsOverview of machine learning in Spark MLlibExtracting, transforming, and selecting featuresBuilding and evauating regression modelsBuilding and evaluating classification modelsBuilding and evaluating clustering modelsCross-validating models and tuning hyperparametersBuilding machine learning pipelinesDeploying machine learning models
Additional course details:
Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward.
This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts.
Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success.
While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you.
Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.
Nexus Human, established over 20 years ago, stands as a pillar of excellence in the realm of IT and Business Skills Training and education in Ireland and the UK....