• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

2 Educators providing Pyspark courses delivered Live Online

Nexus Human

nexus human

London

Nexus Human, established over 20 years ago, stands as a pillar of excellence in the realm of IT and Business Skills Training and education in Ireland and the UK.  For over two decades, Nexus Human has been a steadfast source of reliable and high-quality training solutions, catering to a diverse range of professional and educational needs. With a strong reputation in the Training Industry, Nexus Human has consistently demonstrated its commitment to equipping individuals and organisations with the skills and knowledge required to thrive in today's dynamic world.  Our training programs span a wide spectrum, encompassing IT certifications, business skills, and much more.   What sets Nexus Human apart is our unwavering dedication to staying at the forefront of industry trends and technology advancements.  Our expert instructors, coupled with cutting-edge training resources, ensure that students receive the most up-to-date and relevant knowledge available. The impact of Nexus Human extends far and wide, helping individuals enhance their career prospects and aiding businesses in achieving their goals.  This 20-year journey has solidified our institution's standing as a trusted partner in personal and professional growth, offering reliable, excellent training that continues to shape the future.  Whether you seek to upskill, reskill, or simply stay ahead of the curve, Nexus Human is the place to turn for an educational experience marked by quality, reliability, and innovation.

Courses matching "Pyspark"

Show all 2

Cloudera Data Scientist Training

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview of data science and machine learning at scaleOverview of the Hadoop ecosystemWorking with HDFS data and Hive tables using HueIntroduction to Cloudera Data Science WorkbenchOverview of Apache Spark 2Reading and writing dataInspecting data qualityCleansing and transforming dataSummarizing and grouping dataCombining, splitting, and reshaping dataExploring dataConfiguring, monitoring, and troubleshooting Spark applicationsOverview of machine learning in Spark MLlibExtracting, transforming, and selecting featuresBuilding and evauating regression modelsBuilding and evaluating classification modelsBuilding and evaluating clustering modelsCross-validating models and tuning hyperparametersBuilding machine learning pipelinesDeploying machine learning models Additional course details: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cloudera Data Scientist Training
Delivered OnlineFlexible Dates
Price on Enquiry

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)

By Nexus Human

Duration 5 Days 30 CPD hours This course is intended for This intermediate and beyond level course is geared for experienced technical professionals in various roles, such as developers, data analysts, data engineers, software engineers, and machine learning engineers who want to leverage Scala and Spark to tackle complex data challenges and develop scalable, high-performance applications across diverse domains. Practical programming experience is required to participate in the hands-on labs. Overview Working in a hands-on learning environment led by our expert instructor you'll: Develop a basic understanding of Scala and Apache Spark fundamentals, enabling you to confidently create scalable and high-performance applications. Learn how to process large datasets efficiently, helping you handle complex data challenges and make data-driven decisions. Gain hands-on experience with real-time data streaming, allowing you to manage and analyze data as it flows into your applications. Acquire practical knowledge of machine learning algorithms using Spark MLlib, empowering you to create intelligent applications and uncover hidden insights. Master graph processing with GraphX, enabling you to analyze and visualize complex relationships in your data. Discover generative AI technologies using GPT with Spark and Scala, opening up new possibilities for automating content generation and enhancing data analysis. Embark on a journey to master the world of big data with our immersive course on Scala and Spark! Mastering Scala with Apache Spark for the Modern Data Enterprise is a five day hands on course designed to provide you with the essential skills and tools to tackle complex data projects using Scala programming language and Apache Spark, a high-performance data processing engine. Mastering these technologies will enable you to perform a wide range of tasks, from data wrangling and analytics to machine learning and artificial intelligence, across various industries and applications.Guided by our expert instructor, you?ll explore the fundamentals of Scala programming and Apache Spark while gaining valuable hands-on experience with Spark programming, RDDs, DataFrames, Spark SQL, and data sources. You?ll also explore Spark Streaming, performance optimization techniques, and the integration of popular external libraries, tools, and cloud platforms like AWS, Azure, and GCP. Machine learning enthusiasts will delve into Spark MLlib, covering basics of machine learning algorithms, data preparation, feature extraction, and various techniques such as regression, classification, clustering, and recommendation systems. Introduction to Scala Brief history and motivation Differences between Scala and Java Basic Scala syntax and constructs Scala's functional programming features Introduction to Apache Spark Overview and history Spark components and architecture Spark ecosystem Comparing Spark with other big data frameworks Basics of Spark Programming SparkContext and SparkSession Resilient Distributed Datasets (RDDs) Transformations and Actions Working with DataFrames Spark SQL and Data Sources Spark SQL library and its advantages Structured and semi-structured data sources Reading and writing data in various formats (CSV, JSON, Parquet, Avro, etc.) Data manipulation using SQL queries Basic RDD Operations Creating and manipulating RDDs Common transformations and actions on RDDs Working with key-value data Basic DataFrame and Dataset Operations Creating and manipulating DataFrames and Datasets Column operations and functions Filtering, sorting, and aggregating data Introduction to Spark Streaming Overview of Spark Streaming Discretized Stream (DStream) operations Windowed operations and stateful processing Performance Optimization Basics Best practices for efficient Spark code Broadcast variables and accumulators Monitoring Spark applications Integrating External Libraries and Tools, Spark Streaming Using popular external libraries, such as Hadoop and HBase Integrating with cloud platforms: AWS, Azure, GCP Connecting to data storage systems: HDFS, S3, Cassandra, etc. Introduction to Machine Learning Basics Overview of machine learning Supervised and unsupervised learning Common algorithms and use cases Introduction to Spark MLlib Overview of Spark MLlib MLlib's algorithms and utilities Data preparation and feature extraction Linear Regression and Classification Linear regression algorithm Logistic regression for classification Model evaluation and performance metrics Clustering Algorithms Overview of clustering algorithms K-means clustering Model evaluation and performance metrics Collaborative Filtering and Recommendation Systems Overview of recommendation systems Collaborative filtering techniques Implementing recommendations with Spark MLlib Introduction to Graph Processing Overview of graph processing Use cases and applications of graph processing Graph representations and operations Introduction to Spark GraphX Overview of GraphX Creating and transforming graphs Graph algorithms in GraphX Big Data Innovation! Using GPT and Generative AI Technologies with Spark and Scala Overview of generative AI technologies Integrating GPT with Spark and Scala Practical applications and use cases Bonus Topics / Time Permitting Introduction to Spark NLP Overview of Spark NLP Preprocessing text data Text classification and sentiment analysis Putting It All Together Work on a capstone project that integrates multiple aspects of the course, including data processing, machine learning, graph processing, and generative AI technologies.

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)
Delivered OnlineFlexible Dates
Price on Enquiry