Overview of Data Analyst: Data Analysis in Excel Join our Data Analyst: Data Analysis in Excel course and discover your hidden skills, setting you on a path to success in this area. Get ready to improve your skills and achieve your biggest goals. The Data Analyst: Data Analysis in Excel course has everything you need to get a great start in this sector. Improving and moving forward is key to getting ahead personally. The Data Analyst: Data Analysis in Excel course is designed to teach you the important stuff quickly and well, helping you to get off to a great start in the field. So, what are you looking for? Enrol now! Get a Quick Look at The Course Content: This Data Analyst: Data Analysis in Excel Course will help you to learn: Learn strategies to boost your workplace efficiency. Hone your skills to help you advance your career. Acquire a comprehensive understanding of various topics and tips. Learn in-demand skills that are in high demand among UK employers This course covers the topic you must know to stand against the tough competition. The future is truly yours to seize with this Data Analyst: Data Analysis in Excel. Enrol today and complete the course to achieve a certificate that can change your career forever. Details Perks of Learning with IOMH One-To-One Support from a Dedicated Tutor Throughout Your Course. Study Online - Whenever and Wherever You Want. Instant Digital/ PDF Certificate. 100% Money Back Guarantee. 12 Months Access. Process of Evaluation After studying the course, an MCQ exam or assignment will test your skills and knowledge. You have to get a score of 60% to pass the test and get your certificate. Certificate of Achievement Certificate of Completion - Digital / PDF Certificate After completing the Data Analyst: Data Analysis in Excel course, you can order your CPD Accredited Digital / PDF Certificate for £5.99. Certificate of Completion - Hard copy Certificate You can get the CPD Accredited Hard Copy Certificate for £12.99. Shipping Charges: Inside the UK: £3.99 International: £10.99 Who Is This Course for? This Data Analyst: Data Analysis in Excel is suitable for anyone aspiring to start a career in relevant field; even if you are new to this and have no prior knowledge, this course is going to be very easy for you to understand. On the other hand, if you are already working in this sector, this course will be a great source of knowledge for you to improve your existing skills and take them to the next level. This course has been developed with maximum flexibility and accessibility, making it ideal for people who don't have the time to devote to traditional education. Requirements You don't need any educational qualification or experience to enrol in the Data Analyst: Data Analysis in Excel course. Do note: you must be at least 16 years old to enrol. Any internet-connected device, such as a computer, tablet, or smartphone, can access this online course. Career Path The certification and skills you get from this Data Analyst: Data Analysis in Excel Course can help you advance your career and gain expertise in several fields, allowing you to apply for high-paying jobs in related sectors. Course Curriculum Modifying a Worksheet Insert, Delete, and Adjust Cells, Columns, and Rows 00:10:00 Search for and Replace Data 00:09:00 Use Proofing and Research Tools 00:07:00 Working with Lists Sort Data 00:10:00 Filter Data 00:10:00 Query Data with Database Functions 00:09:00 Outline and Subtotal Data 00:09:00 Analyzing Data Apply Intermediate Conditional Formatting 00:07:00 Apply Advanced Conditional Formatting 00:05:00 Visualizing Data with Charts Create Charts 00:13:00 Modify and Format Charts 00:12:00 Use Advanced Chart Features 00:12:00 Using PivotTables and PivotCharts Create a PivotTable 00:13:00 Analyze PivotTable Data 00:12:00 Present Data with PivotCharts 00:07:00 Filter Data by Using Timelines and Slicers 00:11:00 Working with Multiple Worksheets and Workbooks Use Links and External References 00:12:00 Use 3-D References 00:06:00 Consolidate Data 00:05:00 Using Lookup Functions and Formula Auditing Use Lookup Functions 00:12:00 Trace Cells 00:09:00 Watch and Evaluate Formulas 00:08:00 Automating Workbook Functionality Apply Data Validation 00:13:00 Search for Invalid Data and Formulas with Errors 00:04:00 Work with Macros 00:18:00 Creating Sparklines and Mapping Data Create Sparklines 00:07:00 MapData 00:07:00 Forecasting Data Determine Potential Outcomes Using Data Tables 00:08:00 Determine Potential Outcomes Using Scenarios 00:09:00 Use the Goal Seek Feature 00:04:00 Forecasting Data Trends 00:05:00
A course by Sekhar Metla IT Industry Expert RequirementsNo programming experience needed. You will learn everything you need to knowNo software is required in advance of the course (all software used in the course is free)No pre-knowledge is required - you will learn from basic Audience Beginner JavaScript, Python and MSSQL developers curious about data science development Anyone who wants to generate new income streams Anyone who wants to build websites Anyone who wants to become financially independent Anyone who wants to start their own business or become freelance Anyone who wants to become a Full stack web developer Audience Beginner JavaScript, Python and MSSQL developers curious about data science development Anyone who wants to generate new income streams Anyone who wants to build websites Anyone who wants to become financially independent Anyone who wants to start their own business or become freelance Anyone who wants to become a Full stack web developer
GDPR Data Protection Law [Updated 2023] Stay ahead in compliance with our updated 2023 GDPR Data Protection Law course. Equip yourself with the latest in GDPR Data Protection standards. Secure your organisation's future with comprehensive GDPR Data Protection knowledge. Learning Outcomes: Navigate the Introduction to GDPR for compliance. Uphold the Principles of GDPR in data management. Ensure Lawful Basis for Processing personal data. Defend the Rights of Data Subject under GDPR. Differentiate roles of Data Controller and Processor. More Benefits: LIFETIME access Device Compatibility Free Workplace Management Toolkit Key Modules from GDPR Data Protection Law [Updated 2023]: Introduction to GDPR: Familiarise yourself with the GDPR's scope and its impact on GDPR Data Protection practices. Principles of GDPR: Grasp the key GDPR principles that underpin effective GDPR Data Protection strategies. Lawful Basis for Processing: Understand the legal grounds for processing personal data within GDPR Data Protection frameworks. Rights of Data Subject: Recognise the rights individuals hold over their data, a cornerstone of GDPR Data Protection. Data Controller and Data Processor: Define and distinguish between the responsibilities of data controllers and processors under GDPR Data Protection laws. Data Protection by Design and by Default: Implement GDPR Data Protection requirements throughout your data processing activities. Security of Data: Master the security measures required to protect data in line with GDPR Data Protection guidelines. Data Breaches: Learn how to effectively manage and report data breaches in accordance with GDPR Data Protection procedures. Workplace and GDPR: Apply GDPR Data Protection policies within your organisational processes and workplace culture. Transferring Data Outside of EEA: Navigate the complexities of transferring data internationally under GDPR Data Protection rules. Exemptions: Identify the exemptions within GDPR Data Protection law and how they may apply to certain data processing scenarios.
Duration 3 Days 18 CPD hours This course is intended for This is an introductory-level course designed to teach experienced systems administrators how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Previous Hadoop experience is not required. Overview Working within in an engaging, hands-on learning environment, guided by our expert team, attendees will learn to: Understand the benefits of distributed computing Understand the Hadoop architecture (including HDFS and MapReduce) Define administrator participation in Big Data projects Plan, implement, and maintain Hadoop clusters Deploy and maintain additional Big Data tools (Pig, Hive, Flume, etc.) Plan, deploy and maintain HBase on a Hadoop cluster Monitor and maintain hundreds of servers Pinpoint performance bottlenecks and fix them Apache Hadoop is an open source framework for creating reliable and distributable compute clusters. Hadoop provides an excellent platform (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence. This is an introductory-level, hands-on lab-intensive course geared for the administrator (new to Hadoop) who is charged with maintaining a Hadoop cluster and its related components. You will learn how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Introduction Hadoop history and concepts Ecosystem Distributions High level architecture Hadoop myths Hadoop challenges (hardware / software) Planning and installation Selecting software and Hadoop distributions Sizing the cluster and planning for growth Selecting hardware and network Rack topology Installation Multi-tenancy Directory structure and logs Benchmarking HDFS operations Concepts (horizontal scaling, replication, data locality, rack awareness) Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) Health monitoring Command-line and browser-based administration Adding storage and replacing defective drives MapReduce operations Parallel computing before MapReduce: compare HPC versus Hadoop administration MapReduce cluster loads Nodes and Daemons (JobTracker, TaskTracker) MapReduce UI walk through MapReduce configuration Job config Job schedulers Administrator view of MapReduce best practices Optimizing MapReduce Fool proofing MR: what to tell your programmers YARN: architecture and use Advanced topics Hardware monitoring System software monitoring Hadoop cluster monitoring Adding and removing servers and upgrading Hadoop Backup, recovery, and business continuity planning Cluster configuration tweaks Hardware maintenance schedule Oozie scheduling for administrators Securing your cluster with Kerberos The future of Hadoop
Duration 3 Days 18 CPD hours This course is intended for Data Analysts, Business Analysts, Business Intelligence professionals Cloud Data Engineers who will be partnering with Data Analysts to build scalable data solutions on Google Cloud Platform Overview This course teaches students the following skills: Derive insights from data using the analysis and visualization tools on Google Cloud Platform Interactively query datasets using Google BigQuery Load, clean, and transform data at scale Visualize data using Google Data Studio and other third-party platforms Distinguish between exploratory and explanatory analytics and when to use each approach Explore new datasets and uncover hidden insights quickly and effectively Optimizing data models and queries for price and performance Want to know how to query and process petabytes of data in seconds? Curious about data analysis that scales automatically as your data grows? Welcome to the Data Insights course! This four-course accelerated online specialization teaches course participants how to derive insights through data analysis and visualization using the Google Cloud Platform. The courses feature interactive scenarios and hands-on labs where participants explore, mine, load, visualize, and extract insights from diverse Google BigQuery datasets. The courses also cover data loading, querying, schema modeling, optimizing performance, query pricing, and data visualization. This specialization is intended for the following participants: Data Analysts, Business Analysts, Business Intelligence professionals Cloud Data Engineers who will be partnering with Data Analysts to build scalable data solutions on Google Cloud Platform To get the most out of this specialization, we recommend participants have some proficiency with ANSI SQL. Introduction to Data on the Google Cloud Platform Highlight Analytics Challenges Faced by Data Analysts Compare Big Data On-Premises vs on the Cloud Learn from Real-World Use Cases of Companies Transformed through Analytics on the Cloud Navigate Google Cloud Platform Project Basics Lab: Getting started with Google Cloud Platform Big Data Tools Overview Walkthrough Data Analyst Tasks, Challenges, and Introduce Google Cloud Platform Data Tools Demo: Analyze 10 Billion Records with Google BigQuery Explore 9 Fundamental Google BigQuery Features Compare GCP Tools for Analysts, Data Scientists, and Data Engineers Lab: Exploring Datasets with Google BigQuery Exploring your Data with SQL Compare Common Data Exploration Techniques Learn How to Code High Quality Standard SQL Explore Google BigQuery Public Datasets Visualization Preview: Google Data Studio Lab: Troubleshoot Common SQL Errors Google BigQuery Pricing Walkthrough of a BigQuery Job Calculate BigQuery Pricing: Storage, Querying, and Streaming Costs Optimize Queries for Cost Lab: Calculate Google BigQuery Pricing Cleaning and Transforming your Data Examine the 5 Principles of Dataset Integrity Characterize Dataset Shape and Skew Clean and Transform Data using SQL Clean and Transform Data using a new UI: Introducing Cloud Dataprep Lab: Explore and Shape Data with Cloud Dataprep Storing and Exporting Data Compare Permanent vs Temporary Tables Save and Export Query Results Performance Preview: Query Cache Lab: Creating new Permanent Tables Ingesting New Datasets into Google BigQuery Query from External Data Sources Avoid Data Ingesting Pitfalls Ingest New Data into Permanent Tables Discuss Streaming Inserts Lab: Ingesting and Querying New Datasets Data Visualization Overview of Data Visualization Principles Exploratory vs Explanatory Analysis Approaches Demo: Google Data Studio UI Connect Google Data Studio to Google BigQuery Lab: Exploring a Dataset in Google Data Studio Joining and Merging Datasets Merge Historical Data Tables with UNION Introduce Table Wildcards for Easy Merges Review Data Schemas: Linking Data Across Multiple Tables Walkthrough JOIN Examples and Pitfalls Lab: Join and Union Data from Multiple Tables Advanced Functions and Clauses Review SQL Case Statements Introduce Analytical Window Functions Safeguard Data with One-Way Field Encryption Discuss Effective Sub-query and CTE design Compare SQL and Javascript UDFs Lab: Deriving Insights with Advanced SQL Functions Schema Design and Nested Data Structures Compare Google BigQuery vs Traditional RDBMS Data Architecture Normalization vs Denormalization: Performance Tradeoffs Schema Review: The Good, The Bad, and The Ugly Arrays and Nested Data in Google BigQuery Lab: Querying Nested and Repeated Data More Visualization with Google Data Studio Create Case Statements and Calculated Fields Avoid Performance Pitfalls with Cache considerations Share Dashboards and Discuss Data Access considerations Optimizing for Performance Avoid Google BigQuery Performance Pitfalls Prevent Hotspots in your Data Diagnose Performance Issues with the Query Explanation map Lab: Optimizing and Troubleshooting Query Performance Advanced Insights Introducing Cloud Datalab Cloud Datalab Notebooks and Cells Benefits of Cloud Datalab Data Access Compare IAM and BigQuery Dataset Roles Avoid Access Pitfalls Review Members, Roles, Organizations, Account Administration, and Service Accounts
Duration 4 Days 24 CPD hours This course is intended for This course is appropriate for developers and administrators who intend to use HBase. Overview Skills learned on the course include:The use cases and usage occasions for HBase, Hadoop, and RDBMSUsing the HBase shell to directly manipulate HBase tablesDesigning optimal HBase schemas for efficient data storage and recoveryHow to connect to HBase using the Java API, configure the HBase cluster, and administer an HBase clusterBest practices for identifying and resolving performance bottlenecks Cloudera University?s four-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second. Introduction to Hadoop & HBase What Is Big Data? Introducing Hadoop Hadoop Components What Is HBase? Why Use HBase? Strengths of HBase HBase in Production Weaknesses of HBase HBase Tables HBase Concepts HBase Table Fundamentals Thinking About Table Design The HBase Shell Creating Tables with the HBase Shell Working with Tables Working with Table Data HBase Architecture Fundamentals HBase Regions HBase Cluster Architecture HBase and HDFS Data Locality HBase Schema Design General Design Considerations Application-Centric Design Designing HBase Row Keys Other HBase Table Features Basic Data Access with the HBase API Options to Access HBase Data Creating and Deleting HBase Tables Retrieving Data with Get Retrieving Data with Scan Inserting and Updating Data Deleting Data More Advanced HBase API Features Filtering Scans Best Practices HBase Coprocessors HBase on the Cluster How HBase Uses HDFS Compactions and Splits HBase Reads & Writes How HBase Writes Data How HBase Reads Data Block Caches for Reading HBase Performance Tuning Column Family Considerations Schema Design Considerations Configuring for Caching Dealing with Time Series and Sequential Data Pre-Splitting Regions HBase Administration and Cluster Management HBase Daemons ZooKeeper Considerations HBase High Availability Using the HBase Balancer Fixing Tables with hbck HBase Security HBase Replication & Backup HBase Replication HBase Backup MapReduce and HBase Clusters Using Hive & Impala with HBase Using Hive and Impala with HBase Appendix A: Accessing Data with Python and Thrift Thrift Usage Working with Tables Getting and Putting Data Scanning Data Deleting Data Counters Filters Appendix B: OpenTSDB
Duration 4 Days 24 CPD hours This course is intended for This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems. Overview Skills learned in this course include:Creating a data set with Kite SDKDeveloping custom Flume components for data ingestionManaging a multi-stage workflow with OozieAnalyzing data with CrunchWriting user-defined functions for Hive and ImpalaWriting user-defined functions for Hive and ImpalaIndexing data with Cloudera Search Cloudera University?s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). IntroductionApplication Architecture Scenario Explanation Understanding the Development Environment Identifying and Collecting Input Data Selecting Tools for Data Processing and Analysis Presenting Results to the Use Defining & Using Datasets Metadata Management What is Apache Avro? Avro Schemas Avro Schema Evolution Selecting a File Format Performance Considerations Using the Kite SDK Data Module What is the Kite SDK? Fundamental Data Module Concepts Creating New Data Sets Using the Kite SDK Loading, Accessing, and Deleting a Data Set Importing Relational Data with Apache Sqoop What is Apache Sqoop? Basic Imports Limiting Results Improving Sqoop?s Performance Sqoop 2 Capturing Data with Apache Flume What is Apache Flume? Basic Flume Architecture Flume Sources Flume Sinks Flume Configuration Logging Application Events to Hadoop Developing Custom Flume Components Flume Data Flow and Common Extension Points Custom Flume Sources Developing a Flume Pollable Source Developing a Flume Event-Driven Source Custom Flume Interceptors Developing a Header-Modifying Flume Interceptor Developing a Filtering Flume Interceptor Writing Avro Objects with a Custom Flume Interceptor Managing Workflows with Apache Oozie The Need for Workflow Management What is Apache Oozie? Defining an Oozie Workflow Validation, Packaging, and Deployment Running and Tracking Workflows Using the CLI Hue UI for Oozie Processing Data Pipelines with Apache Crunch What is Apache Crunch? Understanding the Crunch Pipeline Comparing Crunch to Java MapReduce Working with Crunch Projects Reading and Writing Data in Crunch Data Collection API Functions Utility Classes in the Crunch API Working with Tables in Apache Hive What is Apache Hive? Accessing Hive Basic Query Syntax Creating and Populating Hive Tables How Hive Reads Data Using the RegexSerDe in Hive Developing User-Defined Functions What are User-Defined Functions? Implementing a User-Defined Function Deploying Custom Libraries in Hive Registering a User-Defined Function in Hive Executing Interactive Queries with Impala What is Impala? Comparing Hive to Impala Running Queries in Impala Support for User-Defined Functions Data and Metadata Management Understanding Cloudera Search What is Cloudera Search? Search Architecture Supported Document Formats Indexing Data with Cloudera Search Collection and Schema Management Morphlines Indexing Data in Batch Mode Indexing Data in Near Real Time Presenting Results to Users Solr Query Syntax Building a Search UI with Hue Accessing Impala through JDBC Powering a Custom Web Application with Impala and Search
Duration 5 Days 30 CPD hours This course is intended for This intermediate and beyond level course is geared for experienced technical professionals in various roles, such as developers, data analysts, data engineers, software engineers, and machine learning engineers who want to leverage Scala and Spark to tackle complex data challenges and develop scalable, high-performance applications across diverse domains. Practical programming experience is required to participate in the hands-on labs. Overview Working in a hands-on learning environment led by our expert instructor you'll: Develop a basic understanding of Scala and Apache Spark fundamentals, enabling you to confidently create scalable and high-performance applications. Learn how to process large datasets efficiently, helping you handle complex data challenges and make data-driven decisions. Gain hands-on experience with real-time data streaming, allowing you to manage and analyze data as it flows into your applications. Acquire practical knowledge of machine learning algorithms using Spark MLlib, empowering you to create intelligent applications and uncover hidden insights. Master graph processing with GraphX, enabling you to analyze and visualize complex relationships in your data. Discover generative AI technologies using GPT with Spark and Scala, opening up new possibilities for automating content generation and enhancing data analysis. Embark on a journey to master the world of big data with our immersive course on Scala and Spark! Mastering Scala with Apache Spark for the Modern Data Enterprise is a five day hands on course designed to provide you with the essential skills and tools to tackle complex data projects using Scala programming language and Apache Spark, a high-performance data processing engine. Mastering these technologies will enable you to perform a wide range of tasks, from data wrangling and analytics to machine learning and artificial intelligence, across various industries and applications.Guided by our expert instructor, you?ll explore the fundamentals of Scala programming and Apache Spark while gaining valuable hands-on experience with Spark programming, RDDs, DataFrames, Spark SQL, and data sources. You?ll also explore Spark Streaming, performance optimization techniques, and the integration of popular external libraries, tools, and cloud platforms like AWS, Azure, and GCP. Machine learning enthusiasts will delve into Spark MLlib, covering basics of machine learning algorithms, data preparation, feature extraction, and various techniques such as regression, classification, clustering, and recommendation systems. Introduction to Scala Brief history and motivation Differences between Scala and Java Basic Scala syntax and constructs Scala's functional programming features Introduction to Apache Spark Overview and history Spark components and architecture Spark ecosystem Comparing Spark with other big data frameworks Basics of Spark Programming SparkContext and SparkSession Resilient Distributed Datasets (RDDs) Transformations and Actions Working with DataFrames Spark SQL and Data Sources Spark SQL library and its advantages Structured and semi-structured data sources Reading and writing data in various formats (CSV, JSON, Parquet, Avro, etc.) Data manipulation using SQL queries Basic RDD Operations Creating and manipulating RDDs Common transformations and actions on RDDs Working with key-value data Basic DataFrame and Dataset Operations Creating and manipulating DataFrames and Datasets Column operations and functions Filtering, sorting, and aggregating data Introduction to Spark Streaming Overview of Spark Streaming Discretized Stream (DStream) operations Windowed operations and stateful processing Performance Optimization Basics Best practices for efficient Spark code Broadcast variables and accumulators Monitoring Spark applications Integrating External Libraries and Tools, Spark Streaming Using popular external libraries, such as Hadoop and HBase Integrating with cloud platforms: AWS, Azure, GCP Connecting to data storage systems: HDFS, S3, Cassandra, etc. Introduction to Machine Learning Basics Overview of machine learning Supervised and unsupervised learning Common algorithms and use cases Introduction to Spark MLlib Overview of Spark MLlib MLlib's algorithms and utilities Data preparation and feature extraction Linear Regression and Classification Linear regression algorithm Logistic regression for classification Model evaluation and performance metrics Clustering Algorithms Overview of clustering algorithms K-means clustering Model evaluation and performance metrics Collaborative Filtering and Recommendation Systems Overview of recommendation systems Collaborative filtering techniques Implementing recommendations with Spark MLlib Introduction to Graph Processing Overview of graph processing Use cases and applications of graph processing Graph representations and operations Introduction to Spark GraphX Overview of GraphX Creating and transforming graphs Graph algorithms in GraphX Big Data Innovation! Using GPT and Generative AI Technologies with Spark and Scala Overview of generative AI technologies Integrating GPT with Spark and Scala Practical applications and use cases Bonus Topics / Time Permitting Introduction to Spark NLP Overview of Spark NLP Preprocessing text data Text classification and sentiment analysis Putting It All Together Work on a capstone project that integrates multiple aspects of the course, including data processing, machine learning, graph processing, and generative AI technologies.
Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview of data science and machine learning at scaleOverview of the Hadoop ecosystemWorking with HDFS data and Hive tables using HueIntroduction to Cloudera Data Science WorkbenchOverview of Apache Spark 2Reading and writing dataInspecting data qualityCleansing and transforming dataSummarizing and grouping dataCombining, splitting, and reshaping dataExploring dataConfiguring, monitoring, and troubleshooting Spark applicationsOverview of machine learning in Spark MLlibExtracting, transforming, and selecting featuresBuilding and evauating regression modelsBuilding and evaluating classification modelsBuilding and evaluating clustering modelsCross-validating models and tuning hyperparametersBuilding machine learning pipelinesDeploying machine learning models Additional course details: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.
Duration 3 Days 18 CPD hours This course is intended for This course is geared for Python-experienced attendees who wish to be equipped with the skills you need to use pandas to ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. Overview Working in a hands-on learning environment, guided by our expert team, attendees will learn to: Understand how data analysts and scientists gather and analyze data Perform data analysis and data wrangling using Python Combine, group, and aggregate data from multiple sources Create data visualizations with pandas, matplotlib, and seaborn Apply machine learning (ML) algorithms to identify patterns and make predictions Use Python data science libraries to analyze real-world datasets Use pandas to solve common data representation and analysis problems Build Python scripts, modules, and packages for reusable analysis code Perform efficient data analysis and manipulation tasks using pandas Apply pandas to different real-world domains with the help of step-by-step demonstrations Get accustomed to using pandas as an effective data exploration tool. Data analysis has become a necessary skill in a variety of domains where knowing how to work with data and extract insights can generate significant value. Geared for data team members with incoming Python scripting experience, Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. Using real-world datasets, you will learn how to use the powerful pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will be able to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding lessons, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data. Students will leave the course armed with the skills required to use pandas to ensure the veracity of their data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. Introduction to Data Analysis Fundamentals of data analysis Statistical foundations Setting up a virtual environment Working with Pandas DataFrames Pandas data structures Bringing data into a pandas DataFrame Inspecting a DataFrame object Grabbing subsets of the data Adding and removing data Data Wrangling with Pandas What is data wrangling? Collecting temperature data Cleaning up the data Restructuring the data Handling duplicate, missing, or invalid data Aggregating Pandas DataFrames Database-style operations on DataFrames DataFrame operations Aggregations with pandas and numpy Time series Visualizing Data with Pandas and Matplotlib An introduction to matplotlib Plotting with pandas The pandas.plotting subpackage Plotting with Seaborn and Customization Techniques Utilizing seaborn for advanced plotting Formatting Customizing visualizations Financial Analysis - Bitcoin and the Stock Market Building a Python package Data extraction with pandas Exploratory data analysis Technical analysis of financial instruments Modeling performance Rule-Based Anomaly Detection Simulating login attempts Exploratory data analysis Rule-based anomaly detection Getting Started with Machine Learning in Python Learning the lingo Exploratory data analysis Preprocessing data Clustering Regression Classification Making Better Predictions - Optimizing Models Hyperparameter tuning with grid search Feature engineering Ensemble methods Inspecting classification prediction confidence Addressing class imbalance Regularization Machine Learning Anomaly Detection Exploring the data Unsupervised methods Supervised methods Online learning The Road Ahead Data resources Practicing working with data Python practice