Duration 3 Days 18 CPD hours This course is intended for This course is geared for Python-experienced attendees who wish to be equipped with the skills you need to use pandas to ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. Overview Working in a hands-on learning environment, guided by our expert team, attendees will learn to: Understand how data analysts and scientists gather and analyze data Perform data analysis and data wrangling using Python Combine, group, and aggregate data from multiple sources Create data visualizations with pandas, matplotlib, and seaborn Apply machine learning (ML) algorithms to identify patterns and make predictions Use Python data science libraries to analyze real-world datasets Use pandas to solve common data representation and analysis problems Build Python scripts, modules, and packages for reusable analysis code Perform efficient data analysis and manipulation tasks using pandas Apply pandas to different real-world domains with the help of step-by-step demonstrations Get accustomed to using pandas as an effective data exploration tool. Data analysis has become a necessary skill in a variety of domains where knowing how to work with data and extract insights can generate significant value. Geared for data team members with incoming Python scripting experience, Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. Using real-world datasets, you will learn how to use the powerful pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will be able to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding lessons, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data. Students will leave the course armed with the skills required to use pandas to ensure the veracity of their data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. Introduction to Data Analysis Fundamentals of data analysis Statistical foundations Setting up a virtual environment Working with Pandas DataFrames Pandas data structures Bringing data into a pandas DataFrame Inspecting a DataFrame object Grabbing subsets of the data Adding and removing data Data Wrangling with Pandas What is data wrangling? Collecting temperature data Cleaning up the data Restructuring the data Handling duplicate, missing, or invalid data Aggregating Pandas DataFrames Database-style operations on DataFrames DataFrame operations Aggregations with pandas and numpy Time series Visualizing Data with Pandas and Matplotlib An introduction to matplotlib Plotting with pandas The pandas.plotting subpackage Plotting with Seaborn and Customization Techniques Utilizing seaborn for advanced plotting Formatting Customizing visualizations Financial Analysis - Bitcoin and the Stock Market Building a Python package Data extraction with pandas Exploratory data analysis Technical analysis of financial instruments Modeling performance Rule-Based Anomaly Detection Simulating login attempts Exploratory data analysis Rule-based anomaly detection Getting Started with Machine Learning in Python Learning the lingo Exploratory data analysis Preprocessing data Clustering Regression Classification Making Better Predictions - Optimizing Models Hyperparameter tuning with grid search Feature engineering Ensemble methods Inspecting classification prediction confidence Addressing class imbalance Regularization Machine Learning Anomaly Detection Exploring the data Unsupervised methods Supervised methods Online learning The Road Ahead Data resources Practicing working with data Python practice
Duration 3 Days 18 CPD hours This course is intended for Senior Executives CIOs and CTOs Business Intelligence Executives Marketing Executives Data & Business Analytics Specialists Innovation Specialists & Entrepreneurs Academics, and other people interested in Big Data Overview More specifically, BDAW addresses advanced big data architecture topics, including, data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, security and privacy, minimizing the risk of an unsound architecture and technology selection. Big Data Architecture Workshop (BDAW) is a learning event that addresses advanced big data architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system. Throughout the highly interactive workshop, students apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for students to learn techniques for architecting big data systems, not only from Cloudera?s experience but also from the experiences of fellow students. Workshop Application Use Cases Oz Metropolitan Architectural questions Team activity: Analyze Metroz Application Use Cases Application Vertical Slice Definition Minimizing risk of an unsound architecture Selecting a vertical slice Team activity: Identify an initial vertical slice for Metroz Application Processing Real time, near real time processing Batch processing Data access patterns Delivery and processing guarantees Machine Learning pipelines Team activity: identify delivery and processing patterns in Metroz, characterize response time requirements, identify Machine Learning pipelines Application Data Three V?s of Big Data Data Lifecycle Data Formats Transforming Data Team activity: Metroz Data Requirements Scalable Applications Scale up, scale out, scale to X Determining if an application will scale Poll: scalable airport terminal designs Hadoop and Spark Scalability Team activity: Scaling Metroz Fault Tolerant Distributed Systems Principles Transparency Hardware vs. Software redundancy Tolerating disasters Stateless functional fault tolerance Stateful fault tolerance Replication and group consistency Fault tolerance in Spark and Map Reduce Application tolerance for failures Team activity: Identify Metroz component failures and requirements Security and Privacy Principles Privacy Threats Technologies Team activity: identify threats and security mechanisms in Metroz Deployment Cluster sizing and evolution On-premise vs. Cloud Edge computing Team activity: select deployment for Metroz Technology Selection HDFS HBase Kudu Relational Database Management Systems Map Reduce Spark, including streaming, SparkSQL and SparkML Hive Impala Cloudera Search Data Sets and Formats Team activity: technologies relevant to Metroz Software Architecture Architecture artifacts One platform or multiple, lambda architecture Team activity: produce high level architecture, selected technologies, revisit vertical slice Vertical Slice demonstration Additional course details: Nexus Humans Big Data Architecture Workshop training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Big Data Architecture Workshop course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.
Duration 2 Days 12 CPD hours This course is intended for Report Authors Overview Create query models Create reports based on query relationships Introduction to dimensional data Introduction to dimensional data in reports Dimensional report context Focus your dimensional data Calculations and dimensional functions Create advanced dynamic reports This offering teaches Professional Report Authors about advanced report building techniques using relational data models, dimensional data, and ways of enhancing, customizing, managing, and distributing professional reports. The course builds on topics presented in the Fundamentals course. Activities will illustrate and reinforce key concepts during this learning activity. Create query models Build a query and connect it to a report Answer a business question by referencing data in a separate query Create reports based on query relationships Create join relationships between queries Combine data containers based on relationships from different queries Create a report comparing the percentage of change Introduction to dimensional reporting concepts Examine data sources and model types Describe the dimensional approach to queries Apply report authoring styles Introduction to dimensional data in reports Use members to create reports Identify sets and tuples in reports Use query calculations and set definitions Dimensional report context Examine dimensional report members Examine dimensional report measures Use the default measure to create a summarized column in a report Focus your dimensional data Focus your report by excluding members of a defined set Compare the use of the filter() function to a detail filter Filter dimensional data using slicers Calculations and dimensional functions Examine dimensional functions Show totals and exclude members Create a percent of base calculation Create advanced dynamic reports Use query macros Control report output using a query macro Create a dynamic growth report Create a report that displays summary data before detailed data and uses singletons to summarize information Design effective prompts Create a prompt that allows users to select conditional formatting values Create a prompt that provides users a choice between different filters Create a prompt to let users choose a column sort order Create a prompt to let users select a display type Examine the report specification Examine report specification flow Identify considerations when modifying report specifications Customize reporting objects Distribute reports Burst a report to email recipients by using a data item Burst a list report to the IBM Cognos Analytics portal by using a burst table Burst a crosstab report to the IBM Cognos Analytics portal by using a burst table and a master detail relationship Enhance user interaction with HTML Create interactive reports using HTML Include additional information with tooltips Send emails using links in a report Introduction to IBM Cognos Active Reports Examine Active Report controls and variables Create a simple Active Report using Static and Data-driven controls Change filtering and selection behavior in a report Create interaction between multiple controls and variables Active Report charts and decks Create an Active Report with a Data deck Use Master detail relationships with Decks Optimize Active Reports Create an Active Report with new visualizations
Duration 2 Days 12 CPD hours This course is intended for Report authors working with dimensional data sources. Through interactive demonstrations & exercises, participants will learn how to author reports that navigate & manipulate dimensional data structures using the specific dimensional functions & features available in IBM Cognos Analytics. Introduction to Dimensional Concepts Identify different data sources and models Investigate the OLAP dimensional structure Identify dimensional data items and expressions Differentiate the IBM Cognos Analytics query language from SQL and MDX Differentiate relational and dimensional report authoring styles Introduction to Dimensional Data in Reports Work with members Identify sets and tuples in IBM Cognos Analytics Dimensional Report Context Understand the purpose of report context Understand how data is affected by default and root members Focus Your Dimensional Data Compare dimensional queries to relational queries Explain the importance of filtering dimensional queries Evaluate different filtering techniques Filter based on dimensions and members Filter based on measure values Filter using a slicer Calculations & Dimensional Functions Use IBM Cognos Analytics dimensional functions to create sets and tuples Perform arithmetic operations in OLAP queries Identify coercion errors and rules Functions for Navigating Dimesional Hierarchies Navigate dimensional data using family functions Relative Functions Navigate dimensional data using relative functions Navigate dimensional data using relative time functions Advanced Drilling Techniques & Member Sets Understand default drill-up and drill-down functionality Identify cases when you need to override default drilling behavior Configure advanced drilling behavior to support sophisticated use cases Define member sets to support advanced drilling Define member sets to support functions Set Up Drill-Through Reports Navigate from a specific report to a target report Drill down to greater detail and then navigate to target report Navigate between reports created using different data sources End-to-End Workshop Review concepts covered throughout the course
Duration 3 Days 18 CPD hours This course is intended for Data Wrangling with Python takes a practical approach to equip beginners with the most essential data analysis tools in the shortest possible time. It contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context. Overview By the end of this course, you will be confident in using a diverse array of sources to extract, clean, transform, and format your data efficiently. In this course you will start with the absolute basics of Python, focusing mainly on data structures. Then you will delve into the fundamental tools of data wrangling like NumPy and Pandas libraries. You'll explore useful insights into why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of the specialized pre-built routines in Python.This combination of Python tips and tricks will also demonstrate how to use the same Python backend and extract/transform data from an array of sources including the Internet, large database vaults, and Excel financial tables. To help you prepare for more challenging scenarios, you'll cover how to handle missing or wrong data, and reformat it based on the requirements from the downstream analytics tool. The course will further help you grasp concepts through real-world examples and datasets. Introduction to Data Structure using Python Python for Data Wrangling Lists, Sets, Strings, Tuples, and Dictionaries Advanced Operations on Built-In Data Structure Advanced Data Structures Basic File Operations in Python Introduction to NumPy, Pandas, and Matplotlib NumPy Arrays Pandas DataFrames Statistics and Visualization with NumPy and Pandas Using NumPy and Pandas to Calculate Basic Descriptive Statistics on the DataFrame Deep Dive into Data Wrangling with Python Subsetting, Filtering, and Grouping Detecting Outliers and Handling Missing Values Concatenating, Merging, and Joining Useful Methods of Pandas Get Comfortable with a Different Kind of Data Sources Reading Data from Different Text-Based (and Non-Text-Based) Sources Introduction to BeautifulSoup4 and Web Page Parsing Learning the Hidden Secrets of Data Wrangling Advanced List Comprehension and the zip Function Data Formatting Advanced Web Scraping and Data Gathering Basics of Web Scraping and BeautifulSoup libraries Reading Data from XML RDBMS and SQL Refresher of RDBMS and SQL Using an RDBMS (MySQL/PostgreSQL/SQLite) Application in real life and Conclusion of course Applying Your Knowledge to a Real-life Data Wrangling Task An Extension to Data Wrangling
Duration 4 Days 24 CPD hours This course is intended for This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Overview Skills gained in this training include:The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysisThe fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with HadoopHow Pig, Hive, and Impala improve productivity for typical analysis tasksJoining diverse datasets to gain valuable business insightPerforming real-time, complex queries on datasets Cloudera University?s four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data. Hadoop Fundamentals The Motivation for Hadoop Hadoop Overview Data Storage: HDFS Distributed Data Processing: YARN, MapReduce, and Spark Data Processing and Analysis: Pig, Hive, and Impala Data Integration: Sqoop Other Hadoop Data Tools Exercise Scenarios Explanation Introduction to Pig What Is Pig? Pig?s Features Pig Use Cases Interacting with Pig Basic Data Analysis with Pig Pig Latin Syntax Loading Data Simple Data Types Field Definitions Data Output Viewing the Schema Filtering and Sorting Data Commonly-Used Functions Processing Complex Data with Pig Storage Formats Complex/Nested Data Types Grouping Built-In Functions for Complex Data Iterating Grouped Data Multi-Dataset Operations with Pig Techniques for Combining Data Sets Joining Data Sets in Pig Set Operations Splitting Data Sets Pig Troubleshoot & Optimization Troubleshooting Pig Logging Using Hadoop?s Web UI Data Sampling and Debugging Performance Overview Understanding the Execution Plan Tips for Improving the Performance of Your Pig Jobs Introduction to Hive & Impala What Is Hive? What Is Impala? Schema and Data Storage Comparing Hive to Traditional Databases Hive Use Cases Querying with Hive & Impala Databases and Tables Basic Hive and Impala Query Language Syntax Data Types Differences Between Hive and Impala Query Syntax Using Hue to Execute Queries Using the Impala Shell Data Management Data Storage Creating Databases and Tables Loading Data Altering Databases and Tables Simplifying Queries with Views Storing Query Results Data Storage & Performance Partitioning Tables Choosing a File Format Managing Metadata Controlling Access to Data Relational Data Analysis with Hive & Impala Joining Datasets Common Built-In Functions Aggregation and Windowing Working with Impala How Impala Executes Queries Extending Impala with User-Defined Functions Improving Impala Performance Analyzing Text and Complex Data with Hive Complex Values in Hive Using Regular Expressions in Hive Sentiment Analysis and N-Grams Conclusion Hive Optimization Understanding Query Performance Controlling Job Execution Plan Bucketing Indexing Data Extending Hive SerDes Data Transformation with Custom Scripts User-Defined Functions Parameterized Queries Choosing the Best Tool for the Job Comparing MapReduce, Pig, Hive, Impala, and Relational Databases Which to Choose?
Duration 3 Days 18 CPD hours This course is intended for Report Authors Overview What is IBM Cognos Analytics ? Reporting Examine dimensionally modelled and dimensional data sources Examine personal data sources and data modules Examine List reports Aggregate measure/fact data Use shared dimensions to create multi-fact queries Add repeated information to reports Create crosstab reports Create complex crosstab reports Format, sort, and aggregate data in a crosstab report Create discontinuous crosstab reports Create Visualization reports Add business logic to reports using IBM Cognos Analytics ? Reporting Focus reports using filters Focus reports using prompts Augment reports using calculations Extend report functionality in IBM Cognos Analytics - Reporting Customize reports with conditional formatting Conditionally format one crosstab measure based on another Drill-through definitions Enhance the report layout Use additional report building techniques This offering provides Business and Professional Authors with an introduction to report building techniques using relational data models. Techniques to enhance, customize, and manage professional reports will be explored. Activities will illustrate and reinforce key concepts during this learning opportunity. What is IBM Cognos Analytics - Reporting? Create a simple list report Create a report from a dimensionally modeled relational data source Examine personal data sources and data modules Upload personal data Upload custom images Use navigation paths Create a report from a personal data source Examine list reports Group data in a list Format columns in a list Include headers and footers in a list Enhance a list report Aggregate measure/fact data Identify differences in aggregation Explore data aggregation Use shared dimensions to create multi-fact queries Create a multi-fact query in a list report Add repeated information to reports Create a mailing list report Create crosstab reports Add measures to a crosstab Data sources for a crosstab Create a simple crosstab report Create complex crosstab reports Add items as peers Create crosstab nodes and crosstab members Create a complex crosstab report Format, sort, and aggregate data in a crosstab Sort, format, and aggregate a crosstab report Create discontinuous crosstab reports Present unrelated items using a discontinuous crosstab Create a visualization report Create and format a visualization report Create a report that uses a Map visualization Show the same data graphically and numerically Focus reports using filters Apply filters to a report Apply a detail filter on fact data in a report Apply a summary filter to a report Focus reports using prompts Create a prompt by adding a parameter Add a value prompt to a report Add a Select & search prompt to a report Create a cascading prompt Augment reports using calculations Add calculations to a report Display prompt selections in the report title Customize reports with conditional formatting Create a multilingual report Highlight exceptional data and conditionally render a column Drill-through definitions Let users navigate to related data in IBM Cognos Analytics Enhance report layout Create a report structured on data items Create a condensed list report Use additional report building techniques Section a report and reuse objects within the same report Reuse layout components in a different report Explore options for reports that contain no data Additional course details: Nexus Humans B6158 IBM Cognos Analytics - Author Reports Fundamentals (v11.0.x) training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the B6158 IBM Cognos Analytics - Author Reports Fundamentals (v11.0.x) course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.
Duration 2 Days 12 CPD hours This course is intended for If you are a data analyst, data scientist, or a business analyst who wants to get started with using Python and machine learning techniques to analyze data and predict outcomes, this book is for you. Basic knowledge of computer programming and data analytics is a must. Familiarity with mathematical concepts such as algebra and basic statistics will be useful. Overview By the end of this course, you will have the skills you need to confidently use various machine learning algorithms to perform detailed data analysis and extract meaningful insights from data. This course is designed to give you practical guidance on industry-standard data analysis and machine learning tools in Python, with the help of realistic data. The course will help you understand how you can use pandas and Matplotlib to critically examine a dataset with summary statistics and graphs, and extract the insights you seek to derive. You will continue to build on your knowledge as you learn how to prepare data and feed it to machine learning algorithms, such as regularized logistic regression and random forest, using the scikit-learn package. You?ll discover how to tune the algorithms to provide the best predictions on new and unseen data. As you delve into later sections, you?ll be able to understand the working and output of these algorithms and gain insight into not only the predictive capabilities of the models but also their reasons for making these predictions. Data Exploration and Cleaning Python and the Anaconda Package Management System Different Types of Data Science Problems Loading the Case Study Data with Jupyter and pandas Data Quality Assurance and Exploration Exploring the Financial History Features in the Dataset Activity 1: Exploring Remaining Financial Features in the Dataset Introduction to Scikit-Learn and Model Evaluation Introduction Model Performance Metrics for Binary Classification Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve Details of Logistic Regression and Feature Exploration Introduction Examining the Relationships between Features and the Response Univariate Feature Selection: What It Does and Doesn't Do Building Cloud-Native Applications Activity 3: Fitting a Logistic Regression Model and Directly Using the Coefficients The Bias-Variance Trade-off Introduction Estimating the Coefficients and Intercepts of Logistic Regression Cross Validation: Choosing the Regularization Parameter and Other Hyperparameters Activity 4: Cross-Validation and Feature Engineering with the Case Study Data Decision Trees and Random Forests Introduction Decision trees Random Forests: Ensembles of Decision Trees Activity 5: Cross-Validation Grid Search with Random Forest Imputation of Missing Data, Financial Analysis, and Delivery to Client Introduction Review of Modeling Results Dealing with Missing Data: Imputation Strategies Activity 6: Deriving Financial Insights Final Thoughts on Delivering the Predictive Model to the Client
Duration 2 Days 12 CPD hours This course is intended for This course is designed for professionals in a variety of job roles who are currently using desktop or web-based data management tools such as Microsoft Excel or SQL Server reporting services to perform numerical or general data analysis. They are responsible for connecting to cloud-based data sources, as well as shaping and combining data for the purpose of analysis. They are also looking for alternative ways to analyze business data, visualize insights, and share those insights with peers across the enterprise. This includes capturing and reporting on data to peers, executives, and clients. Overview In this course, you will analyze data with Microsoft Power BI. You will: Analyze data with self-service BI. Connect to data sources. Perform data cleaning, profiling, and shaping. Visualize data with Power BI. Enhance data analysis by adding and customizing visual elements. Model data with calculations. Create interactive visualizations. As technology progresses and becomes more interwoven with our businesses and lives, more data is collected about business and personal activities. This era of 'big data' is a direct result of the popularity and growth of cloud computing, which provides an abundance of computational power and storage, allowing organizations of all sorts to capture and store data. Leveraging that data effectively can provide timely insights and competitive advantages. Creating data-backed visualizations is key for data scientists, or any professional, to explore, analyze, and report insights and trends from data. Microsoft© Power BI© software is designed for this purpose. Power BI was built to connect to a wide range of data sources, and it enables users to quickly create visualizations of connected data to gain insights, show trends, and create reports. Power BI's data connection capabilities and visualization features go far beyond those that can be found in spreadsheets, enabling users to create compelling and interactive worksheets, dashboards, and stories that bring data to life and turn data into thoughtful action. Analyzing Data with Self-Service BI Topic A: Data Analysis and Visualization for Business Intelligence Topic B: Self-Service BI with Microsoft Power BI Connecting to Data Sources Topic A: Create Data Connections Topic B: Configure and Manage Data Relationships Topic C: Save Files in Power BI Performing Data Cleaning, Profiling, and Shaping Topic A: Clean, Transform, and Load Data with the Query Editor Topic B: Profile Data with the Query Editor Topic C: Shape Data with the Query Editor Topic D: Combine and Manage Data Rows Visualizing Data with Power BI Topic A: Create Visualizations in Power BI Topic B: Chart Data in Power BI Enhancing Data Analysis Topic A: Customize Visuals and Pages Topic B: Incorporate Tooltips Modeling Data with Calculations Topic A: Create Calculations with Data Analysis Expressions (DAX) Topic B: Create Calculated Measures and Conditional Columns Creating Interactive Visualizations Topic A: Create and Manage Data Hierarchies Topic B: Filter and Slice Reports Topic C: Create Dashboards Additional course details: Nexus Humans Microsoft Power BI: Data Analysis Practitioner (Second Edition) (v1.3) training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Microsoft Power BI: Data Analysis Practitioner (Second Edition) (v1.3) course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.
Duration 2 Days 12 CPD hours This course is intended for Business Analysts, Technical Managers, and Programmers Overview This intensive training course helps students learn the practical aspects of the R programming language. The course is supplemented by many hands-on labs which allow attendees to immediately apply their theoretical knowledge in practice. Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data as well as supervised and unsupervised machine learning. What is R ? What is R? ? Positioning of R in the Data Science Space ? The Legal Aspects ? Microsoft R Open ? R Integrated Development Environments ? Running R ? Running RStudio ? Getting Help ? General Notes on R Commands and Statements ? Assignment Operators ? R Core Data Structures ? Assignment Example ? R Objects and Workspace ? Printing Objects ? Arithmetic Operators ? Logical Operators ? System Date and Time ? Operations ? User-defined Functions ? Control Statements ? Conditional Execution ? Repetitive Execution ? Repetitive execution ? Built-in Functions ? Summary Introduction to Functional Programming with R ? What is Functional Programming (FP)? ? Terminology: Higher-Order Functions ? A Short List of Languages that Support FP ? Functional Programming in R ? Vector and Matrix Arithmetic ? Vector Arithmetic Example ? More Examples of FP in R ? Summary Managing Your Environment ? Getting and Setting the Working Directory ? Getting the List of Files in a Directory ? The R Home Directory ? Executing External R commands ? Loading External Scripts in RStudio ? Listing Objects in Workspace ? Removing Objects in Workspace ? Saving Your Workspace in R ? Saving Your Workspace in RStudio ? Saving Your Workspace in R GUI ? Loading Your Workspace ? Diverting Output to a File ? Batch (Unattended) Processing ? Controlling Global Options ? Summary R Type System and Structures ? The R Data Types ? System Date and Time ? Formatting Date and Time ? Using the mode() Function ? R Data Structures ? What is the Type of My Data Structure? ? Creating Vectors ? Logical Vectors ? Character Vectors ? Factorization ? Multi-Mode Vectors ? The Length of the Vector ? Getting Vector Elements ? Lists ? A List with Element Names ? Extracting List Elements ? Adding to a List ? Matrix Data Structure ? Creating Matrices ? Creating Matrices with cbind() and rbind() ? Working with Data Frames ? Matrices vs Data Frames ? A Data Frame Sample ? Creating a Data Frame ? Accessing Data Cells ? Getting Info About a Data Frame ? Selecting Columns in Data Frames ? Selecting Rows in Data Frames ? Getting a Subset of a Data Frame ? Sorting (ordering) Data in Data Frames by Attribute(s) ? Editing Data Frames ? The str() Function ? Type Conversion (Coercion) ? The summary() Function ? Checking an Object's Type ? Summary Extending R ? The Base R Packages ? Loading Packages ? What is the Difference between Package and Library? ? Extending R ? The CRAN Web Site ? Extending R in R GUI ? Extending R in RStudio ? Installing and Removing Packages from Command-Line ? Summary Read-Write and Import-Export Operations in R ? Reading Data from a File into a Vector ? Example of Reading Data from a File into A Vector ? Writing Data to a File ? Example of Writing Data to a File ? Reading Data into A Data Frame ? Writing CSV Files ? Importing Data into R ? Exporting Data from R ? Summary Statistical Computing Features in R ? Statistical Computing Features ? Descriptive Statistics ? Basic Statistical Functions ? Examples of Using Basic Statistical Functions ? Non-uniformity of a Probability Distribution ? Writing Your Own skew and kurtosis Functions ? Generating Normally Distributed Random Numbers ? Generating Uniformly Distributed Random Numbers ? Using the summary() Function ? Math Functions Used in Data Analysis ? Examples of Using Math Functions ? Correlations ? Correlation Example ? Testing Correlation Coefficient for Significance ? The cor.test() Function ? The cor.test() Example ? Regression Analysis ? Types of Regression ? Simple Linear Regression Model ? Least-Squares Method (LSM) ? LSM Assumptions ? Fitting Linear Regression Models in R ? Example of Using lm() ? Confidence Intervals for Model Parameters ? Example of Using lm() with a Data Frame ? Regression Models in Excel ? Multiple Regression Analysis ? Summary Data Manipulation and Transformation in R ? Applying Functions to Matrices and Data Frames ? The apply() Function ? Using apply() ? Using apply() with a User-Defined Function ? apply() Variants ? Using tapply() ? Adding a Column to a Data Frame ? Dropping A Column in a Data Frame ? The attach() and detach() Functions ? Sampling ? Using sample() for Generating Labels ? Set Operations ? Example of Using Set Operations ? The dplyr Package ? Object Masking (Shadowing) Considerations ? Getting More Information on dplyr in RStudio ? The search() or searchpaths() Functions ? Handling Large Data Sets in R with the data.table Package ? The fread() and fwrite() functions from the data.table Package ? Using the Data Table Structure ? Summary Data Visualization in R ? Data Visualization ? Data Visualization in R ? The ggplot2 Data Visualization Package ? Creating Bar Plots in R ? Creating Horizontal Bar Plots ? Using barplot() with Matrices ? Using barplot() with Matrices Example ? Customizing Plots ? Histograms in R ? Building Histograms with hist() ? Example of using hist() ? Pie Charts in R ? Examples of using pie() ? Generic X-Y Plotting ? Examples of the plot() function ? Dot Plots in R ? Saving Your Work ? Supported Export Options ? Plots in RStudio ? Saving a Plot as an Image ? Summary Using R Efficiently ? Object Memory Allocation Considerations ? Garbage Collection ? Finding Out About Loaded Packages ? Using the conflicts() Function ? Getting Information About the Object Source Package with the pryr Package ? Using the where() Function from the pryr Package ? Timing Your Code ? Timing Your Code with system.time() ? Timing Your Code with System.time() ? Sleeping a Program ? Handling Large Data Sets in R with the data.table Package ? Passing System-Level Parameters to R ? Summary Lab Exercises Lab 1 - Getting Started with R Lab 2 - Learning the R Type System and Structures Lab 3 - Read and Write Operations in R Lab 4 - Data Import and Export in R Lab 5 - k-Nearest Neighbors Algorithm Lab 6 - Creating Your Own Statistical Functions Lab 7 - Simple Linear Regression Lab 8 - Monte-Carlo Simulation (Method) Lab 9 - Data Processing with R Lab 10 - Using R Graphics Package Lab 11 - Using R Efficiently