Duration 5 Days 30 CPD hours This course is intended for This course is intended for Database Administrators, Database Developers, BI professionals, and Business reporting users. Overview Upon successful completion of this course, students will be able to run Queries and retrieve results, perform conditional searches, and retrieve data from multiple tables. Before starting this course, make sure you meet at least one of the following prerequisites: Basic knowledge of the Microsoft Windows operating system and its core functionality. Basic working knowledge of Relational Databases. In this course, students will gain a good understanding of the Transact-SQL language. They will be able to create queries, sort, filter the data, execute procedures with T-SQL. Course Outline 1.Introduction to Microsoft SQL Server 2.Introduction to T-SQL Querying 3.Writing SELECT Queries 4.Querying Multiple Tables 5.Sorting and Filtering Data 6.Working with SQL Server Data Types 7.Using DML to Modify Data 8.Using Built-In Functions 9.Grouping and Aggregating Data 10.Using Subqueries 11.Using Table Expressions 12.Using Set Operators 13.Using Windows Ranking, Offset, and Aggregate Functions 14.Pivoting and Grouping Sets 15.Executing Stored Procedures 16.Programming with T-SQL
Duration 3 Days 18 CPD hours This course is intended for This course is intended for: Database architects Database administrators Database developers Data analysts and scientists Overview This course is designed to teach you how to: Discuss the core concepts of data warehousing, and the intersection between data warehousing and big data solutions Launch an Amazon Redshift cluster and use the components, features, and functionality to implement a data warehouse in the cloud Use other AWS data and analytic services, such as Amazon DynamoDB, Amazon EMR, Amazon Kinesis, and Amazon S3, to contribute to the data warehousing solution Architect the data warehouse Identify performance issues, optimize queries, and tune the database for better performance Use Amazon Redshift Spectrum to analyze data directly from an Amazon S3 bucket Use Amazon QuickSight to perform data analysis and visualization tasks against the data warehouse Data Warehousing on AWS introduces you to concepts, strategies, and best practices for designing a cloud-based data warehousing solution using Amazon Redshift, the petabyte-scale data warehouse in AWS. This course demonstrates how to collect, store, and prepare data for the data warehouse by using other AWS services such as Amazon DynamoDB, Amazon EMR, Amazon Kinesis, and Amazon S3. Additionally, this course demonstrates how to use Amazon QuickSight to perform analysis on your data Module 1: Introduction to Data Warehousing Relational databases Data warehousing concepts The intersection of data warehousing and big data Overview of data management in AWS Hands-on lab 1: Introduction to Amazon Redshift Module 2: Introduction to Amazon Redshift Conceptual overview Real-world use cases Hands-on lab 2: Launching an Amazon Redshift cluster Module 3: Launching clusters Building the cluster Connecting to the cluster Controlling access Database security Load data Hands-on lab 3: Optimizing database schemas Module 4: Designing the database schema Schemas and data types Columnar compression Data distribution styles Data sorting methods Module 5: Identifying data sources Data sources overview Amazon S3 Amazon DynamoDB Amazon EMR Amazon Kinesis Data Firehose AWS Lambda Database Loader for Amazon Redshift Hands-on lab 4: Loading real-time data into an Amazon Redshift database Module 6: Loading data Preparing Data Loading data using COPY Data Warehousing on AWS AWS Classroom Training Concurrent write operations Troubleshooting load issues Hands-on lab 5: Loading data with the COPY command Module 7: Writing queries and tuning for performance Amazon Redshift SQL User-Defined Functions (UDFs) Factors that affect query performance The EXPLAIN command and query plans Workload Management (WLM) Hands-on lab 6: Configuring workload management Module 8: Amazon Redshift Spectrum Amazon Redshift Spectrum Configuring data for Amazon Redshift Spectrum Amazon Redshift Spectrum Queries Hands-on lab 7: Using Amazon Redshift Spectrum Module 9: Maintaining clusters Audit logging Performance monitoring Events and notifications Lab 8: Auditing and monitoring clusters Resizing clusters Backing up and restoring clusters Resource tagging and limits and constraints Hands-on lab 9: Backing up, restoring and resizing clusters Module 10: Analyzing and visualizing data Power of visualizations Building dashboards Amazon QuickSight editions and feature
Duration 1 Days 6 CPD hours This course is intended for This class is intended for the following: Data analysts, Data scientists, Business analysts getting started with Google Cloud Platform. Individuals responsible for designing pipelines and architectures for data processing, creating and maintaining machine learning and statistical models, querying datasets, visualizing query results and creating reports. Executives and IT decision makers evaluating Google Cloud Platform for use by data scientists. Overview This course teaches students the following skills:Identify the purpose and value of the key Big Data and Machine Learning products in the Google Cloud Platform.Use Cloud SQL and Cloud Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud Platform.Employ BigQuery and Cloud Datalab to carry out interactive data analysis.Train and use a neural network using TensorFlow.Employ ML APIs.Choose between different data processing products on the Google Cloud Platform. This course introduces participants to the Big Data and Machine Learning capabilities of Google Cloud Platform (GCP). It provides a quick overview of the Google Cloud Platform and a deeper dive of the data processing capabilities. Introducing Google Cloud Platform Google Platform Fundamentals Overview. Google Cloud Platform Big Data Products. Compute and Storage Fundamentals CPUs on demand (Compute Engine). A global filesystem (Cloud Storage). CloudShell. Lab: Set up a Ingest-Transform-Publish data processing pipeline. Data Analytics on the Cloud Stepping-stones to the cloud. Cloud SQL: your SQL database on the cloud. Lab: Importing data into CloudSQL and running queries. Spark on Dataproc. Lab: Machine Learning Recommendations with Spark on Dataproc. Scaling Data Analysis Fast random access. Datalab. BigQuery. Lab: Build machine learning dataset. Machine Learning Machine Learning with TensorFlow. Lab: Carry out ML with TensorFlow Pre-built models for common needs. Lab: Employ ML APIs. Data Processing Architectures Message-oriented architectures with Pub/Sub. Creating pipelines with Dataflow. Reference architecture for real-time and batch data processing. Summary Why GCP? Where to go from here Additional Resources Additional course details: Nexus Humans Google Cloud Platform Big Data and Machine Learning Fundamentals training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Google Cloud Platform Big Data and Machine Learning Fundamentals course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.
In the past, popular thought treated artificial intelligence (AI) as if it were the domain of science fiction or some far-flung future. In the last few years, however, AI has been given new life. The business world has especially given it renewed interest. However, AI is not just another technology or process for the business to consider - it is a truly disruptive force.
Microsoft Power BI Masterclass 2021 Course Overview: The "Microsoft Power BI Masterclass 2021" provides learners with the skills to become proficient in data analysis and visualization using Power BI. This comprehensive course covers the core functionalities of Power BI, from data preparation and transformation to creating impactful reports and dashboards. Learners will gain valuable insights into data modelling, visualisation, and the use of DAX for advanced calculations. By the end of the course, participants will be able to apply their knowledge to real-world projects, improving their ability to communicate data-driven insights effectively. This course is ideal for professionals and beginners who want to leverage Power BI to unlock the potential of their data. Course Description: This masterclass delves into the essential features of Microsoft Power BI, guiding learners through every stage of data analysis. Starting with project setup and data transformation in the Query Editor, the course progresses to advanced topics such as DAX functions and data storytelling. Learners will explore how to build data models, create dashboards, and employ Python in Power BI to enhance their reports. The course also covers Power BI Service for cloud-based analytics, row-level security for data protection, and integrating additional data sources. With a focus on empowering users to communicate insights clearly, the course ensures learners gain the expertise to manage data efficiently, make informed decisions, and stay up to date with evolving tools and features. Microsoft Power BI Masterclass 2021 Curriculum: Module 01: Introduction Module 02: Preparing our Project Module 03: Data Transformation - The Query Editor Module 04: Data Transformation - Advanced Module 05: Creating a Data Model Module 06: Data Visualization Module 07: Power BI & Python Module 08: Storytelling with Data Module 09: DAX - The Essentials Module 10: DAX - The CALCULATE function Module 11: Power BI Service - Power BI Cloud Module 12: Row-Level Security Module 13: More data sources Module 14: Next steps to improve & stay up to date (See full curriculum) Who is this course for? Individuals seeking to enhance their data analysis skills. Professionals aiming to advance their data visualization expertise. Beginners with an interest in data science or business analytics. Business analysts or data professionals looking to upskill in Power BI. Career Path: Data Analyst Business Intelligence Analyst Data Scientist Power BI Developer Reporting Analyst Data Visualisation Expert
Duration 3 Days 18 CPD hours This course is intended for This is an introductory-level course designed to teach experienced systems administrators how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Previous Hadoop experience is not required. Overview Working within in an engaging, hands-on learning environment, guided by our expert team, attendees will learn to: Understand the benefits of distributed computing Understand the Hadoop architecture (including HDFS and MapReduce) Define administrator participation in Big Data projects Plan, implement, and maintain Hadoop clusters Deploy and maintain additional Big Data tools (Pig, Hive, Flume, etc.) Plan, deploy and maintain HBase on a Hadoop cluster Monitor and maintain hundreds of servers Pinpoint performance bottlenecks and fix them Apache Hadoop is an open source framework for creating reliable and distributable compute clusters. Hadoop provides an excellent platform (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence. This is an introductory-level, hands-on lab-intensive course geared for the administrator (new to Hadoop) who is charged with maintaining a Hadoop cluster and its related components. You will learn how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Introduction Hadoop history and concepts Ecosystem Distributions High level architecture Hadoop myths Hadoop challenges (hardware / software) Planning and installation Selecting software and Hadoop distributions Sizing the cluster and planning for growth Selecting hardware and network Rack topology Installation Multi-tenancy Directory structure and logs Benchmarking HDFS operations Concepts (horizontal scaling, replication, data locality, rack awareness) Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) Health monitoring Command-line and browser-based administration Adding storage and replacing defective drives MapReduce operations Parallel computing before MapReduce: compare HPC versus Hadoop administration MapReduce cluster loads Nodes and Daemons (JobTracker, TaskTracker) MapReduce UI walk through MapReduce configuration Job config Job schedulers Administrator view of MapReduce best practices Optimizing MapReduce Fool proofing MR: what to tell your programmers YARN: architecture and use Advanced topics Hardware monitoring System software monitoring Hadoop cluster monitoring Adding and removing servers and upgrading Hadoop Backup, recovery, and business continuity planning Cluster configuration tweaks Hardware maintenance schedule Oozie scheduling for administrators Securing your cluster with Kerberos The future of Hadoop
Duration 2 Days 12 CPD hours This course is intended for Audience: Data Scientists, Software Developers, IT Architects, and Technical Managers. Participants should have the general knowledge of statistics and programming Also familiar with Python Overview ? NumPy, pandas, Matplotlib, scikit-learn ? Python REPLs ? Jupyter Notebooks ? Data analytics life-cycle phases ? Data repairing and normalizing ? Data aggregation and grouping ? Data visualization ? Data science algorithms for supervised and unsupervised machine learning Covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Python for Data Science ? Using Modules ? Listing Methods in a Module ? Creating Your Own Modules ? List Comprehension ? Dictionary Comprehension ? String Comprehension ? Python 2 vs Python 3 ? Sets (Python 3+) ? Python Idioms ? Python Data Science ?Ecosystem? ? NumPy ? NumPy Arrays ? NumPy Idioms ? pandas ? Data Wrangling with pandas' DataFrame ? SciPy ? Scikit-learn ? SciPy or scikit-learn? ? Matplotlib ? Python vs R ? Python on Apache Spark ? Python Dev Tools and REPLs ? Anaconda ? IPython ? Visual Studio Code ? Jupyter ? Jupyter Basic Commands ? Summary Applied Data Science ? What is Data Science? ? Data Science Ecosystem ? Data Mining vs. Data Science ? Business Analytics vs. Data Science ? Data Science, Machine Learning, AI? ? Who is a Data Scientist? ? Data Science Skill Sets Venn Diagram ? Data Scientists at Work ? Examples of Data Science Projects ? An Example of a Data Product ? Applied Data Science at Google ? Data Science Gotchas ? Summary Data Analytics Life-cycle Phases ? Big Data Analytics Pipeline ? Data Discovery Phase ? Data Harvesting Phase ? Data Priming Phase ? Data Logistics and Data Governance ? Exploratory Data Analysis ? Model Planning Phase ? Model Building Phase ? Communicating the Results ? Production Roll-out ? Summary Repairing and Normalizing Data ? Repairing and Normalizing Data ? Dealing with the Missing Data ? Sample Data Set ? Getting Info on Null Data ? Dropping a Column ? Interpolating Missing Data in pandas ? Replacing the Missing Values with the Mean Value ? Scaling (Normalizing) the Data ? Data Preprocessing with scikit-learn ? Scaling with the scale() Function ? The MinMaxScaler Object ? Summary Descriptive Statistics Computing Features in Python ? Descriptive Statistics ? Non-uniformity of a Probability Distribution ? Using NumPy for Calculating Descriptive Statistics Measures ? Finding Min and Max in NumPy ? Using pandas for Calculating Descriptive Statistics Measures ? Correlation ? Regression and Correlation ? Covariance ? Getting Pairwise Correlation and Covariance Measures ? Finding Min and Max in pandas DataFrame ? Summary Data Aggregation and Grouping ? Data Aggregation and Grouping ? Sample Data Set ? The pandas.core.groupby.SeriesGroupBy Object ? Grouping by Two or More Columns ? Emulating the SQL's WHERE Clause ? The Pivot Tables ? Cross-Tabulation ? Summary Data Visualization with matplotlib ? Data Visualization ? What is matplotlib? ? Getting Started with matplotlib ? The Plotting Window ? The Figure Options ? The matplotlib.pyplot.plot() Function ? The matplotlib.pyplot.bar() Function ? The matplotlib.pyplot.pie () Function ? Subplots ? Using the matplotlib.gridspec.GridSpec Object ? The matplotlib.pyplot.subplot() Function ? Hands-on Exercise ? Figures ? Saving Figures to File ? Visualization with pandas ? Working with matplotlib in Jupyter Notebooks ? Summary Data Science and ML Algorithms in scikit-learn ? Data Science, Machine Learning, AI? ? Types of Machine Learning ? Terminology: Features and Observations ? Continuous and Categorical Features (Variables) ? Terminology: Axis ? The scikit-learn Package ? scikit-learn Estimators ? Models, Estimators, and Predictors ? Common Distance Metrics ? The Euclidean Metric ? The LIBSVM format ? Scaling of the Features ? The Curse of Dimensionality ? Supervised vs Unsupervised Machine Learning ? Supervised Machine Learning Algorithms ? Unsupervised Machine Learning Algorithms ? Choose the Right Algorithm ? Life-cycles of Machine Learning Development ? Data Split for Training and Test Data Sets ? Data Splitting in scikit-learn ? Hands-on Exercise ? Classification Examples ? Classifying with k-Nearest Neighbors (SL) ? k-Nearest Neighbors Algorithm ? k-Nearest Neighbors Algorithm ? The Error Rate ? Hands-on Exercise ? Dimensionality Reduction ? The Advantages of Dimensionality Reduction ? Principal component analysis (PCA) ? Hands-on Exercise ? Data Blending ? Decision Trees (SL) ? Decision Tree Terminology ? Decision Tree Classification in Context of Information Theory ? Information Entropy Defined ? The Shannon Entropy Formula ? The Simplified Decision Tree Algorithm ? Using Decision Trees ? Random Forests ? SVM ? Naive Bayes Classifier (SL) ? Naive Bayesian Probabilistic Model in a Nutshell ? Bayes Formula ? Classification of Documents with Naive Bayes ? Unsupervised Learning Type: Clustering ? Clustering Examples ? k-Means Clustering (UL) ? k-Means Clustering in a Nutshell ? k-Means Characteristics ? Regression Analysis ? Simple Linear Regression Model ? Linear vs Non-Linear Regression ? Linear Regression Illustration ? Major Underlying Assumptions for Regression Analysis ? Least-Squares Method (LSM) ? Locally Weighted Linear Regression ? Regression Models in Excel ? Multiple Regression Analysis ? Logistic Regression ? Regression vs Classification ? Time-Series Analysis ? Decomposing Time-Series ? Summary Lab Exercises Lab 1 - Learning the Lab Environment Lab 2 - Using Jupyter Notebook Lab 3 - Repairing and Normalizing Data Lab 4 - Computing Descriptive Statistics Lab 5 - Data Grouping and Aggregation Lab 6 - Data Visualization with matplotlib Lab 7 - Data Splitting Lab 8 - k-Nearest Neighbors Algorithm Lab 9 - The k-means Algorithm Lab 10 - The Random Forest Algorithm
Duration 1 Days 6 CPD hours This course is intended for This basic course is for users and developers familiar with earlier versions of IBM InfoSphere Information Server or IBM InfoSphere MDM who want to learn about new features in V11.3 Overview The objectives of this course are as follows:- Learn about the new features of DataStage V11.3- Learn about the new features of Information Analyzer V11.3- Learn about the new features of Data Click V11.3- Learn about the new features of the Information Governance Catalog V11.3 This course is designed to introduce you to new features in data integration and governance in IBM InfoSphere Information Server V11.3 and IBM InfoSphere MDM V11.3. Outline Unit DS: New Features in IBM InfoSphere DataStage V11.3 Unit DC: New Features in IBM InfoSphere Data Click V11.3 Unit IA: New Features in IBM InfoSphere Information Analyzer V11.3 **All units are accompanied by hands-on lab exercises. Additional course details: Nexus Humans KM650 IBM What is New in IBM InfoSphere Data Integration and Governance? V11.3 training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the KM650 IBM What is New in IBM InfoSphere Data Integration and Governance? V11.3 course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.
Duration 1 Days 6 CPD hours This course is intended for Report authors wanting to develop interactive report content, or content disconnected from IBM Cognos servers. In this course, participants increase their IBM Cognos Analytics experience by building interactive reports using Active Report controls, which can be distributed to and consumed by users in a disconnected environment, including mobile devices. Introduction to IBM Cognos Active Reports Examine IBM Cognos Active Reports Convert an existing report into an Active Report Add interactions in Active Reports using Active Report connections Create a basic Active Report Examine interactive behavior of Active Report controls Save a report in the IBM Cognos Analytics portal Save an Active Report to an MHT file Save an Active Report as a report template Use an Active Report as a prompt page Understand Active Report security Use Active Report Connections Examine Active Report connections Filter and select in controls using Active Report connections Examine variables Use a single variable to control multiple controls Use multiple variables to show different data in different controls Use Active Report controls to support mobile device usage Active Report Charts & Decks Add charts to active reports Understand and optimize chart behavior Examine decks and data decks Optimize use of decks Review Master Detail relationships Examine RAVE visualizations
Duration 2 Days 12 CPD hours This course is intended for Anyone who works with IBM SPSS Statistics and wants to learn advanced statistical procedures to be able to better answer research questions. Overview Introduction to advanced statistical analysis Group variables: Factor Analysis and Principal Components Analysis Group similar cases: Cluster Analysis Predict categorical targets with Nearest Neighbor Analysis Predict categorical targets with Discriminant Analysis Predict categorical targets with Logistic Regression Predict categorical targets with Decision Trees Introduction to Survival Analysis Introduction to Generalized Linear Models Introduction to Linear Mixed Models This course provides an application-oriented introduction to advanced statistical methods available in IBM SPSS Statistics. Students will review a variety of advanced statistical techniques and discuss situations in which each technique would be used, the assumptions made by each method, how to set up the analysis, and how to interpret the results. This includes a broad range of techniques for predicting variables, as well as methods to cluster variables and cases. Introduction to advanced statistical analysis Taxonomy of models Overview of supervised models Overview of models to create natural groupings Group variables: Factor Analysis and Principal Components Analysis Factor Analysis basics Principal Components basics Assumptions of Factor Analysis Key issues in Factor Analysis Improve the interpretability Use Factor and component scores Group similar cases: Cluster Analysis Cluster Analysis basics Key issues in Cluster Analysis K-Means Cluster Analysis Assumptions of K-Means Cluster Analysis TwoStep Cluster Analysis Assumptions of TwoStep Cluster Analysis Predict categorical targets with Nearest Neighbor Analysis Nearest Neighbor Analysis basics Key issues in Nearest Neighbor Analysis Assess model fit Predict categorical targets with Discriminant Analysis Discriminant Analysis basics The Discriminant Analysis model Core concepts of Discriminant Analysis Classification of cases Assumptions of Discriminant Analysis Validate the solution Predict categorical targets with Logistic Regression Binary Logistic Regression basics The Binary Logistic Regression model Multinomial Logistic Regression basics Assumptions of Logistic Regression procedures Testing hypotheses Predict categorical targets with Decision Trees Decision Trees basics Validate the solution Explore CHAID Explore CRT Comparing Decision Trees methods Introduction to Survival Analysis Survival Analysis basics Kaplan-Meier Analysis Assumptions of Kaplan-Meier Analysis Cox Regression Assumptions of Cox Regression Introduction to Generalized Linear Models Generalized Linear Models basics Available distributions Available link functions Introduction to Linear Mixed Models Linear Mixed Models basics Hierachical Linear Models Modeling strategy Assumptions of Linear Mixed Models Additional course details: Nexus Humans 0G09A IBM Advanced Statistical Analysis Using IBM SPSS Statistics (v25) training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the 0G09A IBM Advanced Statistical Analysis Using IBM SPSS Statistics (v25) course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.