• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

8 SRE courses

🔥 Limited Time Offer 🔥

Get a 10% discount on your first order when you use this promo code at checkout: MAY24BAN3X

Site Reliability Engineering (SRE) Foundation (DevOps Institute)

By Nexus Human

Duration 2 Days 12 CPD hours This course is intended for The target audience for the SRE Foundation course are professionals including. Anyone starting or leading a move towards increased reliability. Anyone interested in modern IT leadership and organizational change approaches. Business Managers, Business Stakeholders, Change Agents, Consultants, DevOps Practitioners, IT Directors, IT Managers, IT, Team Leaders, Product Owners, Scrum Masters, Software Engineers, Site Reliability Engineers, System Integrators, Tool Providers will benefit from this course. Overview The learning objectives for the SRE Foundation course include a practical understanding of. The history of SRE and its emergence at Google. The inter-relationship of SRE with DevOps and other popular frameworks. The underlying principles behind SRE Service Level Objectives (SLO's) and their user focus Service Level Indicators (SLI's) and the modern monitoring landscape. Error budgets and the associated error budget policies. Toil and its effect on an organization's productivity. Some practical steps that can help to eliminate toil. Observability as something to indicate the health of a service SRE tools. Automation techniques and the importance of security. Anti-fragility, our approach to failure and failure testing. The organizational impact that introducing SRE brings. The SRE (Site Reliability Engineering) Foundation course is an introduction to the principles & practices that enable an organization to reliably and economically scale critical services. Introducing a site-reliability dimension requires organizational re-alignment, a new focus on engineering & automation, and the adoption of a range of new working paradigms. This course prepares you for the SRE Foundation (SREF) certification. COURSE INTRODUCTION * Course Goals * Course Agenda SRE PRINCIPLES & PRACTICES * What is Site Reliability Engineering? * SRE & DevOps: What is the Difference? * SRE Principles & Practices SERVICE LEVEL OBJECTIVES & ERROR BUDGETS * Service Level Objectives (SLO?s) * Error Budgets * Error Budget Policies REDUCING TOIL * What is Toil? * Why is Toil Bad? * Doing Something About Toil MONITORING & SERVICE LEVEL INDICATORS * Service Level Indicators (SLI?s) * Monitoring * Observability SRE TOOLS & AUTOMATION * Automation Defined * Automation Focus * Hierarchy of Automation Types * Secure Automation * Automation Tools ANTI-FRAGILITY & LEARNING FROM FAILURE * Why Learn from Failure * Benefits of Anti-Fragility * Shifting the Organizational Balance ORGANIZATIONAL IMPACT OF SRE * Why Organizations Embrace SRE * Patterns for SRE Adoption * On-Call Necessities * Blameless Post-Mortems * SRE & Scale SRE, OTHER FRAMEWORKS, THE FUTURE * SRE & Other Frameworks * The Future EXAM PREPARATIONS * Exam Requirements, Question Weighting, and Terminology List * Sample Exam Review ADDITIONAL COURSE DETAILS: Nexus Humans Site Reliability Engineering (SRE) Foundation (DevOps Institute) training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Site Reliability Engineering (SRE) Foundation (DevOps Institute) course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Site Reliability Engineering (SRE) Foundation (DevOps Institute)
Delivered on-request, onlineDelivered Online
Price on Enquiry

Site Reliability Engineering (SRE) Practitioner (DevOps Institute)

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for The target audience for the SRE Practitioner course are professionals including: Anyone focused on large-scale service scalability and reliability Anyone interested in modern IT leadership and organizational change approaches Business Managers Business Stakeholders Change Agents Consultants DevOps Practitioners IT Directors IT Managers IT Team Leaders Product Owners Scrum Masters Software Engineers Site Reliability Engineers System Integrators Tool Providers Overview After completing this course, students will have learned: Practical view of how to successfully implement a flourishing SRE culture in your organization. The underlying principles of SRE and an understanding of what it is not in terms of anti-patterns, and how you become aware of them to avoid them. The organizational impact of introducing SRE. Acing the art of SLIs and SLOs in a distributed ecosystem and extending the usage of Error Budgets beyond the normal to innovate and avoid risks. Building security and resilience by design in a distributed, zero-trust environment. How do you implement full stack observability, distributed tracing and bring about an Observability-driven development culture? Curating data using AI to move from reactive to proactive and predictive incident management. Also, how you use DataOps to build clean data lineage. Why is Platform Engineering so important in building consistency and predictability of SRE culture? Implementing practical Chaos Engineering. Major incident response responsibilities for a SRE based on incident command framework, and examples of anatomy of unmanaged incidents. Perspective of why SRE can be considered as the purest implementation of DevOps SRE Execution model Understanding the SRE role and understanding why reliability is everyone's problem. SRE success story learnings This course introduces a range of practices for advancing service reliability engineering through a mixture of automation, organizational ways of working and business alignment. Tailored for those focused on large-scale service scalability and reliability. SRE ANTI-PATTERNS * Rebranding Ops or DevOps or Dev as SRE * Users notice an issue before you do * Measuring until my Edge * False positives are worse than no alerts * Configuration management trap for snowflakes * The Dogpile: Mob incident response * Point fixing * Production Readiness Gatekeeper * Fail-Safe really? SLO IS A PROXY FOR CUSTOMER HAPPINESS * Define SLIs that meaningfully measure the reliability of a service from a user?s perspective * Defining System boundaries in a distributed ecosystem for defining correct SLIs * Use error budgets to help your team have better discussions and make better data-driven decisions * Overall, Reliability is only as good as the weakest link on your service graph * Error thresholds when 3rd party services are used BUILDING SECURE AND RELIABLE SYSTEMS * SRE and their role in Building Secure and Reliable systems * Design for Changing Architecture * Fault tolerant Design * Design for Security * Design for Resiliency * Design for Scalability * Design for Performance * Design for Reliability * Ensuring Data Security and Privacy FULL-STACK OBSERVABILITY * Modern Apps are Complex & Unpredictable * Slow is the new down * Pillars of Observability * Implementing Synthetic and End user monitoring * Observability driven development * Distributed Tracing * What happens to Monitoring? * Instrumenting using Libraries an Agents PLATFORM ENGINEERING AND AIOPS * Taking a Platform Centric View solves Organizational scalability challenges such as fragmentation, inconsistency and unpredictability. * How do you use AIOps to improve Resiliency * How can DataOps help you in the journey * A simple recipe to implement AIOps * Indicative measurement of AIOps SRE & INCIDENT RESPONSE MANAGEMENT * SRE Key Responsibilities towards incident response * DevOps & SRE and ITIL * OODA and SRE Incident Response * Closed Loop Remediation and the Advantages * Swarming ? Food for Thought * AI/ML for better incident management CHAOS ENGINEERING * Navigating Complexity * Chaos Engineering Defined * Quick Facts about Chaos Engineering * Chaos Monkey Origin Story * Who is adopting Chaos Engineering * Myths of Chaos * Chaos Engineering Experiments * GameDay Exercises * Security Chaos Engineering * Chaos Engineering Resources SRE IS THE PUREST FORM OF DEVOPS * Key Principles of SRE * SREs help increase Reliability across the product spectrum * Metrics for Success * Selection of Target areas * SRE Execution Model * Culture and Behavioral Skills are key * SRE Case study POST-CLASS ASSIGNMENTS/EXERCISES * Non-abstract Large Scale Design (after Day 1) * Engineering Instrumentation- Instrumenting Gremlin (after Day 2)

Site Reliability Engineering (SRE) Practitioner (DevOps Institute)
Delivered on-request, onlineDelivered Online
Price on Enquiry

Logging, Monitoring and Observability in Google Cloud

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for This class is intended for the following customer job roles: Cloud architects, administrators, and SysOps personnel Cloud developers and DevOps personnel Overview This course teaches participants the following skills: Plan and implement a well-architected logging and monitoring infrastructure Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs) Create effective monitoring dashboards and alerts Monitor, troubleshoot, and improve Google Cloud infrastructure Analyze and export Google Cloud audit logs Find production code defects, identify bottlenecks, and improve performance Optimize monitoring costs This course teaches you techniques for monitoring, troubleshooting, and improving infrastructure and application performance in Google Cloud. Guided by the principles of Site Reliability Engineering (SRE), and using a combination of presentations, demos, hands-on labs, and real-world case studies, attendees gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, tracing application performance bottlenecks, and profiling CPU and memory usage. INTRODUCTION TO GOOGLE CLOUD MONITORING TOOLS * Understand the purpose and capabilities of Google Cloud operations-focused components: Logging, Monitoring, Error Reporting, and Service Monitoring * Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and Profiler AVOIDING CUSTOMER PAIN * Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation * Measure customer pain with SLIs * Define critical performance measures * Create and use SLOs and SLAs * Achieve developer and operation harmony with error budgets ALERTING POLICIES * Develop alerting strategies * Define alerting policies * Add notification channels * Identify types of alerts and common uses for each * Construct and alert on resource groups * Manage alerting policies programmatically MONITORING CRITICAL SYSTEMS * Choose best practice monitoring project architectures * Differentiate Cloud IAM roles for monitoring * Use the default dashboards appropriately * Build custom dashboards to show resource consumption and application load * Define uptime checks to track aliveness and latency CONFIGURING GOOGLE CLOUD SERVICES FOR OBSERVABILITY * Integrate logging and monitoring agents into Compute Engine VMs and images * Enable and utilize Kubernetes Monitoring * Extend and clarify Kubernetes monitoring with Prometheus * Expose custom metrics through code, and with the help of OpenCensus ADVANCED LOGGING AND ANALYSIS * Identify and choose among resource tagging approaches * Define log sinks (inclusion filters) and exclusion filters * Create metrics based on logs * Define custom metrics * Link application errors to Logging using Error Reporting * Export logs to BigQuery MONITORING NETWORK SECURITY AND AUDIT LOGS * Collect and analyze VPC Flow logs and Firewall Rules logs * Enable and monitor Packet Mirroring * Explain the capabilities of Network Intelligence Center * Use Admin Activity audit logs to track changes to the configuration or metadata of resources * Use Data Access audit logs to track accesses or changes to user-provided resource data * Use System Event audit logs to track GCP administrative actions MANAGING INCIDENTS * Define incident management roles and communication channels * Mitigate incident impact * Troubleshoot root causes * Resolve incidents * Document incidents in a post-mortem process INVESTIGATING APPLICATION PERFORMANCE ISSUES * Debug production code to correct code defects * Trace latency through layers of service interaction to eliminate performance bottlenecks * Profile and identify resource-intensive functions in an application OPTIMIZING THE COSTS OF MONITORING * Analyze resource utilization cust for monitoring related components within Google Cloud * Implement best practices for controlling the cost of monitoring within Google Cloud

Logging, Monitoring and Observability in Google Cloud
Delivered on-request, onlineDelivered Online
Price on Enquiry

Hands-on Linux - Self-Hosted WordPress for Linux Beginners

By Packt

Master the art of self-hosting WordPress on Linux with our comprehensive video course, designed to empower technical professionals to fully control their web presence.

Hands-on Linux - Self-Hosted WordPress for Linux Beginners
Delivered Online On Demand
£74.99

AWS Certified Solutions Architect Associate (SAA-C03)

By Packt

Prepare for the AWS Certified Solutions Architect - Associate (SAA-C03) exam. Learn about the AWS Management Console, S3 buckets, instances, database services, cloud security, costs associated with AWS, Amazon Elastic Compute Cloud (EC2), Amazon Virtual Private Cloud (VPC), Amazon Simple Storage Service (S3), and Amazon Elastic Block Store (EBS).

AWS Certified Solutions Architect Associate (SAA-C03)
Delivered Online On Demand
£261.99

AWS Solutions Architect Associate (SAA-C02) Exam Prep Course - 2021 UPDATED!

By Packt

With this 2-in-1 course, you will get access to AWS Technical Essentials and AWS Certified Solutions Architect - Associate certification exam content.

AWS Solutions Architect Associate (SAA-C02) Exam Prep Course - 2021 UPDATED!
Delivered Online On Demand
£88.99

HashiCorp Certified - Consul Associate Course

By Packt

The course will provide a comprehensive overview of Consul and its capabilities, including deploying a single data center, registering services using service discovery, and accessing Consul Key/Value (KV). It is designed for individuals who possess basic terminal skills and have an understanding of application and data center/cloud networking architectures for running applications.

HashiCorp Certified - Consul Associate Course
Delivered Online On Demand
£82.99

Red Hat OpenShift Installation Lab (DO322)

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for Cluster administrators (Junior systems administrators, junior cloud administrators) interested in deploying additional clusters to meet increasing demands from their organizations. Cluster engineers (Senior systems administrators, senior cloud administrators, cloud engineers) interested in the planning and design of OpenShift clusters to meet performance and reliability of different workloads and in creating work books for these installations. Site reliability engineers (SREs) interested in deploying test bed clusters to validate new settings, updates, customizations, operational procedures, and responses to incidents. Overview Validate infrastructure prerequisites for an OpenShift cluster. Run the OpenShift installer with custom settings. Describe and monitor each stage of the OpenShift installation process. Collect troubleshooting information during an ongoing installation, or after a failed installation. Complete the configuration of cluster services in a newly installed cluster. Installing OpenShift on a cloud, virtual, or physical infrastructure. Red Hat OpenShift Installation Lab (DO322) teaches essential skills for installing an OpenShift cluster in a range of environments, from proof of concept to production, and how to identify customizations that may be required because of the underlying cloud, virtual, or physical infrastructure. This course is based on Red Hat OpenShift Container Platform 4.6. 1 - INTRODUCTION TO CONTAINER TECHNOLOGY * Describe how software can run in containers orchestrated by Red Hat OpenShift Container Platform. 2 - CREATE CONTAINERIZED SERVICES * Provision a server using container technology. 3 - MANAGE CONTAINERS * Manipulate prebuilt container images to create and manage containerized services. 4 - MANAGE CONTAINER IMAGES * Manage the life cycle of a container image from creation to deletion. 5 - CREATE CUSTOM CONTAINER IMAGES * Design and code a Dockerfile to build a custom container image. 6 - DEPLOY CONTAINERIZED APPLICATIONS ON OPENSHIFT * Deploy single container applications on OpenShift Container Platform. 7 - TROUBLESHOOT CONTAINERIZED APPLICATIONS * Troubleshoot a containerized application deployed on OpenShift. 8 - DEPLOY AND MANAGE APPLICATIONS ON AN OPENSHIFT CLUSTER * Use various application packaging methods to deploy applications to an OpenShift cluster, then manage their resources. 9 - DESIGN CONTAINERIZED APPLICATIONS FOR OPENSHIFT * Select a containerization method for an application and create a container to run on an OpenShift cluster. 10 - PUBLISH ENTERPRISE CONTAINER IMAGES * Create an enterprise registry and publish container images to it. 11 - BUILD APPLICATIONS * Describe the OpenShift build process, then trigger and manage builds. 12 - CUSTOMIZE SOURCE-TO-IMAGE (S2I) BUILDS * Customize an existing S2I base image and create a new one. 13 - CREATE APPLICATIONS FROM OPENSHIFT TEMPLATES * Describe the elements of a template and create a multicontainer application template. 14 - MANAGE APPLICATION DEPLOYMENTS * Monitor application health and implement various deployment methods for cloud-native applications. 15 - PERFORM COMPREHENSIVE REVIEW * Create and deploy cloudinative applications on OpenShift.

Red Hat OpenShift Installation Lab (DO322)
Delivered on-request, onlineDelivered Online
Price on Enquiry

Educators matching "SRE"

Show all 18