AWS-DevOPS

AWS-DevOPS

AWS-DevOPS

SDLC Automation

Apply concepts required to automate a CI/CD pipeline

 Set up repositories

 Set up build services

 Integrate automated testing (e.g., unit tests, integrity tests)

 Set up deployment products/services

 Orchestrate multiple pipeline stages

 Determine source control strategies and how to implement them

 Determine a workflow for integrating code changes from multiple contributors

 Assess security requirements and recommend code repository access design

 Reconcile running application versions to repository versions (tags)

 Differentiate different source control types

 Apply concepts required to automate and integrate testing

 Run integration tests as part of code merge process

 Run load/stress testing and benchmark applications at scale

 Measure application health based on application exit codes (robust Health Check)

 Automate unit tests to check pass/fail, code coverage

 CodePipeline, CodeBuild, etc.

 Integrate tests with pipeline

Apply concepts required to build and manage artifacts securely

 Distinguish storage options based on artifacts security classification

 Translate application requirements into Operating System and package configuration (build specs)

 Determine the code/environment dependencies and required resources

 Example: CodeDeploy AppSpec, CodeBuild buildspec

 Run a code build process

Determine deployment/delivery strategies (e.g., A/B, Blue/green, Canary, Red/black) and how to

 Implement them using AWS services

 Determine the correct delivery strategy based on business needs

 Critique existing deployment strategies and suggest improvements

 Recommend DNS/routing strategies (e.g., Route 53, ELB, ALB, load balancer) based on business continuity goals

 Verify deployment success/failure and automate rollbacks

Configuration Management and Infrastructure as Code

 Determine deployment services based on deployment needs

 Demonstrate knowledge of process flows of deployment models

 Given a specific deployment model, classify and implement relevant AWS services to meet requirements

 Given the requirement to have DynamoDB choose CloudFormation instead of OpsWorks

 Determine what to do with rolling updates

Determine application and infrastructure deployment models based on business needs

 Balance different considerations (cost, availability, time to recovery) based on business

 requirements to choose the best deployment model

 Determine a deployment model given specific AWS services

 Analyze risks associated with deployment models and relevant remedies

Apply security concepts in the automation of resource provisioning

 Choose the best automation tool given requirements

 Demonstrate knowledge of security best practices for resource provisioning (e.g., encrypting data bags, generating credentials on the fly)

 Review IAM policies and assess if sufficient but least privilege is granted for all lifecycle stages of a deployment (e.g., create, update, promote)

 Review credential management solutions (e.g., EC2 parameter store, third party)

 Build the automation

 CloudFormation template, Chef Recipe, Cookbooks, Code pipeline, etc.

Determine how to implement lifecycle hooks on a deployment

 Determine appropriate integration techniques to meet project requirements

 Choose the appropriate hook solution (e.g., implement leader node selection after a node failure) in an Auto Scaling group

 Evaluate hook implementation for failure impacts (if a remote call fails, if a dependent service is temporarily unavailable (i.e., Amazon S3), and recommend resiliency improvements

Evaluate deployment rollout procedures for failure impacts and evaluate rollback/recovery processes

Apply concepts required to manage systems using AWS configuration management tools and services

 Identify pros and cons of AWS configuration management tools

 Demonstrate knowledge of configuration management components

 Show the ability to run configuration management services end to end with no assistance

while adhering to industry best practices

Monitoring and Logging :

Determine how to set up the aggregation, storage, and analysis of logs and metrics

 Implement and configure distributed logs collection and processing (e.g., agents, syslog,flumed, CW agent)

 Aggregate logs (e.g., Amazon S3, CW Logs, intermediate systems (EMR), Kinesis FH – Transformation, ELK/BI)

 Implement custom CW metrics, Log subscription filters

 Manage Log storage lifecycle (e.g., CW to S3, S3 lifecycle, S3 events)

Apply concepts required to automate monitoring and event management of an environment

 Parse logs (e.g., Amazon S3 data events/event logs/ELB/ALB/CF access logs) and correlate

 with other alarms/events (e.g., CW events to AWS Lambda) and take appropriate action

 Use CloudTrail/VPC flow logs for detective control (e.g., CT, CW log filters, Athena, NACL or

 WAF rules) and take dependent actions (AWS step) based on error handling logic (state machine)

 Configure and implement Patch/inventory/state management using ESM (SSM), Inspector,

 CodeDeploy, OpsWorks, and CW agents

 EC2 retirement/maintenance

 Handle scaling/failover events (e.g., ASG, DB HA, route table/DNS update, Application Config,

 Auto Recovery, PH dashboard, TA)

 Determine how to automate the creation of monitoring

Apply concepts required to audit, log, and monitor operating systems, infrastructures, and applications

 Monitor end to end service metrics (DDB/S3) using available AWS tools (X-ray with EB and Lambda)

 Verify environment/OS state through auditing (Inspector), Config rules, CloudTrail (process and action), and AWS APIs

 Enable, configure, and analyze custom metrics (e.g., Application metrics, memory, KCL/KPL) and take action

 Ensure container monitoring (e.g., task state, placement, logging, port mapping, LB)

 Distinguish between services that enable service level or OS level monitoring

 Example: AWS services that use OS agents (e.g., Inspector, SSM)

Determine how to implement tagging and other metadata strategies

 Segregate authority based on tagging (lifecycle stages – dev/prod) with Condition context

 keys

 Utilize Amazon S3 system/user-defined metadata for classification and automation

 Design and implement tag-based deployment groups with CodeDeploy

 Best practice for cost allocation/optimization with tagging

 Policies and Standards Automation :

Apply concepts required to enforce standards for logging, metrics, monitoring, testing, and security

 Detect, report, and respond to governance and security violations

 Apply logging standards across application, operating system, and infrastructure

 Apply context specific application health and performance monitoring

 Outline standards for delivery models for logs and metrics (e.g., JSON, XML, Data Normalization)

Determine how to optimize cost through automation

 Prioritize automation effort to reduce labor costs

 Implement right sizing of workload based on metrics

 Assess ways to improve time to market through automating process orchestration and 

 repeatable tasks

 Diagnose outliers to determine use case fit

 Example: Configuration drift

 Measure and automate cost optimization through events Example: Trusted Advisor

Apply concepts required to implement governance strategies

 Generalize governance standards across CI/CD pipeline

 Outline and measure the real-time status of compliance with governance strategies

 Report on compliance with governance strategies

 Deploy governance policies related to self-service capabilities Example: Service Catalog, CFN Nag

Incident and Event Response :

Troubleshoot issues and determine how to restore operations

 Given an issue, evaluate how to narrow down the unhealthy components as quickly as possible

 Given an increase in load, determine what steps to take to mitigate the impact

 Determine the causes and impacts of a failure

 Example: Deployment, operations

 Determine the best way to restore operations after a failure occurs

 Investigate and correlate logged events with application components Example: application source code

Determine how to automate event management and alerting

 Set up automated restores from backup in the event of a catastrophic failure

 Set up methods to deliver alerts and notifications that are appropriate for different types of events

 Assess the quality/actionability of alerts

 Configure metrics appropriate to an application’s SLAs

 Proactively update limits

Apply concepts required to implement automated healing

 Set up the correct scaling strategy to enable auto-healing when a failure occurs (e.g., with Auto Scaling policies)

 Use the correct rollback strategy to avoid impact from failed deployments

 Configure Route 53 to ensure cross-Region failover

 Detect and respond to maintenance or Spot termination events

 Apply concepts required to set up event-driven automated actions

 Configure Lambda functions or CloudWatch actions to implement automated actions

 Set up CloudWatch event rules and/or Config rules and targets

 Use AWS Systems Manager or Step Functions to coordinate components (e.g., Lambda, use maintenance windows)

 Configure a build/roll-out process to automatically respond to critical software updates

High Availability, Fault Tolerance, and Disaster Recovery :

Determine appropriate use of multi-AZ versus multi-Region architectures

 Determine deployment strategy based on HA/DR requirements

 Determine data replication strategy based on cost and durability requirements

 Determine infrastructure, platform, and services based on HA/DR requirements

 Design for HA/FT/DR based on service availability (i.e., global/regional/single AZ)

Determine how to implement high availability, scalability, and fault tolerance

 Design deployment strategy to support HA/FT/scalability

 Assess statefulness of application infrastructure components

 Use load balancing to distribute traffic across multiple AZ/ASGs/instance types (spot/M4 vs C4) /targets

 Use appropriate caching solutions to improve availability and performance

Determine the right services based on business needs (e.g., RTO/RPO, cost)

 Determine cost-effective storage solution for your application Example: tiered, archival, EBS type, hot/cold 

 Choose a database platform and configuration to meet business requirements

 Choose a cost-effective compute platform based on business requirements Example: Spot

 Choose a deployment service/model based on business requirements

 Example: Code Deploy, Blue/Green deployment

 Determine when to use managed service vs. self-managed infrastructure (Docker on EC2 vs. ECS)

Determine how to design and automate disaster recovery strategies

 Automate failure detection

 Automate components/environment recovery

 Choose appropriate deployment strategy for environment recovery

 Design automation to support failover in hybrid environment

 Evaluate a deployment for points of failure

 Determine appropriate deployment-specific health checks

 Implement failure detection during deployment

 Implement failure event handling/response

 Ensure that resources/components/processes exist to react to failures during deployment

 Look for exit codes on each event of the deployment

 Map errors to different points of deployment