AWS-DevOPS
AWS-DevOPS
AWS-DevOPS
SDLC Automation
Apply concepts required to automate a CI/CD pipeline
Set up repositories
Set up build services
Integrate automated testing (e.g., unit tests, integrity tests)
Set up deployment products/services
Orchestrate multiple pipeline stages
Determine source control strategies and how to implement them
Determine a workflow for integrating code changes from multiple contributors
Assess security requirements and recommend code repository access design
Reconcile running application versions to repository versions (tags)
Differentiate different source control types
Apply concepts required to automate and integrate testing
Run integration tests as part of code merge process
Run load/stress testing and benchmark applications at scale
Measure application health based on application exit codes (robust Health Check)
Automate unit tests to check pass/fail, code coverage
CodePipeline, CodeBuild, etc.
Integrate tests with pipeline
Apply concepts required to build and manage artifacts securely
Distinguish storage options based on artifacts security classification
Translate application requirements into Operating System and package configuration (build specs)
Determine the code/environment dependencies and required resources
Example: CodeDeploy AppSpec, CodeBuild buildspec
Run a code build process
Determine deployment/delivery strategies (e.g., A/B, Blue/green, Canary, Red/black) and how to
Implement them using AWS services
Determine the correct delivery strategy based on business needs
Critique existing deployment strategies and suggest improvements
Recommend DNS/routing strategies (e.g., Route 53, ELB, ALB, load balancer) based on business continuity goals
Verify deployment success/failure and automate rollbacks
Configuration Management and Infrastructure as Code
Determine deployment services based on deployment needs
Demonstrate knowledge of process flows of deployment models
Given a specific deployment model, classify and implement relevant AWS services to meet requirements
Given the requirement to have DynamoDB choose CloudFormation instead of OpsWorks
Determine what to do with rolling updates
Determine application and infrastructure deployment models based on business needs
Balance different considerations (cost, availability, time to recovery) based on business
requirements to choose the best deployment model
Determine a deployment model given specific AWS services
Analyze risks associated with deployment models and relevant remedies
Apply security concepts in the automation of resource provisioning
Choose the best automation tool given requirements
Demonstrate knowledge of security best practices for resource provisioning (e.g., encrypting data bags, generating credentials on the fly)
Review IAM policies and assess if sufficient but least privilege is granted for all lifecycle stages of a deployment (e.g., create, update, promote)
Review credential management solutions (e.g., EC2 parameter store, third party)
Build the automation
CloudFormation template, Chef Recipe, Cookbooks, Code pipeline, etc.
Determine how to implement lifecycle hooks on a deployment
Determine appropriate integration techniques to meet project requirements
Choose the appropriate hook solution (e.g., implement leader node selection after a node failure) in an Auto Scaling group
Evaluate hook implementation for failure impacts (if a remote call fails, if a dependent service is temporarily unavailable (i.e., Amazon S3), and recommend resiliency improvements
Evaluate deployment rollout procedures for failure impacts and evaluate rollback/recovery processes
Apply concepts required to manage systems using AWS configuration management tools and services
Identify pros and cons of AWS configuration management tools
Demonstrate knowledge of configuration management components
Show the ability to run configuration management services end to end with no assistance
while adhering to industry best practices
Monitoring and Logging :
Determine how to set up the aggregation, storage, and analysis of logs and metrics
Implement and configure distributed logs collection and processing (e.g., agents, syslog,flumed, CW agent)
Aggregate logs (e.g., Amazon S3, CW Logs, intermediate systems (EMR), Kinesis FH – Transformation, ELK/BI)
Implement custom CW metrics, Log subscription filters
Manage Log storage lifecycle (e.g., CW to S3, S3 lifecycle, S3 events)
Apply concepts required to automate monitoring and event management of an environment
Parse logs (e.g., Amazon S3 data events/event logs/ELB/ALB/CF access logs) and correlate
with other alarms/events (e.g., CW events to AWS Lambda) and take appropriate action
Use CloudTrail/VPC flow logs for detective control (e.g., CT, CW log filters, Athena, NACL or
WAF rules) and take dependent actions (AWS step) based on error handling logic (state machine)
Configure and implement Patch/inventory/state management using ESM (SSM), Inspector,
CodeDeploy, OpsWorks, and CW agents
EC2 retirement/maintenance
Handle scaling/failover events (e.g., ASG, DB HA, route table/DNS update, Application Config,
Auto Recovery, PH dashboard, TA)
Determine how to automate the creation of monitoring
Apply concepts required to audit, log, and monitor operating systems, infrastructures, and applications
Monitor end to end service metrics (DDB/S3) using available AWS tools (X-ray with EB and Lambda)
Verify environment/OS state through auditing (Inspector), Config rules, CloudTrail (process and action), and AWS APIs
Enable, configure, and analyze custom metrics (e.g., Application metrics, memory, KCL/KPL) and take action
Ensure container monitoring (e.g., task state, placement, logging, port mapping, LB)
Distinguish between services that enable service level or OS level monitoring
Example: AWS services that use OS agents (e.g., Inspector, SSM)
Determine how to implement tagging and other metadata strategies
Segregate authority based on tagging (lifecycle stages – dev/prod) with Condition context
keys
Utilize Amazon S3 system/user-defined metadata for classification and automation
Design and implement tag-based deployment groups with CodeDeploy
Best practice for cost allocation/optimization with tagging
Policies and Standards Automation :
Apply concepts required to enforce standards for logging, metrics, monitoring, testing, and security
Detect, report, and respond to governance and security violations
Apply logging standards across application, operating system, and infrastructure
Apply context specific application health and performance monitoring
Outline standards for delivery models for logs and metrics (e.g., JSON, XML, Data Normalization)
Determine how to optimize cost through automation
Prioritize automation effort to reduce labor costs
Implement right sizing of workload based on metrics
Assess ways to improve time to market through automating process orchestration and
repeatable tasks
Diagnose outliers to determine use case fit
Example: Configuration drift
Measure and automate cost optimization through events Example: Trusted Advisor
Apply concepts required to implement governance strategies
Generalize governance standards across CI/CD pipeline
Outline and measure the real-time status of compliance with governance strategies
Report on compliance with governance strategies
Deploy governance policies related to self-service capabilities Example: Service Catalog, CFN Nag
Incident and Event Response :
Troubleshoot issues and determine how to restore operations
Given an issue, evaluate how to narrow down the unhealthy components as quickly as possible
Given an increase in load, determine what steps to take to mitigate the impact
Determine the causes and impacts of a failure
Example: Deployment, operations
Determine the best way to restore operations after a failure occurs
Investigate and correlate logged events with application components Example: application source code
Determine how to automate event management and alerting
Set up automated restores from backup in the event of a catastrophic failure
Set up methods to deliver alerts and notifications that are appropriate for different types of events
Assess the quality/actionability of alerts
Configure metrics appropriate to an application’s SLAs
Proactively update limits
Apply concepts required to implement automated healing
Set up the correct scaling strategy to enable auto-healing when a failure occurs (e.g., with Auto Scaling policies)
Use the correct rollback strategy to avoid impact from failed deployments
Configure Route 53 to ensure cross-Region failover
Detect and respond to maintenance or Spot termination events
Apply concepts required to set up event-driven automated actions
Configure Lambda functions or CloudWatch actions to implement automated actions
Set up CloudWatch event rules and/or Config rules and targets
Use AWS Systems Manager or Step Functions to coordinate components (e.g., Lambda, use maintenance windows)
Configure a build/roll-out process to automatically respond to critical software updates
High Availability, Fault Tolerance, and Disaster Recovery :
Determine appropriate use of multi-AZ versus multi-Region architectures
Determine deployment strategy based on HA/DR requirements
Determine data replication strategy based on cost and durability requirements
Determine infrastructure, platform, and services based on HA/DR requirements
Design for HA/FT/DR based on service availability (i.e., global/regional/single AZ)
Determine how to implement high availability, scalability, and fault tolerance
Design deployment strategy to support HA/FT/scalability
Assess statefulness of application infrastructure components
Use load balancing to distribute traffic across multiple AZ/ASGs/instance types (spot/M4 vs C4) /targets
Use appropriate caching solutions to improve availability and performance
Determine the right services based on business needs (e.g., RTO/RPO, cost)
Determine cost-effective storage solution for your application Example: tiered, archival, EBS type, hot/cold
Choose a database platform and configuration to meet business requirements
Choose a cost-effective compute platform based on business requirements Example: Spot
Choose a deployment service/model based on business requirements
Example: Code Deploy, Blue/Green deployment
Determine when to use managed service vs. self-managed infrastructure (Docker on EC2 vs. ECS)
Determine how to design and automate disaster recovery strategies
Automate failure detection
Automate components/environment recovery
Choose appropriate deployment strategy for environment recovery
Design automation to support failover in hybrid environment
Evaluate a deployment for points of failure
Determine appropriate deployment-specific health checks
Implement failure detection during deployment
Implement failure event handling/response
Ensure that resources/components/processes exist to react to failures during deployment
Look for exit codes on each event of the deployment
Map errors to different points of deployment