Netsys Infotech

AWS-DevOPS

AWS-DevOPS

AWS-DevOPS

SDLC Automation

Apply concepts required to automate a CI/CD pipeline

Set up repositories

Set up build services

Integrate automated testing (e.g., unit tests, integrity tests)

Set up deployment products/services

Orchestrate multiple pipeline stages

Determine source control strategies and how to implement them

Determine a workflow for integrating code changes from multiple contributors

Assess security requirements and recommend code repository access design

Reconcile running application versions to repository versions (tags)

Differentiate different source control types

Apply concepts required to automate and integrate testing

Run integration tests as part of code merge process

Run load/stress testing and benchmark applications at scale

Measure application health based on application exit codes (robust Health Check)

Automate unit tests to check pass/fail, code coverage

CodePipeline, CodeBuild, etc.

Integrate tests with pipeline

Apply concepts required to build and manage artifacts securely

Distinguish storage options based on artifacts security classification

Translate application requirements into Operating System and package configuration (build specs)

Determine the code/environment dependencies and required resources

Example: CodeDeploy AppSpec, CodeBuild buildspec

Run a code build process

Determine deployment/delivery strategies (e.g., A/B, Blue/green, Canary, Red/black) and how to

Implement them using AWS services

Determine the correct delivery strategy based on business needs

Critique existing deployment strategies and suggest improvements

Recommend DNS/routing strategies (e.g., Route 53, ELB, ALB, load balancer) based on business continuity goals

Verify deployment success/failure and automate rollbacks

Configuration Management and Infrastructure as Code

Determine deployment services based on deployment needs

Demonstrate knowledge of process flows of deployment models

Given a specific deployment model, classify and implement relevant AWS services to meet requirements

Given the requirement to have DynamoDB choose CloudFormation instead of OpsWorks

Determine what to do with rolling updates

Determine application and infrastructure deployment models based on business needs

Balance different considerations (cost, availability, time to recovery) based on business

requirements to choose the best deployment model

Determine a deployment model given specific AWS services

Analyze risks associated with deployment models and relevant remedies

Apply security concepts in the automation of resource provisioning

Choose the best automation tool given requirements

Demonstrate knowledge of security best practices for resource provisioning (e.g., encrypting data bags, generating credentials on the fly)

Review IAM policies and assess if sufficient but least privilege is granted for all lifecycle stages of a deployment (e.g., create, update, promote)

Review credential management solutions (e.g., EC2 parameter store, third party)

Build the automation

CloudFormation template, Chef Recipe, Cookbooks, Code pipeline, etc.

Determine how to implement lifecycle hooks on a deployment

Determine appropriate integration techniques to meet project requirements

Choose the appropriate hook solution (e.g., implement leader node selection after a node failure) in an Auto Scaling group

Evaluate hook implementation for failure impacts (if a remote call fails, if a dependent service is temporarily unavailable (i.e., Amazon S3), and recommend resiliency improvements

Evaluate deployment rollout procedures for failure impacts and evaluate rollback/recovery processes

Apply concepts required to manage systems using AWS configuration management tools and services

Identify pros and cons of AWS configuration management tools

Demonstrate knowledge of configuration management components

Show the ability to run configuration management services end to end with no assistance

while adhering to industry best practices

Monitoring and Logging :

Determine how to set up the aggregation, storage, and analysis of logs and metrics

Implement and configure distributed logs collection and processing (e.g., agents, syslog,flumed, CW agent)

Aggregate logs (e.g., Amazon S3, CW Logs, intermediate systems (EMR), Kinesis FH – Transformation, ELK/BI)

Implement custom CW metrics, Log subscription filters

Manage Log storage lifecycle (e.g., CW to S3, S3 lifecycle, S3 events)

Apply concepts required to automate monitoring and event management of an environment

Parse logs (e.g., Amazon S3 data events/event logs/ELB/ALB/CF access logs) and correlate

with other alarms/events (e.g., CW events to AWS Lambda) and take appropriate action

Use CloudTrail/VPC flow logs for detective control (e.g., CT, CW log filters, Athena, NACL or

WAF rules) and take dependent actions (AWS step) based on error handling logic (state machine)

Configure and implement Patch/inventory/state management using ESM (SSM), Inspector,

CodeDeploy, OpsWorks, and CW agents

EC2 retirement/maintenance

Handle scaling/failover events (e.g., ASG, DB HA, route table/DNS update, Application Config,

Auto Recovery, PH dashboard, TA)

Determine how to automate the creation of monitoring

Apply concepts required to audit, log, and monitor operating systems, infrastructures, and applications

Monitor end to end service metrics (DDB/S3) using available AWS tools (X-ray with EB and Lambda)

Verify environment/OS state through auditing (Inspector), Config rules, CloudTrail (process and action), and AWS APIs

Enable, configure, and analyze custom metrics (e.g., Application metrics, memory, KCL/KPL) and take action

Ensure container monitoring (e.g., task state, placement, logging, port mapping, LB)

Distinguish between services that enable service level or OS level monitoring

Example: AWS services that use OS agents (e.g., Inspector, SSM)

Determine how to implement tagging and other metadata strategies

Segregate authority based on tagging (lifecycle stages – dev/prod) with Condition context

keys

Utilize Amazon S3 system/user-defined metadata for classification and automation

Design and implement tag-based deployment groups with CodeDeploy

Best practice for cost allocation/optimization with tagging

Policies and Standards Automation :

Apply concepts required to enforce standards for logging, metrics, monitoring, testing, and security

Detect, report, and respond to governance and security violations

Apply logging standards across application, operating system, and infrastructure

Apply context specific application health and performance monitoring

Outline standards for delivery models for logs and metrics (e.g., JSON, XML, Data Normalization)

Determine how to optimize cost through automation

Prioritize automation effort to reduce labor costs

Implement right sizing of workload based on metrics

Assess ways to improve time to market through automating process orchestration and

repeatable tasks

Diagnose outliers to determine use case fit

Example: Configuration drift

Measure and automate cost optimization through events Example: Trusted Advisor

Apply concepts required to implement governance strategies

Generalize governance standards across CI/CD pipeline

Outline and measure the real-time status of compliance with governance strategies

Report on compliance with governance strategies

Deploy governance policies related to self-service capabilities Example: Service Catalog, CFN Nag

Incident and Event Response :

Troubleshoot issues and determine how to restore operations

Given an issue, evaluate how to narrow down the unhealthy components as quickly as possible

Given an increase in load, determine what steps to take to mitigate the impact

Determine the causes and impacts of a failure

Example: Deployment, operations

Determine the best way to restore operations after a failure occurs

Investigate and correlate logged events with application components Example: application source code

Determine how to automate event management and alerting

Set up automated restores from backup in the event of a catastrophic failure

Set up methods to deliver alerts and notifications that are appropriate for different types of events

Assess the quality/actionability of alerts

Configure metrics appropriate to an application’s SLAs

Proactively update limits

Apply concepts required to implement automated healing

Set up the correct scaling strategy to enable auto-healing when a failure occurs (e.g., with Auto Scaling policies)

Use the correct rollback strategy to avoid impact from failed deployments

Configure Route 53 to ensure cross-Region failover

Detect and respond to maintenance or Spot termination events

Apply concepts required to set up event-driven automated actions

Configure Lambda functions or CloudWatch actions to implement automated actions

Set up CloudWatch event rules and/or Config rules and targets

Use AWS Systems Manager or Step Functions to coordinate components (e.g., Lambda, use maintenance windows)

Configure a build/roll-out process to automatically respond to critical software updates

High Availability, Fault Tolerance, and Disaster Recovery :

Determine appropriate use of multi-AZ versus multi-Region architectures

Determine deployment strategy based on HA/DR requirements

Determine data replication strategy based on cost and durability requirements

Determine infrastructure, platform, and services based on HA/DR requirements

Design for HA/FT/DR based on service availability (i.e., global/regional/single AZ)

Determine how to implement high availability, scalability, and fault tolerance

Design deployment strategy to support HA/FT/scalability

Assess statefulness of application infrastructure components

Use load balancing to distribute traffic across multiple AZ/ASGs/instance types (spot/M4 vs C4) /targets

Use appropriate caching solutions to improve availability and performance

Determine the right services based on business needs (e.g., RTO/RPO, cost)

Determine cost-effective storage solution for your application Example: tiered, archival, EBS type, hot/cold

Choose a database platform and configuration to meet business requirements

Choose a cost-effective compute platform based on business requirements Example: Spot

Choose a deployment service/model based on business requirements

Example: Code Deploy, Blue/Green deployment

Determine when to use managed service vs. self-managed infrastructure (Docker on EC2 vs. ECS)

Determine how to design and automate disaster recovery strategies

Automate failure detection

Automate components/environment recovery

Choose appropriate deployment strategy for environment recovery

Design automation to support failover in hybrid environment

Evaluate a deployment for points of failure

Determine appropriate deployment-specific health checks

Implement failure detection during deployment

Implement failure event handling/response

Ensure that resources/components/processes exist to react to failures during deployment

Look for exit codes on each event of the deployment

Map errors to different points of deployment