Understanding AWS Core Services
Understanding AWS Core Services
summary: traditional datacenters vs cloud, aws tools (TCO, budgets, resource tags,
simple monthly calc, cost explorer)
CLI: creating access keys, root user access keys should not typically be used
version 1/version 2
-download and install the awscli package using pip or msi
aws --version (shows version)
aws configure --profile crayroot
enter in access key, secret key, default region name (us-west-2), output format
(json)
quiz:
scenario1: SDK would be used because it requires code for a custom app
scenario2: console would be used, she's just testing, considering aws
scenario3: requires automation from aws already built in, so CLI only
COMPUTE SERVICES: cloud-based VMs - ec2 (iaas), elastic beanstalk (paas), labmda
(serverless)
EC2 use cases: web application hosting, batch processing, API server, desktop in
the cloud
EC2 concepts: instance types, root device type, AMI, purchase options
compute optimized: 96 vCPUs, $4.6 per hour! 64 vCPU specialized (has GPU): $24.48
per hour
storage optimized: 64 vCPUs $4.9 per hour
Root Device Type: Instance Store (physically attached to host on server) vs EBS
(persistent storage/separate)
-data on ephemeral goes away completely vs EBS which can be snapshotted/migrated to
new EC2
AMI: template for EC2 instance (config/OS/data), can be shared, custom AMIs,
commercial available in marketplace
launching an instance:
security group: by default, allows ALL ips to try to connect. can specify 'my IP'
so only i can
LAMBDA: run code without infrastructure, charged for execution only, memory from
128-3008MB
-integrates with MANY services: event-driven workflows (such as uploading a file)
-reduced maintenance, fault tolerance, scales on demand, based on USAGE
QUIZ:
scenario1: reserved instance, EC2, 3 years
-ALL UPFRONT, since we know it is at LEAST 3 years, this would provide the MOST
cost effective option
scenario2: elastic beanstalk - uploading code, supports PHP, scales out of the box
scenario3: spot instances - NO COMMITMENTS, but if you can start/stop without a
problem, its always spot
CONTENT AND NETWORK DELIVERY SERVICES (CDN): route 53, VPC, direct connect, API
gateway, cloudfront, ELB
-VPC: virtual private cloud, logically isolated, enables NETWORK (ipv4/ipv6),
ranges/subnets/gateways
--supports private/public subnets (external/internal), can use NAT for private, can
connect other VPCs (peering)
--AWS Direct Connect: dedicated network connection to AWS
Route 53: DNS, GLOBAL service (not regional), HA, global resource routing (send
people to an app based on location)
-DNS: translates more readily memorized domain names and IP addresses
-requires propagation (couple of hours), can configure route53 to failover from us-
east-1 to eu-west-1
CloudFront: uses EDGE locations, CDN, enables users to get content CLOSER to them,
static/dynamic
-includes security features, AWS Shield for DDoS, AWS WAF (web app firewall)
-edge locations are global
API Gateway: managed API mgmt service, monitoring/metrics on API calls, supports
VPC if needed
quiz:
scenario 1: Direct connect, persistent connection to AWS
scenario 2: CloudFront, its a CDN that uses EDGE locations
scenario 3: horizontal, Elastic load balancing
FILE STORAGE SERVICES: S3, S3 Glacier, Elastic Block Store (EBS), elastic File
system, AWS Snowball, Snowmobile
non-archival classes:
s3 standard: default, used for frequently accessed data
s3 intelligent-itering: can move data based on usage
s3 standard-ia: infrequent access at a discount
s3 one zone-ia: stored in a single AZ, much lower cost, less resilience
s3 transfer acceleration: upload data much faster, uses EDGE locations (CloudFront)
created bucket, uploaded files, set access to objects, enabled static hosting
glacier: 90 days minimum, retrieved in minutes or hours, retrieval fee per GB, 5
times less expensive than s3 std
glacier deep archive: 180 day minimum, hour retrieval, fee per GB, 23x less
expensive
data can be uploaded/retrieved programatically (this is done via CLI/SDK ONLY!)
EBS: elastic black storage, persistent block within EC2, single ec2 instance
-redundancy within an AZ, data snapshots, can enable encryption (SSD, IOPS SSD,
throughput, cold)
--SSD: cost effective, general
--IOPS SSD: high performance/low latency
--Throughput optimized: frequently accessed data
--Cold: less accessed
EFS: elastic file system, linux-based workloads, fully managed NFS, petabyte scale,
multiple AZs
-standard vs infrequent access
EFS = network filesystem for multiple EC2 instances vs EBS which is attaching
drives to single EC2
Amazon FSx = Windows File server, SMB support, AD integration, NTFS, SSD
quiz:
scenario 1: s3 lifecycle policies, move to cold storage
scenario 2: snowball can migrate petabytes of data, smaller device shipped
scenario 3: EFS is a shared file system for linux EC2 hosts
WRONG 1: partially correct, can use policies, but rather than cold storage, move to
S3 infrequent access (discount)
3 notes: must be in petabytes or LESS for EFS to function
quiz:
scenario 1: redshift, since they possibly need encryption/data storage
scenario 2: EC2 and deploy the service
scenario 3: dynamodb with elasticache
APP INTEGRATION SERVICES: SNS (managed pubsub messaging), SQS (managed queue
service), Step Functions
-SNS: pub/sub, decoupled applications, organizes by topic, multiple AWS services,
user notifications
-connect user signup to SNS topic, required to be listening
-SQS: decoupled/fault tolerant apps, 256KB payload/14 days
standard queue: does not guarantee order
FIFO: processed in order
workflow: user signup, insert into crm + send email, schedule call with
salesperson, wait 1 week, send followup
-integrates with compute, database, messaging (sqs/sns), data processing, machine
learning
quiz:
scenario 1: SQS, user signups go into a queue rather than being dropped if service
is down (fault tolerance)
scenario 2: step services (workflow - called Step Functions)
scenario 3: SNS topics, listening for event types
QUIZ:
scenario 1: AWS Config, monitors desired state config (CAN SET RULES)
scenario 2: CloudFormation is used to automate creation of a lot of services
scenario 3: CloudTrail - who initiates actions, deletions
------------------------------------------------------
INTRO TO SECURITY AND ARCHITECTURE ON AWS: shared resonsibility, well architected,
fault tolerance/HA
Acceptable Use Policy (AUP): mass emails, virii/malware, pentests are ALLOWED
Least Privilege Access: grant minimum permission to complete tasks
-do NOT use Root account on day to day, use IAM account
Shared Responsibility Model: security/compliance is SHARED between AWS + Customer
-AWS: security of the cloud, customer: security IN the cloud
AWS Well-architected Framework: best practices across five pillars that drive
business value
-operational excellence: running/monitoring for business value
-security: protecting information and business assets
-reliability: recovering from disruptions
-performance efficiency: using resources to achieve business value
-cost optimization: minimal costs for desired value (s3 storage classes, instances,
etc)
QUIZ:
scenario 1: compliance REPORTS are found in AWS Artifact
scenario 2: AWS is not responsible, we are for CODE, data, encryption, etc
WRONG: review the SHARED RESPONSIBILITY MODEL to deliniate what AWS is responsible
for
scenario 3: well-architected framework for best practices for developing in AWS
AWS IDENTITIES AND USER MGMT: least privilege access, IAM, IAM types, enabling MFA,
Cognito
-IAM: service that controls access to AWS services, free, authentication (login),
authorization (access)
-federation: external identity provider
DEMO: create a user, can attach a policy, AWS managed, select a policy
-rather than adding the role manually, can add them to groups
adding MFA: done by accessing security permissions for Root user, but...
-for IAM users, click the user then security credentials
QUIZ:
scenario 1: create a Group, attach a Role or select a POLICY, then add users to the
Group
--make sure that all members require the same access (least privilege!)
scenario 2: no, he needs to follow least privilege, only the service requires
access to S3
-takeaway here is that ROLES can be assigned to users or services, user may not
require access, just the service!
scenario 3: multi-factor authentication requires more than just a password
data processing:
AWS Glue (Extract Transform Load): data is stored somewhere, extracted, transformed
(normalizing phone numbers, group), placed in new location for analysis (LOAD)
-fully managed ETL, supports RDS/DynamoDB/Redshift/S3, supports serverless model
(no servers needed, just use it)
Amazon EMR (elastic map reduce): big-data processing using popular tools for S3 and
EC2
-clustered environment, no configuration: Apache Spark/Hive/HBase/Flink/Hudi,
Presto... can use these tools without configuring!
AWS Data Pipeline: workflow/orchestration service for AWS services - managing
processing for point A to point B, ensuring stops at specifc points
-managed ETL, supports S3/EMR (elastic map reduce for big data), Redshift,
DynamoDB, RDS
analyzing data: services in place to analyze data, querying data in S3, BI tools
with dashboards, search service for custom apps
-Amazon Athena: serverless, query large scale data in S3, can write queries using
standard SQL (no database required), charged based on data scanned for query
-Amazon Quicksight: fully managed business intelligence, dynamic data dashboards
based on data in AWS, per user or session pricing model
--standard vs enterprise, different capabilities/costpoints
-Amazon Cloudsearch: fully managed search, custom app but make data available to
users, scaling of infrastructure, charged per hour and instance type
--integrate search into custom apps (search through a ton of PDF docs for example)
SUMMARY: integrating data from own datacenter, processing data, data analysis, AI +
ML
DISASTER RECOVERY ON AWS: prepare/recover for events that could impact the business
(power, network, physical dmg, flooding, fire, etc)
-what if there was a complete REGION outage?! four approaches recommended by AWS
-Backup/Restore > Pilot Light > Warm Standby, Multi-Site (from cheapest to most
expensive AND from slowest recovery time to fastest)
backup and restore: back up everything in S3, either standard or archival, EBS data
can be stored as snapshots in S3
-in DR, process is started to launch new environment, longest recovery time,
cheapest
pilot light: key infrastructure running in cloud, reduce recovery time, increases
cost to continually run in cloud
-AMIs are prepared for systems, core pieces are running/kept up to date
warm standby: scaled down version of full environment, infrastructure continually
running
multi-site: full environment running in the cloud at all times (multiple regions,
full instance types, near seemless, most expensive)
RTO (recovery time objective): TIME. time it takes to get systems running to ideal
business state
RPO (recovery point objective): amount of DATA loss in terms of time (if the RPO is
1hr, how much DATA is lost)
takeaway: how much TIME it takes versus how much DATA (expressed in time)
QUIZ:
scenario 1: multi-site is a seemless transition
scenario 2: backup/restore or pilot light, most likely backup since that minimizes
cost
scenario 3: few key servers up, warm standby = smaller instance types (almost like
a DEV)
-WRONG: keyword here is that a FEW KEY servers running in the cloud, even if it's
scaled down
auto-scaling group: EC2 instances with rules/scaling, uses LAUNCH template (OS,
instance type, security group)
-define min, max, and DESIRED number of instances (at least, at most, desired
state)
-health checks are performed (if web server, check web url, etc)
-scaling group: exists in 1 or more availability zone in single region, works with
or without SPOT instances
DEMO: 1 region, 1 VPC, 2 AZs, scaling group in both AZs with desired of 2 (1 in
each)
-application load balancer: distributes traffic to best instance
-1 instance goes down, ALB is informed to stop traffic routing, then autoscaling
group brings up NEW instance
-SECRETS manager: credentials, API keys, tokens - natively with RDS, DocumentDB,
Redshift
--auto-rotates credentials, granular access to secrets (which servers have access)
aws vpn: encrypted tunnel, vpc not available to public internet, data center or
client machines
-site to site vpn: customer gateway to vpn gateway in aws, encrypted traffic
between sites
-direct connect: does not go over public internet
pre-defined solutions: AWS Service Catalog (IT Services) and AWS Marketplace (third
party)
service catalog: service catalog for the cloud, stuff that is already configured
-could be single server or multi-tier apps. can be levered to meet COMPLIANCE
-supports lifecycle (version1 > 1.1 > etc)
CodeCommit: source control service using Git, uses IAM policies, alt to
Github/Bitbucket
CodeBuild: no need to manage infrastructure, charged per minute
CodeDeploy: managed deployment service for EC2/Fargate/Lambda and on-preimse,
provides dashboard in Console
CodePipeline: fully managed continuous delivery for building/testing/deploying,
integrates with Github as well
CodeStar: workflow, automates continuous delivery toolchain, custom dashboards,
only charged for OTHER svcs (free)
quiz:
scenario 1: in this case its AWS Service Catalog because compliance is the keyword
scenario 2: autoscaling with application load balancing
scenario 3: macie, because it protects personal info