Notes On Chapter 6

Chapter 6
AWS Application Services
6.1 Lambda
AWS Lambda is a serverless compute service offered by Amazon Web Services (AWS) that lets
you run code without provisioning or managing servers. With Lambda, you only pay for the
compute time your code uses, and you don’t have to worry about the underlying infrastructure,
scaling, or server maintenance. AWS Lambda enables you to run code in response to specific
events, such as HTTP requests, file uploads, changes in databases, or custom application triggers.
You provide the code, and Lambda handles the execution environment. It's designed to
automatically scale depending on the number of incoming requests, making it ideal for handling
variable workloads. With AWS Lambda, you can run code without provisioning or managing
servers. You pay only for the compute time that you consume—there's no charge when your
code isn't running. You can run code for virtually any type of application or backend service—all
with zero administration. Just upload your code and Lambda takes care of everything required to
run and scale your code with high availability.
Core Concepts
 Event-driven architecture: Lambda functions are invoked by events, such as changes to

data in S3, new records in DynamoDB, or HTTP requests from API Gateway. The
service integrates with many AWS services to trigger functions when specific conditions
are met.
 Function: A Lambda function is the piece of code that you want to run. It can be written
in various programming languages, such as Python, Node.js, Java, Go, Ruby, C#, and
others.
 Execution role: The Lambda function needs permissions to access other AWS resources.
You assign an AWS Identity and Access Management (IAM) role to the Lambda
function to grant it the necessary permissions.
 Event source: This is the service or application that triggers the Lambda function. Event
sources include services like S3, SNS, CloudWatch, DynamoDB, etc.
 Resources: Lambda allows you to define resources such as memory and timeout. You
can configure the memory size (from 128 MB to 10 GB) and execution timeout
(maximum 15 minutes per invocation).
Components of Lambda
To understand how Lambda operates, let’s explore its key components in detail:
1. Function
At the core of AWS Lambda is the function, which contains the code to be executed. A function
in Lambda is defined as a piece of logic written in one of the supported programming languages.
 Key Attributes:
o Handler: Specifies the entry point of your code (the function to be executed when
the event triggers the Lambda).
o Runtime: Determines the language runtime (e.g., Python, Node.js, Java) for
executing the function.
o Deployment Package: The code and dependencies are packaged and uploaded to
Lambda. This can include a zip file or a container image.
2. Event Source
An event source is an AWS service, resource, or third-party application that triggers the Lambda
function. Events can be synchronous or asynchronous.
 Examples of Event Sources:

o Synchronous: API Gateway, Elastic Load Balancing (ALB), and Amazon
CloudFront.
o Asynchronous: Amazon S3, DynamoDB Streams, Amazon SNS, and Amazon
EventBridge.
o Custom Sources: Events from external systems or applications can trigger the
function using AWS SDKs.
3. Trigger
A trigger connects an event source to a Lambda function. It defines the conditions under which
the function is invoked.
 Configuration: Triggers can be configured directly in the Lambda Management Console

or using AWS CLI/SDKs.
4. Execution Role (IAM Role)
Lambda requires permissions to interact with AWS resources. The IAM execution role assigned
to a Lambda function defines what resources and services the function can access or modify.
 Example Permissions:
o Access to an S3 bucket to retrieve files.
o Permissions to write logs to Amazon CloudWatch.
o Access to read or write to a DynamoDB table.
5. Environment Variables
Environment variables store configuration settings for the Lambda function. These can include
database connection strings, API keys, or other runtime configurations.
 Features:
o Variables can be encrypted using AWS Key Management Service (KMS).
o Easy to update configurations without changing the function code.
6. Execution Context
The execution context is the environment in which your function runs. AWS Lambda manages
the lifecycle of the execution environment.
 Components:
o Memory: Memory allocation directly affects CPU and network resources.
o Temporary Storage (/tmp): Provides up to 512 MB of ephemeral storage per
function instance.
o Environment Reuse: Lambda reuses execution environments for performance
optimization, enabling caching between invocations.
7. Concurrency and Scaling
Concurrency defines how many function instances can run in parallel. Lambda scales
automatically based on the number of incoming requests.
 Types of Concurrency:
o Unreserved Concurrency: Default behavior where Lambda scales without limits
(subject to account quotas).
o Reserved Concurrency: Ensures a specific number of instances are always
available.
o Provisioned Concurrency: Keeps pre-warmed instances ready to handle bursts
with low latency.
8. Error Handling
Lambda provides mechanisms to handle failures during execution.
 Retry Behavior:
o Synchronous invocations retry twice upon failure.
o Asynchronous invocations can retry based on Dead Letter Queue (DLQ) or event
retry configurations.
 DLQ: Failed events can be sent to an Amazon SQS queue or SNS topic for further
analysis.
 Amazon CloudWatch Logs: Captures execution details, including errors, to help with
debugging.
9. Monitoring and Logging
AWS Lambda integrates with Amazon CloudWatch to provide metrics, logs, and traces.
 CloudWatch Metrics:
o Tracks invocation count, duration, errors, and throttles.
 CloudWatch Logs:
o Captures runtime logs for debugging and performance tuning.
Lambda is an ideal compute service for application scenarios that need to scale up rapidly, and
scale down to zero when not in demand. For example, you can use Lambda for:
 File processing: Use Amazon Simple Storage Service (Amazon S3) to trigger Lambda data
processing in real time after an upload.
 Stream processing: Use Lambda and Amazon Kinesis to process real-time streaming data for
application activity tracking, transaction order processing, clickstream analysis, data cleansing,
log filtering, indexing, social media analysis, Internet of Things (IoT) device data telemetry, and
metering.
 Web applications: Combine Lambda with other AWS services to build powerful web
applications that automatically scale up and down and run in a highly available configuration
across multiple data centers.
 IoT backends: Build serverless backends using Lambda to handle web, mobile, IoT, and third-
party API requests.
 Mobile backends: Build backends using Lambda and Amazon API Gateway to authenticate and
process API requests. Use AWS Amplify to easily integrate with your iOS, Android, Web, and
React Native frontends.
How AWS Lambda Works

1. Upload the Code: You can upload your code directly (as a ZIP file), use a version control
repository like GitHub, or point to a container image.
2. Set the Trigger: Configure event sources that will trigger the function, such as API Gateway for
HTTP requests or S3 for file uploads.
3. Execute Code: When an event occurs, Lambda invokes the function, passing the event data (like
request parameters or file content) to it.
4. Scaling: Lambda automatically handles scaling. It runs the function code in response to each
trigger, and if the request volume increases, Lambda scales out, invoking multiple instances of
the function.
5. Statelessness: Each invocation of a Lambda function is independent, meaning that there is no
state retention between calls unless you explicitly store data (e.g., in a database or S3 bucket).
Key Features
 No Infrastructure Management: You don’t need to worry about server maintenance,

patching, or scaling. AWS takes care of everything behind the scenes.
 Auto-scaling: Lambda automatically adjusts the number of function instances based on
the request volume. For example, if 1,000 requests happen at once, Lambda can execute
1,000 instances of your function in parallel.
 Pay Only for Usage: You are billed only for the time your function runs. Charges are
based on the number of requests and the duration of code execution, measured in
milliseconds.
 Multiple Language Support: Lambda supports several programming languages, and
you can even bring your own runtime to Lambda.
 Versioning & Aliases: You can maintain different versions of your Lambda function and
use aliases to point to specific versions. This helps in managing the deployment and
promoting different stages (development, staging, production).
 Monitoring and Logging: AWS CloudWatch integrates with Lambda to provide real-
time logging and metrics, such as function invocations, execution duration, and error
counts.
 Error Handling and Retries: Lambda automatically retries failed invocations,
especially when triggered by asynchronous events like SNS messages. You can configure
dead-letter queues (DLQs) to capture failed events for later inspection.
6.2 Lightsail
AWS Lightsail is a simplified cloud platform by Amazon Web Services (AWS) designed for
developers, startups, and small businesses. It provides an easy way to launch and manage virtual
private servers (VPS), making it ideal for users who want straightforward, cost-effective hosting
solutions without dealing with the complexities of the broader AWS ecosystem.
Key Features:
1. Pre-configured Instances:
o Lightsail offers pre-configured blueprints for popular applications (e.g., WordPress,
Joomla) and development stacks (e.g., LAMP, Node.js).
o You can also launch plain Linux or Windows servers.
2. Simplified Pricing:
o Lightsail uses a flat-rate pricing model. You pay a fixed monthly fee based on the
instance size and resources (CPU, RAM, storage).
3. Built-in Tools:
o Includes features like DNS management, static IPs, monitoring, and automatic backups.
4. Scalability:
o While Lightsail is designed for simpler workloads, it integrates with other AWS services
for scaling as your application grows.
5. Networking:
o Provides built-in networking capabilities, including load balancers, private networking,
and firewall management.
Use Cases:
 Hosting websites and blogs.

 Running small-scale web applications.
 Setting up development or test environments.
 Managing lightweight e-commerce platforms.
Advantages:
 Ease of Use: A user-friendly dashboard simplifies setup and management.

 Cost-Effective: Ideal for predictable workloads with fixed monthly pricing.
 Quick Start: Deploy pre-configured solutions with minimal effort.
Limitations:
 Limited Features: Not as flexible or powerful as the full AWS suite.

 Scaling Constraints: Best for small-to-medium workloads; larger applications may require
transitioning to other AWS services.
6.3 Route 53
Amazon Route 53 is a scalable and highly available Domain Name System (DNS) web service
provided by AWS. It connects user requests to the appropriate resources in AWS or on-premises
environments. Named after port 53 (the DNS port), Route 53 offers domain registration, DNS
routing, and health-checking capabilities.
Key Features:
1. Domain Registration:
o Allows you to register new domain names directly or transfer existing ones.
2. DNS Management:
o Maps domain names to AWS resources like EC2 instances, S3 buckets, or CloudFront
distributions.
o Supports multiple routing policies, such as simple, weighted, latency-based, failover,
geolocation, and more.
3. Health Checks:
o Monitors the health of endpoints (e.g., servers) and routes traffic only to healthy ones.
4. Highly Available:
o Route 53 is designed for high availability and reliability, backed by AWS's global
infrastructure.
5. Scalability:
o Handles DNS queries for large-scale applications, scaling automatically with traffic.
6. Integration with AWS:
o Works seamlessly with AWS services like Elastic Load Balancer, CloudFront, and S3.
Use Cases:
 Managing DNS for websites and web applications.

 Routing traffic for global applications based on latency or geography.
 Ensuring high availability with failover routing.
Advantages:
 Reliability: Built for robust and fail-safe DNS operations.

 Flexibility: Supports advanced routing and load-balancing strategies.
 Security: Integrates with AWS Identity and Access Management (IAM) for access control.
6.4 SNS and SQS
a) Amazon SNS (Simple Notification Service):
Amazon SNS is a fully managed pub/sub messaging service that enables applications, services,
and devices to send and receive notifications. It decouples the producers (publishers) and
consumers (subscribers) in a system.
Key Features:
 Publish/Subscribe Model: Messages published to a topic are delivered to all subscribers (e.g.,
email, SMS, HTTP endpoints, or AWS services like SQS).
 Flexible Protocols: Supports email, SMS, mobile push notifications, and HTTP/S endpoints.
 Fan-out Messaging: A single message can be sent to multiple destinations simultaneously.
 Scalable and Reliable: Can handle millions of messages per second.
Use Cases:
 Sending alerts and notifications (e.g., system health checks, application events).
 Broadcasting messages to multiple endpoints (e.g., mobile devices).
 Triggering workflows by notifying downstream systems.
b) Amazon SQS (Simple Queue Service):
Amazon SQS is a fully managed message queuing service that allows you to decouple and
scale distributed systems, applications, and microservices. It ensures that messages between
components are reliably delivered and processed.
Key Features:
 Message Queuing: Stores messages until they are processed by consumers.

 Two Queue Types:
1. Standard Queue: Offers high throughput with at-least-once delivery (duplicates
possible).
2. FIFO Queue: Ensures exactly-once delivery and message order.
 Decoupling: Isolates producers and consumers for better scalability and reliability.
 Dead-Letter Queues: Handles undelivered messages for troubleshooting.
Use Cases:
 Building decoupled microservices.

 Processing tasks asynchronously (e.g., image processing, data transformations).
 Managing workflows and buffering requests during high-traffic spikes.
When to Use:
 Use SNS for real-time notifications or fan-out scenarios.
 Use SQS for asynchronous task processing or queuing.
6.5 CloudWatch
Amazon CloudWatch is a monitoring and observability service provided by AWS. It collects

and tracks metrics, logs, and events from AWS resources, applications, and on-premises systems,
giving you insights into the performance, operational health, and resource utilization of your
infrastructure. Amazon CloudWatch monitors your Amazon Web Services (AWS) resources and
the applications you run on AWS in real time. You can use CloudWatch to collect and track
metrics, which are variables you can measure for your resources and applications. The
CloudWatch home page automatically displays metrics about every AWS service you use. You
can additionally create custom dashboards to display metrics about your custom applications,
and display custom collections of metrics that you choose. You can create alarms that watch
metrics and send notifications or automatically make changes to the resources you are
monitoring when a threshold is breached. For example, you can monitor the CPU usage and disk
reads and writes of your Amazon EC2 instances and then use that data to determine whether you
should launch additional instances to handle increased load. You can also use this data to stop
under-used instances to save money. With CloudWatch, you gain system-wide visibility into
resource utilization, application performance, and operational health.
Key Features:
1. Monitoring Metrics:
o Collects and tracks key performance indicators (e.g., CPU utilization, memory usage, and
request counts) for AWS services like EC2, RDS, and Lambda.
2. Logs Management:
o Aggregates, monitors, and analyzes logs from AWS services and applications using
CloudWatch Logs.
o Helps troubleshoot issues with detailed log data.
3. Alarms:
o Sets thresholds for metrics to trigger alarms.
o Alarms can notify users (via SNS) or trigger automated actions (e.g., scaling EC2
instances).
4. Dashboards:
o Creates custom visualizations for metrics across multiple resources.
5. Events:
o Automates responses to operational changes using CloudWatch Events (e.g., start a
Lambda function when an S3 object is created).
6. Application Insights:
o Provides AI-powered recommendations for application health and performance
optimization.
Use Cases:
 Monitoring resource usage and application performance.

 Detecting and responding to anomalies or issues in real-time.
 Setting up automated scaling and resource optimization.
 Logging and troubleshooting application errors.
Advantages:
 Centralized Monitoring: Monitors multiple AWS services and custom applications in one place.
 Scalable: Designed to handle the scale of modern cloud applications.
 Automation: Automates responses to operational states (e.g., auto-scaling).
6.6 Amazon Elastic Beanstalk
Amazon Elastic Beanstalk is a fully managed service that simplifies the deployment and scaling of
web applications and services. It allows developers to upload their code, and Elastic Beanstalk
automatically handles the provisioning, configuration, and management of the underlying infrastructure.
With Elastic Beanstalk you can quickly deploy and manage applications in the AWS Cloud without
having to learn about the infrastructure that runs those applications. Amazon Web Services (AWS)
comprises over one hundred services, each of which exposes an area of functionality. While the variety of
services offers flexibility for how you want to manage your AWS infrastructure, it can be challenging to
figure out which services to use and how to provision them. Elastic Beanstalk reduces management
complexity without restricting choice or control. You simply upload your application, and Elastic
Beanstalk automatically handles the details of capacity provisioning, load balancing, scaling, and
application health monitoring.
Elastic Beanstalk supports applications developed in Go, Java, .NET, Node.js, PHP, Python, and Ruby.
Elastic Beanstalk also supports Docker platforms. With Docker containers you can choose your own
programming language and application dependencies that may not be supported by the other Elastic
Beanstalk platforms. When you deploy your application, Elastic Beanstalk builds the selected supported
platform version and provisions one or more AWS resources, such as Amazon EC2 instances, in your
AWS account to run your application.
You can interact with Elastic Beanstalk by using the Elastic Beanstalk console, the AWS Command Line
Interface (AWS CLI), or eb, a high-level CLI designed specifically for Elastic Beanstalk. You can also
perform most deployment tasks, such as changing the size of your fleet of Amazon EC2 instances or
monitoring your application, directly from the Elastic Beanstalk web interface (console). To use Elastic
Beanstalk, you create an application, upload an application version in the form of an application source
bundle (for example, a Java .war file) to Elastic Beanstalk, and then provide some information about the
application. Elastic Beanstalk automatically launches an environment and creates and configures the
AWS resources needed to run your code. After your environment is launched, you can then manage your
environment and deploy new application versions. The following diagram illustrates the workflow of
Elastic Beanstalk.
After you create and deploy your application, information about the application—including metrics,
events, and environment status—is available through the Elastic Beanstalk console, APIs, or Command
Line Interfaces, including the unified AWS CLI.
Key Features:
1. Easy Deployment:
o Supports multiple programming languages, including Java, Python, Node.js, PHP, Ruby,
Go, and .NET.
o Automatically provisions resources like EC2 instances, load balancers, and databases.
2. Managed Infrastructure:
o Handles tasks such as capacity provisioning, load balancing, scaling, and health
monitoring.
3. Customization:
o Offers full control over the underlying resources if needed, allowing you to tweak
configurations.
4. Monitoring and Logs:
o Provides integrated monitoring via AWS CloudWatch and access to application logs.
5. Scaling:
o Supports auto-scaling based on demand, ensuring applications perform well under
varying traffic loads.
Use Cases:
 Hosting scalable web applications and APIs.

 Quickly deploying applications in various languages without managing infrastructure.
 Running development, testing, and production environments.
Advantages:
 Developer-Friendly: Focus on writing code while AWS manages the heavy lifting.
 Flexibility: Combine automation with full control over resources when needed.
 Scalable: Automatically scales to handle fluctuating traffic.
6.7 SageMaker
Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and
developers can quickly and confidently build, train, and deploy ML models into a production-ready hosted
environment. It provides a UI experience for running ML workflows that makes SageMaker ML tools available
across multiple integrated development environments (IDEs).
With SageMaker, you can store and share your data without having to build and manage your own servers. This
gives you or your organizations more time to collaboratively build and develop your ML workflow, and do it
sooner. SageMaker provides managed ML algorithms to run efficiently against extremely large data in a distributed
environment. With built-in support for bring-your-own-algorithms and frameworks, SageMaker offers flexible
distributed training options that adjust to your specific workflows. Within a few steps, you can deploy a model into a
secure and scalable environment from the SageMaker console.
Sagemaker Workflow for creating ML Models
In machine learning, you teach a computer to make predictions or inferences. First, you use an algorithm and
example data to train a model. Then, you integrate your model into your application to generate inferences in real
time and at scale. The following diagram shows the typical workflow for creating an ML model. It includes three
stages in a circular flow that we cover in more detail proceeding the diagram:
 Generate example data

 Train a model
 Deploy the model
The diagram shows how to perform the following tasks in most typical scenarios:
1. Generate example data – To train a model, you need example data. The type of data that you need depends on the
business problem that you want the model to solve. This relates to the inferences that you want the model to
generate. For example, if you want to create a model that predicts a number from an input image of a handwritten
digit. To train this model, you need example images of handwritten numbers.
Data scientists often devote time exploring and preprocessing example data before using it for model training. To
preprocess data, you typically do the following:
a. Fetch the data – You might have in-house example data repositories, or you might use datasets that are publicly
available. Typically, you pull the dataset or datasets into a single repository.
b. Clean the data – To improve model training, inspect the data and clean it, as needed. For example, if your data has
a country name attribute with values United States and US, you can edit the data to be consistent.
c. Prepare or transform the data – To improve performance, you might perform additional data transformations. For
example, you might choose to combine attributes for a model that predicts the conditions that require de-icing an
aircraft. Instead of using temperature and humidity attributes separately, you can combine those attributes into a new
attribute to get a better model.
In SageMaker, you can preprocess example data using SageMaker APIs with the SageMaker Python SDK in an
integrated development environment (IDE). With SDK for Python (Boto3) you can fetch, explore, and prepare your
data for model training. For information about data preparation, processing, and transforming your data.
2. Train a model – Model training includes both training and evaluating the model, as follows:
 Training the model – To train a model, you need an algorithm or a pre-trained base model. The algorithm you
choose depends on a number of factors. For a built-in solution, you can use one of the algorithms that SageMaker
provides. For a list of algorithms provided by SageMaker and related considerations, see Built-in algorithms and
pretrained models in Amazon SageMaker. For a UI-based training solution that provides algorithms and models,
see SageMaker JumpStart pretrained models.
You also need compute resources for training. Your resource use depends on the size of your training dataset and
how quickly you need the results. You can use resources ranging from a single general-purpose instance to a
distributed cluster of GPU instances.
 Evaluating the model – After you train your model, you evaluate it to determine whether the accuracy of the
inferences is acceptable. To train and evaluate your model, use the SageMaker Python SDK to send requests to the
model for inferences through one of the available IDEs.
3. Deploy the model – You traditionally re-engineer a model before you integrate it with your application and deploy
it. With SageMaker hosting services, you can deploy your model independently, which decouples it from your
application code.
Machine learning is a continuous cycle. After deploying a model, you monitor the inferences, collect more high-
quality data, and evaluate the model to identify drift. You then increase the accuracy of your inferences by updating
your training data to include the newly collected high-quality data. As more example data becomes available, you
continue retraining your model to increase accuracy.
Machine learning environments & Pre-trained models
SageMaker JumpStart provides a wide range of pre-trained models, pre-built solution templates, and examples for
popular problem types. These use the SageMaker SDK as well as Studio Classic.
SageMaker includes the following machine learning environments.
A) Supervised learning
Amazon SageMaker provides several built-in general purpose algorithms that can be used for either classification or
regression problems.
 AutoGluon-Tabular—an open-source AutoML framework that succeeds by ensembling models and stacking them in
multiple layers.
 CatBoost—an implementation of the gradient-boosted trees algorithm that introduces ordered boosting and an
innovative algorithm for processing categorical features.
 Factorization Machines Algorithm—an extension of a linear model that is designed to economically capture
interactions between features within high-dimensional sparse datasets.
 K-Nearest Neighbors (k-NN) Algorithm—a non-parametric method that uses the k nearest labeled points to assign a
value. For classification, it is a label to a new data point. For regression, it is a predicted target value from the
average of the k nearest points.
 LightGBM—an implementation of the gradient-boosted trees algorithm that adds two novel techniques for improved
efficiency and scalability. These two novel techniques are Gradient-based One-Side Sampling (GOSS) and
Exclusive Feature Bundling (EFB).
 Linear Learner Algorithm—learns a linear function for regression or a linear threshold function for classification.
 TabTransformer—a novel deep tabular data modeling architecture built on self-attention-based Transformers.
 XGBoost algorithm with Amazon SageMaker—an implementation of the gradient-boosted trees algorithm that
combines an ensemble of estimates from a set of simpler and weaker models.
Amazon SageMaker also provides several built-in supervised learning algorithms used for more specialized tasks
during feature engineering and forecasting from time series data.
 Object2Vec Algorithm—a new highly customizable multi-purpose algorithm used for feature engineering. It can
learn low-dimensional dense embeddings of high-dimensional objects to produce features that improve training
efficiencies for downstream models. While this is a supervised algorithm, there are many scenarios in which the
relationship labels can be obtained purely from natural clusterings in data. Even though it requires labeled data for
training, this can occur without any explicit human annotation.
 Use the SageMaker DeepAR forecasting algorithm—a supervised learning algorithm for forecasting scalar (one-
dimensional) time series using recurrent neural networks (RNN).
B) Unsupervised learning
Amazon SageMaker provides several built-in algorithms that can be used for a variety of unsupervised learning
tasks. These tasks includes things like clustering, dimension reduction, pattern recognition, and anomaly detection.
 Principal Component Analysis (PCA) Algorithm—reduces the dimensionality (number of features) within a dataset
by projecting data points onto the first few principal components. The objective is to retain as much information or
variation as possible. For mathematicians, principal components are eigenvectors of the data's covariance matrix.
 K-Means Algorithm—finds discrete groupings within data. This occurs where members of a group are as similar as
possible to one another and as different as possible from members of other groups.
 IP Insights—learns the usage patterns for IPv4 addresses. It is designed to capture associations between IPv4
addresses and various entities, such as user IDs or account numbers.
 Random Cut Forest (RCF) Algorithm—detects anomalous data points within a data set that diverge from otherwise
well-structured or patterned data.
C) Textual analysis
SageMaker provides algorithms that are tailored to the analysis of textual documents. This includes text used in
natural language processing, document classification or summarization, topic modeling or classification, and
language transcription or translation.
 BlazingText algorithm—a highly optimized implementation of the Word2vec and text classification algorithms that
scale to large datasets easily. It is useful for many downstream natural language processing (NLP) tasks.
 Sequence-to-Sequence Algorithm—a supervised algorithm commonly used for neural machine translation.
 Latent Dirichlet Allocation (LDA) Algorithm—an algorithm suitable for determining topics in a set of documents. It
is an unsupervised algorithm, which means that it doesn't use example data with answers during training.
 Neural Topic Model (NTM) Algorithm—another unsupervised technique for determining topics in a set of
documents, using a neural network approach.
 Text Classification - TensorFlow—a supervised algorithm that supports transfer learning with available pretrained
models for text classification.
D) Image processing
SageMaker also provides image processing algorithms that are used for image classification, object detection, and
computer vision.
 Image Classification - MXNet—uses example data with answers (referred to as a supervised algorithm). Use this
algorithm to classify images.
 Image Classification - TensorFlow—uses pretrained TensorFlow Hub models to fine-tune for specific tasks (referred
to as a supervised algorithm). Use this algorithm to classify images.
 Semantic Segmentation Algorithm—provides a fine-grained, pixel-level approach to developing computer vision
applications.
 Object Detection - MXNet—detects and classifies objects in images using a single deep neural network. It is a
supervised learning algorithm that takes images as input and identifies all instances of objects within the image
scene.
 Object Detection - TensorFlow—detects bounding boxes and object labels in an image. It is a supervised learning
algorithm that supports transfer learning with available pretrained TensorFlow models.
 SageMaker geospatial capabilities
Build, train, and deploy ML models using geospatial data.
SageMaker Canvas
An auto ML service that gives people with no coding experience the ability to build models and make
predictions with them.
SageMaker Studio
An integrated machine learning environment where you can build, train, deploy, and analyze your models
all in the same application.
SageMaker Studio Lab
A free service that gives customers access to AWS compute resources in an environment based on open-
source JupyterLab.
RStudio on Amazon SageMaker
An integrated development environment for R, with a console, syntax-highlighting editor that supports
direct code execution, and tools for plotting, history, debugging and workspace management.
 Major features of SageMaker
SageMaker includes the following major features in alphabetical order excluding any SageMaker prefix.
Amazon Augmented AI
Build the workflows required for human review of ML predictions. Amazon A2I brings human review to
all developers, removing the undifferentiated heavy lifting associated with building human review
systems or managing large numbers of human reviewers.
AutoML step
Create an AutoML job to automatically train a model in Pipelines.
SageMaker Autopilot
Users without machine learning knowledge can quickly build classification and regression models.
Batch Transform
Preprocess datasets, run inference when you don't need a persistent endpoint, and associate input records
with inferences to assist the interpretation of results.
SageMaker Clarify
Improve your machine learning models by detecting potential bias and help explain the predictions that
models make.
Collaboration with shared spaces
A shared space consists of a shared JupyterServer application and a shared directory. All user profiles in a
Amazon SageMaker domain have access to all shared spaces in the domain.
SageMaker Data Wrangler
Import, analyze, prepare, and featurize data in SageMaker Studio. You can integrate Data Wrangler into
your machine learning workflows to simplify and streamline data pre-processing and feature engineering
using little to no coding. You can also add your own Python scripts and transformations to customize
your data prep workflow.
Data Wrangler data preparation widget
Interact with your data, get visualizations, explore actionable insights, and fix data quality issues.
SageMaker Debugger
Inspect training parameters and data throughout the training process. Automatically detect and alert users
to commonly occurring errors such as parameter values getting too large or small.
SageMaker Edge Manager
Optimize custom models for edge devices, create and manage fleets and run models with an efficient
runtime.
SageMaker Experiments
Experiment management and tracking. You can use the tracked data to reconstruct an experiment,
incrementally build on experiments conducted by peers, and trace model lineage for compliance and audit
verifications.
SageMaker Feature Store
A centralized store for features and associated metadata so features can be easily discovered and reused.
You can create two types of stores, an Online or Offline store. The Online Store can be used for low
latency, real-time inference use cases and the Offline Store can be used for training and batch inference.
SageMaker Ground Truth
High-quality training datasets by using workers along with machine learning to create labeled datasets.
SageMaker Ground Truth Plus

A turnkey data labeling feature to create high-quality training datasets without having to build labeling
applications and manage the labeling workforce on your own.
SageMaker Inference Recommender
Get recommendations on inference instance types and configurations (e.g. instance count, container
parameters and model optimizations) to use your ML models and workloads.
Inference shadow tests
Evaluate any changes to your model-serving infrastructure by comparing its performance against the
currently deployed infrastructure.
SageMaker JumpStart
Learn about SageMaker features and capabilities through curated 1-click solutions, example notebooks,
and pretrained models that you can deploy. You can also fine-tune the models and deploy them.
SageMaker ML Lineage Tracking
Track the lineage of machine learning workflows.
SageMaker Model Building Pipelines
Create and manage machine learning pipelines integrated directly with SageMaker jobs.
SageMaker Model Cards
Document information about your ML models in a single place for streamlined governance and reporting
throughout the ML lifecycle.
SageMaker Model Dashboard
A pre-built, visual overview of all the models in your account. Model Dashboard integrates information
from SageMaker Model Monitor, transform jobs, endpoints, lineage tracking, and CloudWatch so you can
access high-level model information and track model performance in one unified view.
SageMaker Model Monitor
Monitor and analyze models in production (endpoints) to detect data drift and deviations in model quality.
SageMaker Model Registry
Versioning, artifact and lineage tracking, approval workflow, and cross account support for deployment
of your machine learning models.
SageMaker Neo
Train machine learning models once, then run anywhere in the cloud and at the edge.
Notebook-based Workflows
Run your SageMaker Studio notebook as a non-interactive, scheduled job.
Preprocessing
Analyze and preprocess data, tackle feature engineering, and evaluate models.
SageMaker Projects
Create end-to-end ML solutions with CI/CD by using SageMaker projects.
Reinforcement Learning
Maximize the long-term reward that an agent receives as a result of its actions.
SageMaker Role Manager
Administrators can define least-privilege permissions for common ML activities using custom and
preconfigured persona-based IAM roles.
SageMaker Serverless Endpoints
A serverless endpoint option for hosting your ML model. Automatically scales in capacity to serve your
endpoint traffic. Removes the need to select instance types or manage scaling policies on an endpoint.
Studio Classic Git extension
A Git extension to enter the URL of a Git repository, clone it into your environment, push changes, and
view commit history.
SageMaker Studio Notebooks
The next generation of SageMaker notebooks that include AWS IAM Identity Center (IAM Identity
Center) integration, fast start-up times, and single-click sharing.
SageMaker Studio Notebooks and Amazon EMR
Easily discover, connect to, create, terminate and manage Amazon EMR clusters in single account and
cross account configurations directly from SageMaker Studio.
SageMaker Training Compiler
Train deep learning models faster on scalable GPU instances managed by SageMaker.
6.8 Amazon Polly
Amazon Polly is a text-to-speech (TTS) service offered by Amazon Web Services (AWS). It
converts written text into natural-sounding speech, enabling developers to create applications
that can "talk" and make content more engaging and accessible. Polly uses advanced deep
learning technologies to synthesize high-quality, lifelike speech.
Key Features of Amazon Polly:
1. Wide Selection of Voices:

o Offers a variety of male and female voices across multiple languages and dialects.
o Includes both standard voices and Neural Text-to-Speech (NTTS) voices for
enhanced naturalness.
2. Neural Text-to-Speech (NTTS):
o Leverages machine learning to produce highly realistic speech.
o Supports Amazon Polly Brand Voice for custom voice creation tailored to a
brand's personality.
3. Languages and Localization:
o Supports a broad range of languages, making it suitable for global audiences.
o Includes localized accents and regional variations.
4. Speech Styles:
o NTTS voices include specific styles like conversational, news reading, or
narration, offering flexibility based on use cases.
5. Custom Lexicons:
o Developers can use custom pronunciation dictionaries to fine-tune how words are
pronounced.
6. Real-Time and Offline Use:
o Capable of generating speech in real time or storing it as audio files for offline
playback.
7. Cost-Effective:
o Pay-as-you-go pricing model, making it suitable for projects of any size.
8. SSML Support:
o Allows the use of Speech Synthesis Markup Language (SSML) tags for
customizing speech output, such as pauses, emphasis, or phoneme adjustments.
Use Cases:
1. Accessibility:
o Helping visually impaired users consume digital content through audio.
2. Customer Service:
o Powering virtual assistants, chatbots, and IVR systems for enhanced customer
experiences.
3. E-Learning:
o Creating voiceovers for educational content and training materials.
4. Media and Entertainment:
o Automating voiceovers for news, podcasts, or audiobooks.
5. Multilingual Applications:
o Providing speech in various languages for global reach.
6.09 Customer Carbon Footprint Tool
The AWS Customer Carbon Footprint Tool is a service offered by Amazon Web Services
(AWS) to help organizations measure, monitor, and report the carbon emissions associated with
their use of AWS cloud services. This tool provides valuable insights to help customers track
their sustainability goals and optimize their cloud usage to minimize environmental impact.
Key Features of the AWS Customer Carbon Footprint Tool:
1. Carbon Emissions Reporting:

o The tool provides a detailed breakdown of carbon emissions associated with AWS
services used by the customer.
o Reports are based on the Greenhouse Gas Protocol standards, ensuring credibility
and comparability.
2. Historical and Forecasted Data:
o Customers can view their emissions trends over time.
o The tool also provides projections for future emissions, allowing for strategic
planning.
3. Region-Specific Insights:
o It accounts for differences in carbon intensity across AWS regions, providing
region-wise emission details.
4. Renewable Energy Impact:
o The tool highlights the role of AWS’s renewable energy initiatives in reducing
customer emissions.
o Includes insights into how moving workloads to regions with higher renewable
energy usage can further reduce emissions.
5. Actionable Insights for Optimization:
o Recommendations for reducing emissions, such as optimizing workloads,
migrating to more energy-efficient instance types, or using AWS services
designed for sustainability.
6. Integration with Sustainability Reporting:
o The data can be easily incorporated into corporate sustainability reports and
compliance documentation.
Benefits for Customers:
 Transparency: Provides clear and detailed data on how cloud usage contributes to their
carbon footprint.
 Informed Decision-Making: Enables organizations to identify areas for improvement in
their sustainability efforts.
 Alignment with Goals: Helps companies meet carbon neutrality or net-zero
commitments.
The tool is accessible through the AWS Management Console and is available to all AWS
customers at no additional cost. It's a step forward in helping organizations align their digital
transformation efforts with environmental sustainability.
6.10 Amazon Sustainability Data Initiative
The Amazon Sustainability Data Initiative (ASDI) is a program by Amazon Web Services
(AWS) designed to support sustainability research and innovation by providing access to
massive datasets and cloud computing resources. The initiative focuses on fostering
collaboration and enabling scientific breakthroughs in sustainability by making it easier and
more cost-effective for researchers, scientists, and organizations to analyze large-scale
environmental data.
Key Features of ASDI:
1. Free Data Hosting and Access:

o ASDI provides free hosting for eligible sustainability-related datasets, making it
accessible to a global audience.
o Examples include datasets on climate, weather, satellite imagery, biodiversity,
and more.
2. Collaboration with Partners:
o ASDI collaborates with organizations like NASA, NOAA, and the European
Space Agency to make their data accessible in the cloud.
3. Scalable Cloud Computing:
o Through AWS, researchers can leverage powerful computational tools to analyze
massive datasets without needing on-premises infrastructure.
4. Promoting Research and Solutions:
o ASDI encourages the use of these datasets for applications like climate modeling,
disaster response, biodiversity conservation, and renewable energy optimization.
5. Funding and Support:
o AWS offers research credits to help organizations and researchers offset the costs
of using AWS for sustainability projects.
Example Use Cases:
 Disaster Response: Real-time weather and climate data are used for planning and
responding to natural disasters like hurricanes or wildfires.
 Carbon Emissions Monitoring: Analyzing satellite data to track changes in forest cover
or urban emissions.
 Renewable Energy Optimization: Improving the efficiency of wind and solar energy
systems by using weather prediction data.
Important Questions
1) Describe in detail Amazon Sustainability Data Initiative (ASDI) and customer carbon
footprint tool with respect to AWS
2) What is Sagemaker? Explain the different machine learning environments and features of
Sagemaker along with workflow for creating an ML model
3) Describe Amazon Elastic Beanstalk in detail
4) write a short note on CloudWatch
5) Explain SNS and SQS with respect to AWS
6) Explain the Route 53 in brief
7) What is AWS Lambda? explain the components of Lambda in detail

Notes On Chapter 6

Uploaded by

Copyright:

Available Formats

Notes On Chapter 6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes On Chapter 6

Uploaded by

Copyright:

Available Formats

Chapter 6

AWS Application Services

 Event-driven architecture: Lambda functions are invoked by events, such as changes to

 Examples of Event Sources:

 Configuration: Triggers can be configured directly in the Lambda Management Console

4. Execution Role (IAM Role)

7. Concurrency and Scaling

Lambda provides mechanisms to handle failures during execution.

9. Monitoring and Logging

How AWS Lambda Works

 No Infrastructure Management: You don’t need to worry about server maintenance,

 Hosting websites and blogs.

 Ease of Use: A user-friendly dashboard simplifies setup and management.

 Limited Features: Not as flexible or powerful as the full AWS suite.

 Managing DNS for websites and web applications.

 Reliability: Built for robust and fail-safe DNS operations.

6.4 SNS and SQS

a) Amazon SNS (Simple Notification Service):

b) Amazon SQS (Simple Queue Service):

 Message Queuing: Stores messages until they are processed by consumers.

 Building decoupled microservices.

Amazon CloudWatch is a monitoring and observability service provided by AWS. It collects

 Monitoring resource usage and application performance.

 Hosting scalable web applications and APIs.

Sagemaker Workflow for creating ML Models

 Generate example data

Machine learning environments & Pre-trained models

SageMaker includes the following machine learning environments.

 SageMaker geospatial capabilities

Build, train, and deploy ML models using geospatial data.

SageMaker Studio Lab

RStudio on Amazon SageMaker

 Major features of SageMaker

Create an AutoML job to automatically train a model in Pipelines.

Collaboration with shared spaces

SageMaker Data Wrangler

Data Wrangler data preparation widget

SageMaker Edge Manager

SageMaker Feature Store

SageMaker Ground Truth

SageMaker Ground Truth Plus

SageMaker Inference Recommender

Inference shadow tests

SageMaker ML Lineage Tracking

Track the lineage of machine learning workflows.

SageMaker Model Building Pipelines

SageMaker Model Cards

SageMaker Model Dashboard

SageMaker Model Monitor

SageMaker Model Registry

Create end-to-end ML solutions with CI/CD by using SageMaker projects.

SageMaker Role Manager

SageMaker Serverless Endpoints

Studio Classic Git extension

SageMaker Studio Notebooks

SageMaker Studio Notebooks and Amazon EMR

SageMaker Training Compiler

6.8 Amazon Polly

1. Wide Selection of Voices: