AWS Data Analytics - Technical - Student

Download as pdf or txt
Download as pdf or txt
You are on page 1of 160

AWS Solutions Training for

Partners: Data Analytics on AWS


– Technical
Module 1: Course Introduction
Course objectives
In this course, you will learn how to:
• Identify Amazon Web Services (AWS) services in the AWS analytics stack
• Describe decision points and technology selections for data analytics architectures
• Design highly available and fault-tolerant serverless data analytics architectures
• Discuss the AWS Data Pipeline and the customer data analytics journey using the Data Flywheel
• Describe five AWS data analytics technical solutions:
• Modernizing a data warehouse with Amazon Redshift
• Data lakes
• Streaming data
• Data governance
• Machine learning (ML)
• Identify technical engagement strategies and best practices for delivering a proof of concept
(POC)
• Locate and use AWS Partner Network (APN) Partner resources for opportunities and training
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 3
About this course

• This course is for technical professionals at APN Consulting Partner organizations


who are engaged in pre-sales discussions with customers to help architect data
analytic solutions on AWS and answer technical questions about using AWS
data analytics services.
• This 1-day course is focused on educating technical professionals with sufficient
technical knowledge on AWS data analytics services and solutions to
successfully engage with and help customers.
• This course is not designed to be a technical deep dive into AWS data analytics
services and solutions. It provides the necessary resources and learning path
towards gaining deeper knowledge into the services.

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 4
Agenda

1st Half 2nd Half

Module 1: Course Introduction Module 4: AWS Data Analytics Solutions –


Part II
Module 2: AWS Data Analytics Stack
Portfolio Activity : Whiteboard – game analytics
pipeline architecture
Module 3: AWS Data Analytics Solutions –
Part I Technical Engagement Strategies

Demo -> Real-time Data Streaming into Serverless Data Lake

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 5
Module 2: AWS Data Analytics
Portfolio
Customer challenges and
opportunities for APN Partners

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 7
New realities

Explosion of data- Pay-as-you-go pricing


Demand growing for faster
connected devices, apps, allows organizations to
decision making on
and systems generate more analyze data to gain
real-time data.
data than ever before. insights.

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 8
Common data analytics challenges
What challenges do you see when using big data
analytics/technologies? (n=545)

Inadequate analytical know-how in our company


53% Top four challenges
Data privacy issues (safety of personal data)
49% involve knowledge, skill,
Inadequate technical know-how in our company
48%
security, and privacy
Data security (unauthorized access to company data) 48%

This is your opportunity

https://bi-survey.com/challenges-big-data-analytics
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 9
AWS data analytics portfolio
overview

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 10
Secure infrastructure for analytics

Customers need multiple levels of security, identity and access


management, encryption, and compliance to secure their data lake.

Security Identity Encryption Compliance

Amazon GuardDuty AWS Identify and Access AWS Certificate Manager AWS Artifact
Management (IAM) Private Certificate Authority
(ACM Private CA) Amazon Inspector
AWS Shield
AWS Single Sign-On
AWS Well-Architected Tool AWS Key Management Service AWS CloudHSM
AWS Organizations (AWS KMS)
Amazon Macie Amazon Cognito
AWS Directory Service Encryption at rest
Amazon Virtual Private AWS CloudTrail
Cloud (Amazon VPC) Encryption in transit
Bring your own keys,
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
hardware security module 11
(HSM) support
AWS data analytics portfolio
Data visualization, engagement, and machine learning
AWS Data Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon
Exchange QuickSight Pinpoint SageMaker Comprehend Polly Lex Rekognition Translate

Analytics
Amazon Amazon EMR AWS Glue Amazon Amazon OpenSearch Amazon Kinesis
Redshift (Spark and Presto) (Spark and Python) Athena Service Data Analytics

Data lake infrastructure and management

Amazon Simple Storage Service (Amazon S3) AWS Lake Formation AWS Glue
Amazon S3 Glacier

Data movement
AWS Database Migration Service (AWS DMS) | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams | Amazon Managed Streaming for Apache Kafka

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 12
Data movement services
Help customers move data from on premises to the cloud

AWS DMS AWS Snowball AWS Snowmobile

Amazon Kinesis Amazon Kinesis Amazon Managed


Data Streams Data Firehose Streaming for
Kafka

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 13
Data lake services

Customers are constrained by volume, variety, veracity, and velocity of


on-premises data, and data silos pose a major challenge.

Amazon S3 Amazon S3 Glacier AWS Lake Formation AWS Glue

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 14
Analytics services
Help customers extract value out of their data

Amazon Redshift Amazon EMR AWS Glue

Amazon Athena Amazon ES Amazon Kinesis


Data Analytics

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 15
Data visualization, engagement, and
machine learning services
Help customers understand and visualize their data, and use
machine learning (ML) for advanced analytics and predictions

AWS Data Exchange Amazon QuickSight Amazon SageMaker

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 16
AWS value proposition

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 17
Standards, formats, and open source

• Apache Flink • Apache Mahout • PyTorch


• Ganglia • MapReduce • R
• Apache HBase • Apache MXNet • Scala
• HCatalog • MySQL • Apache Spark
• Hadoop Distributed • Apache Oozie • Sqoop
File System (HDFS)
• Apache ORC • SQL
• Apache Hive
• Apache Parquet • TensorFlow
• Hudi
• Phoenix • Tez
• Java
• Apache Pig • Yarn
• JupyterHub
• Presto • Apache Zeppelin
• Apache Kafka
• Python • Apache Zookeeper
• Apache Livy

…and many more


© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 18
AWS alternatives to open source

Amazon Managed Streaming


Amazon EMR
OpenSearch Service for Apache Kafka

Spark, Hive, Presto, Operational Real-time


Flink, HBase analytics analytics
Hadoop Elasticsearch Kafka

Spark Logstash

Kibana
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 19
Data analytics pipeline

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 20
Data management challenges

How can customers:


• Collect a variety of data types accumulating at varying velocities?
• Collect data from numerous sources accumulating at differing velocities?
• Store massive amounts of data without running out of space?
• Cleanse and augment data quality to be analyzed?

Can they automate these steps?

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 21
Data analytics pipeline

Process and
Collect Visualize
analyze

Data Insights
Insights
Store

Time-to-answer (latency)
Balance of throughput and cost

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 22
Data pipeline challenges
Building a data pipeline is challenging. Customers must:
• Manage updates, patches, and software integrations
• Handle increased overhead costs plus need for support
• Maintain focus on the core task of building applications that lead to data insights

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 23
AWS data analytics pipeline services
Collect Store Process and analyze Visualize

Amazon Kinesis AWS Amazon S3 Amazon Amazon EMR Amazon Athena Amazon
Data Firehose Snowball S3 Glacier QuickSight

Amazon Kinesis AWS Direct Amazon DynamoDB Amazon RDS Amazon Kinesis Amazon
Data Streams Connect Data Analytics SageMaker

Amazon Amazon Aurora Amazon Redshift


Amazon Managed
OpenSearch
Streaming for
Kafka

Automate 24
AWS Database Migration Service AWS Glue
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Flywheel

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 25
Data Flywheel and customer journey
Store and
ü Save time manage data Modernize data
ü Save costs warehouse and
ü Agility build a data lake
ü Global distribution
ü Scale and performance ü New and faster insights
ü Broader access to analytics
Migrate data and
workloads to the cloud Build data-driven
applications
010010010
01010001
Data
100010100
Attract new customers
Generate more data

Innovate with
ü Better experiences
machine learning
ü Deeper engagement
ü Efficient processes
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 26
https://pages.awscloud.com/data-flywheel.html
Module 3: Data Analytics
Solutions on AWS – Part I
Objectives
In this module, you will learn how to:
• Explain data migration options from on premises to the AWS Cloud
• Describe two AWS data analytics technical solutions
• Modernizing a data warehouse with Amazon Redshift
• Data lakes
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 28
Data migration options

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 29
Journey to a modern data architecture
Evolution of data architecture

100110000100
101011100101
010111001010
100001011111
011010
001111001011

Data Data
0010110

Traditional Data lakes Real-time analytics


010001100001

Machine
0

data warehousing warehouse on AWS with streaming data governance learning


modernization

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 30
AWS data migration options

AWS Direct AWS Storage Amazon S3 Transfer AWS Snowball Amazon Kinesis AWS Database
Connect Gateway Acceleration Data Firehose Migration Service

• File gateway
• Tape gateway
• Volume gateway

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 31
Solution 1: Modernizing a data
warehouse with Amazon Redshift

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 32
Journey to a modern data architecture
Evolution of data architecture

100110000100
101011100101
010111001010
100001011111
011010
001111001011

Data Data
0010110

Traditional Data lakes Real-time analytics


010001100001

Machine
0

data warehousing warehouse on AWS with streaming data governance learning


modernization

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 33
Data warehouses

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 34
Data warehouse defined
Extract
Central repository of curated data
from different sources
• Data optimized for reporting and data
analysis
• Data extracted, cleaned, transformed, Source 1

Transform and load


(flat files) Analytics
and loaded into a data warehouse using
extract, transform, load (ETL) tool

Staging area
Benefits
• Better decision making Source 2 Data warehouse
(database)
• Consolidated data from many sources
• Improved data quality, consistency, and
accuracy
• Access to historical intelligence Source 3
(database)
• Improved performance

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data warehouse concepts: https://aws.amazon.com/data-warehouse/ 35
OLTP and OLAP comparison
Online Transactional Online Analytical
Processing (OLTP) Processing (OLAP)
Relational Database Data Warehouse
Application create, read, update, delete
Data Source OLTP and secondary source
(CRUD), origin

Analyze and gain insights from


Purpose Capture and store transactional data
historical data

SQL INSERT, UPDATE, DELETE – short ETL focused, batch job to import,
Workloads
and fast queries JOINs, run complex queries
Denormalized using fewer tables in
Highly normalized, many distinct
Database Design STAR and snowflake schema with
tables to reduce duplication
duplicated data for fast performance
Depends on the amount of data, Growth over time, typically ranges
Database Size
typically from MB to TB in size from TB to PB in size
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 36
Traditional architecture and on-premises
data warehouse challenges
• Difficult to scale
• Long lead times for hardware procurement
• Complex upgrades are the norm
• High overhead costs for administration
• Expensive licensing and support costs
• Proprietary formats do not support newer open data formats, which results in data silos
• Data not cataloged, unreliable quality
• Licensing cost limits number of users and how much data can be accommodated
• Difficult to integrate with services and tools

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 37
Amazon Redshift

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 38
Amazon Redshift
Secure data warehouse that extends seamlessly to a data lake

A fully managed data warehouse that is highly integrated with


other AWS services. Features include:
• Optimized for high performance
• Support for open file formats
• Petabyte-scale capability
• Support for complex queries and analytics, with data visualization
Amazon Redshift tools
• Secure end-to-end encryption and certified compliance
• Service Level Agreement (SLA) of 99.9 percent
• Based on open source Postgres database
• Cost efficient
https://aws.amazon.com/redshift/pricing/
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 39
Amazon Redshift performance features

Massively parallel processing Columnar storage Shared-nothing architecture


(MPP)

Breaks a large job it into smaller Data from each column is stored Independent and resilient nodes
tasks, then distributes the tasks to together so the data can be without any dependencies
multiple compute nodes accessed faster, without scanning
and sorting all other columns

Result: Faster processing time Result: Compression of stored Result: Improves scalability
data improves performance

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 40
Amazon Redshift architecture

Java Database Open Database


Connectivity Client applications Connectivity
(JDBC) (ODBC)

Leader node

Compute Node 1 Compute Node 2

Node slices Node slices

Data warehouse cluster


41
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://docs.aws.amazon.com/redshift/index.html
Leader node
Responsible for communication with the client application and
compute notes
Amazon Redshift leader
Leader node
node:
• SQL endpoint
• Metadata
• Query compilation and Compute node 1 Compute node 2
optimization
• Coordinates parallel SQL
processing Node slices Node slices
• Machine learning (ML)
optimizations
Data warehouse cluster

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 42
Compute node
Runs queries in parallel and returns the result to the leader node
• SQL running powerhouses
• Compute node can load, unload, backup, and Leader node
restore data to and from Amazon S3.
• Node clusters range from 1 to 128.

Compute node 1 Compute node 2

Node slices Node slices

Data warehouse cluster

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 43
Compute node slices
Slices are a symmetric multiprocessing (SMP) mechanism.
Compute node 1 Compute node 2
• Partitioned into slices.
• Slices work in parallel to
complete operations. Node slices Node slices

• Virtual processors contained in


each compute node.
Data warehouse cluster
• Each slice is allocated an equal
Slice 1 | Slice 2
amount of memory, compute
allowance, and disk space.
Virtual core Virtual core
• Each slice operates in parallel
but can request data from 7.5 GB 7.5 GB
other slices. RAM RAM
Local disk Local disk

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 44
Amazon Redshift instance types

https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 45
Management interfaces

https://us-west-2.console.aws.amazon.com/redshiftv2/home?region=us-west-2#query-editor
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 46
Solution 2: Data lakes

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 47
Journey to a modern data architecture
Evolution of data architecture

100110000100
101011100101
010111001010
100001011111
011010
001111001011

Data Data
0010110

Traditional Data lakes Real-time analytics


010001100001

Machine
0

data warehousing warehouse on AWS with streaming data governance learning


modernization

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 48
Extract more value from data

Data lake

Data producers Data lake team Data consumers

Application developers
Applications IoT devices Data analysts
Data engineers and
Data scientists

End users Security and governance Officer Business users


© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 49
Data lakes defined
Architectural approach for a centralized
enterprise data repository stored on
Amazon S3
• Stores all structured, semi-structured, Machine
unstructured, and binary data at unlimited scale learning
• Holds curated and raw data Business
• Uses AWS data analytics tools for analytics Data lake intelligence
and
• Increases pace of innovation by extracting insights Data analytics
from data warehousing
• Enables more organizational agility
• Reduces cost and delivers results with predictive Open formats
analytics and ML central catalog

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 50
Reference architecture:
Catalog and search Access and user interface
Data lake on AWS

AWS Glue Amazon DynamoDB Amazon ES Amazon API Gateway IAM Amazon Cognito

Data ingestion Central storage Processing and analytics

AWS Data Exchange Amazon Kinesis


Machine Amazon QuickSight Amazon EMR
Amazon S3 learning

AWS AWS DMS Amazon


Direct Connect AWS Snowball Amazon Athena
Redshift
Protect and secure

Amazon CloudWatch IAM AWS STS AWS KMS AWS CloudTrail


51
Data services – AWS Glue

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 52
Cleansing data
After migration, data still presents challenges:

Data is increasingly diverse It accumulates rapidly It must be cleansed before


analyzed by many applications
• Volume • Missing or incorrect data
• Variety • Wrong data format Avoid unsearchable data
• Velocity • Partial missing data
• Veracity

How can customers provide access to users to gain insights?


© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 53
AWS Glue crawlers
AWS IAM role
Built-in classifiers
AWS Glue crawler
MySQL
MariaDB
Databases PostgreSQL
JDBC Amazon Aurora
connection Oracle

NoSQL Amazon Redshift Amazon Redshift


connection
Apache Avro
Object Parquet
connection Amazon DynamoDB ORC
XML
JSON and JSONPaths
AWS CloudTrail
Amazon S3
Binary JSON (BSON)
Logs
Delimited
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
… growing 55
AWS Glue Data Catalog services

AWS Glue ETL

Amazon Athena

AWS Glue Data


Catalog

Amazon Redshift
lake house

Amazon EMR
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 56
Use case: Log aggregation with ETL

Update table partition


AWS Glue ETL AWS Glue Data
Catalog
Create partition
on Amazon S3
AWS Glue
crawler Amazon S3 Amazon S3
AWS service logs
bucket bucket
Web application logs
Query data
Server logs
Amazon Athena

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 57
Data services – AWS Data
Exchange and Amazon Athena

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 58
AWS Data Exchange
Find and subscribe to third-party data in the cloud

Find diverse data in one place Analyze data Access third-party data

• More than 1,000 data products • Download of copy of data to • Streamlined access to data
Amazon S3
• More than 80 data providers • Minimize legal reviews and
• Combine, analyze, and model with negotiations
existing data

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 59
Amazon Athena
Interactive query service to analyze data in Amazon S3 using standard SQL

No setup costs Pay per query Open Streamlined

$ SQL

Zero setup costs, Pay only for queries run, ANSI SQL interface, Serverless, zero
point to Amazon S3 save 30%–90% on JDBC/ODBC drivers, multiple infrastructure, zero
and start querying per-query costs through formats, compression types, administration,
compression and complex joins and data integrated with Amazon
types QuickSight

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 60
AWS Lake Formation

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 61
Challenges of building a secure data lake

Typical steps to build a secure data lake


Ingestion and cleaning Security
Analytics and machine learning

1 Set up
storage

4 Configure and
2 Move data enforce security and
3 Cleanse,
compliance policies
prepare, and 5 Make data available
catalog data for analytics

Data engineer Data security officer Data analyst


© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 62
AWS Lake Formation for a secure data lake

1 2 3 4

Ingest and organize Secure and control Collaborate and use Monitor and audit
Automates creating data Sets up fine-grained Search and data Based on data access
lake and data ingestion. access control and data discovery using Data and governance policies,
governance. Catalog metadata. alert notifications are
raised on policy violation
To protect data, all and logged.
access is checked against
set policies.
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 63
AWS Lake Formation builds on AWS Glue
AWS Lake Formation

Blueprints Security, search, collaboration


Monitoring

Workflow AWS Glue Data Catalog

Connections,
AWS Glue ETL jobs AWS Glue crawlers
databases, tables

AWS Glue

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 64
AWS Lake Formation benefits

Amazon Athena Amazon Amazon Redshift Amazon Amazon EMR


QuickSight SageMaker Comprehensive set of integrated
tools enables every user equally.

Centralized management of
fine-grained permissions
AWS Lake empowers security officers.
Formation
Simplified ingest and cleaning
AWS Glue Blueprints ML Data Catalog Access enables data engineers to build
Transforms control faster.

Cost effective, durable storage


includes global replication
capabilities.
Amazon S3
data lake storage

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 65
Data visualization with Amazon
QuickSight

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 66
Amazon QuickSight
BI service built for the cloud with pay-per-session pricing and ML insights

Scalable Pay for use


Automatically scales with use and Pay monthly or annually.
activity, with no additional
With pay-per-session pricing, customers
infrastructure requirements.
only pay when they access their reports
and dashboards, with no upfront costs.
Seamlessly grows with customers.

Serverless and fully Fully integrated


managed Deeply integrated with data sources and
other AWS services like Amazon Redshift,
Fully managed cloud application,
Amazon S3, Athena, Amazon Aurora,
meaning there's no upfront cost,
Amazon RDS, IAM, AWS CloudTrail, and
software to deploy, capacity planning,
Amazon Cloud Directory– providing
maintenance, upgrades, or migrations.
customers with everything they need for an
end-to-end cloud BI solution.
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 67
Amazon QuickSight visualization

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 68
Serverless data lakes and analytics

Web app data Amazon Athena

Amazon RDS
Amazon EMR Amazon
AWS Glue AWS Glue Data
Amazon S3 crawler QuickSight
Catalog
Other databases

Amazon Redshift
On-premises data Spectrum

Streaming data
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 69
Summary

Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data
AWS data • Amazon S3
Amazon Redshift
migration • AWS Glue
options • AWS Data Exchange
• Amazon Athena
• AWS Lake Formation
• Amazon QuickSight

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 70
Module 4: AWS Data Analytics
Solutions – Part II
Objectives
In this module, you will learn about three key types of data analytics
technical solutions on AWS:
• Streaming and real-time analytics with Amazon Kinesis
• Data governance
• Extended solution: Insights and monetization with machine learning (ML)
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 72
Solution 3: Streaming and
real-time analytics with
Amazon Kinesis

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 73
Journey to a modern data architecture
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with governance learning
streaming data

Types of data used


© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 74
Streaming data defined
Data that is generated continuously from thousands of
data sources, sent simultaneously

Player-game interactions Social media streams


Music downloads
Geolocation of
cars and devices Website clicks

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 75
Common use cases: Real-time analytics
The value of data diminishes over time

Milliseconds Seconds Minutes Hours

• Messaging between • Log ingestion • Streaming ETL into


microservices • Internet of Things (IoT) data lakes and
• Response analytics device maintenance data warehouse
(web and mobile • Change data capture (CDC)
application
notifications)

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 76
Enabling real-time analytics
Data streaming technology enables a customer to ingest, process, and analyze high
volumes of high-velocity data from a variety of sources, in real time.

1. 2. 3. 4. 5.

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 77
Data streaming solution challenges

Challenges of building on-premises, real-time streaming solutions:

Difficult to set up Tricky to scale

Difficult to achieve high availability


Integration requires development
Error prone and complex to
manage Expensive to maintain

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 78
AWS streaming data solutions
Efficiently collect, process, and analyze data streams in real time

Amazon Kinesis Amazon Kinesis Amazon Kinesis


Data Streams Data Firehose Data Analytics

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 79
Data generators: Simple streaming
data patterns
Data producers Streaming services Data consumers

Mobile and applications

Amazon Kinesis Agent Amazon Simple Amazon EMR


Amazon Kinesis
Storage Service (S3)
Amazon Kinesis Producer Data Analytics
Library (KPL)

Amazon Kinesis Data Streams


Amazon Redshift Amazon EC2

Amazon CloudWatch Logs Amazon Kinesis


Data Firehose

Amazon CloudWatch Events


Amazon Kinesis
Connector Library
AWS IoT
Amazon Kinesis
Apache Kafka Data Streams
80
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 81
Amazon Kinesis Data Streams
Massively scalable, highly durable data ingestion and processing service
optimized for real-time data streaming
Data collected is Real-time analytics Data synchronously Serverless, can scale
available within replicates data across dynamically to handle

70 3
Availability MB to TB Thousands
and
Zones in a each hour to millions
Region of PutRecords
milliseconds • Dashboards each second
• Anomaly detection
• Dynamic pricing
No upfront cost
low, pay-as-
you-go pricing
https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 82
How Kinesis Data Streams works

Amazon Kinesis
Data Analytics

Spark on Amazon EMR

Amazon Kinesis
Input
Data Streams Output
Amazon EC2

Capture and send data Ingest and store data Analyze streaming data
streams for processing using BI tools

AWS Lambda

Build custom, real-time


applications
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 83
Kinesis Data Streams architecture
Data record
Shard 1
• Sequence #
• Partition Key Amazon S3
Amazon EC2 instances Amazon Kinesis
• Data blob
Data Firehose

Client Shard 1 Amazon DynamoDB


EC2 instance
Shard 2
Mobile client Amazon Redshift
EC2 instance

Shard N Amazon EMR


Traditional
server
Amazon Kinesis
Amazon Kinesis Data Analytics Amazon Kinesis
Data producers Data Stream Data stream Data consumers Data Firehose
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 84
https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
Kinesis Data Streams provisioning

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 85
Amazon Kinesis Data Firehose

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 86
How Kinesis Data Firehose works

Amazon S3

Amazon Kinesis Amazon Redshift


Input
Data Firehose Output

Capture and send data Prepares and loads data Analyze streaming data
continuously to the Amazon using analytics tools
selected destinations OpenSearch Service

Splunk

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Durably store the data 87
for analytics
Kinesis Data Streams and
Kinesis Data Firehose
Characteristics Amazon Kinesis Data Streams Amazon Kinesis Data Firehose

Processing time As fast as 70 milliseconds after ingestion Between 60–900 seconds

Stream storage and In shards, default 24 hours and up to 365 Max buffer size 128 MB and max time 900
duration days seconds
Data transformation and
None Uses AWS Lambda and AWS Glue
conversion
Amazon Kinesis Agent, applications using Amazon Kinesis Producer Library (KPL), AWS SDK
Data producer
for Java, Amazon CloudWatch Logs and CloudWatch Events, AWS IoT
AWS Lambda, Amazon Kinesis Data Analytics,
AWS Lambda, Amazon Kinesis Data
and Kinesis Data Firehose, apps using the KCL
Analytics, Amazon Kinesis Data Firehose,
Data consumer and SWK for Java, Amazon S3, Amazon
Applications using the Kinesis Client Library
Redshift, Amazon ES, Splunk, and Amazon
(KCL) and SDK for Java
Kinesis Data Analytics

Data compression None gzip, Snappy, Zip, or no data compression

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5 88
https://aws.amazon.com/kinesis/data-firehose/faqs/?nc=sn&loc=5
When to use Kinesis Data Streams and
Kinesis Data Firehose

For data streaming applications with massive ingestion requirements

• Requires data to be sent to consumer analytics services for millisecond


response time
• Massively scalable
Amazon Kinesis • Data retention time ranging from hours to days
Data Streams • Example: Real-time gaming

For data streaming applications that require near real-time responses in seconds

• Need for data augmentation, data transformation, or data compression


• Need to save data to Amazon S3, Amazon Redshift, Amazon ES, Splunk, or
send data to Amazon Kinesis Data Analytics for analytics
Amazon Kinesis • Example: Log analytics
Data Firehose

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 89
Amazon Kinesis Data Analytics

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 90
Amazon Kinesis Data Analytics

Amazon Kinesis
Input Data Analytics Output

Capture streaming data Query and analyze Send processes data


with Amazon MSK, streaming data to analytics tools to
Amazon Kinesis Data create alerts and
Streams, Amazon Kinesis respond in real time
Data Firehose, or other
data sources

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 91
Use case: Clickstream analytics
Evolve from batch processing to real-time analytics

Amazon Kinesis Amazon Kinesis Amazon Kinesis Amazon Redshift


Input Output
Data Firehose Data Analytics Data Firehose

Websites send Collects the data and Processes data in Loads processed Runs analytics Readers see
clickstream data sends to Kinesis Data near-real time data into models to identify personalized content
Analytics Amazon Redshift content suggestions and
recommendations increase
engagement

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. s92
Solution 4: Data governance

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 93
Journey to a modern data architecture
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with governance learning
streaming data

Types of data used


© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 94
Challenges of data in data lakes

• Securing data
• Auditing data usage
• Managing data access
• Safeguarding sensitive data and PII
• Maintaining regulations and mandates

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 95
Resolving PII dangers
Consumer
consent
violation
External Data
• Do these issues need to be hacking breach
resolved?
• Is there a solution architecture
that solves all PII issues? Personally
Second- identifiable
• What best practices can be party information Spyware
used to mitigate PII dangers? misuse (PII)

Unsecured
Espionage
devices
Rogue
agents

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 96
Amazon Macie

Continually evaluate
Discover sensitive
Amazon Macie Amazon S3 Take action
data
environment
Enable Amazon Automatically Analyzes bucket Generates findings
Macie with one-click generates an using ML and and sends to
in the AWS inventory of pattern matching to Amazon
Management Amazon S3 bucket discover sensitive CloudWatch Events
Console or with a and details on the data, like PII for integration into
single API call bucket-level security workflows and
• Financial
and access controls remediation actions
• Personal
• National
• Medical
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Credentials and secrets 97
Journey to a modern data architecture
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with governance learning
streaming data

Types of data used


© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 98
Data lakes and machine learning

AI and
Business analytics machine learning
Machine learning requires:
• More data: Collect all types of data
• Flexibility: Define schema during analysis
Data warehouse Big data
Interactive Real time
queries processing • Scalability: Scale storage and compute (CPU or
Data Catalog GPU) independently
10011000010010101
11001010101110010
• Data transformation and processing: Run a broad
set of processing and analytics on the
10100001011111011
010
00111100101100101

same data without movement


10
0100011000010

Data warehouse
Data lake
• Security: Networking, identity, encryption, and
compliance

OLTP ERP CRM LOB Devices Web Sensors Social


© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 99
Amazon SageMaker
Machine learning at enterprise scale
• Managed Jupyter for enterprise data science
Build
• Sample notebooks for most common use cases
Notebooks for High-performance
common problems algorithms • Single-pass, streaming training algorithms

• Training models at scale without DevOps


Train and tune assistance
One-click training Hyperparameter • ML on ML to optimize hyperparameters
optimization

• Deploy to production with a single call


Deploy and manage
• Fully managed, production-grade inferences
One-click Fully managed
deployment elastic hosting

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/machine-learning/?nc2=h_ql_prod_ml 100
Machine learning solutions on data lakes

Amazon S3

Kinesis Data Firehose

AWS Glue Amazon S3 Amazon


Real-time ratings
SageMaker
AWS Lambda
Kinesis
Data Analytics

Processed data
Amazon S3 AWS Database Amazon
Migration Service DynamoDB
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 101
Use case: Next Caller

https://www.youtube.com/watch?v=K27WjYwyqw8&list=PLhr1KZpdzukdeX8m
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. Q2qO73bg6UKQHYsHb&index=1&did=ta_card&trk=ta_card 102
Summary

Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data

• Kinesis Data Streams Amazon SageMaker


Amazon Macie
• Kinesis Data Firehose
• Kinesis Data Analytics

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 103
Module 5: AWS Technical
Conversations and Engagement
AWS six-phase strategy
for implementing a data
analytics solution

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 105
Data analytics projects: A phased strategy

Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6

Data analytics Use case Architecture and POC Application Migration


in the cloud Identification data migration delivery tuning and from POC to
assessment optimization production

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 106
Phase 1: Data analytics in the cloud
assessment

Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6

Data analytics Use case Architecture and POC Application Migration


in the cloud identification data migration Delivery tuning and from POC to
Assessment optimization production

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 107
Phase 1: Data analytics in the cloud
assessment: Identify challenges

Agreement from the customer:


“Yes, AWS is the right place to
build my data analytics solution.”
Phase 1
GOAL
Data analytics in
the cloud OBJECTIVES
assessment

Introduce
Provide Emphasize how Conduct
customer
AWS services the pieces fit differentiation
references and
overview together conversations
use cases
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 108
Data analytics in the cloud assessment: AWS best
practices

Do Avoid

Remember that different Focusing only on


Phase 1 stakeholders have the IT department
different goals and
Data analytics points of view
in the cloud
assessment
Data analytics in the cloud assessment: AWS best
practices

Do Avoid

Focus on the business Focusing first on


Phase 1 value of untapped data technology instead of
business value
Data analytics
in the cloud
assessment
Phase 2: Use case identification

Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6

Data Use case Architecture and POC Application Migration


analytics in identification data migration delivery tuning and from POC to
the cloud optimization production
assessment

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 111
Phase 2: Use case identification

Identify high impact use case


to run as the customer’s
first AWS data analytics POC
Phase 2
GOAL
Use case
identification Objectives

Match AWS data Select business


Collect critical
Identify data tools and question
business
available services to data with most
questions
sources immediate value
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 112
Use case exploration

1 Brainstorm as many business discovery questions as


possible with the customer

2 Identify the single business discovery question that delivers


the most immediate and impactful business value

Exploratory 3 Clearly articulate the question, the data available to inform


the answer, and the AWS tools required

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 113
Use case identification: AWS best practices

Do Avoid

Focus on specific Focusing first on


Phase 2 use cases and technology instead of
associated data business value
Use case
identification
Use case identification: AWS best practices

Do Avoid

Look for highly Assuming 100 percent


Phase 2 variable workloads active use when
calculating costs
Use case
identification
Use case identification: AWS best practices

Do Avoid

Focus on the benefits of Emphasis on reducing IT


Phase 2 managed services headcount,
staff, and resources
Use case
identification
Phase 3: Architecture and data migration

PHASE1 PHASE2 PHASE3 PHASE4 PHASE5 PHASE6

Data Use case Architecture and POC Application Migration


analytics in identification data migration delivery tuning and from POC to
the cloud optimization production
Assessment

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 117
Phase 3: Architecture and data migration

Define the AWS architecture


blueprint and migrate the
customer’s data into AWS
Phase 3
GOAL
Architecture and
data migration Objectives

Identify mix Reinforce with


Solidify Migrate
of tools and reference
architecture customer data
services architecture
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 118
Architecture and data migration: APN Partner best
practices

Do Avoid

Engage AWS
Engaging AWS Support
AWS Partner Development too late in the process
Phase 3
Managers
Architecture Partner Solutions Architects
and data
migration AWS Professional Services
Phase 4: Proof of concept delivery

PHASE1 PHASE2 PHASE3 PHASE4 PHASE5 PHASE6

Data Use case Architecture and POC delivery Application Migration


Analytics in identification data migration tuning and from POC to
the cloud optimization production
assessment

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 120
Phase 4: Proof of concept

Deliver a successful first project with AWS


data analytics offerings that
demonstrates immediate, impactful
business value
Phase 4
GOAL
POC Delivery
Objectives

Build out Tailor to Apply AWS


blueprint customer environment
architecture use case best practices
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 121
Phase 5: Application tuning
and optimization

PHASE1 PHASE2 PHASE3 PHASE4 PHASE5 PHASE6

Data Use case Architecture and POC Application Migration


analytics in identification data migration Delivery tuning and from POC to
the cloud optimization production
assessment

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 122
Phase 5: Application tuning
and optimization

Fine-tune the POC

Phase 5
GOAL
Application
tuning and Objectives
optimization

Tune queries Troubleshoot


Re-evaluate Automate to
for fastest errors and
and replace optimize cost
turnaround time roadblocks
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 123
Phase 6: Migration from POC
to production

PHASE1 PHASE2 PHASE3 PHASE4 PHASE5 PHASE6

Data Use case Architecture and POC Application Migration


analytics in identification data migration delivery tuning and from POC to
the cloud optimization production
assessment

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 124
Phase 6: Migration from POC
to production

Move the POC into production.


Ensure that the POC becomes
a real application.
Phase 6
GOAL
Migration
from POC to Objectives
production

Communicate Demonstrate Migrate


Troubleshoot
POC value value and applications
errors and
to senior results to the from dev, test to
roadblocks
management organization production
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 125
Phase 6: POC to production best practices

Do

• Identify groups and roles in the


organization that requested the POC
Phase 6
• Create a thought-out plan
POC to • Set up a continuous integration and
production continuous delivery (CI/CD) pipeline
• Set up metrics and alarms for production
environment
• Continue engagement with the customer
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 126
AWS POC best practices

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 127
Possible POC pitfalls

Project Lack of skills to


incorrectly complete POC
scoped

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 128
Delivering a successful POC

Before anything else: Aim for optimal Identify how to


Identify the end goal architecture upfront measure success

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 129
POC success factors

• Work with a C-level POC sponsor


• Create a well-scoped SOW
• Use a Partner solutions architect to review the POC and SOW
• Have well-defined deliverables
• Identify all services for the POC
• Follow an agile process
• Track key risks and having a contingency plan
• Use an AWS PDM or AWS PSM to help identify applicable POC
funding programs
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 130
Example: Defining and measuring
success criteria
Success criteria Before After Measurement examples
Cost: capital expenditure
(capex) /operational $1.16 million $500,000 57% savings in capex
expenditure (opex)

Time-to-deployment 12 months – on premises 3 months – AWS Cloud 75% faster time-to-deployment


Data ingest window: Data ingest window:
Time-to-value
16 hours 7 hours
43% faster time-to-value

Time-to-market 9 months 1 month 80% faster in launching new products

New products 2 new products annually 6 new products annually 200% increase in products launched
Micro-batch,
Data availability Batch only
real-time streaming
80% faster time to data visibility

Customer engagement 30,000 page views 37,500 page views 25% increase in customer engagement
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 131
https://calculator.aws/#/addService
AWS well-architected review
using the Analytics Lens

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 132
10 design principles:
Analytics applications, 1–5
1. Automate data ingestion to handle big data
2. Design ingestion for failures and duplicates
3. Preserve original source data
4. Describe data with metadata
5. Establish data lineage

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 133
10 design principles:
Analytics applications, 6–10
6. Use the right ETL tool for the job
7. Orchestrate ETL workflows
8. Tier storage appropriately
9. Secure, protect, and manage the entire analytics pipeline
10. Design for scalable and reliable analytics pipelines

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 134
Activity: Game Analytics Pipeline
Architecture
Game analytics pipeline solution
architecture
Role
Requirements
You are a Partner solution
engineer (SE) helping a cloud • Enable the ingestion of streaming data from millions of
gamers playing from their desktop PCs now, and eventually
gaming architect at a hot new mobile devices
startup. • Enable customers to capture real-time analytics, monitoring
the game and gamers to improve the gamer experience and
Goal
the game, and for monetization.
Whiteboard an AWS • Enable internal team needs to track activities such as key
architecture for a game performance indicators (KPIs), system performance, user
activity, gamer satisfaction reporting, and expenses.
analytics pipeline for a multi-
player game with over five Constraints
million gamers worldwide. • Small IT staff
• Low budget

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 136
Whiteboard: Game analytics pipeline architecture

Who are the actors in a gaming application?

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 137
Whiteboard: Data producers and consumers
Data
Consume
Data consumers
rs
producers
Producer
s

Live ops
PC

Service
teams
PC

PC Data
engineers
AWS SDK

What do the gamers generate? Data


analysts

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 138
Whiteboard: Data producers and consumers
Data
Consume
Data Events stream consumers
rs
producers
Producer
s Events stream

Events stream Live ops


PC

Service
teams
PC

PC Data
engineers
AWS SDK

Which AWS services would be most suitable for ingesting Data


analysts
the real-time streaming gaming data?

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 139
Whiteboard: Events stream
Data
Consume
Data Events Stream
Events stream
consumers
rs
producers
Producer
s

Kinesis Live ops


PC Data Streams

Service
teams
PC

Data
PC engineers
AWS SDK

Data
Which AWS services would be most suitable for processing analysts
the event streaming data?
Whiteboard: Streaming analytics
Data Events Stream
stream Streaming analytics
Streaming Analytics Data
Consume
producers
Producer consumers
rs
s

PC Kinesis Kinesis AWS


Live ops
Data Streams Data Analytics Lambda

PC Service
teams

PC
Data
AWS SDK engineers

Which AWS services would be most suitable for making Data


analysts
the processed data available to data consumers?
Whiteboard: Metrics and notifications
Data Events Stream
Events stream StreamingAnalytics
Streaming analytics Metricsand
Metrics andNotifications
notifications Data
Consume
producers
Producer consumers
rs
s

PC Kinesis Kinesis AWS CloudWatch Amazon SNS Live ops


Data Streams Data Analytics Lambda

PC Service
teams

PC
Data
AWS SDK engineers

How and where should you store massive amounts of


Data
growing data? Which Kinesis services can write data to analysts
storage?
Whiteboard: Streaming ingestion
Data Events Stream
Events stream StreamingAnalytics
Streaming analytics Metricsand
Metrics andNotifications
notifications Data
Consume
producers
Producer consumers
rs
s

PC Kinesis Kinesis AWS CloudWatch Amazon SNS Live ops


Data Streams Data Analytics Lambda

Streaming ingestion
PC Service
teams

Kinesis AWS
PC
Data Firehose Lambda
Data
AWS SDK engineers

What do you do with augmented streaming data? Which Data


analysts
AWS services are ideal for this?
Whiteboard: Data lake integration and ETL
Data Events Stream
Events stream Streaming analytics
Streaming Analytics Metricsand
Metrics andNotifications
notifications Data
Consume
producers
Producer consumers
rs
s

PC Kinesis Kinesis AWS CloudWatch Amazon SNS Live ops


Data Streams Data Analytics Lambda

Streaming ingestion Data lake integration and


PC Service
ETL
teams

Kinesis AWS Amazon AWS Glue


PC
Data Firehose Lambda S3
Data
AWS SDK engineers

How will the business users be able to view the data? Data
Which AWS tools would you recommend? analysts
Whiteboard: Data visualization and interactive analytics
Data Events Stream
Events stream StreamingAnalytics
Streaming analytics Metricsand
Metrics andNotifications
notifications Data
Consume
producers
Producer consumers
rs
s

PC Kinesis Kinesis AWS CloudWatch Amazon SNS Live ops


Data Streams Data Analytics Lambda

Streaming ingestion Data lake integration and


PC Service
ETL
teams

Kinesis AWS Amazon S3 AWS Glue


PC
Data Firehose Lambda
Data
AWS SDK engineers
Interactive analytics

Which services are needed to complete the full Data


analysts
architecture? Athena QuickSight
Game analytics pipeline architecture
Data
Data Events Stream
Events stream StreamingAnalytics
Streaming analytics Metrics
Metrics and
and notifications
Notifications Data
Consume
producers
Producers consumers
rs

PC Kinesis Kinesis AWS CloudWatch Amazon SNS Live ops


Data Streams Data Analytics Lambda

Solution API Streaming ingestion Data lake integration and


Mobile Service
ETL
teams

API Gateway
Kinesis AWS Amazon AWS Glue
Servers and (events)
Data Firehose Lambda S3
backend Data
AWS SDK engineers
Configuration data Interactive analytics
Configure Lambda authorizer
apps
Data
Athena QuickSight analysts
Configuration AWS Lambda DynamoDB
Admin
endpoints
Try it yourself

Game Analytics Pipeline references

AWS Solution Implementation: Serverless Game Analytics New Game Technology Learning
Game Analytics Pipeline Workshop Path

https://aws.amazon.com/solution https://serverless-game- https://aws.amazon.com/blogs/g


s/implementations/game- analytics.workshop.aws/en ametech/new-game-tech-
analytics-pipeline/ learning-path-is-your-training-
walkthrough/

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 147
Demo - Serverless Data Lake

Glue ETL

QuickSight

RAW Data Transformed


(json) (Parquet)

Real-time Data
Streaming into
Serverless Data Lake

Athena
Glue Crawler Glue Data Catalog
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 148
Module 6: APN Partner
Opportunities and Resources
APN Partners and
AWS for Data Analytics

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Discounting and funding programs

Migration POC funding


programs
r pr ise
Ente unt
Disco m
ro g ra
P )
(E D P

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 151
AWS Data and Analytics Competency
categories
Data Analytics Provide a set of integrated tools to solve data
Platforms analytics challenges within a standard framework

Provide highly scalable databases that


NoSQL/New SQL
organize data into a structure

Enable customers to move and consolidate data


Data Integration and
from disparate sources, transform it, and
Preparation
prepare it for analytics

Business Intelligence
Help customers turn raw data into actionable business
(BI) and Data
Visualization
information, such as reporting, dashboards, and data visualization

Data Governance and


Security
Help customers discover, categorize, and control their data

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 152
Best practices after identifying an
opportunity

Register your
Cultivate strong
opportunity Use existing Partner Achieve AWS Data and relationships with
through programs Analytics competency AWS sales teams
APN Partner Central

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 153
Collaboration workflow
Register an
Receive approval Engage AWS
opportunity on Engage Before SA
from AWS PSM account or
APN Partner AWS sales involvement
Partner SA
Central

Build a reference Build and deliver


Conduct a big Validate the Direct SA
solution the live solution
data POC POC involvement

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 154
AWS Professional Services

• Global team of experts


• Collaborate with APN Partners to help customers realize their desired
business outcomes in AWS Cloud
• Reach out to APN Partners when they need additional resources

AWS Professional Services: https://aws.amazon.com/professional-services/

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 155
Data solutions in AWS
Marketplace

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Marketplace

https://aws.amazon.com/marketplace/search/results?searchTerms=data+and+analytics
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 157
Call to action

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build a data analytic practice on AWS

Build packaged Know your Partner Ask for customer


solutions Solutions Architect references

Develop customer Engage with AWS Achieve an APN


workshops service teams competency
© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 159
Call to action

Work with your View the analytics


Use the Data Flywheel Partner team to customer case studies Create a specialized
to perform schedule an service around one of
assessments Immersion Day for https://aws.amazon.com/ the analytics services
big-data/datalakes-and-
your customers
analytics/

Build relationships
Participate in the with APN teams for Prepare for the AWS
AWS Data Lab funding opportunities Data Analytics –
for your marketing Specialty certification
https://aws.amazon.com/
aws-data-lab/
and sales efforts

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. 160
Thank you

© 2022 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections, feedback, or other questions? Contact us at
https://support.aws.amazon.com/#/contacts/aws-training. All trademarks are the property of their owners.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy