0% found this document useful (0 votes)
54 views50 pages

ANT336 Building Data Mesh Architectures On AWS

Uploaded by

ijaved
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views50 pages

ANT336 Building Data Mesh Architectures On AWS

Uploaded by

ijaved
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.

ANT336

Building data mesh architectures


on AWS
Ian Meyers Nivas Shankar Travis Muhlestein
Director, Product Management Principal Product Manager Chief Data & Analytics Officer
AWS Analytics Service AWS Lake Formation GoDaddy

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda

Modern data strategy on AWS


Customer challenges
Why build a data mesh?
Data mesh – Design goals and four core principles
Data mesh architecture on AWS
Why GoDaddy chose a data mesh pattern
How GoDaddy built a data mesh using AWS modern data architecture
Conclusion

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Modern data strategy

Data Analytics
lakes

Catalog People,
Data
apps, and
sources
Governance devices

Machine
learning Databases

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Amazon
DynamoDB EMR

Amazon OpenSearch
Amazon
Service
Aurora
Amazon S3

on AWS
Amazon Amazon
Redshift SageMaker

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Security, durability, availability

Simplicity and ease of use and operations

Price/performance

Data connectivity and integration

Data governance

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer challenges

MORE MORE STRONGER


SOURCES BUSINESS UNITS GOVERNANCE
Data sources increasing Data and analytics is Enterprises need stronger
with the pace of the business table stakes for innovation governance to protect their data so
that it can be used safely
All business units are all-in

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake

Data lake

Sharing data in an
enterprise can be
Data lake
Data lake

challenging Data lake

Data lake

Lines of business have made investments in


cloud-based data lakes and analytics that are Data lake

purpose-built to solve their specific business Data lake


Data lake
problem
These systems are often unique to the type Data lake

of data and the algorithms being applied,


and don’t always translate to other problems Data lake

Data lake

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake
Common data lake challenges we hear from our customers

“It’s difficult to meet all


“Finding the data I need “I just want to get access
requirements across
is too difficult” to the data I need”
differing business units”

“I wish to focus on innovating with “My data science team should easily find “My team needs to own datasets,
data, not on maintaining and the datasets they seek and have the pipelines, and repositories that are
administering a data lake” ability to share them with others” isolated from other teams”

“Current data architecture


“Why doesn’t our organization “If I share data,
treat data as a product?”
is complex and monolithic and
I’ve lost control”
slow to change”

“There is a mismatch between executive “Our internal policies on what can be “I need to create a model to support
leadership goals & business line shared is unclear & there is lack of sharing from both producers and
deliverables, and incentives” incentive to share” consumers of data”

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sharing data in an
enterprise can be
challenging Data lake

It may seem that the only way to


effectively govern data is through
a single, centralized architecture and
technology
This can increase friction to adopt and
reduce velocity of delivering business

Time
outcomes

Delivery velocity

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Centralized data lakes are complex to scale across
business units
ANALYTICS BUSINESS INTELLIGENCE AND MACHINE LEARNING

Amazon Redshift Amazon EMR Amazon Athena AWS Data Exchange QuickSight Amazon SageMaker ML
Data warehousing Hadoop + Spark Interactive analytics Visualizations

Amazon Kinesis Data Analytics Amazon OpenSearch Service


Real time Operational analytics Third-party BI tools

Lake Formation

Access control Data Catalog

DATA LAKE
Amazon S3
data lake storage

DATA MOVEMENT
AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose | Amazon Kinesis Data Streams M anaged Streaming for Kafka

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sharing data in an enterprise can be challenging

Often data is being shared


(and maybe reshared), but in
an ad hoc, ungoverned
way based on
team connections
This can increase risk
associated with protecting
sensitive data

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sharing data in an enterprise can be challenging
O R G A N I Z AT I O N A L I N C E N T I V E S A R O U N D D ATA S H A R I N G C A N B E M I S A L I G N E D

FOR SALE
“Everyone wants to be a consumer,
FRESH
no one wants to be a producer” QUALITY DATA

BUY! BUY! BUY! BUY! BUY! BUY!

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why data mesh? Data domain Data mesh

Use existing investments in data platforms


and treat them as independent “domains” Amazon S3

Improve data governance by pushing policy AWS Glue Lake Amazon


down into data domains Data Catalog Formation Redshift

Provides a clear mechanism for centralized Resource


data discovery link

Provides self-service data sharing features Control


tower Self-service provisioning
to allow domain owners to grant access Lake
Formation
Data
governance
Resource
sharing
to consumers policies

Measure and invest in data products based AWS AWS CloudFormation Service CloudFormation
Console SDK
on usage and business value catalog template library

Organizations service
control policies

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh:
Design goals and
four key principles

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh: Four core principles

Data owner Data steward


Data domain ownership Federated computational governance
A data mesh features data domains as nodes, which Federated data governance is how data products are
exist in data lake accounts; it is founded in shared – delivering discoverable metadata
decentralization and distribution of data auditability based on federated decision-making and
responsibility to people closest to the data accountability structures

Data engineer Data consumer


Data as a product Self-serve sharing
A data producer contributes one or more The platform streamlines the experience of data
data products to a central catalog in a data mesh users to discover, access, and use data products; it
account; DaaP must be autonomous, discoverable, streamlines the experience of
secure, and correct, and useable data providers to build, deploy, and maintain data
products
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh design goals

Enable organizations to get value from


data at scale
Create a business-oriented data products that
can support the top strategic goals
Allow business domains federated governance
through lightweight
centralized policy by removing bottlenecks
Encourage data-driven agility
Support the sharing of data products, with the
goal of delighting the experience of data users

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Modern data architecture consists of five parts

Scalable data lake


and warehouses Non-relational Big data
databases processing

Purpose-built data services

Log Relational
Data sharing analytics databases
Data domain

Unified governance

Data Machine
Data discovery warehousing learning

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh architecture
D E C E N T R A L I Z E D , L I G H T W E I G H T F E D E R AT E D G O V E R N A N C E A C R O S S
D O M A I N - O R I E N T E D D ATA S Y S T E M S TO D R I V E G O V E R N E D S H A R I N G

CONSUMER 1 CONSUMER 2 …. CONSUMER N

Domain Domain Domain


Amazon Amazon Amazon Amazon Amazon Notebook Amazon Amazon Amazon
Redshift EMR Athena QuickSight Athena Redshift EMR SageMaker

Unified policy management Federated governance

Centralized governance & audit

Federated access control


Resource Data Data DataZone
Organization-wide sharing share projects Products

Amazon Simple Amazon Amazon Simple Amazon


Domain
Storage Service (S3) Redshift Domain Storage Service (S3) Redshift
Data Share Data Share

PRODUCER 1 …. PRODUCER N

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh principle #1: Data domain ownership
D ATA O W N E R S A R E A C C O U N TA B L E F O R T H E I R D ATA P R O D U C T S TO B E R E L I A B L E ,
AVA I L A B L E , A N D A C C U R AT E

Data Accountability for data domain and consumption of data products


owner
Create Store Share
Owns data created or transformed in Owns data stored in Owns how my organization
my organization my organization uses the data

Register and catalog data assets


Organizational unit
AWS (lines of business)
Organizations Data lake Data marketplace Data warehouse
Protect and secure organization data
Bounded context Bounded context
General
LOB 1
Reference Data
Manage data quality Domain Bounded Context
Country

Product 1 Product 2 Currency Geography


LOB 2
Products
Domain and services

Maintain ease of use and


Producer Analytical Operational Business unit Customer

account
atomic integrity
account account

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh principle #2: Data as a product
D O M A I N - D R I V E N D E S I G N T E C H N I Q U E S TO F O R M U L AT E A N D E S TA B L I S H B O U N D E D C O N T E X T S F O R
D ATA P R O D U C T S

Federated data governance


Natively accessible
Data Catalog
Discoverable & addressable
AWS Lake
Data attributes
Policy control
Data engineer
Trustworthy & truthful
Formation Data permissions
› Each team manages their data
Data domain and organizes it as data products
Data registry
› Each product provides an interface(s) to
Interoperable
allow others to interact (e.g., APIs, SQL,
AWS Glue Data Data Data
reports)
lake warehouse marketplace Data lifecycle
› Remove usability frictions, meet
Operationalize
the user where they are
data quality
› Provide all supporting
Amazon EMR

metadata, lineage
› Data products are valuable
on their own
Devices Web Sensors Social

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh principle #3: Self-serve sharing
E C O S Y S T E M O F S E L F - S E R V E D ATA I N F R A S T R U C T U R E W I T H O P E N P R O TO C O L S

Data consumer

Data consumer
Amazon Athena AI/ML Amazon Redshift Amazon QuickSight
› Design for generalist majority
(i.e., make it easy to use and adopt
Authorize principal Query from client services with no specialist skills needed)
Federated data governance Data Catalog › Enable personas to discover, learn, understand,
Data attributes consume, and maintain data products
Policy control
AWS Lake
Formation Data permissions › Collection of interoperable data products, which
enable cross-functional domains to produce and
consume data easily and with autonomy and will
Data domain allow it to scale
› Data products must include data, metadata,
Data Data Data
code, and policy all as single
lake warehouse marketplace unit of value
› Abstract complexity through automation

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh principle #4:
Federated data governance
G O V E R N A N C E M O D E L T H AT E M B R A C E S D E C E N T R A L I Z AT I O N A N D D O M A I N S E L F - S O V E R E I G N T Y T H R O U G H D E C I S I O N -
M A K I N G M O D E L L E D BY F E D E R AT I O N O F D ATA P R O D U C T O W N E R S

Federated access control to data owners

Data domain Data domain A Data consumer


owner A Data engineer
Data domain B
DataZone

Data domain Consumer Y


owner B Data domain N Security credentials Access auditing Entitlements

Data stewards PCI/SOX auditors


(organization)

› Decentralization implementation of governance team and › They create global policies and standardization to
standard authorization achieve interoperability
› Governance team = a guild consisting of representatives › Automated execution of policies by the data domains
of all teams taking part in the data mesh (e.g., data classification and privacy, compliance, security,
documentation, and interoperability)
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh architecture pattern: Data lake data
products sharing
Data domain organization N Data domain organization N
Data domain organization 1 Central organization – Data domain organization 1
federated governance account
AWS Data Amazon Amazon
1 Register data location Exchange Athena Redshift
9 Share assets collections
Data lake
AWS Data Exchange

8 Subscribe and query shared data products

AWS Lake Data


Data
5 Transform & enrich Formation Products
projects
Populate AWS Lake AWS Lake
Data Catalog metadata Formation Formation
Table Data Catalog Data Catalog
column 2 Grant Table Table
AWS Glue Amazon 6 Grant share tables
EMR tags share column
column
tables tags tags
3 Create local database
Central catalog updated regularly 7 Grant consumer permission
4 Create table resource link from each data domain

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh architecture pattern: Data lake and data
warehouse data product sharing
Data domain organization N Data domain organization N
Data domain organization 1 Central organization – Data domain organization 1
federated governance account
1 Register data location
Data lake Data Catalog
Amazon Data
2 Register data share tables Redshift share

Data share

4 Transform & enrich Data Data 6 Query shared dataset


projects Products
AWS Lake Formation AWS Lake AWS Lake
Data Catalog Formation Formation
Table Data Catalog Data Catalog
column Table Table
tags 3 Grant data share tables 5 Grant data share tables
column column
tags tags

Populate
metadata

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GoDaddy’s journey with AWS

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our vision
is to radically shift the global economy toward
life-fulfilling entrepreneurial ventures

Our mission
is to empower entrepreneurs everywhere,
making opportunity more inclusive for all

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our strategy
We champion everyday entrepreneurs by
empowering them with sage guidance set in
seamlessly intuitive experiences
to securely name, create, and grow their ventures in
select markets; leveraging the exponential power of
our community at global scale to deliver profitable
revenue growth

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
At GoDaddy our goal is
Marketplaces and social media
to partner with our
Messaging commerce
customers at every Digital identity
Payments
point on this wheel
Domain

Logo

Email
Connected commerce
Bio site
Create posts
Website
Online store
Hosting and security
Physical store

Offline point of sale Ubiquitous presence


Customer relationship management

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The largest domain registrar 21M+ 84M+ 15M+
human-guided
customers domains
in the world moments*

One of the world’s


top WordPress
85% 12% 65+
hosting brands Care
customer website
net promoter score
One of the top-branded retention share
(NPS)
email providers
in the world
55 10M+ $26B
gross merchandise
global sites mailboxes
volume (GMV)

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GoDaddy & AWS – 5 years of strategic collaboration

2018 2019 2020 2021 2022 2023


GoDaddy selects AWS Public cloud GoDaddy on-premises Neustar migration Websites & marketing Amazon registrar
for its scale, portal launches Hadoop migration to AWS migration begins plans to migrate
performance, begins 1 million domains
and acceleration 130+ teams and 800 Amazon registrar Migrated from to GoDaddy
developers migrate to 150 teams & 97 migrates 113K Hadoop to AWS
App services AWS prod workloads domains to GD Migration of 100K
established & on AWS 300+ teams, 165 prod customer website
first TLZs 100% new product MAIT Team workloads, and 651 builder sites into AWS
development on AWS Amazon Prime established projects on AWS
27 teams onboarded (7/2019) collaboration
to AWS (on goal of 7) with websites Security scans,
and marketing golden AMIs,
node rotation
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why GoDaddy chose
a data mesh pattern

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh:
Can a data mesh create customer clarity?
Multi-domain – insights & sharing

Preparation
Discovery

+
Augmentation
Data
Interaction +
governance
Exploration
=

Data consumer
Data discovery Unified data experience Data sharing

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-domain federated customer actions

Care
Cohort
First-time user

Learn

milestone use
First milestone use:
10,480

First

Churn risk
Manage products:
Customer

Manage products
11,202
B

High
use
Active users
Entitlements: GoDaddy Payments,

Product engagement
W+M, 0ffice365
Upcoming renewal: Web+Marketing

Low
use
Upcoming renewal date: 10/2/2023
Autopay: On

User conversion
C
Bot

SERP

Purchased
Repeat visitor

Product engagement:
D 20,042

cross-sell
Upsell/
Shop

Product engagement
Top prod name: WAM

Setup
Total active time: 14hrs 22mins
Websites published: 23
Websites updated: 204
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Upsell/
Care Churn risk User conversion
cross-sell
First High Low
Cohort Other 1 Other 2 Purchased Setup
milestone use use use
Cart Chat Forums Product engagement Renew Signup
Multi-domain federated customer actions

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn Manage products SERP Shop
First-time user Customer Bot Repeat visitor

D
A

C
GoDaddy data mesh – Customer layers

Data owner
Data domain ownership CUSTOMER TIERS

Multi-domain products
Data engineer Visiting user (tier 0)
Data as a product Prospective customer (tier 1)
Highly or lowly engaged (tier 2)
Conversion (tier 3)
Data steward Account (tier 4)
Data governance council High-value account (tier 5)

Data consumer
Self-serve sharing
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How GoDaddy built a
data mesh using AWS
modern data architecture

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data owner
Data mesh
Data domain ownership

Producer
domains

Utility
Systems of record

domains
1st party
3rd party
Services Data ingress
Products

S3

Self-
service*

AWS native and third-party services

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh Data consumer
Self-serve sharing

Producer Consumer
domains domains

Utility
Systems of record

domains

1st party
Data products
3rd party
Services Data ingress Data egress
Business insights
Products

S3

Self- Self-
service* service*

AWS native and third-party services

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh Data engineer
Data as a product

Producer Consumer
domains domains

Utility
Data processing
Systems of record

domains (streaming & batch)


1st party
Data products
3rd party
Services Data ingress Data egress
Business insights
Products S3

Data lake
(storage & metadata)

Self- Self-
service* service*

AWS native and third-party services


Spark, Tecton, Tableau Cloud,
Amazon Amazon Amazon SageMaker AWS Glue AWS Lake
Redshift EMR Athena Alation Searchability (Catalog) Formation

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh Data steward
Data governance council

Producer Consumer
domains domains

Utility
Data processing
Systems of record

domains (streaming & batch)


1st party
Data products
3rd party
Services Data ingress Data egress
Business insights
Products S3

Data lake
(storage & metadata)

Self- Enhanced
governance Data governance* and the data interfaces Self-
service* service*

AWS native and third-party services


Spark, Tecton, Tableau Cloud,
Amazon Amazon Amazon SageMaker AWS Glue AWS Lake
Redshift EMR Athena Alation Searchability (Catalog) Formation

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh
Utility Self-service, modern cloud data platform –
domains
delivers reliable, secure, near real-time customer 360 data

Producer
Notebook
Machine learning, data tooling, Consumer
domains domains
and experimentation

Utility
Data processing
Processing
Systems of record

domains (Streaming
(streaming & batch)
Batch)
1st party
Data products
3rd party
Services Data ingress Data egress
Business insights
Products S3

Data lake
(storage & metadata)

Self- Enhanced
governance Data governance* and the data interfaces Self-
service* service*

AWS native and third-party services


Spark, Tecton, Tableau Cloud,
Amazon Amazon Amazon SageMaker AWS Glue AWS Lake
Redshift EMR Athena Alation Searchability (Catalog) Formation

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh
Utility Self-service, modern cloud data platform –
domains
delivers reliable, secure, near real-time customer 360 data

Producer Machine learning, data tooling, Consumer


domains
Notebook
3 domains
and experimentation

1 Utility
Data processing
Processing
Systems of Record

domains (Streaming
(streaming & batch)
Batch)
1st party
Data products
3rd party
Services Data ingress Data egress 5
Business insights
Products S3

Data lake
2
(storage & metadata)
4
Self- Enhanced
governance Data governance* and the data interfaces Self-
service* service*

AWS native and third-party services


Spark, Tecton, Tableau Cloud,
Amazon Amazon Amazon SageMaker AWS Glue AWS Lake
Redshift EMR Athena Alation Searchability (Catalog) Formation

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh

Producer
domains

Producers instrument DOMAIN* telemetry,


1 share data via data platform
Utility
Systems of record

domains
1st party
3rd party
Services Data ingress
Products

2 Producers register
DOMAIN* data in catalog
Self-
service*

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh
Utility Self-service, modern cloud data platform –
domains
delivers reliable, secure, near real-time customer 360 data

Notebook
Machine learning, data tooling,
and experimentation
Data processing
(streaming & batch)

Inferencing* request
access table (DB-API)

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh

Data governance council enforces policies 4

Enhanced
governance Data governance* and the data interfaces

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data mesh Consumer
Accounts

Data products
Consumers use shared Data
5
DOMAIN* data egress Business insights

Self
Service*

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GoDaddy
conceptual/
domain
architecture

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Business outcomes

• Created hierarchical views of data products at different levels so that business


users can analyze information to make quicker business decisions
• Automated access management framework to enable self-served access
to data within and across lines of business
• Accelerated the data platform adoption to 10+ LOBs and 300+ teams globally, with more
to come in the future
• Enabled data scientists to find and access data needed to generate ML models across
LOBs
• Achieved 10s of millions of dollars in cost savings from data reuse
and better management of purchased datasets

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

Please complete the session survey


in the mobile app

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy