Aws Storage Options
Aws Storage Options
Aws Storage Options
AWS Cloud
October 2013
© 2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Notices
This document is provided for informational purposes only. It represents AWS’s
current product offerings and practices as of the date of issue of this document,
which are subject to change without notice. Customers are responsible for
making their own independent assessment of the information in this document
and any use of AWS’s products or services, each of which is provided “as is”
without warranty of any kind, whether express or implied. This document does
not create any warranties, representations, contractual commitments,
conditions or assurances from AWS, its affiliates, suppliers or licensors. The
responsibilities and liabilities of AWS to its customers are controlled by AWS
agreements, and this document is not part of, nor does it modify, any agreement
between AWS and its customers.
Contents
Introduction 1
Traditional vs. Cloud-Based Storage Alternatives 1
Amazon Simple Storage Service (Amazon S3) 3
Amazon Glacier 7
Amazon Elastic Block Store (Amazon EBS) Volumes 10
Amazon EC2 Instance Store Volumes 15
AWS Import/Export 20
AWS Storage Gateway 22
Amazon CloudFront 25
Amazon Simple Queue Service (Amazon SQS) 28
Amazon Relational Database Service (Amazon RDS) 31
Amazon DynamoDB 35
Amazon ElastiCache 38
Amazon Redshift 41
Databases on Amazon EC2 44
Contributors 47
Further Reading 47
Cloud Storage Use Cases 47
AWS Storage Services 47
Other Resources 48
Amazon Web Services – Storage Options in the AWS Cloud
Introduction
Amazon Web Services (AWS) is a flexible, cost-effective, easy-to-use cloud
computing platform. This whitepaper helps architects and developers
understand the primary data storage options available in the AWS cloud. We
provide an overview of each storage option, describe ideal usage patterns,
performance, durability and availability, cost model, scalability and elasticity,
interfaces, and anti-patterns.
Page 1 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Amazon EC2 Instance Temporary block storage volumes for Amazon EC2 virtual
Storage machines
Page 2 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Databases on Amazon
Self-managed database on an Amazon EC2 instance
EC2
For additional comparison categories among the AWS storage collection, see the
AWS Storage Quick Reference.
Page 3 of 50
Amazon Web Services – Storage Options in the AWS Cloud
can serve as an origin store for a content delivery network (CDN), such as
Amazon CloudFront. Because of Amazon S3’s elasticity, it works particularly
well for hosting web content with extremely spiky bandwidth demands. Also,
because no storage provisioning is required, Amazon S3 works well for fast
growing websites hosting data intensive, user-generated content, such as video
and photo sharing sites.
Amazon S3 is also commonly used as a data store for computation and large-
scale analytics, such as analyzing financial transactions, clickstream analytics,
and media transcoding. Because of the horizontal scalability of Amazon S3, you
can access your data from multiple computing nodes concurrently without
being constrained by a single connection.
Performance
Access to Amazon S3 from within Amazon EC2 in the same region is fast.
Amazon S3 is designed so that server-side latencies are insignificant relative to
Internet latencies. Amazon S3 is also built to scale storage, requests, and users to
support a virtually unlimited number of web-scale applications. If you access
Amazon S3 using multiple threads, multiple applications, or multiple clients
concurrently, total Amazon S3 aggregate throughput will typically scale to rates
that far exceed what any single server can generate or consume.
Page 4 of 50
Amazon Web Services – Storage Options in the AWS Cloud
metadata (e.g., object name, size, keywords, and so on). Metadata in the database
can easily be indexed and queried, making it very efficient to locate an object’s
reference via a database query. This result can then be used to pinpoint and
then retrieve the object itself from Amazon S3.
For noncritical data that can be reproduced easily if needed, such as transcoded
media or image thumbnails, you can use the Reduced Redundancy Storage
(RRS) option in Amazon S3, which provides a lower level of durability at a lower
storage cost. Objects stored using the RRS option have less redundancy than
objects stored using standard Amazon S3 storage. In either case, your data is
still stored on multiple devices in multiple locations. RRS is designed to provide
99.99% durability per object over a given year. While RRS is less durable than
standard Amazon S3, it is still designed to provide 400 times more durability
than a typical disk drive.
Cost Model
With Amazon S3, you pay only for what you use and there is no minimum fee.
Amazon S3 has three pricing components: storage (per GB per month), data
transfer in or out (per GB per month), and requests (per n thousand requests per
month). For new customers, AWS provides a free usage tier which includes up to
Page 5 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Interfaces
Amazon S3 provides standards-based REST and SOAP web services APIs for
both management and data operations. These APIs allow Amazon S3 objects
(files) to be stored in uniquely-named buckets (top-level folders). Each object
must have a unique object key (file name) that serves as an identifier for the
object within that bucket. While Amazon S3 is a web-based object store rather
than a traditional file system, you can easily emulate a file system hierarchy
(folder1/folder2/file) in Amazon S3 by creating object key names that
correspond to the full path name of each file.
Page 6 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Anti-Patterns
Amazon S3 is optimal for storing numerous classes of information that are
relatively static and benefit from its durability, availability, and elasticity
features. However, in a number of situations Amazon S3 is not the optimal
solution. Amazon S3 has the following anti-patterns:
Amazon Glacier
Amazon Glacier is an extremely low-cost storage service that provides highly
secure, durable, and flexible storage for data backup and archival.5 With
Amazon Glacier, customers can reliably store their data for as little as $0.01 per
gigabyte per month. Amazon Glacier enables customers to offload the
administrative burdens of operating and scaling storage to AWS, so that they
don’t have to worry about capacity planning, hardware provisioning, data
replication, hardware failure detection and repair, or time-consuming hardware
migrations.
Page 7 of 50
Amazon Web Services – Storage Options in the AWS Cloud
You store data in Amazon Glacier as archives. An archive can represent a single
file or you may choose to combine several files to be uploaded as a single
archive. Retrieving archives from Amazon Glacier requires the initiation of a job.
You organize your archives in vaults. You can control access to your vaults using
the AWS Identity and Access Management (IAM) service.
Amazon Glacier is designed for use with other Amazon Web Services. Amazon
S3 allows you to seamlessly move data between Amazon S3 and Amazon Glacier
using data lifecycle policies. You can also use AWS Import/Export to accelerate
moving large amounts of data into Amazon Glacier using portable storage
devices for transport.
Performance
Amazon Glacier is a low-cost storage service designed to store data that is
infrequently accessed and long lived. Amazon Glacier jobs typically complete in
3 to 5 hours.
Cost Model
With Amazon Glacier, you pay only for what you use and there is no minimum
fee. In normal use, Amazon Glacier has three pricing components: storage (per
GB per month), data transfer out (per GB per month), and requests (per
thousand UPLOAD and RETRIEVAL requests per month).
Page 8 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Note that Amazon Glacier is designed with the expectation that retrievals are
infrequent and unusual, and data will be stored for extended periods of time.
You can retrieve up to 5% of your average monthly storage (pro-rated daily) for
free each month. If you choose to retrieve more than this amount of data in a
month, you are charged an additional (per GB) retrieval fee. There is also a pro-
rated charge (per GB) for items deleted prior to 90 days.
Interfaces
There are two ways to use Amazon Glacier, each with its own set of interfaces.
The Amazon Glacier APIs provide both management and data operations.
Page 9 of 50
Amazon Web Services – Storage Options in the AWS Cloud
information on how to use Amazon Glacier from Amazon S3, see the Object
Lifecycle Management section of the Amazon S3 Developer Guide.8
Note that when using Amazon Glacier as a storage class in Amazon S3, you use
the Amazon S3 APIs, and when using “native” Amazon Glacier, you use the
Amazon Glacier APIs. Objects archived to Amazon Glacier via Amazon S3 can
only be listed and retrieved via the Amazon S3 APIs or the AWS Management
Console—they are not visible as archives in an Amazon Glacier vault.
Anti-Patterns
Amazon Glacier has the following anti-patterns:
Page 10 of 50
Amazon Web Services – Storage Options in the AWS Cloud
durability. The same snapshot can be used to instantiate as many volumes as you
wish. These snapshots can be copied across AWS regions, making it easier to
leverage multiple AWS regions for geographical expansion, data center migration
and disaster recovery. Sizes for Amazon EBS volumes range from 1 GB to 1 TB,
and are allocated in 1 GB increments.
Performance
Amazon EBS provides two volume types: standard volumes and Provisioned
IOPS volumes. They differ in performance characteristics and pricing model,
allowing you to tailor your storage performance and cost to the needs of your
applications. You can attach and stripe across multiple volumes of either type to
increase the I/O performance available to your Amazon EC2 applications.
Standard volumes offer cost effective storage for applications with moderate or
bursty I/O requirements. Standard volumes are designed to deliver
approximately 100 input/output operations per second (IOPS) on average with a
best effort ability to burst to hundreds of IOPS. Standard volumes are also well
suited for use as boot volumes, where the burst capability provides fast instance
start-up times.
Because Amazon EBS volumes are network-attached devices, other network I/O
performed by the instance, as well as the total load on the shared network, can
Page 11 of 50
Amazon Web Services – Storage Options in the AWS Cloud
affect individual Amazon EBS volume performance. To enable your Amazon EC2
instances to fully utilize the Provisioned IOPS on an Amazon EBS volume, you
can launch selected Amazon EC2 instance types as Amazon EBS-optimized
instances. Amazon EBS-optimized instances deliver dedicated throughput
between Amazon EC2 and Amazon EBS, with options between 500 Mbps and
1,000 Mbps depending on the instance type used. When attached to Amazon
EBS-optimized instances, Provisioned IOPS volumes are designed to deliver
within 10% of the Provisioned IOPS performance 99.9% of the time.
The combination of Amazon EC2 and Amazon EBS enables you to use many of
the same disk performance optimization techniques that you would use with on-
premises servers and storage. For example, by attaching multiple Amazon EBS
volumes to a single Amazon EC2 instance, you can partition the total application
I/O load by allocating one volume for database log data, one or more volumes
for database file storage, and other volumes for file system data. Each separate
Amazon EBS volume can be configured as Amazon EBS standard or Amazon
EBS Provisioned IOPS as needed.
To maximize both durability and availability of their Amazon EBS data, you
should create snapshots of your Amazon EBS volumes frequently. (For data
Page 12 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Cost Model
With Amazon EBS, you pay only for what you use. Amazon EBS pricing has three
components: provisioned storage, I/O requests, and snapshot storage. Amazon
EBS standard volumes are charged per GB-month of provisioned storage and
per million I/O requests. Amazon EBS Provisioned IOPS volumes are charged
per GB-month of provisioned storage and per Provisioned IOPS-month. For
both volume types, Amazon EBS snapshots are charged per GB-month of data
stored. Amazon EBS snapshot copy is charged for the data transferred between
regions, and for the standard Amazon EBS snapshot charges in the destination
region. It’s important to remember that for Amazon EBS volumes, you are
charged for provisioned (allocated) storage, whether or not you actually use it.
For Amazon EBS snapshots, you are charged only for storage actually used
(consumed). Note that Amazon EBS snapshots are incremental and compressed,
so the storage used in any snapshot is generally much less than the storage
consumed on an Amazon EBS volume.
Note that there is no charge for transferring information among the various AWS
storage offerings (i.e., Amazon EC2 instance with Amazon EBS, Amazon S3,
Amazon RDS, and so on) as long as they are within the same AWS region.
Pricing information for Amazon EBS can be found at Am azo n EC2 Pricing.10
Page 13 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Interfaces
Amazon offers management APIs for Amazon EBS in both SOAP and REST
formats. These are used to create, delete, describe, attach, and detach Amazon
EBS volumes for your Amazon EC2 instances; to create, delete, and describe
snapshots from Amazon EBS to Amazon S3; and to copy snapshots from one
region to another. If you prefer to work with a graphical tool, the AWS
Management Console gives you all the capabilities of the API in an easy-to-use
browser interface. Regardless of how you create your Amazon EBS volume, note
that all storage is allocated at the time of volume creation, and that you are
charged for this allocated storage even if you don’t write data to it.
There is no AWS data API for Amazon EBS. Instead, Amazon EBS presents a
block-device interface to the Amazon EC2 instance. That is, to the Amazon EC2
instance, an Amazon EBS volume appears just like a local disk drive. To write to
Page 14 of 50
Amazon Web Services – Storage Options in the AWS Cloud
and read data from Amazon EBS volumes, you therefore use the native file
system I/O interfaces of your chosen operating system.
Anti-Patterns
As described previously, Amazon EBS is ideal for information that needs to be
persisted beyond the life of a single Amazon EC2 instance. However, in certain
situations other AWS storage options may be more appropriate. Amazon EBS
has the following anti-patterns:
• Temporary storage—If you are using Amazon EBS for temporary storage
(such as scratch disks, buffers, queues, and caches), consider using local
instance store volumes, Amazon SQS, or ElastiCache (Memcached or
Redis).
• Static data or web content—If your data doesn’t change that often,
Amazon S3 may represent a more cost-effective and scalable solution for
storing this fixed information. Also, web content served out of Amazon
EBS requires a web server running on Amazon EC2, while you can
deliver web content directly out of Amazon S3.
Page 15 of 50
Amazon Web Services – Storage Options in the AWS Cloud
High I/O and high storage provide Amazon EC2 instance storage targeted to
specific use cases. High I/O instances provide instance store volumes backed by
SSD, and are ideally suited for many high performance database workloads.
Example applications include NoSQL databases like Cassandra and MongoDB.
High storage instances support much higher storage density per Amazon EC2
instance, and are ideally suited for applications that benefit from high sequential
I/O performance across very large datasets. Example applications include data
warehouses, Hadoop storage nodes, seismic analysis, cluster file systems, etc.
Note that applications using instance storage for persistent data generally
provide data durability through replication, or by periodically copying data to
durable storage.
Performance
The non-SSD-based instance store volumes in most Amazon EC2 instance
families have performance characteristics similar to standard Amazon EBS
volumes. Because the Amazon EC2 instance virtual machine and the local
instance store volumes are located in the same physical server, interaction with
Page 16 of 50
Amazon Web Services – Storage Options in the AWS Cloud
this storage is very fast, particularly for sequential access. To increase aggregate
IOPS, or to improve sequential disk throughput, multiple instance store volumes
can be grouped together using RAID 0 (disk striping) software. Because the
bandwidth to the disks is not limited by the network, aggregate sequential
throughput for multiple instance volumes can be higher than for the same
number of Amazon EBS volumes.
The SSD instance store volumes in the Amazon EC2 high I/O instances provide
from tens of thousands to hundreds of thousands of low-latency, random 4 KB
random IOPS. Because of the I/O characteristics of SSD devices, write
performance can be variable. For more information, see High I/O Instances in
the Amazon EC2 User Guide.13
The instance store volumes on Amazon EC2 high storage instances provide very
high storage density and high sequential read and write performance. High
storage instances are capable of delivering 2.6 GB/sec of sequential read and
write performance when using a block size of 2 MB. For more information, see
High Storage Instances in the Amazon EC2 User Guide.14
You should not use local instance store volumes for any data that must persist
over time, such as permanent file or database storage, without providing for data
persistence by replicating your data, or by periodically copying data to durable
storage such as Amazon EBS or Amazon S3. Note that this also applies to the
special-purpose SSD and high-density instance store volumes in the high I/O and
high storage instance types.
Cost Model
The cost of the Amazon EC2 instance includes any local instance store volumes,
if the instance type provides them. While there is no additional charge for data
Page 17 of 50
Amazon Web Services – Storage Options in the AWS Cloud
storage on local instance store volumes, note that data transferred to and from
Amazon EC2 instance store volumes from other Availability Zones or outside of
an Amazon EC2 region may incur data transfer charges, and additional charges
will apply for use of any persistent storage, such as Amazon S3, Amazon Glacier,
Amazon EBS volumes, and Amazon EBS snapshots. Pricing information for
Amazon EC2, Amazon EBS, and data transfer can be found at Amazon EC2
Pricing.16
While you can’t increase or decrease the number of instance store volumes on a
single Amazon EC2 instance, this storage is still scalable and elastic, in that you
can scale the total amount of instance store up or down by increasing or
decreasing the number of running Amazon EC2 instances.
Local instance store volumes are tied to a particular Amazon EC2 instance, and
are fixed in number and size for a given Amazon EC2 instance type, so the
scalability and elasticity of this storage is tied to the number of Amazon EC2
instances.
However, you can achieve full storage elasticity by including one of the other
suitable storage options, such as Amazon S3 or Amazon EBS, in your Amazon
EC2 storage strategy.
Interfaces
There is no separate management API for Amazon EC2 instance store volumes.
Instead, instance store volumes are specified using the block device mapping
feature of the Amazon EC2 API and the AWS Management Console. You cannot
create or destroy instance store volumes, but you can control whether or not they
are exposed to the Amazon EC2 instance, and what device name is used.
There is also no separate data API for instance store volumes. Just like Amazon
EBS volumes, instance store volumes present a block-device interface to the
Amazon EC2 instance. That is, to the Amazon EC2 instance, an instance store
volume appears just like a local disk drive. To write to and read data from
Page 18 of 50
Amazon Web Services – Storage Options in the AWS Cloud
instance store volumes, you therefore use the native file system I/O interfaces of
your chosen operating system.
Note that in some cases, a local instance store volume device will be attached to
the Amazon EC2 instance upon launch, but must be formatted with an
appropriate file system and mounted before use. Also, keep careful track of your
block device mappings. There is no simple way for an application running on an
Amazon EC2 instance to determine which block device is an instance store
(ephemeral) volume and which is an Amazon EBS (persistent) volume.
Anti-Patterns
Amazon EC2 local instance store volumes are fast, free (that is, included in the
price of the Amazon EC2 instance) “scratch volumes” best suited for storing
temporary data that can easily be regenerated, or data that is replicated for
durability. In many situations, however, other AWS storage options may be more
appropriate. Amazon EC2 instance store volumes have the following anti-
patterns:
Page 19 of 50
Amazon Web Services – Storage Options in the AWS Cloud
AWS Import/Export
AWS Import/Export accelerates moving large amounts of data into and out of
AWS using portable storage devices for transport. 17 AWS transfers your data
directly onto and off of storage devices using Amazon’s high-speed internal
network and bypassing the Internet. For significant datasets, AWS
Import/Export is often faster than Internet transfer and more cost effective
than upgrading your connectivity.
AWS Import/Export supports importing and exporting data into and out of
several types of AWS storage, including Amazon EBS snapshots, Amazon S3
buckets, and Amazon Glacier vaults.
Performance
Each AWS Import/Export station is capable of loading data at over 100 MB per
second, but in most cases the rate of the data load will be bounded by a
combination of the read or write speed of your portable storage device and, for
Amazon S3 data loads, the average object (file) size.
Page 20 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Cost Model
With AWS Import/Export, you pay only for what you use. AWS Import/Export
has three pricing components: a per-device fee, a data load time charge (per
data-loading-hour), and possible return shipping charges (for expedited
shipping, or shipping to destinations not local to that AWS Import/Export
region). For the destination storage, the standard Amazon EBS snapshot,
Amazon S3, and Amazon Glacier request and storage pricing applies. Pricing
information can be found at AWS Import/Export Pricing.18
The aggregate total amount of data that can be imported is virtually unlimited.
Interfaces
To upload or download data, you must create and submit an AWS
Import/Export job for each storage device shipped. Each job request requires a
manifest file, a YAML-formatted text file that contains a set of key-value pairs
that supply the required information—such as your device ID, secret access key,
and return address—necessary to complete the job.
Jobs can be created using a command line tool (the AWS Import/Export Web
Service Tool), the AWS SDK for Java, or a native REST API. The job request is
tied to the storage device through a signature file in the root directory (for
Amazon S3 import jobs), or by a barcode taped to the device (for Amazon EBS
and Amazon Glacier jobs).
Anti-Patterns
AWS Import/Export is optimal for large data that would take too long to load
over the Internet, so the anti-pattern is simply data that is more easily
transferred over the Internet. If your data can be transferred over the Internet in
less than one week, AWS Import/Export may not be the ideal solution.
Page 21 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Gateway-cached volumes allow you to utilize Amazon S3 for your primary data,
while retaining some portion of it locally in a cache for frequently accessed data.
These volumes minimize the need to scale your on-premises storage
infrastructure, while still providing your applications with low-latency access to
their frequently accessed data. You can create storage volumes up to 32 TBs in
size and mount them as iSCSI devices from your on-premises application
servers. Data written to these volumes is stored in Amazon S3, with only a cache
of recently written and recently read data stored locally on your on-premises
storage hardware.
Page 22 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Performance
As the AWS Storage Gateway VM sits between your application, Amazon S3, and
underlying on-premises storage, the performance you experience will be
dependent upon a number of factors, including the speed and configuration of
your underlying local disks, the network bandwidth between your iSCSI
initiator and gateway VM, the amount of local storage allocated to the gateway
VM, and the bandwidth between the gateway VM and Amazon S3. For gateway-
cached volumes, to provide low-latency read access to your on-premises
applications, it’s important that you provide enough local cache storage to store
your recently accessed data. Our AWS Storage Gateway documentation provides
guidance on how to optimize your environment setup for best performance,
including how to properly size your local storage.20
AWS Storage Gateway efficiently uses your Internet bandwidth to speed up the
upload of your on-premises application data to AWS. AWS Storage Gateway
only uploads data that has changed, which minimizes the amount of data sent
over the Internet. You can also use AWS Direct Connect to further increase
throughput and reduce your network costs by establishing a dedicated network
connection between your on-premises gateway and AWS.21
Cost Model
With AWS Storage Gateway, you pay only for what you use. AWS Storage
Gateway has four pricing components: gateway usage (per gateway per month),
snapshot storage usage (per GB per month), volume storage usage (per GB per
Page 23 of 50
Amazon Web Services – Storage Options in the AWS Cloud
month), and data transfer out (per GB per month). Pricing information can be
found at AWS Storage Gateway Pricing.22
Interfaces
The AWS Management Console can be used to download the AWS Storage
Gateway VM image. You can then select between a gateway-cached or gateway-
stored configuration, activate your on-premises by associating your gateway’s IP
Address with your AWS account, select an AWS region, and create AWS Storage
Gateway volumes and attach these volumes as iSCSI devices to your on-
premises application servers.
You can begin using the AWS Storage Gateway in just a few steps. To get started,
you simply:
Page 24 of 50
Amazon Web Services – Storage Options in the AWS Cloud
storage will be used for your frequently accessed data. For gateway-stored
configurations, it will be used for your primary data.
4. Activate your on-premises by associating your gateway’s IP Address with
your AWS account and select an AWS region for your gateway to store
uploaded data.
5. Use the AWS Management Console to create AWS Storage Gateway
volumes and attach these volumes as iSCSI devices to your on-premises
application servers.
By following these steps, you can begin using your existing on-premises
applications to seamlessly store data in Amazon S3. These applications can now
write data to their attached AWS Storage Gateway volumes. Your application data
will either be stored directly in Amazon S3 for substantial cost savings on
primary storage (gateway-cached volumes), or will be stored locally and backed
up to Amazon S3 for durable and cost-effective backups (gateway-stored
volumes).
Anti-Patterns
AWS Storage Gateway has the following anti-patterns:
Amazon CloudFront
Amazon CloudFront content delivery network (CDN) is a web service for content
delivery.24 Amazon CloudFront makes your website's dynamic, static, and
streaming content available from a global network of edge locations. When a
visitor requests a file from your website, he or she is invisibly redirected to a
copy of the file at the nearest edge location, which results in faster download
times than if the visitor had accessed the content from a data center farther
away. Amazon CloudFront caches content at edge locations for a period of time
that you specify.
Amazon CloudFront supports all files that can be served over HTTP. This
includes dynamic web pages, such as HTML or PHP pages, any popular static
files that are a part of your web application, such as website images, audio,
video, media files or software downloads. For on-demand media files, you can
Page 25 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Amazon CloudFront is optimized to work with other Amazon Web Services, like
Amazon S3, Amazon EC2, Amazon Elastic Load Balancing, and Amazon Route
53. Amazon CloudFront also works seamlessly with any non-AWS origin server,
which stores the original, definitive versions of your files.
Performance
Amazon CloudFront is designed for low-latency and high-bandwidth delivery of
content. Amazon CloudFront speeds up the distribution of your content by
routing end users to the edge location that can best serve the end user's request
in a worldwide network of edge locations. Typically, requests are routed to the
nearest Amazon CloudFront edge location in terms of latency. This dramatically
reduces the number of networks that your users' requests must pass through
and improves performance. End users get both lower latency—the time it takes
to load the first byte of the object—and higher sustained data transfer rates
needed to deliver popular objects to end user at scale.
Page 26 of 50
Amazon Web Services – Storage Options in the AWS Cloud
a central point of failure. Copies of your files are now held in edge locations
around the world.
Cost Model
With Amazon CloudFront, there are no long-term contracts or required
minimum monthly commitments—you pay only for as much content as you
actually deliver through the service. Amazon CloudFront has two pricing
components: regional data transfer out (per GB) and requests (per 10,000).
Note that it is often cheaper (as well as faster) to deliver popular content from
Amazon S3 through Amazon CloudFront rather than directly from Amazon S3.
Pricing information can be found at Amazon CloudFront Pricing.25
Interfaces
There are several ways to manage and configure Amazon CloudFront. The AWS
Management Console provides an easy way to manage Amazon CloudFront. All
the features of the Amazon CloudFront API are supported. For example, you can
enable or disable distributions, configure CNAMEs, and enable end-user
logging. You can also use the Amazon CloudFront command line tools, the
native REST API, or one of the supported SDKs.
Page 27 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Anti-Patterns
Amazon CloudFront is optimal for delivery of popular static or dynamic.
However, in a number of situations Amazon CloudFront is not the optimal
solution. Amazon CloudFront has the following anti-patterns:
While Amazon SQS and other message queuing services are usually thought of
as an asynchronous communication protocols, Amazon SQS can also be viewed
as a class of temporary but durable data storage for many classes of
applications. Use of Amazon SQS as temporary storage can minimize the use of
other storage mechanisms, such as temporary disk files.
Page 28 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Use of the Amazon SQS queue enables the number of worker instances to scale
up or down, and also enable the processing power of each single worker
instance to scale up or down, to suit the total workload, without any application
changes.
Performance
Amazon SQS is a distributed queuing system that is optimized for horizontal
scalability, not for single-threaded sending or receiving speeds. A single client
can send or receive Amazon SQS messages at a rate of about 5 to 50 messages
per second. Higher receive performance can be achieved by requesting multiple
messages (up to 10) in a single call. It may take several seconds before a
message that has been to a queue is available to be received.
Page 29 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Cost Model
With Amazon SQS, you pay only for what you use and there is no minimum fee.
To get started and to support simple applications, Amazon SQS provides a free
tier of service which provides 100,000 requests per month at no charge. Beyond
the free tier, Amazon SQS pricing is based on number of requests (priced per
10,000 requests) and the amount of data transferred in and out (priced per GB
per month). Pricing information can be found at Amazon SQS Pricing.27
Interfaces
Amazon SQS can be accessed through an HTTP Query web services interface, as
well as through SDKs for Java, PHP, Ruby, and .NET. The Amazon SQS APIs
provides both management and data interfaces. Five APIs make it easy for
developers to get started with Amazon SQS: CreateQueue, SendMessage,
ReceiveMessage, ChangeMessageVisibility, and DeleteMessage. Additional APIs
provide advanced functionality.
Anti-Patterns
Amazon SQS has the following anti-patterns:
Page 30 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Amazon RDS to store the large or binary data, and store a pointer to the
data in Amazon SQS.
Performance
Amazon RDS delivers high performance through a combination of configurable
instances running on Amazon’s proven, world-class infrastructure with fully-
automated maintenance and backup operations. Available database
configurations range from a small instance (64-bit platform with 1.7 GB of RAM
and 1 Amazon EC2 compute unit (ECU)) up to a quadruple extra-large instance
(64-bit platform with 68 GB of RAM and 26 ECUs).
Page 31 of 50
Amazon Web Services – Storage Options in the AWS Cloud
The Amazon RDS Multi-AZ deployment feature enhances both the durability
and the availability of your database by synchronously replicating your data
between a primary Amazon RDS DB instance and a standby instance in another
Availability Zone. In the unlikely event of a DB component failure or an
Availability Zone failure, Amazon RDS will automatically failover to the standby
(which typically takes about three minutes) and the database transactions can
be resumed as soon as the standby is promoted. The synchronous replication
helps to prevent loss of data.
Page 32 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Cost Model
With Amazon RDS, you pay only for what you use and there is no minimum fee.
Amazon RDS offers a tiered pricing structure, based on the size of the database
instance, the deployment type (Single-AZ/Multi-AZ), and the AWS region.
Pricing for Amazon RDS is based on several factors: the DB instance hours (per
hour), the amount of provisioned database storage (per GB-month and per
million I/O requests), additional backup storage (per GB-month), and data
transfer in / out (per GB per month). Pricing information can be found at
Amazon Relational Database Service Pricing.29
Amazon RDS for MySQL also enables you to scale out beyond the capacity of a
single database deployment for read-heavy database workloads by creating one
or more read replicas. Read replicas use MySQL’s built-in asynchronous
replication capability, and can be used in conjunction with the synchronous
replication provided by Amazon RDS Multi-AZ deployments.
Page 33 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Interfaces
Amazon RDS APIs and the AWS Management Console provide a management
interface that allows you to create, delete, modify, and terminate Amazon RDS
DB instances; to create DB snapshots; and to perform point-in-time restores. To
start using Amazon RDS, you simply use the AWS Management Console or
Amazon RDS APIs to launch a database instance (DB instance), selecting the
DB engine (MySQL, Oracle or SQL Server), license type, DB instance class, and
storage capacity that best meets your needs.
There is no AWS data API for Amazon RDS. Once the database is created, AWS
provides a DNS endpoint through which you can connect to your DB instance
using your favorite database tool or programming language. Since you have
direct access to a native MySQL, Oracle, or SQL Server database engine, most
tools designed for these engines should work unmodified with Amazon RDS.
After your schema and data are in place, you interact with your information via
standard SQL, as well as JDBC and other popular APIs, and any graphical tools
that can work with relational data. There are no code changes to be made to let
your application interact with Amazon RDS. You simply replace your database
server’s address (e.g. dbserver.example.com) in you database connection string
with the public DNS endpoint (e.g. myinstance.c0cafggtpzd2.us-east-
1.rds.amazonaws.com) provided by AWS when you create the instance. This
DNS endpoint will remain the same for the lifetime of your instance, even after
failover of an Amazon RDS Multi-AZ deployment. Aside from configuring the
endpoint, everything else about your database-centric application is unchanged.
Anti-Patterns
Amazon RDS is a great solution for cloud-based fully-managed relational
database, but in a number of scenarios it may not be the right choice. Amazon
RDS has the following anti-patterns:
Page 34 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Amazon DynamoDB
Amazon DynamoDB is a fast, fully-managed NoSQL database service that
makes it simple and cost-effective to store and retrieve any amount of data, and
serve any level of request traffic.30 Amazon DynamoDB helps offload the
administrative burden of operating and scaling a highly-available distributed
database cluster. This storage alternative meets the latency and throughput
requirements of highly demanding applications by providing extremely fast and
predictable performance with seamless throughput and storage scalability.
Page 35 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Common use cases include: mobile apps, gaming, digital ad serving, live voting
and audience interaction for live events, sensor networks, log ingestion, access
control for web-based content, metadata storage for Amazon S3 objects, e-
commerce shopping carts, and web session management. Many of these use
cases require a highly available and scalable database because downtime or
performance degradation has an immediate negative impact on an
organization’s business.
Performance
SSDs and limiting indexing on attributes provides high throughput and low
latency (single-digit milliseconds typical for average server-side response
times), and drastically reduces the cost of read and write operations. As the
datasets grow, predictable performance is required so that low-latency for the
workloads can be maintained. This predictable performance can be achieved by
defining the provisioned throughput capacity required for a given table. Behind
the scenes, the service handles the provisioning of resources to achieve the
requested throughput rate, which takes the burden away from the customer to
have to think about instances, hardware, memory, and other factors that can
affect an application’s throughput rate. Provisioned throughput capacity
reservations are elastic and can be increased or decreased on demand.
Page 36 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Cost Model
With Amazon DynamoDB, you pay only for what you use and there is no
minimum fee. Amazon DynamoDB has three pricing components: provisioned
throughput capacity (per hour), indexed data storage (per GB per month), data
transfer in or out (per GB per month). New customers can start using Amazon
DynamoDB for free as part of the AWS Free Usage Tier. Pricing information can
be found at Amazon DynamoDB Pricing.31
Interfaces
Amazon DynamoDB provides a low-level REST API, as well as higher-level
SDKs for Java, .NET, and PHP that wrap the low- level REST API and provide
some object-relational mapping (ORM) functions. These APIs provide both a
management and data interface for Amazon DynamoDB. The API currently
offers thirteen operations that enable table management (creating, listing,
deleting, and obtaining metadata) and working with attributes (getting, writing,
and deleting attributes; query using an index, and full scan). While standard
SQL isn’t available for Amazon DynamoDB, you may use the Amazon
DynamoDB select operation to create SQL-like queries that retrieve a set of
attributes based on criteria that you provide. You can also work with Amazon
DynamoDB using the AWS Management Console.
Page 37 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Anti-Patterns
Amazon DynamoDB has the following anti-patterns:
• BLOB data—If you plan on storing large (greater than 64 KB) BLOB
data, such as digital video, images, or music, you’ll want to consider
Amazon S3. However, Amazon DynamoDB still has a role to play in this
scenario, for keeping track of metadata (e.g., item name, size, date
created, owner, location, and so on) about your binary objects.
• Large data with low I/O rate—Amazon DynamoDB uses SSD drives and
is optimized for workloads with a high I/O rate per GB stored. If you plan
to store very large amounts of data that are infrequently accessed, other
storage options, such as Amazon S3, may be a better choice.
Amazon ElastiCache
ElastiCache is a web service that makes it easy to deploy, operate, and scale a
distributed, in-memory cache in the cloud.32 ElastiCache improves the
performance of web applications by allowing you to retrieve information from a
fast, managed, in-memory caching system, instead of relying entirely on slower
disk-based databases. ElastiCache supports two popular open-source caching
engines: Memcached and Redis.
Page 38 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Performance
Performance of a cache layer is very dependent on the caching strategy and the
hit rate at the application level, so it is difficult to provide general guidance.
Choose the total amount of cache memory needed based on the size of your
dataset and the expected access pattern. Then divide this by the memory per
cache node to get the required number of cache nodes. Make sure that you can
maintain acceptable performance without overloading the database backend in
the event of the failure and replacement of one or more cache nodes. You can
easily add or remove cache nodes from a running cluster, but you cannot change
the cache node type in a running cluster.
Page 39 of 50
Amazon Web Services – Storage Options in the AWS Cloud
With the Memcached engine, all ElastiCache nodes in a single cache cluster are
provisioned in a single Availability Zone. ElastiCache automatically monitors
the health of your cache nodes and replaces them in the event of network
partitioning, host hardware, or software failure. In the event of cache node
failure, the cluster remains available, but performance may be reduced due to
time needed to repopulate the cache in the new “cold” cache nodes. To provide
enhanced fault-tolerance for Availability Zone failures or cold-cache effects, you
can run redundant cache clusters in different Availability Zones.
Cost Model
With ElastiCache, you pay only for what you use and there is no minimum fee.
ElastiCache has only a single pricing component: pricing is per cache node-hour
consumed. For new customers, AWS provides a free usage tier that includes up
to 750 hours usage of a micro cache node. Pricing information can be found at
Amazon ElastiCache Pricing.33
Interfaces
To control and manage ElastiCache—to create, describe, reboot, modify, and
destroy cache clusters—you can use the AWS Management Console, the
ElastiCache command line tools, the HTTP Query API, and various SDKs.
Page 40 of 50
Amazon Web Services – Storage Options in the AWS Cloud
To read and write data to ElastiCache cache cluster, you simply use the normal
Memcached and Redis APIs. Existing applications using Memcached or Redis
can use ElastiCache with almost no modifications other than changing the port
or DNS name used to connect. For a Memcached application, you use standard
operations like get, set, incr and decr in exactly the same way as you would in
your existing Memcached deployments. For a Redis application, you use GET,
SET, EXPIRE, and the various flavors of PUSH and POP exactly as you do with
existing Redis application.
Anti-Patterns
Amazon ElastiCache has the following anti-patterns:
• Persistent data—If you need very fast access to data, but also need
strong data durability (persistence), Amazon DynamoDB is probably a
better choice.
Amazon Redshift
Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service
that makes it simple and cost-effective to efficiently analyze all your data using
your existing business intelligence tools.34 It is optimized for datasets that range
from a few hundred gigabytes to a petabyte or more.
Page 41 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Performance
Amazon Redshift uses a variety of innovations to obtain very high query
performance on datasets ranging in size from hundreds of gigabytes to a
petabyte or more. It uses columnar storage, data compression, and zone maps
to reduce the amount of I/O needed to perform queries. Amazon Redshift has a
massively parallel processing (MPP) architecture that parallelizes and
distributes SQL operations to take advantage of all available resources. The
underlying hardware is designed for high performance data processing that uses
local attached storage to maximize throughput.
Page 42 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Cost Model
With Amazon Redshift, you can pay as you go and there are no upfront costs.
Amazon Redshift has three pricing components: data warehouse node hours,
backup storage, and data transfer. Compute node hours are the total number of
hours run across all compute nodes for the billing period. Backup storage is the
storage associated with automated and manual snapshots for an Amazon
Redshift data warehouse cluster. Increasing the backup retention period or
taking additional snapshots increases the backup storage consumed by the
Amazon Redshift data warehouse cluster. There is no additional charge for
backup storage up to 100% of your provisioned storage for an active data
warehouse cluster.
Interfaces
The Amazon Redshift Query API provides a management interface to manage
data warehouse clusters programmatically. Additionally, the AWS SDKs for
Java, .NET, and other languages provide class libraries that wrap the underlying
Amazon Redshift API to simplify your programming tasks. If you prefer a more
Page 43 of 50
Amazon Web Services – Storage Options in the AWS Cloud
interactive way of managing clusters, you can use the Amazon Redshift console
and the AWS CLI.
The Amazon Redshift APIs do not provide a data interface. Amazon Redshift is
a SQL data warehouse and uses industry standard ODBC and JDBC connections
and PostgreSQL drivers. Once you’ve provisioned your cluster, you can connect
to it, start loading data, and run queries using the same SQL-based tools and
business intelligence applications you use today. For more information, see the
Amazon Redshift Partners page.36
Data can be loaded into Amazon Redshift from a range of data sources including
Amazon S3, Amazon DynamoDB, and AWS Data Pipeline. Amazon Redshift
attempts to load data in parallel into each compute node to maximize the rate at
which data can be ingested into the data warehouse cluster. For more
information on loading data into Amazon Redshift, see the Amazon Redshift
Getting Started Guide.37
Anti-Patterns
Amazon Redshift has the following anti-patterns:
Page 44 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Performance
The performance of a relational database instance on Amazon EC2 depends on
many factors, including the Amazon EC2 instance type, the number and
configuration of Amazon EBS volumes, the database software and its
configuration, and the application workload. In general, you can expect
database performance on Amazon EC2 to be similar to the performance of the
same database installed on similarly configured on-premises equipment. We
encourage you to benchmark your actual application on several Amazon EC2
instance types using several storage configurations to select the best
configuration.
Page 45 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Cost Model
By running a database on Amazon EC2, you pay only for what you use and there
is no minimum fee. The cost of running your own database on Amazon EC2
depends on the size and number of Amazon EC2 instances used to run your
database, the size of the Amazon EBS volumes used for database storage, the
amount of data transferred in and out of Amazon EC2, and, in many cases, the
license cost of the third-party database software. Many open-source database
packages use a no-cost license model; some commercial software vendors use
the Amazon DevPay model; many others provide a bring-your-own-license
model. Contact your database software vendor or Amazon Web Services to
understand the license cost pricing model that applies. Pricing information for
Amazon EC2, Amazon EBS, and data transfer can be found at Amazon EC2
Pricing.39
Anti-Patterns
Running your own relational database on Amazon EC2 is a great solution for
many users, but a number of scenarios exist where other solutions might be the
better choice. Self-managed relational databases on Amazon EC2 have the
following anti-patterns:
Page 46 of 50
Amazon Web Services – Storage Options in the AWS Cloud
them, you may find Amazon S3 to be a better choice. You can use a
database to manage the metadata.
Contributors
The following individuals and organizations contributed to this document:
• Joseph Baron
• Sanjay Kotecha
Further Reading
For additional help, see the following sources.
• Amazon Glacier
• Amazon EBS
Page 47 of 50
Amazon Web Services – Storage Options in the AWS Cloud
• AWS Import/Export
• Amazon CloudFront
• Amazon SQS
• Amazon RDS
• Amazon DynamoDB
• Amazon ElastiCache
• Amazon Redshift
Other Resources
• AWS SDKs, IDE Toolkits, and Command Line Tools
Page 48 of 50
Amazon Web Services – Storage Options in the AWS Cloud
Notes
1 http://media.amazonwebservices.com/AWS_Storage_Use_Cases.pdf
2 https://aws.amazon.com/s3/
3 http://aws.amazon.com/free/
4 http://aws.amazon.com/s3/pricing/
5 http://aws.amazon.com/glacier/
6 http://aws.amazon.com/glacier/pricing/
7 http://aws.amazon.com/sns/
8 http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-
mgmt.html
9 http://aws.amazon.com/ebs/
10 http://aws.amazon.com/ec2/pricing/#EBS
11 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/
12 Most Amazon EC2 instance types provide local instance storage volumes;
however, the micro and the M3 instance are Amazon EBS-only, and do not
provide local instance storage. Also, instances that use Amazon EBS for the
root device (boot from Amazon EBS) do not expose the instance store volumes
by default. If desired, you can expose the instance store volumes at instance
launch time by specifying a block device mapping. For more information, see
the Amazon EC2 User Guide.
13
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage_instances.
html
14
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/high_storage_insta
nces.html
15 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-
lifecycle.html
16 http://aws.amazon.com/ec2/pricing/
17 http://aws.amazon.com/importexport/
18 http://aws.amazon.com/importexport/pricing/
Page 49 of 50
Amazon Web Services – Storage Options in the AWS Cloud
19 http://aws.amazon.com/storagegateway/
20 http://docs.amazonwebservices.com/storagegateway/latest/userguide/
21 http://aws.amazon.com/directconnect/
22 http://aws.amazon.com/storagegateway/pricing/
23 https://console.aws.amazon.com/storagegateway
24 http://aws.amazon.com/cloudfront/
25 http://aws.amazon.com/cloudfront/pricing/
26 http://aws.amazon.com/sqs/
27 http://aws.amazon.com/sqs/pricing/
28 http://aws.amazon.com/rds/
29 http://aws.amazon.com/rds/pricing/
30 http://aws.amazon.com/dynamodb/
31 http://aws.amazon.com/dynamodb/pricing/
32 http://aws.amazon.com/elasticache/
33 http://aws.amazon.com/elasticache/pricing/
34 http://aws.amazon.com/redshift/
35 http://aws.amazon.com/redshift/pricing/
36 http://aws.amazon.com/redshift/partners/
37 http://docs.aws.amazon.com/redshift/latest/gsg/welcome.html
38 http://aws.amazon.com/ec2/
39 http://aws.amazon.com/ec2/pricing/
Page 50 of 50