h10502 Vfcache Intro WP
h10502 Vfcache Intro WP
h10502 Vfcache Intro WP
Abstract
This white paper is an introduction to EMC VFCache. It
describes the implementation details of the product and
provides performance, usage considerations, and major
customer benefits when using VFCache.
February 2012
INTRODUCTION TO EMC VFCACHE
VFCache is a server Flash-caching solution
VFCache accelerates reads and ensures data protection
VFCache extends EMC FAST Suite to server
2 Introduction to EMC VFCache
Copyright 2012 EMC Corporation. All Rights Reserved.
EMC believes the information in this publication is accurate as
of its publication date. The information is subject to change
without notice.
The information in this publication is provided as is. EMC
Corporation makes no representations or warranties of any kind
with respect to the information in this publication, and
specifically disclaims implied warranties of merchantability or
fitness for a particular purpose.
Use, copying, and distribution of any EMC software described in
this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.
VMware, ESX, VMware vCenter, and VMware vSphere are
registered trademarks or trademarks of VMware, Inc. in the
United States and/or other jurisdictions. All other trademarks
used herein are the property of their respective owners.
Part Number: H10502.1
3 Introduction to EMC VFCache
Table of Contents
Executive summary.................................................................................................. 4
EMC VFCache solution ........................................................................................................ 4
Introduction ............................................................................................................ 5
Audience ............................................................................................................................ 5
Terminology ....................................................................................................................... 5
Use cases of Flash technology ................................................................................. 6
VFCache advantages over DAS ....................................................................................... 6
Cache in the storage array .............................................................................................. 7
Flash cell architecture ........................................................................................................ 7
VFCache design concepts ...................................................................................... 10
Business benefits ............................................................................................................. 13
Implementation details ......................................................................................... 15
Read Hit example ......................................................................................................... 15
Read Miss example ...................................................................................................... 16
Write example .............................................................................................................. 17
VMware implementation .................................................................................................. 18
Split-card feature .............................................................................................................. 20
VFCache management ...................................................................................................... 21
Performance Considerations .................................................................................. 22
Locality of reference ......................................................................................................... 22
Warm-up time ................................................................................................................... 22
Workload characteristics .................................................................................................. 23
Throughput versus latency ............................................................................................... 24
Other bottlenecks in the environment .............................................................................. 24
Write performance dependent on back-end array ............................................................. 24
Usage guidelines and characteristics ..................................................................... 25
Specifications .................................................................................................................. 25
Constraints ....................................................................................................................... 26
Stale data ......................................................................................................................... 26
Application use case and performance ................................................................... 28
Test results ....................................................................................................................... 29
Conclusion ............................................................................................................ 30
References ............................................................................................................ 31
4 Introduction to EMC VFCache
Executive summary
Since the first deployment of Flash technology in disk modules (commonly known as
solid-state drives or SSDs) by EMC in enterprise arrays, it has been EMCs goal to
expand the use of this technology throughout the storage environment.
The combination of the requirement for high performance and the rapidly falling cost-
per-gigabyte of Flash technology has led to the concept of a caching tier. A caching
tier is a large-capacity secondary cache using Flash technology that is positioned
between the server application and the storage media.
EMC VFCache solution
EMC VFCache is a server Flash-caching solution that reduces latency and
accelerates throughput to dramatically improve application performance by using
intelligent caching software and PCIe Flash technology.
VFCache accelerates reads and protects data by using a write-through cache to the
networked storage to deliver persistent high availability, integrity, and disaster
recovery.
VFCache coupled with array-based EMC
server
VFCache vendor driver on the ESX server
VFCache software in each virtual machine that needs to be accelerated using
VFCache
In a VMware environment, the VFCache software includes the VFCache driver, CLI
package, and VFCache Agent. The VFCache software does not need to be installed
on all the virtual machines in the server. Only those virtual machines that need to
be accelerated using VFCache need to have VFCache software installed.
VFCache VSI Plug-in for VFCache management in the VMware vCenter client
This is usually the laptop that the administrator uses for connecting to the vCenter
server.
You have to create a datastore using the VFCache hardware on the ESX server. Once
the VFCache datastore has been created, the rest of the setup can be managed using
the VFCache VSI plug-in. In order for a virtual machine to use the VFCache datastore, a
virtual disk (vDisk) for the virtual machines cache device must be created within the
VFCache datastore. vDisks can be created either through the VFCache VSI plug-in or
directly using the vSphere client. This virtual disk needs to be added to the virtual
machine.
The cache configuration and management steps from this point on are similar to the
steps that you would follow in a physical server environment. These can be done
20 Introduction to EMC VFCache
using either the CLI on the virtual machine or the VSI plug-in on the vCenter client.
More details on installation of VFCache in VMware environments can be found in
VFCache Installation Guide for VMware and VFCache VMware Plug-in Administration
Guide available on Powerlink.
Depending on the cache size required on each virtual machine, an appropriate sized
cache vDisk can be created from the VFCache datastore and assigned to the virtual
machine. If you want to change the size of VFCache on a particular virtual machine,
you need to do the following:
1. Shut down the virtual machine.
2. Increase the size of the cache vDisk assigned to the virtual machine.
3. Restart the virtual machine.
VFCache is a local resource at the virtual machine level in the ESX server. This has the
same consequences as any other local resource on a server. For example, you cannot
configure an automatic failover for a virtual machine that has VFCache. You cannot
use features like VMware vCenter Distributed Resource Scheduler (vCenter DRS)for
clusters or VMware vCenter Site Recovery Manager (vCenter SRM) for replication.
You cannot use VFCache in a cluster that balances application workloads by
automatically performing vMotion from heavily used hosts to less-utilized hosts. If
you are planning to use vMotion functionality, you should:
1. Stop VFCache on the source virtual machine.
2. Remove VFCache from the source virtual machine.
3. Perform vMotion from the source virtual machine to the destination virtual
machine.
4. Restore caching on the destination virtual machine.
Both RDM and VMFS volumes are supported with VFCache.
Split-card feature
EMC VFCache has a unique "split-card" feature, which allows you to use part of the
server Flash as a cache and another part of the server Flash as DAS. When using the
DAS portion of this feature, both read and write operations from the application are
done directly on the PCIe Flash capacity in the server.
The contents of the DAS portion do not persist to any storage array. Therefore, it is
highly recommended that you use the DAS portion only for temporary data, such as
operating system swap space and temp file space. This feature provides an option for
you to simultaneously use the card as a caching device and as a storage device for
temporary data.
When this functionality is used, the same Flash capacity and PCIe resources are
shared between the cache and DAS portions. Therefore, the cache performance may
be less compared to when the PCIe card is being used solely as a caching solution.
21 Introduction to EMC VFCache
VFCache management
VFCache does not require sophisticated management software. However, there is a
CLI for management of the product. There is also an option of using a VSI plug-in for
VFCache management in VMware environments.
22 Introduction to EMC VFCache
Performance Considerations
VFCache is a write-through caching product rather than a Flash storage solution, so
there are certain things that need to be considered when evaluating VFCache
performance.
Locality of reference
The key to maximizing VFCache performance is the locality of reference in the
application workload. Applications that reference a small area of storage with very
high frequency will benefit the most from using VFCache. Examples of this are
database indices and reference tables. If the locality of reference is low, the
application may get less benefit after promoting a data chunk into VFCache. Very low
locality will result in few or no promotions and thus no benefit.
Warm-up time
VFCache needs some warm-up time before it shows significant performance
improvement. Warm-up time consists mostly of promotion operations into VFCache.
This happens when the VFCache has just been installed and is empty. This also
happens when the working data set of the application has changed dramatically, and
the current VFCache data is no longer being referenced. During this phase, the
VFCache read-hit rate is low, so the response time is more like that of the SAN
storage. As the VFCache hit rate increases, the performance starts improving and
stabilizes when a large part of the application working set has been promoted into
VFCache. In internal tests using a 1.2 TB Oracle database, the throughput increased
to more than twice the baseline values in 30 minutes when TPC-C-like workload was
used.
Among other things, warm-up time depends on the number and type of storage media
in the back-end SAN storage. For example, a setup of 80 SAS drives will have a
shorter warm-up time than a setup with 20 SAS drives. Similarly, a setup with SAS
hard-disk drives (HDDs) in the back end will warm up faster than with NL-SAS HDDs in
the back end. This is because NL-SAS drives typically have a higher response time
than SAS drives. When you are designing application layouts, it is important to
remember that there is a warm-up time before stable VFCache performance is
reached.
In a demo or a Proof of Concept, the warm-up time can be speeded up by reading
sequentially through the test area in 64 KB I/O size. Once the working set has been
promoted, the benchmark test can be run again to compare the numbers with the
baseline numbers. CLI commands can be used to find out how many pages have been
promoted into VFCache. This gives you an idea of what percentage of the working set
has been promoted into the cache.
If you are comparing the performance against PCIe Flash DAS solutions, the initial
performance numbers of VFCache will be less because the cache needs to warm up
before the stable performance numbers are shown. In the case of DAS solutions, all
read and write operations happen from the PCIe Flash and there is no warm-up phase.
23 Introduction to EMC VFCache
Therefore, initial performance numbers should not be compared between a caching
and a DAS solution.
Workload characteristics
The final performance benefit that you can expect from VFCache depends on the
application workload characteristics. EMC recommends that you do not enable
VFCache for storage volumes that do not have a suitable workload profile. This
enables you to have more caching resources available for those volumes that are a
good fit for VFCache. For example:
Read/write ratio
VFCache provides read acceleration, so the higher the read/write ratio of the
workload, the more performance benefit you get.
Working set size
You should have an idea of the working set size of the application relative to the
cache size. If the working set is smaller than the cache size, the whole working set
will get promoted into the cache and you will see very good performance
numbers. However, if the working set is much bigger than the cache, the
performance benefit will be less. The maximum performance benefit is for those
workloads where the same data is read multiple times or where the application
reads the same data multiple times after writing it once.
Random versus sequential workloads
An EMC storage array is very efficient in processing sequential workloads from
your applications. The storage array uses its own cache and other mechanisms
like prefetching to accomplish this. However, if there is any randomness in the
workload pattern, the performance is lower because of the seek times involved
with accessing data on mechanical drives. The storage array cache is also of
limited use in this case because different applications using the storage array will
compete for the same storage array cache resource. Flash technology does not
have any latency associated with seek times to access the data. VFCache will
therefore show maximum performance difference when the application workload
has a high degree of random component.
Concurrency
Mechanical drives in the storage array have only one or two read/write heads,
which means that only limited number I/Os can be processed at any one point in
time from one disk. So when there are multiple threads in the application trying to
access data from the storage array, response times tend to go up because the
I/Os need to wait in the queue before they are processed. However, storage and
caching devices using Flash technology typically have multiple channels
internally that can process multiple I/Os at the same time. Therefore, VFCache
shows the maximum performance difference when the application workload has a
high degree of concurrency. The application should request multiple I/Os at the
same time.
24 Introduction to EMC VFCache
I/O Size
Large I/O sizes tend to be bandwidth-driven and reduce the performance gap
between Flash technology and non-Flash technologies. Applications with smaller
I/O sizes (for example, 8 KB) show the maximum performance benefit when using
VFCache.
Throughput versus latency
There are some applications that can push the storage environment to the limit to
provide as many IOPS as possible. Using VFCache in those application environments
will show very high IOPS at very low response times. However, there are also
applications that do not require very high IOPS, but they require very low response
times.
You can see the benefit of using VFCache in these application environments. Even
though the application issues relatively few I/Os, whenever the I/Os are issued, they
will be serviced with a very low response time. For example, a web application may
not have a lot of activity in general, but whenever a user issues a request, the
response will be very quick.
Other bottlenecks in the environment
VFCache helps improve throughput and reduce latencies for the applications.
However, any drastic improvement in application throughput may expose new
underlying performance bottlenecks and/or anomalies. Addressing these may include
application tuning, such as increasing buffer cache sizes or other changes that
increase concurrency. For example, in a typical customer deployment, a Microsoft SQL
Server administrator should not enable VFCache on the log files. An inefficient
storage layout design of the log files may be exposed as a bottleneck when VFCache
improves the throughput and latency of the SQL Server database.
Write performance dependent on back-end array
VFCache provides acceleration to read I/Os from the application. Any write operations
that the application issues still happens at the best speed that the back-end storage
array can provide. At a fixed read/write ratio from an application, this tends to limit
the net potential increase in read throughput. For example, if the storage array is
overloaded and is processing write operations at a very slow rate, VFCache will not be
able to accelerate additional application reads.
Once VFCache has been enabled on a particular source volume, every I/O from the
application needs to access the VFCache card, whether it is a read or a write
operation. In most cases, the processing capability of VFCache will be much greater
than what the storage array can provide, therefore VFCache will not be a performance
bottleneck in the data path. However, if a very large number of disks on the storage
array are dedicated to a single host application, and they are fully utilized in terms of
IOPS, the throughput that the storage array could provide without VFCache might be
more than what VFCache can process. In this scenario, VFCache may provide minimal
performance benefit to the application.
25 Introduction to EMC VFCache
Usage guidelines and characteristics
This section provides some of the usage guidelines and salient features of VFCache.
Since VFCache does not store any data that has not already been written on
the storage array, the application data is protected and is persisted on the
storage array if anything happens to VFCache on the server. However, the
cache would need to be warmed up again after the server starts up.
In a physical environment, you can enable or disable VFCache at the source
volume level or LUN level. In a virtual environment, the VFCache capacity
needs to be partitioned for individual virtual machines, as applicable. This
allocated cache capacity inside the virtual machine can then be configured at
vDisk-level granularity. The minimum size for the cache vDisk is 20 GB.
There is no hard limit on the maximum number of server volumes on which
VFCache can be enabled. However, if you enable it on a very large number of
volumes, that may create resource starvation for those volumes that could
actually benefit from VFCache. EMC recommends that VFCache not be enabled
for those volumes that are least likely to gain any performance benefit from
VFCache. This allows other volumes that are a good fit for VFCache to get the
maximum processing and cache capacity resources.
PowerPath optimizes the use of multiple data paths between supported
servers and storage systems, providing a performance boost by doing load
balancing between the paths. VFCache improves the application performance
even further by helping to move the most frequently accessed data closer to
the application by using PCIe Flash technology for write-through caching.
PowerPath and VFCache are complementary EMC products for scaling mission-
critical applications in virtual and physical environments, including cloud
deployments. Additionally, since VFCache sits above the multipathing
software in the I/O stack, it can work with any multipathing solution on the
market. PowerPath and VFCache are purchased separately.
VFCache is complementary to FAST VP and FAST Cache features on the storage
array. However, it is not required to have FAST VP or FAST Cache on the storage
array to use VFCache.
VFCache only accelerates read operations from the application. The write
operations will be limited by the speed with which the back-end array can
process the writes.
Specifications
The cache page size that is used internally in VFCache is 8 KB, but it will work
seamlessly with applications where the predominant I/O size is other than 8
KB. The cache page size is fixed and is not customizable.
VFCache needs to be installed in PCIe Gen2, x8 slots in the server. It can be
installed in x16 PCIe slots also, but only 8 channels will be used by VFCache.
26 Introduction to EMC VFCache
Similarly, if it is installed in an x4 PCIe slot in the server, VFCache will perform
sub-optimally.
VFCache cards are available in 300 GB capacity.
Only one VFCache card can be used per server.
VFCache supports the following connection protocols between the server and
the storage array:
o 4 Gb/s Fibre Channel
o 8 Gb/s Fibre Channel
VFCache is compliant with the Trade Agreements Act (TAA). The following main
requirements are certified as not applicable to VFCache:
o FIPS 140-2
o Common Criteria
o Platform Hardening
o Research Remote Access
Constraints
VFCache does not provide connectivity between the server and the SAN storage
array. You still need to use an HBA card to connect to the back-end storage array
where the data is eventually stored.
VFCache is not supported on blade servers. Blade servers require a customized
version of the card. For the most current list of supported operating systems and
servers, refer to E-Lab Interoperability Navigator.
VFCache is currently not supported in shared-disk environments or active/active
clusters. However, shared disk clusters in VMware environments are supported
since VFCache is implemented at the virtual machine level rather than the ESX
server level.
By default, there is a maximum I/O size of 64 KB, which VFCache driver intercepts.
Any I/O larger than 64 KB will not be intercepted by VFCache. Application with
larger I/O sizes are typically bandwidth sensitive and have sequential workloads,
which would not benefit from a caching solution like VFCache.
Stale data
Stale data because of storage array snapshots
If any operations modify the application data without the knowledge of the server,
it is possible to have stale data in VFCache. For example, if a LUN snapshot were
taken on the array and later used to roll back changes on the source LUN, the
server would have no knowledge of any changes that had been done on the array,
which would result in VFCache having stale data that had not been updated with
the contents from the snapshot. As a workaround in this case, you need to
manually stop and restart the VFCache software driver for the source volume.
27 Introduction to EMC VFCache
Note The whole cache device does not need to be stopped, only the source
volume on which the snapshot operations are being done needs to be
stopped. When you restart the VFCache software driver on the source
volume, a new source ID is automatically generated for that source
volume, which invalidates the old VFCache contents for the source volume
and starts caching the new data from the snapshot. The application then
gets access to new data from the snapshot.
Stale data in VMware environments
If you use VMware, you should be careful when the VMware snapshot feature is
being used. VFCache metadata is kept in the virtual machine memory, therefore, it
will be a part of the virtual machine snapshot image when a virtual machine
snapshot is taken. This means that when this snapshot image is used to roll back
the virtual machine, the old metadata is restored and potentially causes data
corruption.
You must purge the VFCache before the virtual machine suspend and resume
operations. This is handled using scripts that are automatically installed when the
VFCache Agent is installed in the virtual machine. These scripts are automatically
invoked when these virtual machine operations are run. In Windows
environments, you should take care to ensure that other programs or installations
in the virtual machine do not change the default suspend/resume scripts in such
a way that the VFCache scripts are not executed on those events. VFCache can
also be purged manually before suspend and resume operations in the virtual
machine, if needed.
28 Introduction to EMC VFCache
Application use case and performance
VFCache helps you boost the performance of your latency and response-time
sensitive applications typically applications such as database applications (like
Oracle, SQL Server and DB2), OLTP applications, web applications, and financial
trading applications. VFCache is not suitable for more write-intensive or sequential
applications such as data warehousing, streaming media, or Big Data applications.
Use cases are shown in Figure 9.
Figure 9: VFCache Use Cases
The horizontal axis represents a typical read/write ratio of an application workload.
The left side represents write-heavy applications such as backups. The right side
represents read-heavy applications such as reporting tools.
The vertical axis represents the locality of reference or skew of the applications
workload. The lower end represents applications that have very low locality of
reference, and the top side represents applications where a majority of the I/Os go to
a very small set of data.
You will achieve the greatest results with VFCache in high-read applications and
applications with a highly concentrated skew of data.
29 Introduction to EMC VFCache
Test results
EMC conducted application-specific tests with VFCache to determine potential
performance benefits when this product is used. Here is a summary of the VFCache
benefits with a couple of applications:
SQL Server
With a TPC-E like workload in a 750 GB Microsoft SQL Server 2008 R2
environment, the number of transactions increased three times and the latency
was reduced by 87 percent when VFCache was introduced in the configuration.
Oracle
With a TPC-C-like workload in a 1.2 TB Oracle 11g R2 physical environment,
the number of transactions increased three times and the latency was reduced
by 50 percent when VFCache was introduced in the configuration. The test
workload had 70 percent reads and 30 percent writes.
In a VMware setup with 1.2 TB Oracle Database 11g R2 and TPC-C-like
workload, the number of transactions increased by 80 percent when VFCache
was introduced in the configuration. The test workload had 70 percent reads
and 30 percent writes.
For more information on application-specific guidelines and test results, refer to the
list of white papers provided in the References section.
30 Introduction to EMC VFCache
Conclusion
There are multiple ways in which Flash technology can be used in a customer
environment today, for example, Flash in the server or the storage array, Flash used
as a cache or a tier. The key, however, is the software that brings all of this together,
using different technologies at the right place and time for the right price.
VFCache uses EMC FAST technology in the storage array and FAST in the server to
provide this benefit most appropriately, as simply and as easily as possible.
VFCache dramatically accelerates the performance of read-intensive applications.
VFCache software caches the most frequently used data on the server-based PCIe
card, which puts the data closer to the application. It extends FAST technology
into the server by ensuring that the right data is placed in the right storage at the
right time.
The intelligent caching algorithms in VFCache promote the most frequently
referenced data into the PCIe server Flash to provide the best possible
performance and latency to the application.
VFCache provides you with the flexibility to use the same PCIe device as a caching
solution as well as a storage solution for temporary data.
VFCache suits many but not all customer environments, and it is important that you
understand the application workload characteristics properly when choosing and
using VFCache.
VFCache protects data by using a write-through algorithm, which means that writes
persist to the back-end storage array. While other vendors promise the performance
of PCIe Flash technology, EMC VFCache provides this performance with protection.
31 Introduction to EMC VFCache
References
The following documents are available on Powerlink:
EMC VFCache Data sheet
VFCache Installation and Administration Guide for Windows and Linux
VFCache Release Notes for Windows and Linux
VFCache Installation Guide for VMware
VFCache Release Notes for VMware
VFCache VMware Plug-in Administration Guide
Considerations for Choosing SLC versus MLC Flash
EMC VFCache Accelerates Oracle - EMC VFCache, EMC Symmetrix VMAX and
VMAXe, Oracle Database 11g
EMC VFCache Accelerates Virtualized Oracle - EMC VFCache, EMC Symmetrix VMAX
and VMAXe, VMware vSphere, Oracle Database 11g
EMC VFCache Accelerates Oracle - EMC VFCache, EMC VNX, EMC FAST Suite,
Oracle Database 11g
EMC VFCache Accelerates Microsoft SQL Server - EMC VFCache, EMC VNX,
Microsoft SQL Server 2008
The following Demartek analyst report and video are available on EMC.com:
EMC VFCache Flash Caching Solution Evaluation