CR-2-496 - NetApp Flash Pool Deep Dive

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 42

CR-2-496

NetApp Flash Pool Deep


Dive
Jay White
Agenda
 Overview

 How it works

 Design considerations

 Sizing and analysis

 Summary

2
Overview

3
NetApp VST Overview

Virtual Storage Tier


Data-driven • Real-time • Self-managing

2009: Flash Cache Introduced


System Cache • Plug & Play • Very Successful
2012: Flash Pool Introduced
Aggregate Cache • Broaden Usage • Entry Systems
2012: Flash Accel Announced
Server Cache Software • Intelligent • Vendor Agnostic

4
NetApp VST Strategy

Including the Server Cache Partner Program


Flash Accel Application and server performance acceleration
End-to-end flash options

Server Coherency and integration with Data ONTAP

System level read cache for all aggregates/volumes


Flash Cache
Plug and play with no administration overhead

Aggregate level R/W cache with per volume policies


Flash Pool
Persistent cache availability across failover events
Storage

5
Flash Pool Overview
Flash Pool delivers the ability to add an SSD cache to an
existing HDD aggregate which provides
 Offload of expensive HDD operations into the SSD cache
to balance peaks in workloads
 Persistent cache across failover events allows the SSD
cache to be immediately available (no rewarming)
 Reduce HDD spindle count while achieving the same
performance at a lower total configuration cost

6
Flash Pool Overview
Flash Pool does not …
 Accelerate write operations –
Data ONTAP is already write optimized!
 Reduce or alleviate high CPU or
memory utilization
 Cache sequential (read or write) or large
block write (>16kB) operations
 Increase maximum IOPS or
throughput limits for a system

7
Data ONTAP Requirements
Software requirements
 Data ONTAP 8.1.1 or later operating in 7-Mode or
Cluster-Mode
 HDD based 64-bit aggregate
 Aggregate state must be healthy (can’t be FAILED,
LIMBO, offline or in a foreign state)
 Can be a RAID-DP, RAID 4 or SyncMirror aggregate
 Root aggregate support requires PVR (8.1.1 only)
 No license is required – it’s free!

8
Platform Support Requirements
Supported platforms
 Entry-level:
− FAS2220, FAS2240-2 and FAS2240-4
 Midrange:
− FAS/V3160, FAS/V3170, FAS/V3240, and FAS/V3270
 High-end:
− FAS/V6030, FAS/V6040, FAS/V6070, FAS/V6080,
FAS/V6210, FAS/V6240, and FAS/V6280

9
Storage Requirements
Supported disk shelves
 DS14mk4, DS2246, DS424X and DS4486
− Recommended stack depth <= 4 shelves
 V-Series must use only NetApp SSD/HDD
 Supported drive types:
− SAS, SATA (SATA, BSAS and MSATA)
− FC (DS14mk4 FC only, no MetroCluster FC + Flash Pool)
 Supported SSD:
− X441A-R5 100GB SLC SSD

1
How It Works

11
SSD Cache
 All SSD data drives in the aggregate provide
cache capacity accessible by Flash Pool
− Individual read and write cache capacity is
variable based on the cache policies set and
actual cache usage pattern

 Some of the available cache capacity is


used to store Flash Pool metadata and
reserves that are used to facilitate
consistent long term write performance

1
Read Caching
1. The first read request goes to HDD –
block is brought into memory and
sent to the requestor.
2. Read is evicted from memory – if it

SSD
matches the insertion policy (random 2
CP 3

read), it is copied into the SSD cache.

HDD
memory
1
3. Any additional requests for the same
block are serviced from the SSD cache
– the block is copied back into system
memory and sent to the requestor. Block in HDD = actual block
Block in memory = copy of actual block
Block in SSD cache = copy of actual block

1
Overwrite Caching
1. First random write is sent to HDD
in a CP. All sequential writes are
sent to HDD.
2. An overwrite of the same random

SSD
block arrives in memory – if it 2
CP 3

matches the insertion policy (small CP

HDD
memory
1
block random overwrite), it is sent
to the SSD cache in a CP.
NVRAM
3. Actual block resides in SSD (block in
HDD is invalid) – will eventually be Block in HDD = actual block
Block in memory = actual block
de-staged to HDD when evicted Block in SSD cache = actual block
from the SSD cache.

1
Eviction Scanner
Purpose of the eviction scanner
 Runs in order to evict cold blocks to make room for
new blocks that are being inserted - starting when:
− The cache is 75%+ used
− Within the next hour the trending usage is expected to
exceed 75%+ used
 Each scanner pass demotes a block, eventually to the
point that the block is evicted from cache
− It takes multiple scanner passes to result in an eviction,
especially if the block is being accessed between
scanner passes
1
Read Cache Management

read read read

01
Hot Warm Neutral Cold Evict*
scanner scanner scanner scanner

inserted

*Evicted blocks are overwritten with new blocks

1
Write Cache Management

overwrite

01
Neutral Cold Evict*
scanner scanner

inserted

*Evicted blocks are moved to HDD

1
Cache Policies
Cache policies can be modified for each volume that
resides in the Flash Pool
 The “priority” command is used to modify volume
cache policies
priority hybrid-cache set <vol_name> <read|write>-cache=<policy>

 Access the command in the node shell for Data ONTAP


operating in Cluster-Mode
 Volumes with caching disabled continue to be excluded
from Flash Cache

1
Cache Policies
Read cache policies
 none: disable read caching

 random-read: (default) cache randomly read data as


determined by Data ONTAP
 random-read-write: cache a copy in the read cache of all
blocks that are inserted into the write cache
 meta: all metadata blocks of the volumes in the aggregate
plus directory blocks will be cached

1
Cache Policies
Write cache policies
 none: disable write caching

 random-write: (default) cache small block random


overwrites as determined by Data ONTAP

2
Design Considerations

21
Flash Pool Capacity Limits
 The sum of all SSD data drives usable
capacity from each Flash Pool on a system
counts toward the cache limit
 SSD cache does not count towards the
maximum aggregate capacity but does count
towards the systems maximum spindle limit
 Per node cache limit cannot be exceeded
− HA pair limits are listed to indicate the
remaining node in a failover can serve the full
cache capacity from both nodes

2
High-end Platform Limits (DOT 8.1.1)

Per Controller Per HA Pair Number of 100GB


Platform
Maximum Cache Maximum Cache SSD Data Drives
FAS6280 6.0TB 12.0TB 66 per controller
FAS6240 6.0TB 12.0TB 66 per controller
FAS6210 2.0TB 4.0TB 22 per controller
FAS6080 2.0TB 4.0TB 22 per controller
FAS6070 2.0TB 4.0TB 22 per controller
FAS6040 1.0TB 2.0TB 11 per controller
FAS6030 1.0TB 2.0TB 11 per controller

2
Midrange/Entry Platform Limits (DOT 8.1.1)

Per Controller Per HA Pair Number of 100GB


Platform
Maximum Cache Maximum Cache SSD Data Drives
FAS3270 1.0TB 2.0TB 11 per controller
FAS3240 0.5TB 1.0TB 5 per controller
FAS3170 1.0TB 2.0TB 11 per controller
FAS3160 0.5TB 1.0TB 5 per controller
FAS2240-2 300GB total 3 per HA pair
FAS2240-4 300GB total 3 per HA pair
FAS2220 300GB total 3 per HA pair

*FAS/V3210, FAS/V3140 and FAS/V3070 are not supported with Flash Pool

2
Minimum SSDs
A recommended minimum number of SSDs should
used in a Flash Pool to make sure the SSD does not
become a bottleneck

Min # of SSD Total Min # of SSDs


System Family
Data Drives (Data + Parity + Spare)
RAID-DP: 1+2+0 (3)
Entry-level Systems 1
RAID 4: 1+1+1 (3)
RAID-DP: 3+2+1 (6)
Midrange Systems 3
RAID 4: 3+1+1 (5)
RAID-DP: 9+2+1 (12)
High-end Systems 9
RAID 4: 9+1+1 (11)

2
Flash Cache and Flash Pool
Flash Cache and Flash Pool can co-exist
on the same system
 Any aggregate containing SSDs (Flash Pool or
homogenous) will be excluded from Flash Cache
(including volumes with Flash Pool caching disabled)
 Both products have unique attributes that should be
considered based on your workload requirements
 The combined Flash Cache and Flash Pool (data
drives) capacity counts towards the maximum cache
capacity per node/controller

2
Flash Cache, Flash Pool or Both?

Flash Cache Flash Pool

PCIe SSD HDD


System Level Cache Aggregate Level Cache
Caches reads for all (non-flash) volumes Caches reads and random overwrites
Does not count towards spindle limits Per volume cache policies
Plug and play deployment and Persistence across planned and
administration unplanned failover events

2
Snapshot Copies
 Snapshots and read cache
− Read cache blocks are copies of the actual blocks that
are on the HDD
 Volume snapshots only lock the block on HDD

 Snapshots and write cache


− Write cache blocks are the actual block (HDD block invalid)
 Volume snapshots will lock the virtual block, which allows
the physical block to change when the block is evicted
 Aggregate snapshots (not recommended) lock on the
physical block (pins blocks in the cache)

2
Storage Efficiency
 Deduplication
− Blocks that are deduplicated
on HDD are cached
as
deduplicated blocks
 Clones
− Cloned blocks are cached
in the SSD cache

 Compression
− Compressed blocks are not cached
− Blocks in compressed volumes that are 2
POC Recommendations
 Workload recommendations
− Write block size should be 16k or smaller and contain a high
percentage of random writes
− Read block size can be any size and contain a high percentage
of random reads
 System recommendations
− 20% CPU and memory headroom prior to deployment
− If the working set size fits completely in the cache you can look to
reduce spindle count or use slower drives
− If you are unsure or the working set size is larger than the cache
capacity, do not change media type or reduce spindle count more
than 30%, unless you are willing to experience reduced response
times/throughput on cache misses

3
Sizing and Analysis

31
Sizing Flash Pool Configurations
 It is imperative that the working set size fits into the
available Flash Pool caching capacity
− Reductions in HDD drive count are based on the working
set fitting into the cache capacity
− If you overrun the cache capacity then you’ll become more
dependent on the HDD configuration
− While the cache is initially warmed, the configuration will
be dependent on the HDD

 Flash Pool cache capacity = (SSD data drives * usable


SSD capacity) * 0.75

3
Sizing Flash Pool Configurations
System Performance Modeler (SPM)
 It is critical to understand the workload characteristics
− Read to write mix
− Random to sequential mix
− Transfer size
− Working set size
− Rate of change in the working set
 Understanding these factors will aid in determining per
volume cache policies, cache size, and likelihood of
encountering small block random overwrites
3
Predictive Cache Statistics
PCS cannot be used if Flash Cache or Flash Pool already
exist on the system
 Flash Pool caching estimation
− Assume +10% for the actual Flash Pool % replaced
− Set PCS capacity size to 75% of the actual SSD cache
(data drives) being implemented
− In workloads where there are heavy random overwrites
PCS is not as effective as it only accounts for reads,
therefore adjust the results down by 10-20%

34
Monitoring Performance
 A new preset for Flash Pool exists for the
“stats show” command
− “stats show –p hybrid_aggr”
 Debug level command
 Displays statistics specific to Flash Pool
 Detailed information available in TR4070

35
Summary

36
Key Takeaways
 Size the SSD cache capacity to fit the
working set size

 Flash Pool is most effective in highly random


environments containing small block operations

 Use System Performance Modeler (SPM)


to size Flash Pool configurations

 Data ONTAP is optimized for writes –


Flash Pool does not accelerate writes or
cache all random write operations!
3
Resources
The following resources are available on the Field Portal

 Flash Pool Technical FAQ

 NetApp Flash Pool SE Technical Presentation

 TR-4070: Flash Pool Design and Implementation Guide

 Flash Pool Datasheet

 PB: Pure SSD and Flash Pool with DS424X disk shelf

3
© 2012 NetApp, Inc. All rights reserved. No portions of this document may be reproduced
without prior written consent of NetApp, Inc. Specifications are subject to change without notice.
NetApp, the NetApp logo, and Go further, faster, are trademarks or registered trademarks of
NetApp, Inc. in the United States and/or other countries. All other brands or products are
trademarks or registered trademarks of their respective holders and should be treated as such.
Additional Slides

40
Configuration
 Creating a Flash Pool
− The HDD 64-bit aggregate must already exist
− Two-step process:
1. Set the aggregate option for Flash Pool
– 7-Mode: aggr options <aggr_name>
hybrid_enabled on
– Cluster-Mode: storage aggregate modify
-aggregate <aggr_name> -hybrid-enabled true
2. Add SSDs into a new RAID group
 SSDs RAID groups cannot be removed once added (you
must destroy the aggregate to repurpose the drives)
4
Expanding SSD RGs
 A recommended minimum number of SSDs should used to
expand any existing Flash Pool SSD RAID group(s)

Growth Increment
System Family
(SSD Data Drives)
Entry-level Systems 1
Midrange Systems 3
High-end Systems 6

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy