PowerScale+Concepts SSP+ +Participant+Guide

POWERSCALE
CONCEPTS
PARTICIPANT GUIDE
PowerScale Concepts-SSP
© Copyright 2022 Dell Inc. Page i

Table of Contents
PowerScale Concepts ......................................................................................................... 1

Prerequisite Skills ............................................................................................................... 2
Data Storage Overview .............................................................................................. 3

Storage Evolution ................................................................................................................ 4
Types of Data Storage ........................................................................................................ 7
Block-Based Data and File-Based Data .............................................................................. 8
Digital Transformation ......................................................................................................... 9
Data Storage: Ever Changing and Ever Growing ...............................................................10
Two Types of NAS: Scale-up and Scale-out.......................................................................11
Scale-out NAS ...................................................................................................................14
From DAS to NAS ..............................................................................................................16
PowerScale and Big Data ........................................................................................ 19

What is Big Data - The Three v's ........................................................................................20
Big Data Challenges: Volume ............................................................................................22
Big Data Challenges: Velocity ............................................................................................23
Big Data Challenges: Variety .............................................................................................25
Big Data Positioning of PowerScale ...................................................................................26
OneFS: Scale-out Data Lake .............................................................................................27
CloudPools ........................................................................................................................28
PowerScale and Edge-to-Core-to-Cloud ............................................................................29
PowerScale Nodes ................................................................................................... 30

PowerScale Nodes Overview .............................................................................................31
PowerScale Family ............................................................................................................33
Gen 6 Hardware Components ............................................................................................37
PowerScale F200 and F600 Hardware Components..........................................................39
PowerScale F900 Hardware Components .........................................................................40
Node Interconnectivity........................................................................................................42
Quick Scalability .................................................................................................................44
Self-Encrypting Drives (SEDs) ...........................................................................................45
Page ii © Copyright 2022 Dell Inc.

PowerScale Physical Architecture ......................................................................... 46
Networking Architecture .....................................................................................................47
External Network ................................................................................................................50
Interconnect .......................................................................................................................51
Enhanced Connection Management ..................................................................................53
N + M Data Protection ........................................................................................................55
FEC - Forward Error Correction .........................................................................................56
File Striping Example .........................................................................................................58
PowerScale OneFS Operating System ................................................................... 59

OneFS - Distributed Clustered File System ........................................................................60
Benefits of OneFS ..............................................................................................................62
Multiprotocol File Access....................................................................................................63
Authentication ....................................................................................................................64
Policy-Based Automation ...................................................................................................67
Management Interfaces .....................................................................................................68
Built-In Administration Roles ..............................................................................................69
Secure Remote Services....................................................................................................70
Data Management and Security .............................................................................. 71

Data Distribution Across Cluster.........................................................................................72
Data I/O Optimization .........................................................................................................73
Data Protection for Simultaneous Failures .........................................................................75
User Quotas for Capacity Management .............................................................................76
Deduplication for Data Efficiency........................................................................................78
Data Visibility and Analytics ...............................................................................................79
Data Integrity - FEC Protection ..........................................................................................80
Data Resiliency - Snapshots ..............................................................................................81
Data Recovery - Backup ....................................................................................................82
Data Recovery - Replication ...............................................................................................83
Data Retention ...................................................................................................................85
© Copyright 2022 Dell Inc. Page iii

Glossary .................................................................................................. 86
Page iv © Copyright 2022 Dell Inc.

Data Storage Overview
PowerScale Concepts
© Copyright 2022 Dell Inc. Page 1

Prerequisite Skills
Get familiar with the content and successfully complete this course, a student must
have a suitable knowledge base or skill set. The student must be familiar with:
• Networking fundamentals such as TCP/IP, DNS and Routing.
• An introduction to storage such as NAS and SAN, differences and basic storage
principles and features.
Page 2 © Copyright 2022 Dell Inc.


Storage Evolution
The following tabs show the evolution of data storage over the past several years.
IBM Punch Cards - 1940
The first general-purpose system became operational in 1946. It was called the
Electronic Numerical Integrator and Computer (ENIAC). ENIAC used more than
17,000 vacuum tubes and 70,000 resistors to hold a 10-digit decimal number in its
memory. The data was output as punch cards, a format that IBM continued to use
until the early 1960s.
Magnetic Tapes - 1965
In the 1960s, magnetic tapes eclipsed punch cards as the way to store corporate
system data.

IBM 3330 (First DAS) - 1970
During the mid 1960s, magnetic tapes gave way to the hard disk drive. The first
hard drive was the size of two refrigerators, required fifty disks to store less than
four MB of data, and was called the IBM 3330.
Floppy Disks - 1971
Less than 30 years after two refrigerator sized units stored less than four MB, the
average consumer could store about one-third of that amount on a three-and-a-
half-inch plastic disk.
Laptop Hard Drive - 1980
During the 1980s, the personal computer revolution introduced miniaturization. This
revolution bought a wide array of storage form factors.

USB Flash Drivers - 2006
The USB (Universal Serial Bus) was introduced and today, a commercial USB flash
drive can store as much as 1 TB.
Solid-State Drivers - Present
The evolution of data storage is a constant progress, improvement, and improved

efficiency paired with increased storage density.

Types of Data Storage
During the data storage evolution, the two types of data developed were structured
data and unstructured data1. PowerScale specializes in storing unstructured data.
Structured Data Unstructured Data
It resides in a fixed field of records or It does not reside in fixed model.

files.
It requires defined datatype, access, It does not exist in typical row or

and processes. column format.
It is most often in relational database. It is less often in relational database.
For example: records or files, census For example: photos, documents, and
records, economic catalogs, phone presentations.
director(ies), customer contact records,
and spreadsheets.
1 80-90% of digital data is unstructured.

Block-Based Data and File-Based Data
The following table shows the difference between Block based data and File based
data.
Block-based data File-based data
Sequence of bytes at fixed length. Discrete units of information that is

defined by application or created by user.
A single piece of file or whole file. Only useful as a complete file.
Best for high I/O and low latency. Too large for database apps and high I/O.
Associated with structured data. Associated with unstructured data.

Digital Transformation
Digital Transformation (DX) has become an ubiquitous component of every

organization strategic plan over the last few years.
• DX-related emerging technologies will have profound effects on the means of

production and will transform the way consumers interact with every
organization in the future.
• IDC2 projects, that through 2022, 75% of successful digital strategies will be
built by a transformed IT organization, with modernized and rationalized
infrastructure, applications, and data architectures.
• Within the next four years, the global economy will finally reach digital
supremacy, with more than half of Gross domestic product (GDP).3
2 IDC FutureScape: Worldwide IT Industry 2020 Predictions: October 2019, IDC

#US45599219
3 GDP is a monetary measure of the market value of all the final goods and
services that are produced in a specific time. Simultaneously, many organizations

still struggle to tactically apply DX learnings to their own business.

Data Storage: Ever Changing and Ever Growing
With unstructured data being the majority of data storage growth, a solution was
needed.
An International Data Corporation (IDC) study that was published in 2018 showed
that the amount of digital data created, captured, and replicated worldwide grew
exponentially. This finding was based on the proliferation of then-new technologies
such as voice over IP, RFID, smart phones, and consumer use of GPS. Also, the
continuance of data generators such as digital cameras, HD TV broadcasts, digital
games, ATMs, email, video conferencing, and medical imaging added to data
growth.

Two Types of NAS: Scale-up and Scale-out
PowerScale clusters are a NAS solution. There are two types of NAS architectures:
scale-up and scale out.
Scale-Up
• With a scale-up4 platform, if more storage is needed, another independent NAS

system is added to the network.
4Scale-up storage is the traditional architecture that is dominant in the enterprise

space. High performance, high availability single systems that have a fixed capacity
ceiling characterize scale-up.

• A scale-up solution5 has controllers that connect to trays of disks and provide
computational throughput.
• Traditional NAS is great for specific types of workflows, especially those
applications that require block-level access.
Scale-Out
5The two controllers can run active/active or active-passive. For more capacity,
add another disk array. Each of these components is added individually. As more
systems are added, NAS sprawl becomes an issue.

• With a clustered NAS solution, or scale-out architecture, all the NAS boxes, or
PowerScale nodes, belong to a unified cluster with a single point of
management.
• In a scale-out solution6, the computational throughput, disks, disk protection,
and management are combined and exist for a single cluster.
6Not all clustered NAS solutions are the same. Some vendors overlay a
management interface across multiple independent NAS boxes. This gives a
unified management interface but does not unify the file system. While this
approach does ease the management overhead of traditional NAS, it still does not
scale well.

Scale-out NAS
Scale-out NAS7 is now a mainstay in most data center environments. The next
wave of scale-out NAS innovation has enterprises embracing the value8 of NAS
and adopting it as the core of their infrastructure.
The unified software of the platform provides centralized web-based and

command-line administration to manage the following features:
• A cluster that runs a distributed file system.

• Scale-out nodes that add capacity and performance.
7 The PowerScale scale-out NAS storage platform combines modular hardware

with unified software to harness-unstructured data. Powered by the OneFS
operating system, a PowerScale cluster delivers a scalable pool of storage with a
global namespace.
8 Enterprises want to raise the standard on enterprise grade resilience, with a no
tolerance attitude toward data loss and data unavailable situations and support for
features to simplify management. Organizations see massive scale and
performance with smaller data center rack footprints than the performance-centric
workloads drives.

• Storage options that manage files and tiering.

• Flexible data protection and high availability.
• Software modules that control costs and optimize resources.

From DAS to NAS
The following tabs show the journey from DAS to NAS
DAS
RAID
DAS
• In the early days of system data, corporations stored data on hard drives in a
server. The intellectual property of the company depended entirely on the
continuous functionality of the hard drive. Thus, to minimize risk, corporations
mirrored the data on a Redundant Arrays of Inexpensive Disks Controller
(RAID).
• RAID disks were directly attached to a server so that the server thought the
hard drives were part of it. This technique is called Direct Attached Storage
(DAS).
SAN
Volume Manager RAID
SAN

• As applications proliferated, soon there were many servers, each with its own
DAS. This worked fine, with some drawbacks.
• If one servers DAS was full while another servers DAS was empty, the empty
DAS could not share its space with the full DAS. Due to this limitation with DAS,
SAN was introduced which effectively used volume manager and RAID.
NAS
File System Volume Manager RAID
NAS
• SAN was set up for servers, not personal computers (PCs).9

• The breakthrough came when corporations put employee systems on the
network and added to the storage a file system to communicate with users.
From this, Network Attached Storage (NAS) was born.
• NAS works well, but there is room for improvement. For example, the server is
spending as much time servicing employee requests as it is doing the
application work it was meant for.
• The file system does not know where data is supposed to go, because that is
the volume manager’s10 job.
• If high-value data needs more protection than other data, you must migrate the
data to a different volume that has the protection level that data needs. So,
there is an opportunity to improve NAS.
9 PCs worked differently from the storage file server and the network
communications. It only communicates from one file system to another file system.
10 The volume manager does not know how the data is protected because that is
the job of the RAID.


PowerScale and Big Data

What is Big Data - The Three v's
Big data refers to the huge volume of digital information generated by various
businesses. It is collection of data at an enterprise scale and amount of data that
exceeds a petabyte—one million gigabytes.
Big data consisting of extensive datasets primarily in the characteristics of volume,

velocity, and/or variability.
The “Three v's”– volume, velocity, and variety often arrive together. When they
combine, administrators truly feel the need for high performance, higher capacity
storage. The three v's generate the challenges of managing big data.
Growing data has also forced an evolution in storage architecture over the years
due to the amount of maintained data. PowerScale is a big data solution because it
can handle the volume, velocity, and variety that defines the fundamentals of big
data.
1 2 3
1: Challenge: Nonflexible data protection. When you have big data volumes of
information to store, it had better be there, dependably. If an organization relies on
RAID to protect against data loss or corruption, the failure of a single disk drive
causes a disproportionate inconvenience. The most popular RAID implementation
scheme allows the failure of only two drives before data loss. (A sizable, big data
installation easily has more than 1000 individual hard drives, so odds are at least
one drive is down at any time.) The simpler answer is to protect data using a
different scheme.

What is meant by volume? Consider any global website that works at scale. One
example of big data volume is the YouTube press page that says YouTube ingests
100 hours of video every minute.
2: Machine-generated workflows produce massive volumes of data. For example,

the longest stage of designing a system chip is physical verification. Where the chip
design is tested in every way to see not only if it works, but also if it works fast
enough. Each time researchers fire up a test on a graphics chip prototype, sensors
generate many terabytes of data per second. Storing terabytes of data in seconds
is an example of big data velocity.
3: The best example of variety is the migration of the world to social media. On a
platform such as Facebook, people post all kinds of file formats: text, photos, video,
polls, and more. Many kinds of data at that scale represent big data variety.

Big Data Challenges: Volume
The table shows the challenges along with its corresponding solutions.
Conventional Challenge PowerScale Solutions
Complex data architecture11 Single volume/single LUN
Low utilization of raw capacity12 High (80%+) utilization
Nonflexible data protection Scalable resiliency
11 Challenge: SAN and scale-up NAS data storage architectures encounter a

logical limit at 16 TBs. This means that no matter what volume of data arrives, a
storage administrator has to subdivide it into partitions smaller than 16 TB. The
smaller partitions cause silos of data. To simplify this challenge, scale-out NAS
such as a PowerScale cluster holds everything in one single volume with one LUN.
12 Challenge: SAN and scale-up NAS architectures must reserve much of the raw
capacity of the system for management and administrative overhead. Overhead

includes RAID parity disks, metadata for all the LUNs and mega LUNs, duplicate
copies of the file system, and so on. As a result, conventional SAN and NAS
architectures often use half of the raw capacity available, because of the headroom
for each separate stack of storage. Suppose that you have seven different silos of
data. When you put them in one large volume, you immediately get back the
headroom from six of the seven stacks. In that way, PowerScale offers high
utilization. PowerScale customers routinely use 80% or more of raw disk capacity.

Big Data Challenges: Velocity
The table shows the challenges along with its corresponding solutions.
Conventional Challenge PowerScale Solutions
Difficult to scale performance13 Linear scalability
Silos of data14 No hot spots
13 Some data storage architectures use two controllers, sometimes called servers
or filers, to run a stack of many hard drives. You can scale capacity by adding more
hard drives, but it is difficult to scale performance. In a given storage stack, the
hard drives offer nothing but capacity. All the intelligence of the system, including
computer processing and RAM, must come from the two filers. If the horsepower of
the two filers becomes insufficient, the architecture does not enable you to add
more filers. You start over with another stack and two more filers. In contrast, every
node in a PowerScale cluster contains capacity plus computing power plus
memory. The nodes work in parallel, so each node you add scales out linearly. In
other words, all aspects of the cluster scale up, including capacity and
performance.
14 Due to the architectural restrictions, SAN and scale-up NAS end up with several
isolated stacks of storage. Many sites have a different storage stack for each
application or department. A backup storage stack is an example. Instead, an
administrator has to manually arrange a data migration. If the R&D stack performs
product testing that generates results at big data velocity, the company may
establish an HPC stack, which could reach capacity rapidly. Other departments or
workflows may have independent storage stacks with extra capacity remaining, but
there is no automated way for R&D to offload their HPC overflow. In contrast, a
PowerScale cluster distributes data across all its nodes to keep them all at equal
capacity. You do not have one node that is overworked while other nodes sit idle.
There are no hot spots, and thus, no manual data migrations. If the goal is to keep
pace with big data velocity, automated balancing makes more sense.

Challenge15 Parallel processing
Many manual processes16 Policy-driven
15 In conventional storage, a file is typically confined to a RAID stripe. That means

that the maximum throughput of reading that file is limited to how fast those drives
can deliver the file. In modern workflows where a hundred engineers or a thousand
digital artists access a file, the RAID drives cannot keep up. Perhaps the two filers
on that stack cannot process that many requests efficiently. With PowerScale,
every node has at least a dozen drives, plus more RAM and more computer
processing, for more caching and better concurrent access. When there is heavy
demand for a file, several nodes can deliver it.
16 Besides manual data migrations, conventional storage has more manual
processes. A SAN or a scale-up NAS administrator spends a significant amount of

time creating and managing LUNs, partitioning storage, establishing mounts,
launching jobs, and so on. In contrast, PowerScale is policy-driven. Once you
define your policies, the cluster does the rest automatically.

Big Data Challenges: Variety
A scale-out data lake is a large storage solution where vast amounts of data from
other solutions or locations are combined into a single store.
Elements of a data lake are:

• Digital repository to store massive data.
• Variety of formats.
• Can do computations and analytics on original data.
• Helps address the variety issue with big data.
• Data can be secured, analyzed, and actions taken based on insights.
• Enterprises can eliminate the cost of having silos of information.
• Provides scaling capabilities in terms of capacity, performance, security, and
protection.

Big Data Positioning of PowerScale
PowerScale helps you unlock the structure within your data and to address the
challenges with unstructured data management. PowerScale is the next evolution
of OneFS – the operating system powering the scale-out NAS platform.
The PowerScale family includes Isilon nodes, PowerScale nodes and PowerScale
OneFS running across all of them. The software defined architecture of OneFS
gives you simplicity at scale, intelligent insights, and the ability to have any data
anywhere it needs to be – at the edge, core, or cloud.

OneFS: Scale-out Data Lake
This image shows the key characteristics of a scale-out Data Lake.
PowerScale is the industry's leading scale-out clustered storage solution. It

provides a single volume of data storage at a massive scale that is easy to use and
manage. It offers linear scalability and readiness for performance applications,
Hadoop analytics, and other workflows.
A Data Lake17 is a central data repository that stores data from various sources,
such as file shares, web apps, and the cloud. It enables businesses to access the
same data for various uses and enables the manipulation of data using various
clients, analyzers, and applications.
The data is real-time production data with no need to copy or move it from an
external source, like another Hadoop cluster, into the Data Lake.
17The Data Lake provides tiers that are based on data usage, and the ability to
instantly increase the storage capacity when needed.

CloudPools
The PowerScale CloudPools software enables you to select from various public
cloud services or use a private cloud. CloudPools offers the flexibility of another tier
of storage that is off-premises and off-cluster. CloudPools provide a lower TCO18
for archival-type data.
• Treats cloud storage as another cluster-connected tier.
• Policy-based automated tiering.
• Address rapid data growth and optimize data center storage resources - use
valuable on-site storage resources for active data.
• Send rarely used or accessed data to the cloud.
• Seamless integration with data – retrieves at any time.
• Data remains encrypted in the cloud until retrieval.
• Connect to ECS, another PowerScale cluster, Amazon S3, Virtustream,
Microsoft Azure, Google cloud, and Alibaba.
• Policies automatically move specified files to Cloud.
• CloudPools traffic optimization reduces the number of networks roundtrips when
recalling data from a CloudPools target.
• Customers who want to run their own internal clouds can use a PowerScale
installation as the core of their cloud.
18 CloudPools optimize primary storage with intelligent data placement. CloudPools

eliminates management complexity and enables a flexible choice of cloud
providers.

PowerScale and Edge-to-Core-to-Cloud
PowerScale can consolidate file-based, unstructured data into a data lake. It

eliminates costly storage silos, simplifies management, increases data protection,
and acquires more value from your data assets. With integrated multiprotocol
capabilities, PowerScale can support a wide range of traditional and next-
generation applications on a single platform. Support includes powerful big data
analytics that provide you with better insight and use of your stored information.
Edge locations generate data often collected in inefficient islands of storage,

running with limited IT resources, and inconsistent data protection practices. Data
at the edge lives outside of the Data Lake, making it difficult to incorporate into data
analytics projects. The edge-to-core-to-cloud approach extends the Data Lake to
edge locations and out into the cloud. It enables consolidation, protection,
management, and backups of remote edge location data.

PowerScale Nodes
PowerScale Nodes

PowerScale Nodes
PowerScale Nodes Overview
PowerScale has multiple servers that are called nodes, which combine to create a
cluster. Each cluster behaves as a single, central storage system. PowerScale is
designed for large volumes of unstructured data. The design goal for the
PowerScale nodes is to keep the simple ideology of NAS, provide the agility of the
cloud, and the cost of commodity.
The PowerScale family has different offerings that are based on the need for
performance and capacity. The modular architecture of these platforms enables
you to scale out compute and capacity separately. OneFS powers all the nodes.
Gen 6 (left) chassis and Gen 6 refresh (right)

chassis
One F200, one F600, and one F900 node.
Isilon Gen 6 PowerScale Gen 6 PowerScale All-Flash Series:

Platforms: Refresh (new): • F200
• Flash Nodes: • Hybrid: H700, • F600
F800, F810 H7000
• F900
• Hybrid Nodes: • Archive: A300,
H400, H500, A3000
H5600, H600
• Archive Nodes:
A200, A2000

PowerScale Nodes
Minimum of four 1U nodes form a Minimum of three nodes to form a

cluster in a single 4U chassis. cluster. Nodes can be added one at a
Nodes are added to the cluster as node time.
pairs which provides peer node F200 and F600 are 1U nodes while
redundancy. F900 is 2U.
PowerScale platforms classified based on release and branding. Left: Isilon Gen 6 Series, Middle:
PowerScale Gen 6 refresh, Right: PowerScale All-Flash series.

PowerScale Nodes
PowerScale Family
Click each tab to learn about the different offerings that Gen 6 family provides.
F-Series
The F-series nodes sit at the top of both performance and capacity with all-flash
arrays for ultracompute and high capacity. The all-flash platforms can accomplish
250-300k protocol operations per chassis and get 15 GB/s aggregate read
throughput from the chassis. Even when the cluster scales, the latency remains
predictable.
• F80019
• F81020
• F60021
• F20022
• F90023
19 The F800 is suitable for workflows that require extreme performance and
efficiency.
20 The F810 is suitable for workflows that require extreme performance and
efficiency. The F810 also provides high-speed inline data deduplication and inline
data compression. It delivers up to 3:1 efficiency, depending on your specific
dataset and workload.
21 Ideal for small, remote clusters with exceptional system performance for small
office/remote office-technical workloads.

22 Ideal for low-cost all-flash node pool for existing Gen6 clusters. Ideal for small,
remote clusters.
23 Provides the maximum performance of all-NVMe storage in a cost-effective
configuration to address the needs of demanding workloads. Each node is 2U in

PowerScale Nodes
• F200 and F600 General Use Cases24

• F900, F810, and F800 General Use Cases25
H-Series
After the F-series nodes, next in terms of computing power are the H-series nodes.
These are hybrid storage platforms that are highly flexible and press a balance
between large capacity and high-performance storage to provide support for a
broad range of enterprise file workloads.
• H40026
• H50027
height and hosts 24 NVMe SSDs. It allows you to scale raw storage capacity from
46 TB to 368 TB per node and up to 93 PB of raw capacity per cluster. The F900
includes inline compression and deduplication. The minimum number of
PowerScale nodes per cluster is three while the maximum cluster size is 252
nodes. The F900 is best suited for Media and Entertainment 8 K, genomics,
algorithmic trading, artificial intelligence, machine learning, and HPC workloads.
24 1) Digital media: small and medium-size studios
2) Enterprise edge: remote and branch offices along with edge locations that
require high-performance local storage
3) Healthcare, Life Sciences: Genomics sequencing, digital pathology, small
hospitals, clinics
25 1) Digital media: 4 K, 8 K, broadcast, real-time streaming, and post-production.
2) Electronic Design Automation: design, simulation, verification, and analysis of

electronic and mechanical systems design.
3) Life Sciences: genomics DNA and RNA sequencing.
26 The H400 provides a balance of performance, capacity, and value to support a
wide range of file workloads. It delivers up to 3 GB/s bandwidth per chassis and
provides capacity options ranging from 120 TB to 720 TB per chassis.
27 The H500 is a versatile hybrid platform that delivers up to 5 GB/s bandwidth per
chassis with a capacity ranging from 120 TB to 720 TB per chassis. It is an ideal
choice for organizations looking to consolidate and support a broad range of file
workloads on a single platform.

PowerScale Nodes
• H560028
• H60029
• H700
• H7000
• General Use Cases30
A-Series
The A-series nodes namely have lesser compute power compared to other nodes
and are designed for data archival purposes. The archive platforms can be
combined with new or existing all-flash and hybrid storage systems into a single
cluster that provides an efficient tiered storage solution.
• A200
• A300
• A2000
28 The H5600 combines massive scalability – 960 TB per chassis and up to 8 GB/s
bandwidth in an efficient, highly dense, deep 4U chassis. The H5600 delivers inline
data compression and deduplication. It is designed to support a wide range of
demanding, large-scale file applications and workloads.
29 The H600 is Designed to provide high performance at value, delivers up to
120,000 IOPS and up to 12 GB/s bandwidth per chassis. It is ideal for high
performance computing (HPC) workloads that do not require the extreme
performance of all-flash.
30 1) Digital media: broadcast, real-time streaming, rendering, and post-
production2) Enterprise File Services: Home directories, File shares, group, and
project data3) Analytics: Big data analytics, Hadoop, and Splunk log analytics

PowerScale Nodes
• A3000
• General Use Cases31
Performance nodes
• PowerScale P10032
• PowerScale B10033
31 1) Deep Archives: large-scale, archiving data storage

2) Disaster Recovery: disaster recovery target for organizations requiring a large-
capacity storage solution
3) File Archives: storage and access to reference data to meet business, regulatory
and legal requirements
32 It is a low-cost, value-based node that adds performance to a cluster that is
composed of nodes that are CPU-bound. They provide additional CPU horsepower
for compute bound applications and additional DRAM that can be used as L1
cache. The P100 nodes can also be part of a solution that targets a specific
workload to meet specific costs and performance targets. A single P100 node can
be added to a cluster and P100 nodes can be added in single node increments.
The P100 supports inline compression and deduplication.
33 Provides the ability to backup OneFS powered clusters using two-way NDMP
protocol. The B100 is delivered in a cost-effective form factor to address the SLA
targets and tape backup needs of a wide variety of workloads. Each node delivers
Fiber Channel ports that can connect directly to a tape subsystem or a Storage
Area Network (SAN). The B100 does not contain any local storage. A single B100
node can be added to a cluster and B100 nodes can be added in single node
increments. The B100 supports inline compression and deduplication.

PowerScale Nodes
Gen 6 Hardware Components
The Isilon and PowerScale Gen 6 platforms are based on a proprietary architecture
that is designed by Dell Technologies. Gen 6 (old and new) requires a minimum of
four nodes to form a cluster. You must add nodes to the cluster in pairs. The
chassis holds four compute nodes and twenty drive sled slots. Both compute
modules in a node pair power-on immediately when one of the nodes connects to a
power source.
Rear View and Front View of an Isilon Gen 6 or PowerScale Gen 6 chassis.
1 10 9
2 8
4
6
3
5 7
1: The compute module bays of the two nodes make up one node pair. Scaling out
a cluster with Gen 6 nodes is done by adding more node pairs. You cannot mix
node types in the same node pair.
2: Each Gen 6 node provides two ports for front-end connectivity. The connectivity
options for clients and applications are 10 GbE, 25 GbE, 40 GbE, and 100 GbE.
3: Each node can have 1 or 2 SSDs that are used as L3 cache, global namespace
acceleration (GNA), or other SSD strategies.
4: Each Gen 6 node provides two ports for back-end connectivity. A Gen 6 node
supports 10 GbE, 25 GbE, 40 GbE, and InfiniBand.
5: Power supply unit - Peer node redundancy: When a compute module power
supply failure takes place, the power supply from the peer node temporarily
provides power to both nodes.

PowerScale Nodes
6: Each node has five drive sleds. Depending on the length of the chassis and type
of the drive, each node can handle up to 30 drives or as few as 15. A drive sled
must always have the same type of disk drive.
7: You cannot mix 2.5" and 3.5" drive sleds in a node. Disks in a sled are all the
same type.
8: The sled can be either a short sled or a long sled. The types are:
• Long Sled - four drives of size 3.5"

• Short Sled - three drives of size 3.5"
• Short Sled - three or six drives of size 2.5"
9: The chassis comes in two different depths, the normal depth is about 37 inches
and the deep chassis is about 40 inches.
10: Large journals offer flexibility in determining when data should be moved to the
disk. Each node has a dedicated M.2 vault drive for the journal. A node mirrors
their journal to its peer node. The node writes the journal contents to the vault when
a power loss occurs. A backup battery helps maintain power while data is stored in
the vault.

PowerScale Nodes
PowerScale F200 and F600 Hardware Components
PowerScale All-Flash nodes require a minimum of three nodes to form a cluster.

You can add single nodes to the cluster. The F600 and F200 are a 1U form factor
and are based on the PowerEdge R640 architecture.
Graphic shows the front and rear view of a F200 or F600 node pool.
1
5
8 2
7 4
1: Scaling out an F200 or an F600 node pool only requires adding one node.
2: For front-end connectivity, the F600 uses the PCIe slot 3.
3: Each Gen F200 and F600 node provides two ports for backend connectivity. The
PCIe slot 1 is used.
4: Redundant power supply units - When a power supply fails, the secondary
power supply in the node provides power. Power is supplied to the system equally
from both PSUs when the Hot Spare feature is disabled. Hot Spare is configured
using the iDRAC settings.
5: Disks in a node are all the same type. Each F200 node has four SAS SSDs.
6: The nodes come in two different 1U models, the F200 and F600. You need
nodes of the same type to form a cluster.
7: The F200 front-end connectivity uses the rack network daughter card (rNDC).
8: Each F600 node has 8 NVMe SSDs.

PowerScale Nodes
PowerScale F900 Hardware Components
PowerScale All-Flash nodes require a minimum of three nodes to form a cluster.

You can add single nodes to the cluster. The F900 is a 2U form factor and is based
on the PowerEdge R740xd architecture.
Front View and Rear View of an F900 node pool.
2 1 3
5 8
14
7 9 10 11 12 13
1: Drive slots: Enable you to install drives. The F900 supports 24 x 2.5” front-
accessible, hot-plug SSDs that are secured by a removable front bezel.
2: Left control panel: Contains system health and system ID, and status LED.
3: Right control panel: Contains the power button, VGA port, iDRAC Direct
microUSB port, and USB 3.0 ports.
4: Information tag: It is a slide-out label panel that contains system information

such as Service Tag, NIC, MAC address, and so on.
5: Back-end NIC: Two InfiniBand connections or dual port 100G NIC supporting
40G or 100G connection.
6: PCI Extender - CPU1

PowerScale Nodes
7: iDRAC RJ45: Enables you to remotely access iDRAC.
8: PCI Extender - CPU2
9: VGA port: It enables you to connect a display device to the system.
10: USB Port 3.0: It enables you to connect USB to the system.
11: rNDC: The NIC ports that are integrated on the network daughter card (NDC)
provide front-end network connectivity.
12: Power Supply unit 1
13: Power Supply unit 2
14: Front-end NIC

PowerScale Nodes
Node Interconnectivity
PowerScale nodes can use either an InfiniBand or an Ethernet to turn on the

backend. InfiniBand was designed as a high-speed interconnect for high-
performance computing, and Ethernet provides the flexibility and high speeds that
sufficiently support the PowerScale internal communications.
5
2
1: Backend ports int-a and int-b. The int-b port is the upper port. Gen 6 backend
ports are identical for InfiniBand and Ethernet, and cannot be identified by looking
at the node. If Gen 6 nodes are integrated into a Gen 5 or earlier cluster, the
backend uses InfiniBand. Note that there is a procedure to convert an InfiniBand
backend to Ethernet if the cluster no longer has pre-Gen 6 nodes.
2: PowerScale nodes with different backend speeds can connect to the same
backend switch and not see any performance issues. For example, an environment
has a mixed cluster where A200 nodes have 10 GbE backend ports and H600
nodes have 40 GbE backend ports. Both node types can connect to a 40 GbE
switch without effecting the performance of other nodes on the switch. The 40 GbE
switch provides 40 GbE to the H600 nodes and 10 GbE to the A200 nodes
3: Some nodes, such as archival nodes, must use all 10 GbE port bandwidth while
other workflows might need the full utilization of the 40 GbE port bandwidth. The
Ethernet performance is comparable to InfiniBand so there should be no
performance bottlenecks with mixed performance nodes in a single cluster.
Administrators should not see any performance differences if moving from
InfiniBand to Ethernet.
4: F200 and F600 backend ports use the PCIe slot.

PowerScale Nodes
5: F900 backend ports use the PCIe slot.
Warning: Do not plug a backend Ethernet topology into a backend

InfiniBand NIC. If you plug Ethernet into the InfiniBand NIC, it
switches the backend NIC from one mode to the other and will not
come back to the same state.

PowerScale Nodes
Quick Scalability
A PowerScale cluster expansion takes 60s. The primary purpose of NAS approach
is the scale-out part. An administrator can expand the storage by adding a new
node.
In PowerScale, once the node is racked and cabled, adding it to the cluster takes
just a few minutes. That is because OneFS policies will automatically discover the
node, set up addresses for the node, incorporate the node into the cluster, and
begin rebalancing capacity on all nodes to take advantage of the new space. The
node fully configures itself and when it is ready for new data writing it begins to take
data from other nodes to auto balance the entire cluster.

PowerScale Nodes
Self-Encrypting Drives (SEDs)
• Data At Rest Encryption (DARE)

• The term “data at rest” sees any data sitting on your drives. DARE is used
for confidential or sensitive information.
• Self-Encrypting drives securely store confidential data over its lifetime.
• It is used in regulated verticals.
• Federal governments
• Financial services
• Healthcare (HIPPA)
• Self-Encrypting drives enable retirement of hardware without data compromise.
• PowerScale implements DARE using SEDs.

PowerScale Physical Architecture

Networking Architecture
OneFS supports standard network communication protocols IPv4 and IPv6.

PowerScale nodes include several external Ethernet connection options, providing
flexibility for a wide variety of network configurations34.
Network: There are two types of networks that are associated with a cluster:
internal and external.
Front-end, External Network
Clients connect to the cluster using Ethernet connections35 that are available on all
nodes.
The complete cluster is combined with hardware, software, and networks in the
following view:
34 In general, keeping the network configuration simple provides the best results
with the lowest amount of administrative overhead. OneFS offers network
provisioning rules to automate the configuration of additional nodes as clusters
grow.
35 Because each node provides its own Ethernet ports, the amount of network
bandwidth available to the cluster scales linearly.

PowerScale Storage Layer

Client/Application Layer
Ethernet
Ethernet
Layer Backend communication
Protocols: NFS, SMB, S3, (PowerScale internal)
HTTP, FTP, HDFS,
SWIFT
F200 cluster showing supported front-end protocols.
Back-end, Internal Network
OneFS supports a single cluster36 on the internal network. This back-end network,
which is configured with redundant switches for high availability, acts as the
backplane for the cluster.37
36 All intranode communication in a cluster is performed across a dedicated

backend network, consisting of either Ethernet or low-latency QDR InfiniBand (IB).
37 This enables each node to act as a contributor in the cluster and isolating node-
to-node communication to a private, high-speed, low-latency network. This back-

end network uses Internet Protocol (IP) for node-to-node communication.

Gen 6 chassis connecting to the back-end network.

External Network
Eight node Gen 6 cluster showing supported protocols.
The external network provides connectivity for clients over standard file-based
protocols. It supports link aggregation, and network scalability is provided through
software in OneFS. A Gen 6 node has to 2 front-end ports - 10 GigE, 25 GigE, or
40 GigE, and one 1 GigE port for management. Gen 6.5 nodes have 2 front-end
ports - 10 GigE, 25 GigE, or 100 GigE.
In the event of a Network Interface Controller (NIC) or connection failure, clients do

not lose their connection to the cluster. For stateful protocols, such as SMB and
NFSv4, this prevents client-side timeouts and unintended reconnection to another
node in the cluster. Instead, clients maintain their connection to the logical interface
and continue operating normally. OneFS supports Continuous Availability (CA) for
stateful protocols like SMB, and NFSv4 is supported.

Interconnect
Back-end Network
• The back-end network is a private PowerScale network that is used for

intercluster communication.
• It is a distributed connectivity.
• The back-end network supports redundancy for high availability.
• With OneFS 8.2 and later versions, the back-end network may have a leaf-
spine network. Leaf spine is a two level hierarchy where nodes connect to leaf
switches, and leaf switches connect to spine switches.
InfiniBand
• InfiniBand is a high-speed unmanaged fabric. It supports both Gen 5 and Gen 6

nodes.
• InfiniBand with Gen 6 nodes is only used when Gen 6 nodes are added to a
cluster that has, or had, older generation nodes.
• The InfiniBand switches are provided with PowerScale and they come with
range of sizes.
Ethernet
• An Ethernet back-end is a high speed managed fabric with limited monitoring

capability.
• Ethernet switches only support Gen 6 nodes.
• The minimum size of the switch is 24 ports.

Leaf and Spine
• Two level hierarchy.

• Cluster nodes connect to leaf switches which communicate with each other
using the spine switches.
• Switches are not interconnected - switches of the same type (leaf or spine) do
not connect to one another.
• Each leaf switch connects with all spine switches.
• All leaf switches have the same number of uplinks to the spine switches.

Enhanced Connection Management
The clients can access the cluster using DNS, and the enhanced functionality38
provides connection distribution policies as shown in the graphic. Also, they provide
continuous availability39 (CA) capabilities.
3 1 2 4
1: Determines the average throughput on each available network interface and

selects the network interface with the lowest network interface load.
2: Determines the average CPU utilization on each available network interface and
selects the network interface with lightest processor usage.
3: Selects the next available network interface on a rotating basis. This selection is
the default method. Without a SmartConnect license for advanced settings, this is
the only method available for load balancing.
38 The enhanced functionality includes continuous availability for SMBv3. This

feature enables SMBv3 with CA, and NFSv4 with CA can dynamically move to
another node in the event the node they are connected goes down.
39 The continuous availability feature applies to Microsoft Windows 8, Windows 10,
and Windows Server 2012 R2 clients. This feature is part of a non-disruptive

operation initiative of PowerScale to give customers more options for continuous
work and less down time. The CA option enables seamless movement from one
node to another and no manual intervention on the client side.

4: Determines the number of open TCP connections on each available network

interface and selects the network interface with the fewest client connections.

N + M Data Protection
OneFS sets parity bits, also called FEC protection. In the example below, using the
parity bit (green), OneFS determines the missing pieces.
Here, if blue + yellow = green, the missing pieces are identified using the parity
bits.

FEC - Forward Error Correction
Forward error correction (FEC) is a technique used for controlling errors in data
transmission at high speeds. With FEC, the destination recognizes only the data
with no errors from the source that is sending redundant error correcting code with
the data frame.
FEC enables the customer to choose the number of bits of parity to implement.
One bit of parity for many disks is known as N+1; two parity points for many disks
are known as N+2, and so on.
FEC with N+1 Protection
With the N+1 protection, data is 100% available even if a drive or a node fails.
Node 1 Node 2 Node 3 Node 4
Failure

FEC with N+2, N+3, and N+4 Protection
With N+2, N+3, and N+4 protection, data is 100% available if multiple drives or
nodes fail.
Node 2 Node 3 Node 4

Node 1
Failure Failure
Node 6 Node 8
Node 5 Node 7

File Striping Example
During the write operation, with OneFS, the file from the client is striped across the
nodes. The system breaks the file-based data into smaller logical sections called
stripe units. The smallest element in a stripe unit is 8 KB and each stripe unit is 128
KB, or sixteen 8-kilobytes blocks.
If the data file is larger than 128 KB, the next part of the file is written to a second
node. If the file is larger than 256 KB, the third part is written to a third node, and so
on.
File
Stripe Unit
FEC unit
Node 3 Node 4
Node 1 Node 2
Leaf
Leaf
Spine
The graphic illustrates a 384-kilobyte file with three stripe units and one FEC unit.

PowerScale OneFS Operating System

OneFS - Distributed Clustered File System
Shown is a Gen 6 cluster that can scale out to 252 nodes with a single spine switch for each
backend network.
The key to PowerScale scale-out NAS is the architecture of OneFS.

• OneFS runs on all nodes. Nodes work as peers.40
• Grows dynamically41.
• Supports variable fault tolerance levels
• Reed-Solomon FEC.42
• FlexProtect43.
40 No primary node that controls the cluster.

41 When nodes are added, OneFS redistributes the content to use the resources of
the entire cluster.

42 As the system writes the data, it also protects the data.
43 Creates an n-way, redundant fabric that scales as nodes are added to the
cluster, providing 100% data availability even with four simultaneous node failures
depending on cluster size.

• Runs on all nodes. Each node is a peer.44

• Prevents bottlenecking45.
• A copy of OneFS is on every cluster node.
• 10 GbE, 40 GbE (Gen 6): 10 GbE, 25 GbE, 100 GbE (PowerScale All-Flash
and PowerScale Gen 6) and InfiniBand handle all intracluster communications.
44 Each node shares the management workload and acts independently as a point
of access for incoming data request.
45 When there is a large influx of simultaneous requests.

Benefits of OneFS
The OneFS architecture is designed to optimize processes and applications across

the cluster.
• Concurrency- When a node is added to the cluster, it adds computing power,
storage, caching, and networking resources.
• Shared infrastructure
• Access to resources on any node in the cluster from any other node in the
cluster.
• Performance benefits of parallel processing.
• Improved utilization of resources - compute, disk, memory, networking.
• Because all nodes work together, the more nodes, the cluster gets more
powerful.

Multiprotocol File Access
OneFS supports access to the same file using different protocols and
authentication methods simultaneously. SMB clients that authenticate using Active
Directory (AD), and NFS clients that authenticate using LDAP, can access the
same file with their appropriate permissions applied.
• OneFS translates Windows Security Identifiers (SIDs) and UNIX User Identities
(UIDs) into a common identity format.
• Different authentication sources.
• Permissions activities are transparent to client.
• Authenticate against correct source.
• File access behavior as protocol expects.
• Correct permissions applied - stores the appropriate permissions for each
identity or group.
• File and directory permissions

• User and group identities

Authentication
Authentication services offer a layer of security by verifying user credentials before

allowing access to the files. Authentication answers the question, “Are you really
who you say you are?”
Ensure that interactions between authentication types are understood before

enabling multiple methods on the cluster.
The tabs below provide information about different authentication methods:
SSH
SSH multifactor authentication supported.
AD
• The primary reason for joining the cluster to an AD domain46 is to let the AD
domain controller perform user and group authentication.
• Active Directory can serve many functions, but the primary reason for joining
the cluster to an Active Directory domain is to perform user and group
authentication.
46Active Directory is a Microsoft implementation of Lightweight Directory Access

Protocol (LDAP), Kerberos, and DNS technologies that can store information about
network resources.

LDAP
• The Lightweight Directory Access Protocol (LDAP)47 is a networking protocol

that enables you to define, query, and modify directory services and resources.
• OneFS can authenticate users and groups against an LDAP repository to grant
them access to the cluster.
NIS
• The Network Information Service (NIS) provides authentication and identity

uniformity across local area networks.
• OneFS includes a NIS authentication provider that enables you to integrate the
cluster with the NIS infrastructure.
• NIS which is a Sun Microsystem directory access protocol, can authenticate
users and groups when they access the cluster.
File Provider
• A file provider enables you to supply an authoritative third-party source of user

and group information to a PowerScale cluster.
• A third-party source is useful in UNIX and Linux environments that synchronize
the /etc/passwd, /etc/group, and etc/netgroup files across multiple servers.
Local Provider
• The local provider provides authentication, and lookup facilities for user
accounts added by an administrator.
• OneFS supports local user and group authentication using the web
administration interface.
47 An advantage of LDAP is the open nature of its directory services and the ability
to use LDAP across many platforms.

Kerberos Provider
• Kerberos is a network authentication provider that negotiates encryption tickets

for securing a connection.
• OneFS supports Microsoft Kerberos and MIT Kerberos authentication providers
on a cluster. If you configure an Active Directory provider, support for Microsoft
Kerberos authentication is provided automatically.
• MIT Kerberos works independently of Active Directory.

Policy-Based Automation
To run functions, OneFS creates automated policies.

• Automated policies make processes repeatable, decreasing the time spent
manually managing the cluster.
• Policies managed throughout the cluster - a change to the configuration is a
change to the configuration on every node in the cluster.
• Executes policies as a cohesive system.
• Policies drive every process.
• Includes the way data is distributed across the cluster and on each node.
• Includes how client connections get distributed among the nodes, when and
how maintenance tasks are run.

Management Interfaces
The OneFS management interface is used to perform various administrative and

management tasks on the PowerScale cluster and nodes. Management capabilities
vary based on which interface is used. The different types of management
interfaces in OneFS are:
• Serial Console48
• Web Administration Interface (WebUI)49
• Command Line Interface (CLI)50
• Platform Application Programming Interface (PAPI)51
• Front Panel Display52
48 The serial console is used for initial cluster configurations by establishing serial
access to the node designated as node 1.
49 The browser-based OneFS web administration interface provides secure access
with OneFS-supported browsers. This interface is used to view robust graphical

monitoring displays and to perform cluster-management tasks.
50 The command-line interface runs isi commands to configure, monitor, and
manage the cluster. Access to the command-line interface is through a secure shell
(SSH) connection to any node in the cluster.
51 The PAPI is divided into two functional areas: one area enables cluster
configuration, management, and monitoring functionality, and the other area

enables operations on files and directories on the cluster.
52 The Front Panel Display is the physical node or chassis. It is used to perform
basic administrative tasks onsite.

Built-In Administration Roles
Who is allowed to access and make configuration changes using the cluster
management tools? In addition to the integrated root and admin users, OneFS
provides role-based access control (RBAC). With RBAC, you can define privileges
to customize access to administration features in the OneFS WebUI, CLI, and for
PAPI management.
• Grant or deny access to management features.
• RBAC
• Set of global admin privileges
• Five preconfigured admin roles
• Zone RBAC (ZRBAC)
• Set of admin privileges specific to an access zone
• Two preconfigured admin roles
• Can create custom roles.
• Assign users to one or more roles.

Secure Remote Services
If there is an issue with your cluster, there are two types of support available. You
can manually upload log files to the Dell Technologies support FTP site or use
Secure Remote Services.
• Manually FTP upload logfiles
• As needed.
• Support requests logfiles.
• Secure Remote Support
• Broader product support.
• Manual logfile uploads.
• 24x7 remote monitoring - node-by-node basis and sends alerts regarding
the health of devices.
• Allows remote cluster access - requires permission.
• Secure authentication with AES 256-bit encryption and RSA digital
certificates.
• Log files provide detailed information about the cluster activities.
• Remote session that is established through SSH or the WebUI - support
personnel can run scripts that gather diagnostic data about cluster settings and
operations. Data is sent to a secure FTP site where service professionals can
open support cases and troubleshoot on the cluster.

Data Management and Security

Data Distribution Across Cluster
Data distribution is how OneFS spreads data across the cluster. Various models of
PowerScale nodes, or node types can be present in a cluster. Nodes are assigned
to node pools based on the model type, number of drives, and the size of the
drives.
The cluster can have multiple node pools, and groups of node pools can be
combined to form tiers of storage. Data is distributed among the different node
pools that are based on the highest percentage of available space. This means that
the data target can be a pool or a tier anywhere on the cluster.
• Node Pool53
• Policy Options54
• Tiers55
53 - Nodes assigned based on type.

- Function as single data target location.
54 - Default Policy: write anywhere
- Data goes to node pools having more space

55 - Group of node pools
- Can write data to tier

- if SmartPools is licensed, data that is written to specific node pool.

Data I/O Optimization
You can optimize data input and output to match the workflows for your business.
By default, optimization is managed cluster-wide, but you can manage individual
directories or individual files.The data access pattern can be optimized for random

access56, sequential access57, or concurrent access58. For example, sequential

optimization has aggressive prefetching.
Prefetch,59 or read ahead, is an optimization algorithm that attempts to predict what

data is needed next before the request is made. When clients open larger files,
especially streaming formats like video and audio, OneFS assumes that you will
watch minute four of the video after minute three.
56 Random disables prefetch for both data and metadata. Random is most useful
when the workload I/O is highly random. Using the Random setting mitigates cache
"pollution" caused by prefetching blocks into cache that are never read.
57 Sequential access enables aggressive prefetch on reads, increases the size of
file coalescers in the OneFS write cache, and changes the layout of files on disk
(uses more disks in the FEC stripes). Streaming is most useful in workloads that do
heavy sequential reads and writes.
58 Concurrency is a compromise between Streaming and Random. Concurrency
enables some prefetch to help sequential workloads, but not enough that the cache
gets "polluted" when the workload becomes more random. Concurrency is for
general-purpose use cases, good for most workload types or for mixed workload
environments. Concurrency also enables Adaptive Prefetch, whereby OneFS
attempts to adjust the amount of prefetch based on the characteristics of the
incoming I/O.
59 Prefetch proactively loads four, five, and sometimes even six minutes into
memory before it is requested. Prefetch delivers those minutes faster than

returning to the hard drive for each request. With OneFS, you can configure the
prefetch cache characteristics to work best with the selected access pattern.

Data Protection for Simultaneous Failures
Performance optimization is the first thing a customer notices about their cluster in
day-to-day operations. But what does the average administrator notice second?60
Data protection level indicates how many components in a cluster can malfunction
without loss of data. The features are mentioned below.
• Flexible and configurable.
• Virtual hot spare - allocate disk space to hold data as it is rebuilt when a disk
drive fails.
• Select FEC protection by node pool, directory, or file.
• Extra protection creates more FEC stripes, increasing overhead.
• Standard functionality is available in the unlicensed version of SmartPools.
As an example, a research and development department has a node pool that is

dedicated to testing. Because the test data is not production data, the minimal N+1
protection is set. The customer database, however, is an asset. Customer data is
written to a different node pool set and to a higher level of protection such as, N+4.
60They notice when a cluster has issues after they notice how great it works. They
want it to be fast, and they want it to work. That is why data protection is essential.

User Quotas for Capacity Management
This given shows quotas configure for a particular directory
You can subdivide capacity usage by assigning storage quotas to users, groups,
and directories.
• Policy-based quota management.
• Nesting - place a quota on a department, and then a smaller quota on each
department user, and a different quota on the department file share.
• Thin provisioning - shows available storage even if capacity is not available.
• Quota types are:
• Accounting - informational only, can exceed quota.
• Enforcement soft limit - notification sent when exceeded
• Enforcement hard limit - deny writes.

• Customizable quota notifications.

• Requires SmartQuotas license.

Deduplication for Data Efficiency
Deduplication provides an automated way to increase storage efficiency. OneFS

finds duplicate sets of data blocks, and then stores only a single copy of any data
block that is duplicated.
• Consolidates duplicate data blocks.
• Post process - it analyzes data that is already stored.
• The block-level deduplication at the 8-K block level on files over 32 KB.
• Directory level granularity.
• Dry-run assessment tool - test drive.
• SmartDedupe license required.

Data Visibility and Analytics
InsightIQ is a powerful tool that monitors one or more clusters and then presents
data in a robust graphical interface with reports which can be exported. You can
examine the information and break out specific information you want, and even
take advantage of usage growth and prediction features.
InsightIQ offers:
• Monitor system usage - performance and file system analytics.
• Requires a server or VMware system external to cluster.
• Free InsightIQ license.
Powerful multi cluster monitoring tool. Graphical presentation with reporting

data.

Data Integrity - FEC Protection
Each stripe is protected separately with forward error correction (FEC) protection
blocks, or parity.
• Protected at data stripe - one or two data or protection stripe units are contained
on a single node for any given data stripe.
• Striped across nodes.
• Variable protection levels - set separately for node pools, directories, or even
individual files.
• Set at node pool, directory, or file.
• High availability is integrated - data is spread onto many drives and multiple
nodes, all ready to help reassemble the data when a component fails.

Data Resiliency - Snapshots
Data resiliency is the ability to recover past versions of a file that has changed over
time. Eventually, every storage admin gets asked to roll back to a previous “known
good” version of a file. OneFS provides this capability using snapshots.
• File change rollback technology is called snapshots.
• Copy-on-write (CoW) - writes the original blocks to the snapshot version first,
and then writes the data to the file system, incurs a double write penalty but less
fragmentation.
• Redirect-on-write (RoW) - writes changes into available file system space and
then updates pointers to look at the recent changes, there is no double write
penalty but more fragmentation.
• Policy-based
• Scheduled snapshots
• Policies determine the snapshot schedule, path to the snapshot location,
and snapshot retention periods.
• Deletions happen as part of a scheduled job or are deleted manually.
• Out of order deletion allowed, but not recommended.
• Some system processes use with no license required.
• Full capability requires SnapshotIQ license.

Data Recovery - Backup
Gen 6 with the fiber channel combo card.
PowerScale supports NDMP for integration with backup applications such as

Symantec, EMC, CommVault, and IBM. A backup application external to the cluster
manages the backup process.
• Backup is managed over the external network in one of two ways.
• Direct backup device over LAN - slower performance.
• B100 accelerator node provides the ability to backup OneFS powered
clusters using two-way NDMP protocol. It is delivered in a cost-effective form
factor to address the SLA targets and tape backup needs of a wide variety of
workloads.
• NDMP support comes standard.

Data Recovery - Replication
Replication keeps a copy of data from one cluster to another cluster. OneFS
replicates during normal operations, from one PowerScale cluster to another.
Replication may be from one to one, or from one-to-many PowerScale clusters.
Cluster-to-cluster synchronization
Cluster-to-cluster synchronization:
• Scheduled replication over LAN or WAN.

• PowerScale to PowerScale only.
• One-way replication.
Two replication types
The two types of replications are:
• Copy - new files on the source are copied to the target, while files deleted on
the source remain unchanged on the target.
• Synchronization - only works in one direction and both the source and target
clusters maintain identical file sets, except those files on the target are read-
only.
Policy-based synchronization jobs
Per directory or for specific types of data and can set exceptions to include or
exclude specific files.
• Manual start
• On schedule
• When changes made

Bandwidth throttling
Bandwidth throttling - used on replication jobs to optimize resources for high priority
workflows.
Source Target

Data Retention
Data retention is the ability to prevent data from being deleted or modified before
any future date. In OneFS, you can configure data retention at the directory level,
so that different directories can have different retention policies. You can also use
policies to automatically commit certain types of files for retention.
• Two modes of retention
• Enterprise (more flexible) - enable privileged deletes by an administrator.
• Compliance (more secure) - designed to meet SEC regulatory requirements.
Once data is committed to disk, individuals cannot change or delete the data
until the retention clock expires - OneFS prohibits clock changes.
• Compatible with SyncIQ replication.
• Requires SmartLock license.

Glossary
A200
The A200 is an ideal active archive storage solution that combines near-primary
accessibility, value, and ease of use. The A200 provides between 120 TB to 960
TB per chassis and scales to 60 PB in a single cluster.
A2000
The A2000 is an ideal solution for high-density, deep archive storage that
safeguards data efficiently for long-term retention. The A2000 stores up to 1280 TB
per chassis and scales to over 80 PB in a single cluster.
A300
An ideal active archive storage solution that combines high performance, near-
primary accessibility, value, and ease of use. The A300 provides between 120 TB
to 960 TB per chassis and scales to 60 PB in a single cluster. The A300 includes
inline compression and deduplication capabilities.
A3000
An ideal solution for high-performance, high-density, deep archive storage that
safeguards data efficiently for long-term retention. The A3000 stores up to 1280 TB
per chassis and scales to 80 PB in a single cluster. The A3000 includes inline
compression and deduplication capabilities.
H700
Provides maximum performance and value to support demanding file workloads.
The H700 provides capacity up to 960 TB per chassis. The H700 includes inline
compression and deduplication capabilities.
H7000
Provides versatile, high-performance, high-capacity hybrid platform with up to 1280
TB per chassis. The deep-chassis based H7000 is ideal to consolidate a range of
file workloads on a single platform. The H7000 includes inline compression and
deduplication capabilities.

PowerScale+Concepts SSP+ +Participant+Guide

Uploaded by

Copyright:

Available Formats

PowerScale+Concepts SSP+ +Participant+Guide

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PowerScale+Concepts SSP+ +Participant+Guide

Uploaded by

Copyright:

Available Formats

POWERSCALE

© Copyright 2022 Dell Inc. Page i

PowerScale Concepts ......................................................................................................... 1

Data Storage Overview .............................................................................................. 3

PowerScale and Big Data ........................................................................................ 19

PowerScale Nodes ................................................................................................... 30

Page ii © Copyright 2022 Dell Inc.

PowerScale OneFS Operating System ................................................................... 59

Data Management and Security .............................................................................. 71

© Copyright 2022 Dell Inc. Page iii

Page iv © Copyright 2022 Dell Inc.

© Copyright 2022 Dell Inc. Page 1

Page 2 © Copyright 2022 Dell Inc.

Data Storage Overview

© Copyright 2022 Dell Inc. Page 3

IBM Punch Cards - 1940

Magnetic Tapes - 1965

Page 4 © Copyright 2022 Dell Inc.

IBM 3330 (First DAS) - 1970

Floppy Disks - 1971

Laptop Hard Drive - 1980

© Copyright 2022 Dell Inc. Page 5

USB Flash Drivers - 2006

Solid-State Drivers - Present

The evolution of data storage is a constant progress, improvement, and improved

Page 6 © Copyright 2022 Dell Inc.

Types of Data Storage

Structured Data Unstructured Data

It resides in a fixed field of records or It does not reside in fixed model.

It requires defined datatype, access, It does not exist in typical row or

It is most often in relational database. It is less often in relational database.

1 80-90% of digital data is unstructured.

© Copyright 2022 Dell Inc. Page 7

Block-Based Data and File-Based Data

Block-based data File-based data

Sequence of bytes at fixed length. Discrete units of information that is

A single piece of file or whole file. Only useful as a complete file.

Associated with structured data. Associated with unstructured data.

Page 8 © Copyright 2022 Dell Inc.

Digital Transformation (DX) has become an ubiquitous component of every

• DX-related emerging technologies will have profound effects on the means of

2 IDC FutureScape: Worldwide IT Industry 2020 Predictions: October 2019, IDC

services that are produced in a specific time. Simultaneously, many organizations

© Copyright 2022 Dell Inc. Page 9

Data Storage: Ever Changing and Ever Growing

Page 10 © Copyright 2022 Dell Inc.

Two Types of NAS: Scale-up and Scale-out

• With a scale-up4 platform, if more storage is needed, another independent NAS

4Scale-up storage is the traditional architecture that is dominant in the enterprise

© Copyright 2022 Dell Inc. Page 11

Page 12 © Copyright 2022 Dell Inc.

© Copyright 2022 Dell Inc. Page 13

The unified software of the platform provides centralized web-based and

• A cluster that runs a distributed file system.

7 The PowerScale scale-out NAS storage platform combines modular hardware

Page 14 © Copyright 2022 Dell Inc.

• Storage options that manage files and tiering.

© Copyright 2022 Dell Inc. Page 15

From DAS to NAS

The following tabs show the journey from DAS to NAS