PowerScale+Concepts SSP+ +Participant+Guide
PowerScale+Concepts SSP+ +Participant+Guide
PowerScale+Concepts SSP+ +Participant+Guide
CONCEPTS
PARTICIPANT GUIDE
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts
PowerScale Concepts-SSP
Prerequisite Skills
Get familiar with the content and successfully complete this course, a student must
have a suitable knowledge base or skill set. The student must be familiar with:
• Networking fundamentals such as TCP/IP, DNS and Routing.
• An introduction to storage such as NAS and SAN, differences and basic storage
principles and features.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Storage Evolution
The following tabs show the evolution of data storage over the past several years.
The first general-purpose system became operational in 1946. It was called the
Electronic Numerical Integrator and Computer (ENIAC). ENIAC used more than
17,000 vacuum tubes and 70,000 resistors to hold a 10-digit decimal number in its
memory. The data was output as punch cards, a format that IBM continued to use
until the early 1960s.
In the 1960s, magnetic tapes eclipsed punch cards as the way to store corporate
system data.
PowerScale Concepts-SSP
During the mid 1960s, magnetic tapes gave way to the hard disk drive. The first
hard drive was the size of two refrigerators, required fifty disks to store less than
four MB of data, and was called the IBM 3330.
Less than 30 years after two refrigerator sized units stored less than four MB, the
average consumer could store about one-third of that amount on a three-and-a-
half-inch plastic disk.
During the 1980s, the personal computer revolution introduced miniaturization. This
revolution bought a wide array of storage form factors.
PowerScale Concepts-SSP
The USB (Universal Serial Bus) was introduced and today, a commercial USB flash
drive can store as much as 1 TB.
PowerScale Concepts-SSP
During the data storage evolution, the two types of data developed were structured
data and unstructured data1. PowerScale specializes in storing unstructured data.
For example: records or files, census For example: photos, documents, and
records, economic catalogs, phone presentations.
director(ies), customer contact records,
and spreadsheets.
PowerScale Concepts-SSP
The following table shows the difference between Block based data and File based
data.
Best for high I/O and low latency. Too large for database apps and high I/O.
PowerScale Concepts-SSP
Digital Transformation
PowerScale Concepts-SSP
With unstructured data being the majority of data storage growth, a solution was
needed.
An International Data Corporation (IDC) study that was published in 2018 showed
that the amount of digital data created, captured, and replicated worldwide grew
exponentially. This finding was based on the proliferation of then-new technologies
such as voice over IP, RFID, smart phones, and consumer use of GPS. Also, the
continuance of data generators such as digital cameras, HD TV broadcasts, digital
games, ATMs, email, video conferencing, and medical imaging added to data
growth.
PowerScale Concepts-SSP
PowerScale clusters are a NAS solution. There are two types of NAS architectures:
scale-up and scale out.
Scale-Up
PowerScale Concepts-SSP
• A scale-up solution5 has controllers that connect to trays of disks and provide
computational throughput.
• Traditional NAS is great for specific types of workflows, especially those
applications that require block-level access.
Scale-Out
5The two controllers can run active/active or active-passive. For more capacity,
add another disk array. Each of these components is added individually. As more
systems are added, NAS sprawl becomes an issue.
PowerScale Concepts-SSP
• With a clustered NAS solution, or scale-out architecture, all the NAS boxes, or
PowerScale nodes, belong to a unified cluster with a single point of
management.
• In a scale-out solution6, the computational throughput, disks, disk protection,
and management are combined and exist for a single cluster.
6Not all clustered NAS solutions are the same. Some vendors overlay a
management interface across multiple independent NAS boxes. This gives a
unified management interface but does not unify the file system. While this
approach does ease the management overhead of traditional NAS, it still does not
scale well.
PowerScale Concepts-SSP
Scale-out NAS
Scale-out NAS7 is now a mainstay in most data center environments. The next
wave of scale-out NAS innovation has enterprises embracing the value8 of NAS
and adopting it as the core of their infrastructure.
tolerance attitude toward data loss and data unavailable situations and support for
features to simplify management. Organizations see massive scale and
performance with smaller data center rack footprints than the performance-centric
workloads drives.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
DAS
RAID
DAS
• In the early days of system data, corporations stored data on hard drives in a
server. The intellectual property of the company depended entirely on the
continuous functionality of the hard drive. Thus, to minimize risk, corporations
mirrored the data on a Redundant Arrays of Inexpensive Disks Controller
(RAID).
• RAID disks were directly attached to a server so that the server thought the
hard drives were part of it. This technique is called Direct Attached Storage
(DAS).
SAN
SAN
PowerScale Concepts-SSP
• As applications proliferated, soon there were many servers, each with its own
DAS. This worked fine, with some drawbacks.
• If one servers DAS was full while another servers DAS was empty, the empty
DAS could not share its space with the full DAS. Due to this limitation with DAS,
SAN was introduced which effectively used volume manager and RAID.
NAS
NAS
9 PCs worked differently from the storage file server and the network
communications. It only communicates from one file system to another file system.
10 The volume manager does not know how the data is protected because that is
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Big data refers to the huge volume of digital information generated by various
businesses. It is collection of data at an enterprise scale and amount of data that
exceeds a petabyte—one million gigabytes.
The “Three v's”– volume, velocity, and variety often arrive together. When they
combine, administrators truly feel the need for high performance, higher capacity
storage. The three v's generate the challenges of managing big data.
Growing data has also forced an evolution in storage architecture over the years
due to the amount of maintained data. PowerScale is a big data solution because it
can handle the volume, velocity, and variety that defines the fundamentals of big
data.
1 2 3
1: Challenge: Nonflexible data protection. When you have big data volumes of
information to store, it had better be there, dependably. If an organization relies on
RAID to protect against data loss or corruption, the failure of a single disk drive
causes a disproportionate inconvenience. The most popular RAID implementation
scheme allows the failure of only two drives before data loss. (A sizable, big data
installation easily has more than 1000 individual hard drives, so odds are at least
one drive is down at any time.) The simpler answer is to protect data using a
different scheme.
PowerScale Concepts-SSP
What is meant by volume? Consider any global website that works at scale. One
example of big data volume is the YouTube press page that says YouTube ingests
100 hours of video every minute.
3: The best example of variety is the migration of the world to social media. On a
platform such as Facebook, people post all kinds of file formats: text, photos, video,
polls, and more. Many kinds of data at that scale represent big data variety.
PowerScale Concepts-SSP
The table shows the challenges along with its corresponding solutions.
PowerScale Concepts-SSP
The table shows the challenges along with its corresponding solutions.
13 Some data storage architectures use two controllers, sometimes called servers
or filers, to run a stack of many hard drives. You can scale capacity by adding more
hard drives, but it is difficult to scale performance. In a given storage stack, the
hard drives offer nothing but capacity. All the intelligence of the system, including
computer processing and RAM, must come from the two filers. If the horsepower of
the two filers becomes insufficient, the architecture does not enable you to add
more filers. You start over with another stack and two more filers. In contrast, every
node in a PowerScale cluster contains capacity plus computing power plus
memory. The nodes work in parallel, so each node you add scales out linearly. In
other words, all aspects of the cluster scale up, including capacity and
performance.
14 Due to the architectural restrictions, SAN and scale-up NAS end up with several
isolated stacks of storage. Many sites have a different storage stack for each
application or department. A backup storage stack is an example. Instead, an
administrator has to manually arrange a data migration. If the R&D stack performs
product testing that generates results at big data velocity, the company may
establish an HPC stack, which could reach capacity rapidly. Other departments or
workflows may have independent storage stacks with extra capacity remaining, but
there is no automated way for R&D to offload their HPC overflow. In contrast, a
PowerScale cluster distributes data across all its nodes to keep them all at equal
capacity. You do not have one node that is overworked while other nodes sit idle.
There are no hot spots, and thus, no manual data migrations. If the goal is to keep
pace with big data velocity, automated balancing makes more sense.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
A scale-out data lake is a large storage solution where vast amounts of data from
other solutions or locations are combined into a single store.
PowerScale Concepts-SSP
PowerScale helps you unlock the structure within your data and to address the
challenges with unstructured data management. PowerScale is the next evolution
of OneFS – the operating system powering the scale-out NAS platform.
The PowerScale family includes Isilon nodes, PowerScale nodes and PowerScale
OneFS running across all of them. The software defined architecture of OneFS
gives you simplicity at scale, intelligent insights, and the ability to have any data
anywhere it needs to be – at the edge, core, or cloud.
PowerScale Concepts-SSP
A Data Lake17 is a central data repository that stores data from various sources,
such as file shares, web apps, and the cloud. It enables businesses to access the
same data for various uses and enables the manipulation of data using various
clients, analyzers, and applications.
The data is real-time production data with no need to copy or move it from an
external source, like another Hadoop cluster, into the Data Lake.
17The Data Lake provides tiers that are based on data usage, and the ability to
instantly increase the storage capacity when needed.
PowerScale Concepts-SSP
CloudPools
The PowerScale CloudPools software enables you to select from various public
cloud services or use a private cloud. CloudPools offers the flexibility of another tier
of storage that is off-premises and off-cluster. CloudPools provide a lower TCO18
for archival-type data.
• Treats cloud storage as another cluster-connected tier.
• Policy-based automated tiering.
• Address rapid data growth and optimize data center storage resources - use
valuable on-site storage resources for active data.
• Send rarely used or accessed data to the cloud.
• Seamless integration with data – retrieves at any time.
• Data remains encrypted in the cloud until retrieval.
• Connect to ECS, another PowerScale cluster, Amazon S3, Virtustream,
Microsoft Azure, Google cloud, and Alibaba.
• Policies automatically move specified files to Cloud.
• CloudPools traffic optimization reduces the number of networks roundtrips when
recalling data from a CloudPools target.
• Customers who want to run their own internal clouds can use a PowerScale
installation as the core of their cloud.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Nodes
PowerScale Concepts-SSP
PowerScale has multiple servers that are called nodes, which combine to create a
cluster. Each cluster behaves as a single, central storage system. PowerScale is
designed for large volumes of unstructured data. The design goal for the
PowerScale nodes is to keep the simple ideology of NAS, provide the agility of the
cloud, and the cost of commodity.
The PowerScale family has different offerings that are based on the need for
performance and capacity. The modular architecture of these platforms enables
you to scale out compute and capacity separately. OneFS powers all the nodes.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Family
Click each tab to learn about the different offerings that Gen 6 family provides.
F-Series
The F-series nodes sit at the top of both performance and capacity with all-flash
arrays for ultracompute and high capacity. The all-flash platforms can accomplish
250-300k protocol operations per chassis and get 15 GB/s aggregate read
throughput from the chassis. Even when the cluster scales, the latency remains
predictable.
• F80019
• F81020
• F60021
• F20022
• F90023
19 The F800 is suitable for workflows that require extreme performance and
efficiency.
20 The F810 is suitable for workflows that require extreme performance and
efficiency. The F810 also provides high-speed inline data deduplication and inline
data compression. It delivers up to 3:1 efficiency, depending on your specific
dataset and workload.
21 Ideal for small, remote clusters with exceptional system performance for small
remote clusters.
23 Provides the maximum performance of all-NVMe storage in a cost-effective
PowerScale Concepts-SSP
H-Series
After the F-series nodes, next in terms of computing power are the H-series nodes.
These are hybrid storage platforms that are highly flexible and press a balance
between large capacity and high-performance storage to provide support for a
broad range of enterprise file workloads.
• H40026
• H50027
height and hosts 24 NVMe SSDs. It allows you to scale raw storage capacity from
46 TB to 368 TB per node and up to 93 PB of raw capacity per cluster. The F900
includes inline compression and deduplication. The minimum number of
PowerScale nodes per cluster is three while the maximum cluster size is 252
nodes. The F900 is best suited for Media and Entertainment 8 K, genomics,
algorithmic trading, artificial intelligence, machine learning, and HPC workloads.
24 1) Digital media: small and medium-size studios
2) Enterprise edge: remote and branch offices along with edge locations that
require high-performance local storage
3) Healthcare, Life Sciences: Genomics sequencing, digital pathology, small
hospitals, clinics
25 1) Digital media: 4 K, 8 K, broadcast, real-time streaming, and post-production.
wide range of file workloads. It delivers up to 3 GB/s bandwidth per chassis and
provides capacity options ranging from 120 TB to 720 TB per chassis.
27 The H500 is a versatile hybrid platform that delivers up to 5 GB/s bandwidth per
chassis with a capacity ranging from 120 TB to 720 TB per chassis. It is an ideal
choice for organizations looking to consolidate and support a broad range of file
workloads on a single platform.
PowerScale Concepts-SSP
• H560028
• H60029
• H700
• H7000
• General Use Cases30
A-Series
The A-series nodes namely have lesser compute power compared to other nodes
and are designed for data archival purposes. The archive platforms can be
combined with new or existing all-flash and hybrid storage systems into a single
cluster that provides an efficient tiered storage solution.
• A200
• A300
• A2000
28 The H5600 combines massive scalability – 960 TB per chassis and up to 8 GB/s
bandwidth in an efficient, highly dense, deep 4U chassis. The H5600 delivers inline
data compression and deduplication. It is designed to support a wide range of
demanding, large-scale file applications and workloads.
29 The H600 is Designed to provide high performance at value, delivers up to
120,000 IOPS and up to 12 GB/s bandwidth per chassis. It is ideal for high
performance computing (HPC) workloads that do not require the extreme
performance of all-flash.
30 1) Digital media: broadcast, real-time streaming, rendering, and post-
production2) Enterprise File Services: Home directories, File shares, group, and
project data3) Analytics: Big data analytics, Hadoop, and Splunk log analytics
PowerScale Concepts-SSP
• A3000
• General Use Cases31
Performance nodes
• PowerScale P10032
• PowerScale B10033
composed of nodes that are CPU-bound. They provide additional CPU horsepower
for compute bound applications and additional DRAM that can be used as L1
cache. The P100 nodes can also be part of a solution that targets a specific
workload to meet specific costs and performance targets. A single P100 node can
be added to a cluster and P100 nodes can be added in single node increments.
The P100 supports inline compression and deduplication.
33 Provides the ability to backup OneFS powered clusters using two-way NDMP
protocol. The B100 is delivered in a cost-effective form factor to address the SLA
targets and tape backup needs of a wide variety of workloads. Each node delivers
Fiber Channel ports that can connect directly to a tape subsystem or a Storage
Area Network (SAN). The B100 does not contain any local storage. A single B100
node can be added to a cluster and B100 nodes can be added in single node
increments. The B100 supports inline compression and deduplication.
PowerScale Concepts-SSP
The Isilon and PowerScale Gen 6 platforms are based on a proprietary architecture
that is designed by Dell Technologies. Gen 6 (old and new) requires a minimum of
four nodes to form a cluster. You must add nodes to the cluster in pairs. The
chassis holds four compute nodes and twenty drive sled slots. Both compute
modules in a node pair power-on immediately when one of the nodes connects to a
power source.
Rear View and Front View of an Isilon Gen 6 or PowerScale Gen 6 chassis.
1 10 9
2 8
4
6
3
5 7
1: The compute module bays of the two nodes make up one node pair. Scaling out
a cluster with Gen 6 nodes is done by adding more node pairs. You cannot mix
node types in the same node pair.
2: Each Gen 6 node provides two ports for front-end connectivity. The connectivity
options for clients and applications are 10 GbE, 25 GbE, 40 GbE, and 100 GbE.
3: Each node can have 1 or 2 SSDs that are used as L3 cache, global namespace
acceleration (GNA), or other SSD strategies.
4: Each Gen 6 node provides two ports for back-end connectivity. A Gen 6 node
supports 10 GbE, 25 GbE, 40 GbE, and InfiniBand.
5: Power supply unit - Peer node redundancy: When a compute module power
supply failure takes place, the power supply from the peer node temporarily
provides power to both nodes.
PowerScale Concepts-SSP
6: Each node has five drive sleds. Depending on the length of the chassis and type
of the drive, each node can handle up to 30 drives or as few as 15. A drive sled
must always have the same type of disk drive.
7: You cannot mix 2.5" and 3.5" drive sleds in a node. Disks in a sled are all the
same type.
8: The sled can be either a short sled or a long sled. The types are:
9: The chassis comes in two different depths, the normal depth is about 37 inches
and the deep chassis is about 40 inches.
10: Large journals offer flexibility in determining when data should be moved to the
disk. Each node has a dedicated M.2 vault drive for the journal. A node mirrors
their journal to its peer node. The node writes the journal contents to the vault when
a power loss occurs. A backup battery helps maintain power while data is stored in
the vault.
PowerScale Concepts-SSP
Graphic shows the front and rear view of a F200 or F600 node pool.
1
5
8 2
7 4
1: Scaling out an F200 or an F600 node pool only requires adding one node.
3: Each Gen F200 and F600 node provides two ports for backend connectivity. The
PCIe slot 1 is used.
4: Redundant power supply units - When a power supply fails, the secondary
power supply in the node provides power. Power is supplied to the system equally
from both PSUs when the Hot Spare feature is disabled. Hot Spare is configured
using the iDRAC settings.
5: Disks in a node are all the same type. Each F200 node has four SAS SSDs.
6: The nodes come in two different 1U models, the F200 and F600. You need
nodes of the same type to form a cluster.
7: The F200 front-end connectivity uses the rack network daughter card (rNDC).
PowerScale Concepts-SSP
2 1 3
5 8
14
7 9 10 11 12 13
1: Drive slots: Enable you to install drives. The F900 supports 24 x 2.5” front-
accessible, hot-plug SSDs that are secured by a removable front bezel.
2: Left control panel: Contains system health and system ID, and status LED.
3: Right control panel: Contains the power button, VGA port, iDRAC Direct
microUSB port, and USB 3.0 ports.
5: Back-end NIC: Two InfiniBand connections or dual port 100G NIC supporting
40G or 100G connection.
PowerScale Concepts-SSP
10: USB Port 3.0: It enables you to connect USB to the system.
11: rNDC: The NIC ports that are integrated on the network daughter card (NDC)
provide front-end network connectivity.
PowerScale Concepts-SSP
Node Interconnectivity
5
2
1: Backend ports int-a and int-b. The int-b port is the upper port. Gen 6 backend
ports are identical for InfiniBand and Ethernet, and cannot be identified by looking
at the node. If Gen 6 nodes are integrated into a Gen 5 or earlier cluster, the
backend uses InfiniBand. Note that there is a procedure to convert an InfiniBand
backend to Ethernet if the cluster no longer has pre-Gen 6 nodes.
2: PowerScale nodes with different backend speeds can connect to the same
backend switch and not see any performance issues. For example, an environment
has a mixed cluster where A200 nodes have 10 GbE backend ports and H600
nodes have 40 GbE backend ports. Both node types can connect to a 40 GbE
switch without effecting the performance of other nodes on the switch. The 40 GbE
switch provides 40 GbE to the H600 nodes and 10 GbE to the A200 nodes
3: Some nodes, such as archival nodes, must use all 10 GbE port bandwidth while
other workflows might need the full utilization of the 40 GbE port bandwidth. The
Ethernet performance is comparable to InfiniBand so there should be no
performance bottlenecks with mixed performance nodes in a single cluster.
Administrators should not see any performance differences if moving from
InfiniBand to Ethernet.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Quick Scalability
A PowerScale cluster expansion takes 60s. The primary purpose of NAS approach
is the scale-out part. An administrator can expand the storage by adding a new
node.
In PowerScale, once the node is racked and cabled, adding it to the cluster takes
just a few minutes. That is because OneFS policies will automatically discover the
node, set up addresses for the node, incorporate the node into the cluster, and
begin rebalancing capacity on all nodes to take advantage of the new space. The
node fully configures itself and when it is ready for new data writing it begins to take
data from other nodes to auto balance the entire cluster.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Networking Architecture
Network: There are two types of networks that are associated with a cluster:
internal and external.
Clients connect to the cluster using Ethernet connections35 that are available on all
nodes.
The complete cluster is combined with hardware, software, and networks in the
following view:
34 In general, keeping the network configuration simple provides the best results
with the lowest amount of administrative overhead. OneFS offers network
provisioning rules to automate the configuration of additional nodes as clusters
grow.
35 Because each node provides its own Ethernet ports, the amount of network
PowerScale Concepts-SSP
Ethernet
Layer Backend communication
Protocols: NFS, SMB, S3, (PowerScale internal)
HTTP, FTP, HDFS,
SWIFT
OneFS supports a single cluster36 on the internal network. This back-end network,
which is configured with redundant switches for high availability, acts as the
backplane for the cluster.37
PowerScale Concepts-SSP
PowerScale Concepts-SSP
External Network
The external network provides connectivity for clients over standard file-based
protocols. It supports link aggregation, and network scalability is provided through
software in OneFS. A Gen 6 node has to 2 front-end ports - 10 GigE, 25 GigE, or
40 GigE, and one 1 GigE port for management. Gen 6.5 nodes have 2 front-end
ports - 10 GigE, 25 GigE, or 100 GigE.
PowerScale Concepts-SSP
Interconnect
Back-end Network
InfiniBand
Ethernet
PowerScale Concepts-SSP
PowerScale Concepts-SSP
The clients can access the cluster using DNS, and the enhanced functionality38
provides connection distribution policies as shown in the graphic. Also, they provide
continuous availability39 (CA) capabilities.
3 1 2 4
2: Determines the average CPU utilization on each available network interface and
selects the network interface with lightest processor usage.
3: Selects the next available network interface on a rotating basis. This selection is
the default method. Without a SmartConnect license for advanced settings, this is
the only method available for load balancing.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
N + M Data Protection
OneFS sets parity bits, also called FEC protection. In the example below, using the
parity bit (green), OneFS determines the missing pieces.
Here, if blue + yellow = green, the missing pieces are identified using the parity
bits.
PowerScale Concepts-SSP
Forward error correction (FEC) is a technique used for controlling errors in data
transmission at high speeds. With FEC, the destination recognizes only the data
with no errors from the source that is sending redundant error correcting code with
the data frame.
FEC enables the customer to choose the number of bits of parity to implement.
One bit of parity for many disks is known as N+1; two parity points for many disks
are known as N+2, and so on.
With the N+1 protection, data is 100% available even if a drive or a node fails.
Failure
PowerScale Concepts-SSP
With N+2, N+3, and N+4 protection, data is 100% available if multiple drives or
nodes fail.
Failure Failure
Node 6 Node 8
Node 5 Node 7
PowerScale Concepts-SSP
During the write operation, with OneFS, the file from the client is striped across the
nodes. The system breaks the file-based data into smaller logical sections called
stripe units. The smallest element in a stripe unit is 8 KB and each stripe unit is 128
KB, or sixteen 8-kilobytes blocks.
If the data file is larger than 128 KB, the next part of the file is written to a second
node. If the file is larger than 256 KB, the third part is written to a third node, and so
on.
File
Stripe Unit
FEC unit
Node 3 Node 4
Node 1 Node 2
Leaf
Leaf
Spine
The graphic illustrates a 384-kilobyte file with three stripe units and one FEC unit.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Shown is a Gen 6 cluster that can scale out to 252 nodes with a single spine switch for each
backend network.
43 Creates an n-way, redundant fabric that scales as nodes are added to the
cluster, providing 100% data availability even with four simultaneous node failures
depending on cluster size.
PowerScale Concepts-SSP
44 Each node shares the management workload and acts independently as a point
of access for incoming data request.
45 When there is a large influx of simultaneous requests.
PowerScale Concepts-SSP
Benefits of OneFS
PowerScale Concepts-SSP
OneFS supports access to the same file using different protocols and
authentication methods simultaneously. SMB clients that authenticate using Active
Directory (AD), and NFS clients that authenticate using LDAP, can access the
same file with their appropriate permissions applied.
• OneFS translates Windows Security Identifiers (SIDs) and UNIX User Identities
(UIDs) into a common identity format.
• Different authentication sources.
• Permissions activities are transparent to client.
• Authenticate against correct source.
• File access behavior as protocol expects.
• Correct permissions applied - stores the appropriate permissions for each
identity or group.
PowerScale Concepts-SSP
Authentication
SSH
AD
• The primary reason for joining the cluster to an AD domain46 is to let the AD
domain controller perform user and group authentication.
• Active Directory can serve many functions, but the primary reason for joining
the cluster to an Active Directory domain is to perform user and group
authentication.
PowerScale Concepts-SSP
LDAP
NIS
File Provider
Local Provider
• The local provider provides authentication, and lookup facilities for user
accounts added by an administrator.
• OneFS supports local user and group authentication using the web
administration interface.
47 An advantage of LDAP is the open nature of its directory services and the ability
to use LDAP across many platforms.
PowerScale Concepts-SSP
Kerberos Provider
PowerScale Concepts-SSP
Policy-Based Automation
• Includes the way data is distributed across the cluster and on each node.
• Includes how client connections get distributed among the nodes, when and
how maintenance tasks are run.
PowerScale Concepts-SSP
Management Interfaces
• Serial Console48
• Web Administration Interface (WebUI)49
• Command Line Interface (CLI)50
• Platform Application Programming Interface (PAPI)51
• Front Panel Display52
48 The serial console is used for initial cluster configurations by establishing serial
access to the node designated as node 1.
49 The browser-based OneFS web administration interface provides secure access
manage the cluster. Access to the command-line interface is through a secure shell
(SSH) connection to any node in the cluster.
51 The PAPI is divided into two functional areas: one area enables cluster
PowerScale Concepts-SSP
Who is allowed to access and make configuration changes using the cluster
management tools? In addition to the integrated root and admin users, OneFS
provides role-based access control (RBAC). With RBAC, you can define privileges
to customize access to administration features in the OneFS WebUI, CLI, and for
PAPI management.
• Grant or deny access to management features.
• RBAC
• Set of global admin privileges
• Five preconfigured admin roles
• Zone RBAC (ZRBAC)
• Set of admin privileges specific to an access zone
• Two preconfigured admin roles
• Can create custom roles.
• Assign users to one or more roles.
PowerScale Concepts-SSP
If there is an issue with your cluster, there are two types of support available. You
can manually upload log files to the Dell Technologies support FTP site or use
Secure Remote Services.
• As needed.
• Support requests logfiles.
• Secure Remote Support
• Broader product support.
• Manual logfile uploads.
• 24x7 remote monitoring - node-by-node basis and sends alerts regarding
the health of devices.
• Allows remote cluster access - requires permission.
• Secure authentication with AES 256-bit encryption and RSA digital
certificates.
• Log files provide detailed information about the cluster activities.
• Remote session that is established through SSH or the WebUI - support
personnel can run scripts that gather diagnostic data about cluster settings and
operations. Data is sent to a secure FTP site where service professionals can
open support cases and troubleshoot on the cluster.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Data distribution is how OneFS spreads data across the cluster. Various models of
PowerScale nodes, or node types can be present in a cluster. Nodes are assigned
to node pools based on the model type, number of drives, and the size of the
drives.
The cluster can have multiple node pools, and groups of node pools can be
combined to form tiers of storage. Data is distributed among the different node
pools that are based on the highest percentage of available space. This means that
the data target can be a pool or a tier anywhere on the cluster.
• Node Pool53
• Policy Options54
• Tiers55
PowerScale Concepts-SSP
You can optimize data input and output to match the workflows for your business.
By default, optimization is managed cluster-wide, but you can manage individual
directories or individual files.The data access pattern can be optimized for random
PowerScale Concepts-SSP
56 Random disables prefetch for both data and metadata. Random is most useful
when the workload I/O is highly random. Using the Random setting mitigates cache
"pollution" caused by prefetching blocks into cache that are never read.
57 Sequential access enables aggressive prefetch on reads, increases the size of
file coalescers in the OneFS write cache, and changes the layout of files on disk
(uses more disks in the FEC stripes). Streaming is most useful in workloads that do
heavy sequential reads and writes.
58 Concurrency is a compromise between Streaming and Random. Concurrency
enables some prefetch to help sequential workloads, but not enough that the cache
gets "polluted" when the workload becomes more random. Concurrency is for
general-purpose use cases, good for most workload types or for mixed workload
environments. Concurrency also enables Adaptive Prefetch, whereby OneFS
attempts to adjust the amount of prefetch based on the characteristics of the
incoming I/O.
59 Prefetch proactively loads four, five, and sometimes even six minutes into
PowerScale Concepts-SSP
Performance optimization is the first thing a customer notices about their cluster in
day-to-day operations. But what does the average administrator notice second?60
Data protection level indicates how many components in a cluster can malfunction
without loss of data. The features are mentioned below.
• Flexible and configurable.
• Virtual hot spare - allocate disk space to hold data as it is rebuilt when a disk
drive fails.
• Select FEC protection by node pool, directory, or file.
• Extra protection creates more FEC stripes, increasing overhead.
• Standard functionality is available in the unlicensed version of SmartPools.
60They notice when a cluster has issues after they notice how great it works. They
want it to be fast, and they want it to work. That is why data protection is essential.
PowerScale Concepts-SSP
You can subdivide capacity usage by assigning storage quotas to users, groups,
and directories.
• Policy-based quota management.
• Nesting - place a quota on a department, and then a smaller quota on each
department user, and a different quota on the department file share.
• Thin provisioning - shows available storage even if capacity is not available.
• Quota types are:
• Accounting - informational only, can exceed quota.
• Enforcement soft limit - notification sent when exceeded
• Enforcement hard limit - deny writes.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts-SSP
InsightIQ is a powerful tool that monitors one or more clusters and then presents
data in a robust graphical interface with reports which can be exported. You can
examine the information and break out specific information you want, and even
take advantage of usage growth and prediction features.
InsightIQ offers:
• Monitor system usage - performance and file system analytics.
• Requires a server or VMware system external to cluster.
• Free InsightIQ license.
PowerScale Concepts-SSP
Each stripe is protected separately with forward error correction (FEC) protection
blocks, or parity.
• Protected at data stripe - one or two data or protection stripe units are contained
on a single node for any given data stripe.
• Striped across nodes.
• Variable protection levels - set separately for node pools, directories, or even
individual files.
• Set at node pool, directory, or file.
• High availability is integrated - data is spread onto many drives and multiple
nodes, all ready to help reassemble the data when a component fails.
PowerScale Concepts-SSP
Data resiliency is the ability to recover past versions of a file that has changed over
time. Eventually, every storage admin gets asked to roll back to a previous “known
good” version of a file. OneFS provides this capability using snapshots.
• File change rollback technology is called snapshots.
• Copy-on-write (CoW) - writes the original blocks to the snapshot version first,
and then writes the data to the file system, incurs a double write penalty but less
fragmentation.
• Redirect-on-write (RoW) - writes changes into available file system space and
then updates pointers to look at the recent changes, there is no double write
penalty but more fragmentation.
• Policy-based
• Scheduled snapshots
• Policies determine the snapshot schedule, path to the snapshot location,
and snapshot retention periods.
• Deletions happen as part of a scheduled job or are deleted manually.
• Out of order deletion allowed, but not recommended.
• Some system processes use with no license required.
• Full capability requires SnapshotIQ license.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Replication keeps a copy of data from one cluster to another cluster. OneFS
replicates during normal operations, from one PowerScale cluster to another.
Replication may be from one to one, or from one-to-many PowerScale clusters.
Cluster-to-cluster synchronization
Cluster-to-cluster synchronization:
• Copy - new files on the source are copied to the target, while files deleted on
the source remain unchanged on the target.
• Synchronization - only works in one direction and both the source and target
clusters maintain identical file sets, except those files on the target are read-
only.
Per directory or for specific types of data and can set exceptions to include or
exclude specific files.
• Manual start
• On schedule
• When changes made
PowerScale Concepts-SSP
Bandwidth throttling
Bandwidth throttling - used on replication jobs to optimize resources for high priority
workflows.
Source Target
PowerScale Concepts-SSP
Data Retention
Data retention is the ability to prevent data from being deleted or modified before
any future date. In OneFS, you can configure data retention at the directory level,
so that different directories can have different retention policies. You can also use
policies to automatically commit certain types of files for retention.
• Two modes of retention
• Enterprise (more flexible) - enable privileged deletes by an administrator.
• Compliance (more secure) - designed to meet SEC regulatory requirements.
Once data is committed to disk, individuals cannot change or delete the data
until the retention clock expires - OneFS prohibits clock changes.
• Compatible with SyncIQ replication.
• Requires SmartLock license.
PowerScale Concepts-SSP
A2000
The A2000 is an ideal solution for high-density, deep archive storage that
safeguards data efficiently for long-term retention. The A2000 stores up to 1280 TB
per chassis and scales to over 80 PB in a single cluster.
A300
An ideal active archive storage solution that combines high performance, near-
primary accessibility, value, and ease of use. The A300 provides between 120 TB
to 960 TB per chassis and scales to 60 PB in a single cluster. The A300 includes
inline compression and deduplication capabilities.
A3000
An ideal solution for high-performance, high-density, deep archive storage that
safeguards data efficiently for long-term retention. The A3000 stores up to 1280 TB
per chassis and scales to 80 PB in a single cluster. The A3000 includes inline
compression and deduplication capabilities.
H700
Provides maximum performance and value to support demanding file workloads.
The H700 provides capacity up to 960 TB per chassis. The H700 includes inline
compression and deduplication capabilities.
H7000
Provides versatile, high-performance, high-capacity hybrid platform with up to 1280
TB per chassis. The deep-chassis based H7000 is ideal to consolidate a range of
file workloads on a single platform. The H7000 includes inline compression and
deduplication capabilities.
PowerScale Concepts-SSP