TSW03232ESES-SWGdefinedStorage IBM PDF
TSW03232ESES-SWGdefinedStorage IBM PDF
TSW03232ESES-SWGdefinedStorage IBM PDF
fined
Open the book and find:
De
any hardware with storage
controlled in software
Traditional storage systems have become
• Software defined storage
systems that meet your
business needs Softwa re
Storage
costly performance bottlenecks for enterprises
struggling with ever-growing data challenges. • Performance bottlenecks
that exist in your storage
This book explains how software defined storage
infrastructure
enables organizations to significantly reduce
their storage costs while improving performance, • Features and capabilities
reliability, and scalability with any hardware of IBM General Parallel
and intelligent software that performs essential File System (GPFS)
storage functions. • Turnkey software defined
storage solutions that are
• Increase flexibility — traditional storage ready to deploy
systems limit your options and lock you • Innovative IBM GPFS use
in to a rigid and unadaptable solution cases for complex storage
• Simplify management — automated challenges in different
policy-driven storage management makes industries
it easy to implement policies for information
life cycle management and other storage
administration tasks
Making Everything Easier!™ Learn to:
• Empower global collaboration — low
Go to Dummies.com® • Control storage costs
for videos, step-by-step examples,
latency access to data from anywhere how-to articles, or to shop! • Eliminate storage bottlenecks
in the world to enable innovation and
increase productivity • Use IBM GPFS to solve storage
management challenges
Lawrence C. Miller
CISSP
ISBN: 978-1-118-84178-5
Not for resale Scott Fadden
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Sof tware Def ined
Storage
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Software Defined Storage For Dummies®, IBM Platform Computing Edition
Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright © 2014 by John Wiley & Sons, Inc., Hoboken, New Jersey
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the
prior written permission of the Publisher. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, The Dummies Way, Dummies.com,
Making Everything Easier, and related trade dress are trademarks or registered trademarks of
John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries, and may not
be used without written permission. IBM, the IBM logo, and GPFS are trademarks or registered
trademarks of International Business Machines Corporation. All other trademarks are the property
of their respective owners. John Wiley & Sons, Inc., is not associated with any product or vendor
mentioned in this book.
For general information on our other products and services, or how to create a custom For
Dummies book for your business or organization, please contact our Business Development
Department in the U.S. at 877-409-4177, contact info@dummies.biz, or visit www.wiley.com/go/
custompub. For information about licensing the For Dummies brand for products or services, con-
tact BrandedRights&Licenses@Wiley.com.
ISBN: 978-1-118-84178-5 (pbk); ISBN: 978-1-118-84314-7 (ebk)
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2 1
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 1: Storage 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Data Access and Management Challenges.............................. 4
Three Important Functions of Storage..................................... 6
Defining Types of Storage.......................................................... 7
Hard Disk and SSD Technologies............................................ 13
Cluster File Systems................................................................. 15
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
T he rapid, accelerating growth of data, transactions, and
digitally aware devices is straining today’s IT infrastruc-
ture and operations. At the same time, storage costs are
increasing and user expectations and cost pressures are
rising. This staggering growth of data has led to the need for
high-performance streaming, data access, and collaborative
data sharing.
If you work with big data in the cloud or deal with structured
and unstructured data for analytics, you need software defined
storage. Software defined storage uses standard compute,
network, and storage hardware; the storage functions are all
done in software, such as IBM GPFS, that provides automated,
policy driven, application aware storage services, through
orchestration of the underlining storage infrastructure in
support of an overall software defined environment.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
2 Software Defined Storage For Dummies
Foolish Assumptions
It’s been said that most assumptions have outlived their
uselessness, but I’ll assume a few things nonetheless! Basically,
I assume that you know a little something about storage
technologies and storage management challenges. As such, this
book is written primarily for technical readers and decision
makers, such as storage administrators and IT managers.
Thank you for reading, hope you enjoy the book, please take
care of your writers. Seriously, this icon points out helpful
suggestions and useful nuggets of information.
Proceed at your own risk . . . well, okay — it’s actually nothing
that hazardous. These helpful alerts offer practical advice to
help you avoid making potentially costly mistakes.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1
Storage 101
In This Chapter
▶ Recognizing data access and management challenges
▶ Knowing the basics of what storage does
▶ Understanding different types of storage
▶ Distinguishing between different storage technologies
▶ Looking at cluster file systems
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
4 Software Defined Storage For Dummies
Zettabytes
Data doubles approximately
1000 X
every 2 years
Exabytes
1000 X
Petabytes
1000 X
Terabytes
Storage budgets increase only 1-5%
1000 X
Gigabytes
1980 1990 2000 2010 2013
Figure 1-1: S
torage requirements are devouring CAPEX and OPEX
resources.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Storage 101 5
Some of this growth can be addressed with larger hard disk
drives and networking components getting faster and faster.
But as these technologies advance, making the data useful
becomes more difficult. Larger hard disk drives enable you to
store more data, but in many cases the hardware appliances
that utilize these drives aren’t able to keep up.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
6 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Storage 101 7
Block storage
Block-based storage stores data on a hard disk as a sequence
of bits or bytes of a fixed size or length (a block). In a block-
based storage system, a server’s operating system (OS)
connects to the hard drives. Block storage is accessible via
a number of client interfaces including
Direct-attached storage
DAS is the simplest and cheapest type of storage for computer
systems. As the name implies, DAS is directly attached to the
computer or server via a bus interface (see Figure 1-2).
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
8 Software Defined Storage For Dummies
Computer
Bus
Disk(s) &
filesystem
Figure 1-2: D
AS connects hard disks directly to the computer or server via
a bus interface.
Fibre Channel
Switch
SAN storage 2
system 1
protects and
manages data
3
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Storage 101 9
A SAN can be used by multiple servers. Each server has
one or more fast, dedicated storage connections to one or
more storage arrays. A SAN allows multiple computers to
share access to a set of storage controllers. This provides
great flexibility for maintaining enterprise IT infrastruc-
tures. In large organizations, SANs enable a division of
labor where the system administrators manage the com-
puters and the storage administrators manage the SAN.
On its own, data can’t be shared between separate LUNs or
volumes, even within the SAN, except when cluster file sys-
tems are used in the SAN. Having multiple computers sharing
access to the same data is important to many applications
and workflows — making a cluster file system a necessary
addition to SANs used for these purposes.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
10 Software Defined Storage For Dummies
File storage
File-based storage systems, such as NAS appliances, are often
referred to as “filers” and store data on a hard disk as files in a
directory structure. These devices have their own processors
and OS and are accessed by using a standard protocol over a
TCP/IP network. Common protocols include
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Storage 101 11
NAS appliances are fairly common in datacenters today.
However, NAS appliances have several significant disadvan-
tages. They’re typically slower than DAS or SAN and can be
storage performance bottlenecks because all data has to
go through the NAS’s own processors. NAS appliances also
have limited scalability. When a NAS appliance fills up, you
add another, and another, and so on. This creates “islands of
storage” that are very inefficient to manage (see Figure 1-4).
Object storage
Object-based storage systems use containers to store data
known as objects in a flat address space instead of the hierar-
chical, directory-based file systems that are common in block-
and file-based storage systems (see Figure 1-5).
c:\
Unique Object ID 83568qw4590pxvbn
DOCS
APPS
DOS
Date, size, camera Metadata
/c/apps/games/b 83568qw4590pxvbn
Figure 1-5: C
omparing the file system to the object-based storage system.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
12 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Storage 101 13
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
14 Software Defined Storage For Dummies
$
300GB
15,000 pm
2.5 inch drives
600 and 900GB
10,000 rpm
2.5 inch drives
3 and 4TB
7,200 pm
3.5 inch drives
Capacity
Figure 1-6: D
ifferent hard drive technologies require a tradeoff between
capacity, performance, and cost.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Storage 101 15
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
16 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 2
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
18 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 2: What Is Software Defined Storage? 19
NAS devices are relatively inexpensive but have limited
scalability. When you run out of space on a NAS, you simply
add more NAS devices. However, this isn’t a true scale-out
capability because each individual NAS is presented as
separate, standalone storage that’s separately managed.
Cost efficiency
Rather than using expensive proprietary hardware, software
defined storage uses standard hardware to dramatically lower
both acquisition costs and TCO for an enterprise-class storage
solution. The software in a software defined storage solution is
standards based and manages the storage infrastructure as well
as the data in the storage system.
Measured I/O
performance
IBM GPFS
Conventional NAS
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 2: What Is Software Defined Storage? 21
Active
Cloud
GPFS File
Manager
GPFS Storage
Figure 2-2: G
PFS provides a common storage plane.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
22 Software Defined Storage For Dummies
Focusing on OpenStack
OpenStack has a modular archi- and OpenStack is to create a single
tecture with various components scale-out data plane for the entire
including data center or multiple connected
data centers worldwide.
✓ OpenStack Compute (Nova):
A cloud computing fabric GPFS unifies OpenStack VM images,
controller block devices, objects, and files with
support for Nova, Cinder, Swift, and
✓ Block Storage (Cinder): Provides
Glance, along with POSIX interfaces
persistent block-level storage
like NFS and CIFS for integrating
devices
legacy applications. The ability to use
✓ Object Storage (Swift): Scalable a single GPFS file system to manage
redundant storage system volumes (Cinder), images (Glance),
shared file systems (Manila), and
With OpenStack, you can control
use file clones to efficiently/quickly
pools of processing, storage, and
shared data within and between
networking resources throughout
components will be a big advantage
a datacenter. And while OpenStack
for Cloud-scale application develop-
provides open source versions of
ers using OpenStack.
block and object storage, many
OpenStack developers have identi- The robustness and features of
fied a need for more robust storage GPFS combined with OpenStack
to support Cloud-scale applications. Swift object extensions could pro-
While many OpenStack develop- vide an enterprise-grade object
ers feel they can architect around store with high storage efficiency,
limitations in OpenStack compute tape integration, wide-area replica-
capabilities and robustness, storage tion, transparent tiering, checksums,
has a much “higher bar” as far as snapshots, and ACLs — capabilities
resiliency and reliability go. most object-based storage offerings
can’t match today. OpenStack on
Responding for the need for robust
GPFS delivers compelling efficiencies
software defined storage, the
in a single unified storage solution
OpenStack “Havana” release includes
that can support object and file
an OpenStack Block Storage Cinder
access to the same data with robust
driver for IBM GPFS, giving architects
and efficient GPFS Native RAID data
who build public, private, and hybrid
protection. OpenStack Swift object
clouds access to the features and
storage on GPFS can reduce the
capabilities of the industry’s leading
amount of raw storage you need to
enterprise software-defined storage
use compared to object storage sys-
system. And the Cinder is just the
tems that rely strictly on replication.
beginning. IBM’s vision for GPFS
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
24 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3: Digging Deeper into IBM GPFS 25
Computer Cluster
Linux, Windows, AIX
Parallel
Access
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
26 Software Defined Storage For Dummies
When many servers need to use the same set of files at the
same time, the file system needs to ensure that all the files
are protected, so one server can’t change a file without the
other servers knowing about the change. Keeping thousands
of servers “in the loop” on file status is difficult and scaling up
is even harder.
The server(s) that initially have the token for all files that are
not in use is called the token manager. You can assign one
or more servers to be a token manager. Multiple token man-
agers help each other out by sharing the workload and by
taking over when a fellow token manager fails. When a file is
opened, the token manager hands off the token for that file
to the server that’s opening the file. The server using the file
is now responsible for all metadata changes to that file. If a
server wants to open a file that is already open on another
server, the token manager redirects the request to the server
that already has the file open and lets the two servers work
out the details among themselves. This sharing of metadata
maintenance across the entire cluster is what makes GPFS scale
very effectively. Many other file storage technologies rely on a
single metadata server or centralized database. Such a single/
centralized approach quickly limits how much data can be
stored. GPFS addresses this limitation by sharing the workload.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
28 Software Defined Storage For Dummies
Cluster Configurations
When it comes to cluster configuration options, GPFS is a
multi-function tool. The same GPFS software is installed on
all the servers in a cluster. What a server does, and how it
participates in the cluster, is based on the hardware it has
available and what you need it to do. Cluster configuration
is independent of which file system features you require.
Cluster configuration options can be characterized into the
three categories covered in this section.
Shared disk
A shared disk cluster is the most basic environment. In this
configuration, the storage is directly attached to all servers
in the cluster, as shown in Figure 3-2. Application data flows
over the SAN, and control information flows among the GPFS
servers in the cluster over a TCP/IP network.
GPFS Nodes
SAN
SAN Storage
Figure 3-2: S
AN attached storage.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3: Digging Deeper into IBM GPFS 29
This configuration is best for small clusters (1 to 50 servers)
when all servers in the cluster need the highest performance
access to the data. For example, this configuration is good
for high-speed data access for digital media applications or a
storage infrastructure for data analytics.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
30 Software Defined Storage For Dummies
NSD Clients
LAN
NSD Servers
SAN
SAN Storage
Figure 3-3: N
etwork block I/O.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3: Digging Deeper into IBM GPFS 31
GPFS provides the ability to designate separate IP interfaces
for intra-cluster communication and the public network. This
provides a more clearly defined separation of communication
traffic. An NSD server architecture is well suited to clusters
with sufficient network bandwidth between the I/O servers and
the clients. For example, statistical applications like financial
fraud detection, supply chain management, or data mining.
GPFS multi-cluster
GPFS multi-cluster allows you to utilize the GPFS NSD protocol
to share data across clusters. With this feature, you let other
clusters to access one or more of your file systems, and you
mount file systems that belong to other GPFS clusters for
which you’ve been authorized. A multi-cluster environment
permits the administrator access to specific file systems from
another GPFS cluster. This feature permits clusters to share
data at higher performance levels than file sharing technolo-
gies like NFS or CIFS.
LAN
SAN
Cluster A Cluster B
Figure 3-4: M
ulti-cluster configuration.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3: Digging Deeper into IBM GPFS 33
World-wide data sharing
So what can you do if your network link is really long or not
so reliable? You can use a feature in GPFS called Active File
Management (AFM). AFM allows you to create instead of
direct access to the data in the other cluster, as in a multi-
cluster situation, a copy of the data when and where you need
to access it.
At first glance, AFM may seem like any other cache. But when
you start looking at what you can do with these basic behaviors,
the options start multiplying. To understand some of the
possible options, take a look at how AFM can be used.
The isolation between the cache and target is what makes this
model scale so well. You can have as many as 1,000 read-only
caches because each cache only has to track one relationship.
And then it gets interesting! Beyond the many-to-one cache
relationships, you can also cascade caches. A target can have
a dual personality; it can be a cache and a target at the same
time. For example, say a target exists in New York. The data
originates in London and is cached in New York. The office in
Tokyo needs a copy of the same data, so the cache in Tokyo
uses the copy in New York as the target.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
34 Software Defined Storage For Dummies
Store1
Cache:
/data1
Clients access:
/global/data1
Local:
/global/data2 /data3
/global/data3 Store2
/global/data4
Cache: Clients access:
/global/data5 Local:
/data5 /global/data1
/global/data6 /data1
/global/data2
/global/data3
Cache:
/global/data4
/data3
/global/data5
Store3 /global/data6
Cache:
Clients access: /data5
/global/data1 Cache:
/global/data2 /data1
/global/data3
/global/data4 Cache:
/global/data5 /data3
/global/data6
Local:
/data5
Figure 3-5: G
lobal namespace using AFM.
In this example, each site is the target for one third of the
data. The other two sites have cache relationships with the
other sites. This means that no matter which site you log into,
you see exactly the same file system structure and you have
access to all the data — it may just take a little longer to read
a file if it has not yet been copied to your site.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3: Digging Deeper into IBM GPFS 35
For example, you can create a rule for thresholds that moves
files out of the high performance pool if it’s more than 80 per-
cent full, thereby mitigating potential bottlenecks in the high
performance pool. GPFS information life cycle management
capabilities and benefits include
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
36 Software Defined Storage For Dummies
GPFS-powered Enterprise
Solutions
GPFS is under the covers of some of the most game-changing
enterprise products today, including SAP HANA in-memory
database appliance, IBM DB2 PureScale, IBM SONAS (Scale-
out NAS).
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3: Digging Deeper into IBM GPFS 37
server that belongs to a PureScale cluster is called a member.
Each member can simultaneously access the same database
for both read and write operations. Designed mainly for OLTP
(online transaction processing) workloads with many con-
current transactions, DB2 PureScale offers almost unlimited
scale-out in terms of number of members that it can support.
Tests with up to 128 members show near-linear scalability.
IBM SONAS
Excessive proliferation of NAS (discussed in Chapter 1)
and standalone file servers to cater to rapidly growing
enterprise storage demands has resulted in an increase in
storage system disparity and administrative and energy costs
of IT assets. Typically, more than 50 percent of user data in an
enterprise is inactive. This is because projects that once used
such data actively are no longer current, or because technolo-
gies and the businesses have evolved. Manual data life cycle
management in such cases is neither a scalable proposition
nor does it address demanding I/O-intensive application
workloads and performance.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
38 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 4
This chapter talks about the IBM System x GSS. GSS combines
the performance of System x servers with GPFS software to
offer a high-performance, scalable building-block approach to
modern storage needs, known as software defined storage. GSS
allows you to start with a configuration that meets your organi-
zation’s current needs and expand capacity and bandwidth with
each additional GSS to meet your future needs.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
40 Software Defined Storage For Dummies
GSS packaging
GPFS Native RAID (GNR) software hardware has a specially designed
is capable of running on a variety disk tray that holds 384 drives. This
of hardware platforms, but to be hardware is excellent but not prac-
a reliable solution it needs to be tical for many customers, so IBM
tested with each server and disk released the GPFS Storage Server
tray. The GPFS native RAID soft- as a means to provide this advanced
ware was first delivered on the IBM storage technology on standard
P775 Supercomputing platform. This hardware to more customers.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 4: Getting to Know the IBM GPFS Storage Server 41
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
42 Software Defined Storage For Dummies
Figure 4-1 shows the hardware that’s used in the GSS and the
different shipping configurations.
Hard disks don’t report some read faults and occasionally fail
to write data, while actually claiming to have written the data.
These errors are referred to as silent errors, phantom-writes,
dropped-writes, and off-track writes. GPFS Native RAID imple-
ments an end-to-end checksum calculated and appended to
the data by the client to detect silent data corruption.
With declustered RAID, you don’t have idle spare disks sitting
and waiting to be called into service. Spreading data over all
available drives including spares, a hot spare becomes hot
spare space. So, instead of assigning a drive to be a spare, the
RAID software just keeps space free for failure events.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
44 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5
Higher Education
Higher education institutions must support research in
many fields over years of study. This research requires
high-performance computing (HPC) systems and storage
arrays. The United Kingdom’s Durham University Institute
for Computational Cosmology is one example of a research
institute with extreme computing and storage needs.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
46 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5: Software Defined Storage in the Real World 47
The ICC now needs no air chillers; all of the cooling require-
ments are met by the iDataPlex water cooling system. As a
result, the machine room’s power utilization effectiveness
(PUE) score has been reduced to around 1.2 — which means
that more than 80 percent of the electricity used by the room
goes directly to the clusters themselves, rather than powering
ancillary systems.
COSMA5 has also seen the ICC adopt new software for cluster
management: IBM Platform HPC, which includes IBM Platform
MPI for communication between processes and IBM Platform
LSF for workload management.
Energy
The energy industry has massive processing and storage
challenges. For example, the Oil and Gas (O&G) companies
explore vast, remote areas around the world in search of new
oil reserves. Seismic imaging and ultrasound data enables
these companies to map what is beneath the surface — on
land and sea.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
48 Software Defined Storage For Dummies
network that spans more than 1,200 towns. The company has
more than 2.2 million industrial, commercial, and domestic
customers and supplies approximately 390 million cubic
feet of natural gas each year. SSGC also owns and operates
Pakistan’s only gas meter manufacturing facility, with annual
production capacity of more than 750,000 meters.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5: Software Defined Storage in the Real World 49
The IBM Smart Analytics System 7700 solution is a pre-integrated
and optimized stack of data warehouse management software,
analytics tools, storage, and IBM Power Systems servers.
Designed to deliver optimal performance and flexibility for
business analytics, the IBM solution was also fast and easy
to deploy. As a complete business-ready analytics solution, the
IBM Smart Analytics System 7700 offers simple deployment
and operation, yet provides a rich and complete stack of
technologies with the resiliency required for SSGC to analyze
data with confidence and focus on business issues rather than
on platform integration.
The IBM solution gives SSGC faster, more accurate, and more
comprehensive reporting and analytical capabilities, helping
the company to identify hot spots for leaks and pilferage,
to visualize supply-and-demand issues, and to undertake
what-if analyses to plan more efficient processes. The creation
of a data warehouse to store operational data for analysis
and reporting has given SSGC a single version of the truth,
improving the consistency and reliability of reporting. Thanks
to the data warehouse, SSGC can also now easily leverage
information from a wide variety of sources, providing a richer
view of operational performance across multiple different
dimensions and a better understanding of customer behavior.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
50 Software Defined Storage For Dummies
Engineering Analysis
Engineering analysis enables design engineers to analyze
the individual components of a complete system, applying
scientific analytic principles and processes, to understand its
operation.
Infiniti Red Bull Racing uses a HPC cluster, featuring IBM GPFS,
to power its high-performance computing infrastructure for
both design applications and near real-time race analytics,
giving the racing team the edge it needs to design and run the
best cars on the track.
With CFD, Infiniti Red Bull Racing can perform virtual wind-
tunnel testing on new car designs as a first step to determining
the impact of design changes on a vehicle’s aerodynamics.
Simulation is a critical factor in analyzing design improvements
and requires huge amounts of processing power.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5: Software Defined Storage in the Real World 51
To support top performance for design and analytics applica-
tions, Infiniti Red Bull Racing uses an HPC cluster, featuring
IBM GPFS, which enables high-speed file access to applications
running on multiple nodes of the cluster.
The Infiniti Red Bull Racing team knew that as demand for the
data generated by the HPC cluster took off, managing the work-
load would be essential. After initially examining open source
alternatives, the team turned to IBM Platform Computing to
make better use of its enormous compute resources using
intelligent scheduling and resource allocation.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
52 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5: Software Defined Storage in the Real World 53
Life Sciences
The life sciences industry encompasses numerous fields
including agriculture, biochemistry, food science, genetics,
health and medicine, medical devices and imaging, and
pharmaceuticals, among others.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
54 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5: Software Defined Storage in the Real World 55
The use of GPFS will make it easier for the Institute to
continue expanding the storage environment, as the file
system has practically unlimited scalability.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
56 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5: Software Defined Storage in the Real World 57
“By using a mix of IBM disk and tape storage, we can balance
price and performance, improving the cost-efficiency of the
solution significantly,” says Flory. “This approach allows us
to offer a high-performance solution at a competitive price,
which we believe puts us ahead of our competitors.”
JSC had to put major effort into managing its previous storage
infrastructure, which relied on expensive “hardware defined”
data protection in the form of dedicated RAID controllers.
Replacing that with a software defined storage environment
(IBM GPFS Storage Server) allowed them to utilize the abundant
computing power available across the GPFS cluster to be
redirected on demand, to dynamically support RAID rebuilds
at much higher speeds and at half the cost, compared to
conventional hardware defined storage systems.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
58 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 6
Improve Application
Performance
Data-intensive applications are defined by the fact that they
need to read or write a large amount of data to get the job
done. Speeding up data-intensive applications is easy, use a
faster storage infrastructure. In reality, it may not be so easy
because many storage solutions don’t scale efficiently. When
the storage device does not scale the only way to improve
I/O performance is to use many small data containers and
modify your applications to utilize these separate containers
concurrently. If you don’t spread the data for the project
across multiple containers adding containers can create data
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
60 Software Defined Storage For Dummies
hot spots. A data hot spot occurs when you have a high level
of I/O required by a set of data on a single NAS appliance.
GPFS eliminates hot spots by spreading data across all of the
available storage hardware.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 6: Ten Ways to Use Software Defined Storage 61
with other data, metadata is spread across all available
storage and metadata management is distributed across
the entire cluster. Also, many metadata-intensive workloads
perform much better with GPFS, leveraging its distributed
metadata and load balancing features. Applications that
require dynamic load balancing need a file system that has
excellent I/O performance and is very reliable. GPFS performs
like a local file system, with the added advantage of flexibil-
ity, increased scalability, and the reliability of a clustered file
system.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
62 Software Defined Storage For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 6: Ten Ways to Use Software Defined Storage 63
and in different locations, in order to better predict the
reservoir properties and drive new simulations.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
64 Software Defined Storage For Dummies
✓ File placement
✓ File management
After a file has been created, GPFS knows much more about
it. In addition to the attributes available when a file is created,
GPFS now knows additional information including the size of
the file, how long it’s been since someone accessed the file,
and whether or not it’s been changed. Policies that operate on
existing file are called file management policies and allow you
to move, replicate, or delete files. You can use file manage-
ment policies to move data from one pool to another without
changing the file location in the directory structure. One
popular use for file management policies doesn’t involve
moving data at all — you can use it for reporting. The policy
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 6: Ten Ways to Use Software Defined Storage 65
syntax is very powerful, allowing you to generate custom
reports, for example, on the type of files using the most space.
On Linux and AIX, you can use similar tools to get this infor-
mation, but the policy engine is very fast — it can look at the
metadata of millions of files per second.
For additional protection, you can have GPFS create two or three
synchronous copies of the file system metadata and/or the file
data. When data is replicated, you can use either copy of the
data for reads in an active-active mode. You can even tell GPFS
which copy to read from, in order to keep read access local if
you are replicating across sites. This can help for read-intensive
workloads and to keep certain traffic off the WAN. When both
copies are located in the same data center, for example, reads
come from both copies, thereby doubling your read bandwidth.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
66 Software Defined Storage For Dummies
HDFS and GPFS both provide the basic storage tools needed
for MapReduce workloads, but that is where the similari-
ties end. HDFS is a basic storage solution for Map Reduce,
whereas GPFS is an enterprise storage software solution that
supports MapReduce. Some limitations of HDFS include
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 6: Ten Ways to Use Software Defined Storage 67
✓ Block-level information not exposed to applications
✓ Ability to open, read, and append to any section of a file
Cloud Storage
Cloud storage provides a scalable, virtualized infrastructure as
a service, hiding the complexity of fine-grained resource man-
agement from the end-user. According to IDC (International
Data Corporation), the amount of information in the world
is set to grow 44-fold in the next decade, with much of that
increase coming from the rise in cloud computing.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
68 Software Defined Storage For Dummies
Enterprise Analytics
For enterprise analytics, like detecting credit card fraud or
generating customer specific marketing offers, scalability,
reliability, and ready access to data is critical. Analytics is a
data-driven application that delivers measureable business
value.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.