OS Storage Management
OS Storage Management
OS Storage Management
• Increasing the storage capacity might be the easiest and most popular solution to the ever-expanding quantities of
data. Just go out and buy more storage. Triple your storage every year by getting a couple of hundred terabytes (or
petabytes!). But if you don't know how to maximize your current resources, this approach is like buying a
warehouse down the corner and leaving it half empty and disorganized.
• This is where Storage Management comes into play!
• Storage Management will attempt to maximize your current resources through various techniques, software, or
hardware, or processes.
What is Storage Management?
• Storage Management refers to the processes that help make data storage easier through software or techniques. It
tries to improve and maximize the efficiency of data storage resources. Storage management processes can deal
with local or external storage such as NAS, SAN, USBs, SDDs, HDD, the Cloud, etc.,
• The goal of storage management is to improve the performance of resources, not to expand capacity.
Mass -Storage Structure
• The main mass-storage system in modern computers is secondary storage, which is usually provided by hard disk
drives (HDD) and nonvolatile memory (NVM) devices.
• Hard disk drives and nonvolatile memory devices are the major secondary storage I/O units on most
computers. Modern secondary storage is structured as large one- dimensional arrays of logical blocks.
A disk drive motor spins it at high speed. Most drives rotate 60 to 250 times per second, specified in terms of rotations
per minute (RPM). Common drives spin at 5,400, 7,200, 10,000, and 15,000 RPM. Some drives power down when not in
use and spin up upon receiving an I/O request. Rotation speed relates to transfer rates.
The transfer rate is the rate at which data flow between the drive and the computer. Another performance aspect, the
positioning time, or random-access time, consists of two parts: the time necessary to move the disk arm to the desired
cylinder, called the seek time, and the time necessary for the desired sector to rotate to the disk head, called the rotational
latency.
• The disk head flies on an extremely thin cushion (measured in microns) of air or another gas, such as helium, and there
is a danger that the head will make contact with the disk surface. Although the disk platters are coated with a thin
protective layer, the head will sometimes damage the magnetic surface. This accident is called a head crash.
• A head crash normally cannot be repaired; the entire disk must be replaced, and the data on the disk are lost unless they
were backed up to other storage or RAID protected.
Nonvolatile Memory Devices
• Nonvolatile memory (NVM) devices are growing in importance. NVM devices are electrical rather than
mechanical.
• Flash-memory-based NVM is frequently used in a disk-drive- like container, in which case it is called a solid-
state disk (SSD). It takes the form of a USB drive (also known as a thumb drive or flash drive) or a DRAM stick.
It is also surface-mounted onto motherboards as the main storage in devices like smartphones.
• Magnetic tape was used as an early secondary-storage medium. Although it is nonvolatile and can hold large
quantities of data, its access time is slow compared with that of main memory and drives.
Secondary Storage Connection Methods
• A secondary storage device is attached to a computer by the system bus or an I/O bus.
• Several kinds of buses are available, including advanced technology attachment (ATA), serial ATA (SATA),
eSATA, serial attached SCSI (SAS), universal serial bus (USB), and fibr channel (FC).
• The most common connection method is SATA. Because NVM devices are much faster than HDDs, the industry
created a special, fast interface for NVM devices called NVM express (NVMe).
• NVMe directly connects the device to the system PCI bus, increasing through put and decreasing latency
compared with other connection methods.
1
• The data transfers on a bus are carried out by special electronic processors called controllers (or host-bus
adapters (HBA)).
• • The host controller is the controller at the computer end of the bus.
• A device controller is built into each storage device. To perform a mass storage I/O operation, the computer
places a command into the host controller, typically using memory-mapped I/O ports.
Address Mapping
• Storage devices are addressed as large one- dimensional arrays of logical blocks, where the logical block is the
smallest unit of transfer. Each logical block maps to a physical sector or semiconductor page.
• By using this mapping on an HDD, we can—at least in theory-convert a logical block number into an old-style
disk address that consists of a cylinder number, a track number within that cylinder, and a sector number within
that track.
HDD Scheduling
• One of the responsibilities of the operating system is to use the hardware efficiently. For HDDs, meeting this
responsibility entails minimizing access time and maximizing data transfer bandwidth.
• The device bandwidth is the total number of bytes transferred, divided by the total time between the first request
for service and the completion of the last transfer. We can improve both the access time and the bandwidth by
managing the order in which storage I/O requests are serviced.
Whenever a process needs I/O to or from the drive, it issues a system call to the operating system. The request specifies
several pieces of information:
➢ Whether this operation is input or output
➢ The open file handle indicating the file to operate on
➢ What the memory address for the transfer is
➢ The amount of data to transfer
SCAN Scheduling
In the SCAN algorithm, the disk arm starts at one end of the disk and moves toward the other end, servicing requests as it
reaches each cylinder, until it gets to the other end of the disk. At the other end, the direction of head movement is
reversed, and servicing continues. The head continuously scans back and forth across the disk.
The SCAN algorithm is sometimes called the elevator algorithm, since the disk arm behaves just like an elevator in a
building, first servicing all the requests going up and then reversing to service requests the other way.
C-SCAN Scheduling
• Circular SCAN (C-SCAN) scheduling is a variant of SCAN designed to provide a more uniform wait time. Like
SCAN, C-SCAN moves the head from one end of the disk to the other, servicing requests along the way. When
the head reaches the other end, however, it immediately returns to the beginning of the disk without servicing any
requests on the return trip.
NVM Scheduling
• The disk-scheduling algorithms just discussed apply to mechanical platter-based storage like HDDs. They focus
primarily on minimizing the amount of disk head movement. NVM devices do not contain moving disk heads and
commonly use a simple FCFS policy.
• Sequential access is optimal for mechanical devices like HDD and tape because the data to be read or written is
near the read/write head. Random-access I/O, which is measured in input/output operations per second (IOPS),
causes HDD disk head movement. Naturally, random access I/O is much faster on NVM. An HDD can produce
hundreds of IOPS, while an SSD can produce hundreds of thousands of IOPS.
• Error detection determines if a problem has occurred for example a bit in DRAM spontaneously changed from a 0
to a 1, the contents of a network packet changed during transmission, or a block of data changed between when it
was written and when it was read.
2
• By detecting the issue, the system can halt an operation before the error is propagated, report the error to the user
or administrator, or warn of a device that might be starting to fail or has already failed.
• Parity is one form of checksums, which use modular arithmetic to compute, store, and compare values on fixed-
length words. Another error-detection method, common in networking, is a cyclic redundancy check (CRCs),
which uses a hash function to detect multiple-bit errors
• An error-correction code (ECC) not only detects the problem, but also corrects it. The correction is done by
using algorithms and extra amounts of storage. The codes vary based on how much extra storage they need and
how many errors they can correct.
• The ECC is error correcting because it contains enough information, if only a few bits of data have been
corrupted, to enable the controller to identify which bits have changed and calculate what their correct values
should be. It then reports a recoverable soft error. If too many changes occur, and the ECC cannot correct the
error, a noncorrectable hard error is signaled. The controller automatically does the ECC processing whenever a
sector or page is read or written.
• Error detection and correction are frequently differentiators between consumer products and enterprise products.
ECC is used in some systems for DRAM error correction and data path protection, for example.
Boot Block
• For a computer to start running, when it is powered up or rebooted, it must have an initial program to run. This
initial bootstrap loader tends to be simple.
• This tiny bootstrap loader program is also smart enough to bring in a full bootstrap program from secondary
storage. The full bootstrap program is stored in the “boot blocks” at a fixed location on the device. A device that
has a boot partition is called a boot disk or system disk.
3
Bad Blocks
• Because disks have moving parts and small tolerances, they are prone to failure. Sometimes the failure is
complete; in this case, the disk needs to be replaced and its contents restored from backup media to the new disk.
More frequently, one or more sectors become defective. Most disks even come from the factory with bad blocks.
Depending on the disk and controller in use, these blocks are handled in a variety of ways.
• Low-level formatting also sets aside spare sectors not visible to the operating system. The controller can be told to
replace each bad sector logically with one of the spare sectors. This scheme is known as sector sparing or
forwarding.
• As an alternative to sector sparing, some controllers can be instructed to replace a bad block by sector slipping.
• Recoverable soft errors may trigger a device activity in which a copy of the block data is made and the block is
spared or slipped. An unrecoverable hard error, however, results in lost data.
Swap-Space Management
• Swapping in that setting occurs when the amount of physical memory reaches a critically low point and processes
are moved from memory to swap space to free available memory.
• Swap-space management is another low-level task of the operating system. Virtual memory uses secondary
storage space as an extension of main memory. Since drive access is much slower than memory access, using
swap space significantly decreases system performance.
• The main goal for the design and implementation of swap space is to provide the best throughput for the virtual
memory system.
Swap-Space Use
• Swap space is used in various ways by different operating systems, depending on the memory-management
algorithms in use. The amount of swap space needed on a system can therefore vary from a few megabytes of disk
space to gigabytes, depending on the amount of physical memory, the amount of virtual memory it is backing, and
the way in which the virtual memory is used.
Swap-Space Location
• Network-attached storage (NAS) provides access to storage across a network. An NAS device can be either a
special-purpose storage system or a general computer system that provides its storage to other hosts the network.
Network-attached storage provides a convenient way for all the computers on a LAN to share a pool of storage
with the same ease of naming and access enjoyed with local host- attached storage.
4
Cloud Storage
• A storage-area network (SAN) is a private network (using storage protocols rather than networking protocols)
connecting servers and storage units. The power of a SAN lies in its flexibility. Multiple hosts and multiple
storage arrays can attach to the same SAN, and storage can be dynamically allocated to hosts. The storage arrays
can be RAID protected or unprotected drives (Just a Bunch of Disks (JBOD)).
• A SAN switch allows or prohibits access between the hosts and the storage. As one example, if a host is running
low on disk space, the SAN can be configured to allocate more storage to that host. SANs make it possible for
clusters of servers to share the same storage and for storage arrays to include multiple direct host connections.
• Another SAN interconnect is InfiniBan (IB)—a special purpose bus architecture that provides hardware and
software support for high-speed interconnection networks for servers and storage units.
• Storage management techniques or software can be divided into following four subsets:
1. Performance,
2. Availability
3. Recoverability
4. Capacity
Common Storage Management Processes
(Purestorage, 2021)
• While the exact tools and techniques used to manage data storage will differ between organizations, here's a list of
common storage management processes:
• Provisioning: The process of assigning storage capacity to computers, servers, and other machines.
• Virtualization: The practice of leveraging virtual machines (VMs) to decouple software from hardware operating
environments.
• Containerization: A type of virtualization in which fully packaged and portable computing environments are
used to decouple software from their operating systems.
• Data compression: A technique used to free up drive space, close memory gaps, reduce retrieval times, and
otherwise maximize your storage capacity.
• Data migration: The process of moving data from one storage location to another.
• Data replication: From snapshots to mirroring, data replication involves storing the same data in multiple storage
locations for redundancy, resilience, and reliability.
• Disaster recovery: Tools, planning, policies, and procedures for restoring IT operations to normal in the wake of
a disaster.
• Automation: From simple scripts to DevOps, automation covers all tools and techniques related to automating
data storage management processes.
Storage Management Advantages
1. Reduce Capital and Operational Expenses:
The most significant expenses when it comes to storage is maintaining and operating the infrastructure. The CapEx can
reduce because a business will not have to expand storage capacity that often. OpEx can also be reduced as ongoing
operations on storage are decreased.
2. Makes Management Easier:
Storage management systems can help users save time through automated tasks, centralized consoles, or by logging
remotely. It can also reduce the number of IT staff needed to run the storage infrastructure. Storage management can also
make virtualized or cloud environments more comfortable to manage from a single location.
3. Enhance Performance:
One of the main goals of storage management is to improve the performance of the existing storage resources. For
example, compressing data can dramatically reduce the amount of storage and improve file transfer speeds. Automatic
storage provisioning can reduce the time it takes to provision storage resources.
5
4. Speed and Flexibility:
Storage management solutions should be able to work in real-time and adapt to sudden changes in the storage resources.
For example, storage replication is a managed service that replicates stored data in real-time. Storage virtualization can
also help improve flexibility and reduce wasted storage. Virtualization can create a pool of physical storage from multiple
devices into a single logical storage device. Storage capacity can be easily relocated as the business changes needs.
Higher Availability:
This is probably one of the biggest benefits of storage management. For example, technologies such as Replication,
Snapshot and Mirroring, Migration, and Disaster and Recovery (DR) can help you have higher availability and reliability
on data. All these storage techniques can help backup and restore data fast, but some can also serve as primary storage.
Storage Resource Management vs. Storage Management
• SRM is a software solution that aims to optimize the speed and performance of the storage space that is used in
the Storage Area Network (SAN).
• It uses a variety of methodologies to identify underutilized storage, allocate storage capacity, transfer old data to
alternative media, and can predict storage trends and requirements. SRM also helps to manage configurations,
policies, storage media, and more.
• Although it sounds the same, Storage Resource Management (SRM) is a different concept than storage
management.
• SRM refers to the specific software, either as standalone or as a part of a software bundle. SRMs are becoming
increasingly sophisticated.
Networked Storage Management vs. Storage Management
• Networked Storage Management solutions such as the Storage Area Networks (SAN) or the Network Area
Storage (NAS) device are also similar concepts when it comes to managing storage.
• A SAN particularly is a computer network used to improve the accessibility of storage resources.
• The elements in the SAN could be disk arrays or tapes that are networked together but cannot be managed by a
server, it is controlled by a SAN management software.
• Most of the times, SAN and NAS used their proprietary software and hardware, so that makes them less flexible.
• As mentioned above, SRM aims to improve the speed and efficiency of the storage used in SANS. So with an
SRM, you could manage all SANs from a single server.
Implementing Storage Management.
• Storage management is a broad concept, that includes techniques, software, processes that aim at improving the
performance, availability, recoverability, and capacity of the storage resources.
• The first step to implement storage management would be to train IT personnel and storage administrators on best
storage management practices.
• There are a couple of storage management standards and organizations to start getting information.
The Future of Storage Management Technology.
• SRMS, NAS, SAN, and DAS (Direct Attached Storage) solutions present a fantastic set of methodologies to
improve the performance, availability, or recoverability of your data.
• But all of these have one common problem; you cannot manage all your storage resource from a single console.
You need one admin console for each piece of storage.
• The new concept of Software-Defined Storage (SDS) promises to revolutionize the way we store, manage, and
collect data. SDS decouples data storage policy-based provisioning and management from its underlying
hardware.
• Software-Defined Storage virtualizes NAS, SANS, and DAS hardware as virtual disks. In other words, SDS
introduces storage virtualization to separate the storage hardware resources from the management software.
• The best thing is that SDS can be implemented in a wide variety of appliances, from generic servers to traditional
SANS, NAS, etc. Since the intelligence is decoupled from the hardware, it does not need any proprietary device
to run.
• The intelligence running on SDSs will provide all the methodologies shown above, such as thin provisioning,
snapshot, mirroring, backup and DR, automation, etc.
• The benefits of SDS on storage management?
➢ Manage and unify every storage resource as DAS and NAS in your data center.
6
➢ No dependence on proprietary hardware.
➢ 100% virtualized storage with the management interface that comes with countless features.
➢ Manage all storage from a single console, take snapshots, compress, create backups and DR copies, etc.
Software-defined Storage
• Block Storage
• Block storage, also called block-level storage, is a type of IT storage typically used in storage area network (SAN)
environments. Data is stored in volumes, also referred to as blocks. The volumes are treated as individual hard
disks, so block devices can be mounted by guest operating systems as if they were physical disks.
File storage
• File storage, also called file-level or file-based storage, is a type of IT storage for storing data in a hierarchical
structure. Data is saved in files inside folders, nested within other folders, analogous to an office file cabinet. File
storage is presented to both the system storing it and the system retrieving it in the same format. Data can be
accessed using the Network File System (NFS) protocol for Unix or Linux, or the Server Message Block (SMB)
protocol for Microsoft Windows.
Object storage
• Object storage, also called object-based storage, is a type of IT storage for unstructured, non-hierarchical data
(such as email messages, documents, videos, graphics, audio files and web pages) known as objects. An object is
data bundled with the metadata that describes its contents. Each object is assigned a globally unique ID, and
objects are retrieved by applications that present the object ID to object storage.
Storage Area Network
• A storage area network (SAN) is a dedicated high-speed computer network that connects storage devices with
servers. SANs can deliver shared pools of storage to multiple servers, on or off premises. Each server accesses the
shared storage as if it were a drive directly attached to it. A SAN moves IT storage resources off of the user
network and reorganizes them into an independent, high-performance storage network. SANs provide block-level
storage, typically via a Fibre Channel connection. They require proprietary hardware, making them more
expensive to scale than software-defined storage (SDS) or cloud-based solutions.
Storage solutions
• Storage solutions are ways to save or archive computer data in electromagnetic, optical, digital or other formats.
Data may be stored on premises, on external drives, on remote devices, on removable media, or online (in the
cloud). For large amounts of data, businesses often use storage area networks (SANS), network-attached storage
(NAS) devices, software- defined storage (SDS), or cloud-based storage.
Cloud storage
• Cloud storage is a storage-as-a-service model that remotely stores, manages, backs up and shares data over the
Internet. Digital data is stored in logical pools, while the physical storage may be distributed across multiple
servers and locations. Typically, a third-party provider owns and maintains the hardware and network connections,
reducing capital expenses for the customer. Customers may add or remove storage capacity on demand, and pay
only for the storage resources used each month. The cloud storage fee may include a per-gigabyte cost plus the
cost of managing, securing and backing up the data. Most cloud storage providers offer a tiered rate structure,
where archived information costs less than frequently accessed information.
Network-attached storage
• Network-attached storage (NAS) is a type of dedicated file storage device that provides local area network (LAN)
nodes with file-based shared storage through a standard Ethernet connection. NAS devices, which typically do not
have a keyboard or display, are configured and managed with a browser-based utility program. Each NAS device
resides on the LAN as an independent network node and has its own IP address.
Storage Hypervisor
• A storage hypervisor is a supervisory program that manages multiple pools of consolidated storage as virtual
resources. It treats all the storage hardware it manages as generic, even dissimilar and incompatible devices.
Similar to a virtual server hypervisor, a storage hypervisor may run on a specific hardware platform or be
hardware independent. It can run on a physical server, on a virtual machine, inside a hypervisor OS, or in a
7
storage network. Also like a virtual server hypervisor, a storage hypervisor separates the direct link between
physical and logical resources.
Ceph storage
• Ceph storage is a software-defined storage solution that distributes data across clusters of storage resources. It is a
fault-tolerant and scale-out storage system, where multiple Ceph storage nodes (servers) cooperate to present a
single storage system that can hold many petabytes (1PB = 1,000 TB = 1,000,000 GB) of data.
• Ceph is a software defined storage (SDS) platform that unifies the storage of block, object and file data into a
distributed computer cluster. It is a component of the OpenStack set of open-source cloud management tools.
Ceph storage clusters can run on standard x86 servers, using the Controlled Replication Under Scalable Hashing
(CRUSH) algorithm to distribute data evenly across the cluster. This allows cluster nodes to access data without
the bottlenecks common to centralized storage architectures.
Disk-to-disk
• Disk-to-disk (or D2D) refers to the disk-to-disk method of backup storage. With D2D, a computer hard disk is
backed up to another hard disk rather than to a tape. Disk-to-disk systems are random-access storage, not linear or
sequential like tape storage. Thus, D2D can send and receive multiple concurrent data streams. Relative to tape
backup, D2D offers shorter backup windows, faster restores and quicker access. Random access D2D systems
allow more incremental backups per full backup. As a result, time- consuming full backups can be scheduled less
frequently. Unlike tape, individual files can be recovered from D2D without scanning the entire backup volume.
Storage Resource Management
• Storage resource management (SRM) is the software used to manage storage networks and devices in order to
optimize the efficiency and capacity of storage hardware. SRM usually allocates storage capacity based on
company policies. It can manage assets, storage configuration, data and media migration, events, availability and
quotas. Storage resource management can also automate data backup, data recovery and performance analysis.
Some SRM products identify storage usage, availability and performance by application, business unit or user,
enabling IT consumption tracking and chargeback.
Backup storage
• Backup storage keeps copies of data actively in use, providing redundancy in case of hardware failure or data loss.
Unlike long-term, archive or cold storage, backup storage must enable the rapid retrieval and restoration of
backup data. Backup storage devices are disk-based hardware appliances bundled with specialized software that
manages encryption, network connectivity, data deduplication and compression. A remote backup appliance
(outside of a corporate data center) helps provide business continuity and disaster recovery. Backup storage often
uses both disk-to- disk (d2d) and magnetic tape systems as storage media.
Enterprise storage
• Enterprise storage is a centralized repository for business- critical information that provides data sharing, data
management and data protection across multiple (and often dissimilar) computer systems. Whether cloud-based,
on premises, or as a hybrid cloud solution, enterprise storage can handle large volumes of data and large numbers
of users. It offers better performance, availability and scalability than traditional storage options.
Software-defined storage
• Software-defined storage (SDS) is a computer platform that creates a virtualized network of storage resources by
separating the management software from its underlying storage hardware. SDS resources may be spread across
multiple servers and pooled or shared as if they reside on one physical device. Businesses using applications that
generate large amounts of unstructured data (such as data analytics, genomics and multimedia websites) or
DevOps environments that require flexible storage provisioning for new applications may adopt SDS technology
to add storage capacity as needed. Software-defined storage is part of a larger industry trend that also includes
software defined networking (SDN), software-defined infrastructure (SDI) and software-defined data center
(SDDC).