0% found this document useful (0 votes)
11 views

ISM Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

ISM Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

INFORMATION STORAGE MANAGEMENT UNIT-1

1.1 Introduction to Information Storage


1.1.1 Data
Data is a collection of raw facts from which conclusions might be drawn.
Handwritten letters, a printed book, a family photograph, printed and duly signed
copies of mortgage papers, a bank’s ledgers, and an airline ticket are all examples that
contain data.
Before the advent of computers, the methods adopted for data creation and
sharing were limited to fewer forms, such as paper and film. Today, the same data can
be converted into more convenient forms, such as an e-mail message, an e-book, a
digital image, or a digital movie. This data can be generated using
a computer and stored as strings of binary numbers (0s and 1s),

The following is a list of some of the factors that have contributed to the
growth of digital data:
 Increase in data-processing capabilities:
 Lower cost of digital storage:
 Affordable and faster communication technology:
 Proliferation of applications and smart devices

1.1.2 Data Explosion


Inexpensive and easier ways to create, collect, and store all types of data,
coupled with increasing individual and business needs, have led to accelerated
data growth, popularly termed data explosion

1.1.3 Types of Data


Data can be classified as structured or unstructured based on how it is stored
and managed.
Structured data is organized in rows and columns in a rigidly defined format
so that applications can retrieve and process it efficiently. Structured data is typically
stored using a database management system (DBMS).
INFORMATION STORAGE MANAGEMENT UNIT-1

Data is unstructured if its elements cannot be stored in rows and columns,


which makes it difficult to query and retrieve by applications. For example, customer
contacts that are stored in various forms such as sticky notes, e-mail messages,
business cards, or even digital format files, such as .doc, .txt, and .pdf.
A vast majority of new data being created today is unstructured. The industry is
challenged with new architectures, technologies, techniques, and skills to store,
manage, analyze, and derive value from unstructured data from numerous sources.

1.1.4 Big Data


Big data is a new and evolving concept, which refers to data sets whose sizes
are beyond the capability of commonly used software tools to capture, store, manage,
and process within acceptable time limits.
The big data ecosystem consists of the following:
1. Devices that collect data from multiple locations and also generate new
data about this data (metadata).
2. Data collectors who gather data from devices and users.
3. Data aggregators that compile the collected data to extract meaningful
information.
4. Data users and buyers who benefit from the information collected and
aggregated by others in the data value chain.

1.1.5 Information
Information is the intelligence and knowledge derived from data.
Businesses analyze raw data in order to identify meaningful trends. On the
basis of these trends, a company can plan or modify its strategy. For example, a
retailer identifies customers’ preferred products and brand names by analyzing their
purchase patterns and maintaining an inventory of those products.
INFORMATION STORAGE MANAGEMENT UNIT-1

Effective data analysis not only extends its benefits to existing businesses, but
also creates the potential for new business opportunities by using the information in
creative ways. Job portal is an example. In order to reach a wider set of prospective
employers, job seekers post their résumés on various websites offering job search
facilities.
These websites collect the résumés and post them on centrally accessible
locations for prospective employers. In addition, companies post available positions
on job search sites. Job-matching software matches keywords from résumés to
keywords in job postings. In this manner, the job search engine uses data and turns it
into information for employers and job seekers.

1.1.6 Storage
Data created by individuals or businesses must be stored so that it is easily
accessible for further processing. In a computing environment, devices designed for
storing data are termed storage devices or simply storage.
The type of storage used varies based on the type of data and the rate at which
it is created and used. Devices, such as a media card in a cell phone or digital camera,
DVDs, CD-ROMs, and disk drives in personal computers are examples of storage
devices.
Businesses have several options available for storing data, including internal
hard disks, external disk arrays, and tapes.

1.1.7 Evolution of Storage Architecture


Organizations had centralized computers (mainframes) and information
storage devices (tape reels and disk packs) in their data center. In earlier
implementations of open systems, the storage was typically internal to the server.
These storage devices could not be shared with any other servers.
This approach is referred to as server-centric storage architecture In this architecture,
each server has a limited number of storage devices, and any administrative tasks,
such as maintenance of the server or increasing storage capacity, might result in
unavailability of information.
The proliferation of departmental servers in an enterprise resulted in
unprotected, unmanaged, fragmented islands of information and increased capital and
operating expenses..
To overcome these challenges, storage evolved from server-centric to
information-centric architecture. In this architecture, storage devices are managed
centrally and independent of servers. These centrally-managed storage devices are
shared with multiple servers.
INFORMATION STORAGE MANAGEMENT UNIT-1

1.2 Data Center Infrastructure


The data center infrastructure includes hardware components, such as
computers, storage systems, network devices, and power backups; and software
components, such as applications, operating systems, and management software.

1.2.1 Core Elements of a Data Center


Five core elements are essential for the functionality of a data center:
1. Application: A computer program that provides the logic for computing operations
2. Database management system (DBMS): Provides a structured way to store data
in logically organized tables that are interrelated
3. Host or compute: A computing platform (hardware, firmware, and software) that
runs applications and databases
4. Network: A data path that facilitates communication among various networked
devices
5. Storage: A device that stores data persistently for subsequent use
INFORMATION STORAGE MANAGEMENT UNIT-1

Figure 1-5 shows an example of an online order transaction system that involves the
five core elements of a data center and illustrates their functionality in a business
process.

1.2.2 Key Characteristics of a Data Center


1. Availability: A data center should ensure the availability of information when
required. Unavailability of information could cost millions of dollars per hour to
businesses, such as financial services, telecommunications, and e-commerce.
2. Security: Data centers must establish policies, procedures, and core element
integration to prevent unauthorized access to information.
3. Scalability: Business growth often requires deploying more servers, new
applications, and additional databases. Data center resources should scale based on
requirements, without interrupting business operations.
4. Performance: All the elements of the data center should provide optimal
performance based on the required service levels.
5. Data integrity: Data integrity refers to mechanisms, such as error correction codes
or parity bits, which ensure that data is stored and retrieved exactly as it was received.
6. Capacity: Data center operations require adequate resources to store and process
large amounts of data, efficiently. When capacity requirements increase, the data
center must provide additional capacity without interrupting availability or with
minimal disruption. Capacity may be managed by reallocating the existing resources
or by adding new resources.
7. Manageability: A data center should provide easy and integrated management of
all its elements. Manageability can be achieved through automation and reduction of
human intervention in common tasks.

1.2.3 Managing a Data Center


Managing a data center involves many tasks. The key management activities
include the following:
1. Monitoring: It is a continuous process of gathering information on various
elements and services running in a data center. The aspects of a data center that are
monitored include security, performance, availability, and capacity.
INFORMATION STORAGE MANAGEMENT UNIT-1

2. Reporting: It is done periodically on resource performance, capacity, and


utilization. Reporting tasks help to establish business justifi cations and Charge back
of costs associated with data center operations.
3. Provisioning: It is a process of providing the hardware, software, and other
resources required to run a data center. Provisioning activities primarily include
resources management to meet capacity, availability,
performance, and security requirements.

1.3 Virtualization and Cloud Computing


1.3.1 Virtualization
Virtualization is a technique of abstracting physical resources, such as
compute, storage, and network, and making them appear as logical resources.
Virtualization has existed in the IT industry for several years and in different
forms. Common examples of virtualization are virtual memory used on compute
systems and partitioning of raw disks.
Virtualization enables pooling of physical resources and providing an
aggregated view of the physical resource capabilities. For example, storage
virtualization enables multiple pooled storage devices to appear as a single large
storage entity.

1.3.2 Cloud computing


Cloud computing enables individuals or businesses to use IT resources as a
service over the network. It provides highly scalable and flexible computing that
enables provisioning of resources on demand.
Users can scale up or scale down the demand of computing resources,
including storage capacity, with minimal management effort or service provider
interaction.
Cloud computing enables consumption-based metering; therefore, consumers
pay only for the resources they use, such as CPU hours used, amount of data
transferred, and gigabytes of data stored.
Cloud infrastructure is usually built upon virtualized data centers, which
provide resource pooling and rapid provisioning of resources. Information storage in
virtualized and cloud environments is detailed later in the book.
INFORMATION STORAGE MANAGEMENT UNIT-1

1.4 Disk Drive Components


The key components of a hard disk drive are platter, spindle, read-write head,
actuator arm assembly, and controller board

I/O operations in a HDD are performed by rapidly moving the arm across the
rotating fl at platters coated with magnetic particles. Data is transferred between the
disk controller and magnetic platters through the read-write (R/W) head which is
attached to the arm. Data can be recorded and erased on magnetic platters any number
of times.

Platter
A typical HDD consists of one or more flat circular disks called platters.
The data is recorded on these platters in binary codes (0s and 1s).
A platter is a rigid, round disk coated with magnetic material on both
surfaces (top and bottom).
Data can be written to or read from both surfaces of the platter. The number of
platters and the storage capacity of each platter determine the total capacity of the
drive.
INFORMATION STORAGE MANAGEMENT UNIT-1

Spindle
A spindle connects all the platters and is connected to a motor.
The motor of the spindle rotates with a constant speed.
Common spindle speeds are 5,400 rpm, 7,200 rpm, 10,000 rpm, & 15,000rpm.

Read/Write Head
Read/Write (R/W) heads, as shown in Figure 2-7, read and write data from or
to platters.
Drives have two R/W heads per platter, one for each surface of the platter. The
R/W head changes the magnetic polarization on the surface of the platter when writing
data.
While reading data, the head detects the magnetic polarization on the surface of
the platter. During reads and writes, the R/W head senses the magnetic polarization
and never touches the surface of the platter.
When the spindle is rotating, there is a microscopic air gap maintained between
the R/W heads and the platters, known as the head flying height. This air gap is
removed when the spindle stops rotating and the R/W head rests on a special area on
the platter near the spindle. This area is called the landing zone.
The landing zone is coated with a lubricant to reduce friction between the head
and the platter. If the drive malfunctions and the R/W head accidentally touches the
surface of the platter outside the landing zone, a head crash occurs. In a head crash,
the magnetic coating on the platter is scratched and may cause damage to the R/W
head. A head crash generally results in data loss.

Actuator Arm Assembly


R/W heads are mounted on the actuator arm assembly, which positions the
R/W head at the location on the platter where the data needs to be written or read
(refer to Figure 2-7).
The R/W heads for all platters on a drive are attached to one actuator arm
assembly and move across the platters simultaneously.
INFORMATION STORAGE MANAGEMENT UNIT-1

Drive Controller Board


The controller is a printed circuit board, mounted at the bottom of a disk drive.
It consists of a microprocessor, internal memory, circuitry, and firmware.
The firmware controls the power to the spindle motor and the speed of the
motor. It also manages the communication between the drive and the host.
In addition, it controls the R/W operations by moving the actuator arm and
switching between different R/W heads, and performs the optimization of data access.

Physical Disk Structure

Data on the disk is recorded on tracks, The tracks are numbered, starting from
zero, from the outer edge of the platter.
The number of tracks per inch (TPI) on the platter (or the track density)
measures how tightly the tracks are packed on a platter. Each track is divided into
smaller units called sectors.
A sector is the smallest, individually addressable unit of storage.
INFORMATION STORAGE MANAGEMENT UNIT-1

A cylinder is a set of identical tracks on both surfaces of each drive platter.


The location of R/W heads is referred to by the cylinder number, not by the
track number.

Zoned Bit Recording


Platters are made of concentric tracks; the outer tracks can hold more data than
the inner tracks because the outer tracks are physically longer than the inner tracks.
Zoned bit recording uses the disk efficiently. As shown in Figure 2-9 (b), this
mechanism groups tracks into zones based on their distance from the center of the
disk. The zones are numbered, with the outermost zone being zone 0.

1.5 Disk Drive Performance


A disk drive is an electromechanical device that governs the overall
performance of the storage system environment.
Disk Service Time
Disk service time is the time taken by a disk to complete an I/O request.
Components that contribute to the service time on a disk drive are seek time,
rotational latency, and data transfer rate.

Seek Time
The seek time (also called access time) describes the time taken to position the
R/W heads across the platter with a radial movement (moving along the radius of the
platter). In other words, it is the time taken to position and settle the arm and the head
over the correct track.
The seek time of a disk is typically specified by the drive manufacturer. The
average seek time on a modern disk is typically in the range of 3 to 15 milliseconds.

Rotational Latency
To access data, the actuator arm moves the R/W head over the platter to a
particular track while the platter spins to position the requested sector under the
R/W head.
INFORMATION STORAGE MANAGEMENT UNIT-1

The time taken by the platter to rotate and position the data under the R/W head
is called rotational latency.

Data Transfer Rate


The data transfer rate (also called transfer rate) refers to the average amount of
data per unit time that the drive can deliver to the HBA
In a read operation, the data fi rst moves from disk platters to R/W heads; then
it moves to the drive’s internal buffer. Finally, data moves from the buffer through
the interface to the host HBA.
In a write operation, the data moves from the HBA to the internal buffer of the
disk drive through the drive’s interface. The data then moves from the buffer to the
R/W heads. Finally, it moves from the R/W heads to the platters.

Internal transfer rate is the speed at which data moves from a platter’s
surface to the internal buffer (cache) of the disk. The internal transfer rate takes into
account factors such as the seek time and rotational latency.
External transfer rate is the rate at which data can move through the interface
to the HBA. The external transfer rate is generally the advertised speed of the
interface, such as 133 MB/s

Disk I/O Controller Utilization


Utilization of a disk I/O controller has a significant impact on the I/O response
time. The I/O requests arrive at the controller at the rate generated by the application.
This rate is also called the arrival rate. These requests are held in the I/O queue,
and the I/O controller processes them one by one
INFORMATION STORAGE MANAGEMENT UNIT-1

1.6 Introduction to Flash Drives


Flash drives, also referred as solid state drives (SSDs), are new generation
drives that deliver ultra-high performance required by performance-sensitive
applications.
Flash drives use semiconductor-based solid state memory (flash memory) to
store and retrieve data. Unlike conventional mechanical disk drives, flash drives
contain no moving parts; therefore, they do not have seek and rotational latencies.
Flash drives deliver a high number of IOPS with very low response times.

1.6.1 Components and Architecture of Flash Drives


The key components of a flash drive are the controller, I/O interface, mass
storage (collection of memory chips), and cache. The controller manages the
functioning of the drive, and the I/O interface provides power and data access. Mass
storage is an array of Non volatile NAND (negated AND) memory chips used for
storing data.
Cache serves as a temporary space or buffer for data transaction and
operations. Generally, the larger the number of flash memory chips and channels, the
higher the drive’s internal bandwidth, and ultimately the higher the drive’s
performance.
Flash drives typically have eight to 24 channels. Memory chips in flash drives
are logically organized in blocks and pages.
A page is the smallest object that can be read or written on a flash drive. Pages
are grouped together into blocks.

1.7 RAID
RAID or Redundant Array of Independent Disks, is a technology to
connect multiple secondary storage devices and use them as a single storage media.
RAID consists of an array of disks in which multiple disks are
connected together to achieve different goals. RAID levels define the use of disk
arrays.

1.7.1 RAID Implementation Methods


1. Software RAID
Software RAID uses host-based software to provide RAID functions. It is
implemented at the operating-system level and does not use a dedicated hardware
controller to manage the RAID array.
Software RAID implementations offer cost and simplicity benefits when
compared with hardware RAID. However, they have the following limitations:
INFORMATION STORAGE MANAGEMENT UNIT-1

Performance: Software RAID affects overall system performance. This is due to


additional CPU cycles required to perform RAID calculations.
Supported features: Software RAID does not support all RAID levels.
Operating system compatibility: Software RAID is tied to the host operating
system; hence, upgrades to software RAID or to the operating system should be
validated for compatibility. This leads to inflexibility in the data-processing
environment.

2. Hardware RAID
In hardware RAID implementations, a specialized hardware controller is
implemented either on the host or on the array.
Controller card RAID is a host-based hardware RAID implementation in which
a specialized RAID controller is installed in the host, and disk drives are connected
to it. Manufacturers also integrate RAID controllers on motherboards.
A host-based RAID controller is not an efficient solution in a data center
environment with a large number of hosts.
The key functions of the RAID controllers are as follows:
 Management and control of disk aggregations
 Translation of I/O requests between logical disks and physical disks
 Data regeneration in the event of disk failures

1.7.2 RAID Array Components


A RAID array is an enclosure that contains a number of disk drives and
supporting hardware to implement RAID. A subset of disks within a RAID array can
be grouped to form logical associations called logical arrays, also known as a RAID
set or a RAID group
INFORMATION STORAGE MANAGEMENT UNIT-1

1.7.3 RAID Techniques


RAID techniques — striping, mirroring, and parity — form the basis for
defining various RAID levels. These techniques determine the data availability and
performance characteristics of a RAID set.

Striping
Striping is one of the fundamental techniques used in RAID (Redundant Array
of Independent Disks) to improve data storage performance. It involves dividing data
into smaller chunks (often referred to as blocks or stripes) and writing these chunks
sequentially across multiple hard drives in the RAID array. Each chunk of data is
typically of the same size, and the data is distributed across the drives in a round-robin
fashion.

Mirroring
Mirroring is one of the core techniques used in RAID (Redundant Array of
Independent Disks) to enhance data redundancy and fault tolerance. It involves
duplicating data across multiple hard drives in the RAID array.

Parity
Parity is a key concept in RAID (Redundant Array of Independent Disks) that
is used to provide fault tolerance and data recovery capabilities in certain RAID
levels. Parity is a calculated value that is used to reconstruct data in the event of a
drive failure.

1.7.3 RAID Levels


1. RAID-0 (Stripping)
2. RAID-1 (Mirroring)
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
6. RAID-5 (Block-Level Stripping with Distributed Parity)
7. RAID-6 (Block-Level Stripping with two Parity Bits)

RAID 0
o RAID level 0 provides data stripping, i.e., a data can place across
multiple disks. It is based on stripping that means if one disk fails then all data
in the array is lost.
o This level doesn't provide fault tolerance but increases the system
performance.
INFORMATION STORAGE MANAGEMENT UNIT-1

Advantages
1. It is easy to implement.
2. It utilizes the storage capacity in a better way.
Disadvantages
1. A single drive loss can result in the complete failure of the system.
2. Not a good choice for a critical system.

RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive
2. It provides 100% redundancy in case of a failure.
Only half space of the drive is used to store the data. The other half of drive is
just a mirror to the already stored data.

Advantages of RAID 1:
The main advantage of RAID 1 is fault tolerance. In this level, if one disk
fails, then the other automatically takes over.
In this level, the array will function even if any one of the drives fails.
Disadvantages of RAID 1:
In this level, one extra drive is required per drive for mirroring, so the expense
is higher.

RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data,
striped on different disks.
Like level 0, each data bit in a word is recorded on a separate disk and ECC
codes of the data words are stored on different set disks.
INFORMATION STORAGE MANAGEMENT UNIT-1

Due to its complex structure and high cost, RAID 2 is not commercially
available.

Advantages of RAID 2:
This level uses one designated drive to store parity.
It uses the hamming code for error detection.
Disadvantages of RAID 2:
It requires an additional drive for error detection.

RAID 3
RAID 3 consists of byte-level striping with dedicated parity. In this level,
the parity information is stored for each disk section and written to a dedicated
parity drive.
In case of drive failure, the parity drive is accessed, and data is
reconstructed from the remaining devices. Once the failed drive is replaced, the
missing data can be restored on the new drive.
In this level, data can be transferred in bulk. Thus high-speed data
transmission is possible.

Advantages of RAID 3:
In this level, data is regenerated using parity drive.
It contains high data transfer rates.
In this level, data is accessed in parallel.
Disadvantages of RAID 3:
It required an additional drive for parity.
It gives a slow performance for operating on small sized files.

RAID 4
INFORMATION STORAGE MANAGEMENT UNIT-1

RAID 4 consists of block-level stripping with a parity disk. Instead of


duplicating data, the RAID 4 adopts a parity-based approach.
This level allows recovery of at most 1 disk failure due to the way
parity works. In this level, if more than one disk fails, then there is no way to recover
the data.
Level 3 and level 4 both are required at least three disks to implement RAID.

Advantages
It helps in reconstructing the data if at most one data is lost.
Disadvantages
It can’t help in reconstructing when more than one data is lost.

RAID 5
RAID 5 is a slight modification of the RAID 4 system. The only
difference is that in RAID 5, the parity rotates among the drives.
It consists of block-level striping with DISTRIBUTED parity.
Same as RAID 4, this level allows recovery of at most 1 disk failure. If more
than one disk fails, then there is no way for data recovery.
This level was introduced to make the random write performance
better.

Advantages of RAID 5:
This level is cost effective and provides high performance.
In this level, parity is distributed across the disks in an array.
It is used to make the random write performance better.
Disadvantages of RAID 5:
INFORMATION STORAGE MANAGEMENT UNIT-1

In this level, disk failure recovery takes longer time as parity has to be
calculated from all available drives.
This level cannot survive in concurrent drive failure.

RAID 6
This level is an extension of RAID 5. It contains block-level stripping
with 2 parity bits.
Raid-6 helps when there is more than one disk failure. A pair of independent
parities are generated and stored on multiple disks at this level. Ideally, you need
four disk drives for this level.
There are also hybrid RAIDs, which make use of more than one RAID level
nested one after the other, to fulfill specific requirements.

Advantages
Very high data Accessibility.
Fast read data transactions.
Disadvantages
Due to double parity, it has slow write data transactions.
Extra space is required.

1.7.4 Advantages of RAID


1. Increased data reliability: RAID provides redundancy, which means that if one
disk fails, the data can be recovered from the remaining disks in the array. This
makes RAID a reliable storage solution for critical data.
INFORMATION STORAGE MANAGEMENT UNIT-1

2. Improved performance: RAID can improve performance by spreading data


across multiple disks. This allows multiple read/write operations to co-occur,
which can speed up data access.
3. Scalability: RAID can be scaled by adding more disks to the array. This means
that storage capacity can be increased without having to replace the entire storage
system.
4. Cost-effective: Some RAID configurations, such as RAID 0, can be
implemented with low-cost hardware. This makes RAID a cost-effective solution
for small businesses or home users.

1.7.5 Disadvantages of RAID


1. Cost: Some RAID configurations, such as RAID 5 or RAID 6, can be expensive
to implement. This is because they require additional hardware or software to
provide redundancy.
2. Performance limitations: Some RAID configurations, such as RAID 1 or RAID
5, can have performance limitations. For example, RAID 1 can only read data as
fast as a single drive, while RAID 5 can have slower write speeds due to the
parity calculations required.
3. Complexity: RAID can be complex to set up and maintain. This is especially
true for more advanced configurations, such as RAID 5 or RAID 6.
4. Increased risk of data loss: While RAID provides redundancy, it is not a
substitute for proper backups. If multiple drives fail simultaneously, data loss can
still occur.

1.8 Intelligent Storage Systems


RAID technology made an important contribution to enhancing storage
performance and reliability, but disk drives, even with a RAID implementation, could
not meet the performance requirements of today’s applications.
With advancements in technology, a new breed of storage solutions, known as
intelligent storage systems, has evolved. These storage systems are configured with a
large amount of memory (called cache) and multiple I/O paths and use sophisticated
algorithms to meet the requirements of performance-sensitive applications.

1.8.1 Components of an Intelligent Storage System


An intelligent storage system consists of four key components: front end,
cache, back end, and physical disks. An I/O request received from the host at the
front-end port is processed through cache and back end, to enable storage and retrieval
of data from the physical disk.
INFORMATION STORAGE MANAGEMENT UNIT-1

A read request can be serviced directly from cache if the requested data is
found in the cache. In modern intelligent storage systems, front end, cache, and back
end are typically integrated on a single board

Front End
The front end provides the interface between the storage system and the host. It
Consists of two components: front-end ports and front-end controllers. Typically,
a front end has redundant controllers for high availability, and each controller contains
multiple ports that enable large numbers of hosts to connect to the intelligent storage
system.
Front-end controllers route data to and from cache via the internal data bus.
When the cache receives the write data, the controller sends an acknowledgment
message back to the host.

Cache
Cache is semiconductor memory where data is placed temporarily to reduce the
time required to service I/O requests from the host.
Cache improves storage system performance by isolating hosts from the
mechanical delays associated with rotating disks or hard disk drives (HDD).
Accessing data from cache is fast and typically takes less than a millisecond.
On intelligent arrays, write data is first placed in cache and then written to disk.

Structure of Cache
Cache is organized into pages, which is the smallest unit of cache allocation.
The size of a cache page is configured according to the application I/O size.
Cache consists of the data store and tag RAM. The data store holds the data
whereas the tag RAM tracks the location of the data in the data store (see Figure 4-2)
and in the disk.
Entries in tag RAM indicate where data is found in cache and where the data
INFORMATION STORAGE MANAGEMENT UNIT-1

belongs on the disk

Read Operation with Cache


When a host issues a read request, the storage controller reads the tag RAM
to determine whether the required data is available in cache. If the requested data is
found in the cache, it is called a read cache hit or read hit and data is sent directly to
the host, without any disk operation If the requested data is not found in cache, it is
called a cache miss and the data must be read from the disk
INFORMATION STORAGE MANAGEMENT UNIT-1

Write Operation with Cache


Write operations with cache provide performance advantages over writing
directly to disks. When an I/O is written to cache and acknowledged, it is completed
in far less time (from the host’s perspective) than it would take to write directly to
disk.
A write operation with cache is implemented in the following ways:
Write-back cache: Data is placed in cache and an acknowledgment is sent to the host
immediately. Later, data from several writes are committed (de-staged) to the disk.
Write response times are much faster because the write operations are isolated from
the mechanical delays of the disk. However, uncommitted data is at risk of loss if
cache failures occur.
Write-through cache: Data is placed in the cache and immediately written to the
disk, and an acknowledgment is sent to the host. Because data is committed to disk as
it arrives, the risks of data loss are low, but the write-response time is longer because
of the disk operations.

Cache Data Protection


Cache is volatile memory, so a power failure or any kind of cache failure will
cause loss of the data that is not yet committed to the disk. This risk of losing
uncommitted data held in cache can be mitigated using cache mirroring and
cache vaulting:

Cache mirroring
Cache mirroring is a technique used in computer systems, particularly in
storage systems and RAID controllers, to enhance data reliability and availability by
duplicating data stored in a cache.
The primary purpose of cache mirroring is to provide redundancy for cached
data, ensuring that critical information remains available even in the event of a cache
failure.
Cache vaulting:
Cache vaulting is particularly important in enterprise-level storage systems
where data integrity and availability are critical. It provides an additional layer of
protection against data loss caused by power disruptions or unexpected system
failures, reducing the risk of data corruption or inconsistencies.

Back End
The back end provides an interface between cache and the physical disks.
It consists of two components: back-end ports and back-end controllers.
The back-end controls data transfers between cache and the physical disks.
From cache, data is sent to the back end and then routed to the destination disk.
INFORMATION STORAGE MANAGEMENT UNIT-1

The back-end controller communicates with the disks when performing reads
and writes and also provides additional, but limited, temporary data storage. The
algorithms implemented on back-end controllers provide error detection and
correction, along with RAID functionality.

Physical Disk
Physical disks are connected to the back-end storage controller and provide
persistent data storage. Modern intelligent storage systems provide support to a variety
of disk drives with different speeds and types, such as FC, SATA, SAS, and flash
drives.

1.8.2 Types of Intelligent Storage Systems


Intelligent storage systems generally fall into one of the following two
categories:
High-End Storage Systems
High-end storage systems, referred to as active-active arrays, are generally
aimed at large enterprise applications. These systems are designed with a large
number of controllers and cache memory.

Midrange Storage Systems


INFORMATION STORAGE MANAGEMENT UNIT-1

Midrange storage systems are also referred to as active-passive arrays and are
best suited for small- and medium-sized enterprise applications. They also provide
optimal storage solutions at a lower cost. In an active-passive array, a host can
perform I/Os to a LUN only through the controller that owns the LUN.
Midrange storage systems are typically designed with two controllers, each of
which contains host interfaces, cache, RAID controllers, and interface to disk drives.

1.9 Storage provisioning


Storage provisioning is the process of allocating and managing storage
resources to meet the data storage requirements of an organization. It involves the
allocation of storage capacity, performance, and features to applications, users, or
systems based on their specific needs.
The goal of storage provisioning is to efficiently utilize storage resources while
ensuring data availability, performance, and scalability.

1.9.1 Traditional Storage Provisioning


It is also known as Thick Provisioning, In Traditional storage provisioning,
physical storage drives are logically grouped together on which a required RAID level
is applied to form a set, called RAID set.
The number of drives in the RAID set and the RAID level determine the
availability, capacity, and performance of the RAID set.
RAID sets usually have a large capacity because they combine the total
capacity of individual drives in the set. Logical units are created from the RAID sets
by partitioning the available capacity into smaller units.
INFORMATION STORAGE MANAGEMENT UNIT-1

These units are then assigned to the hosts based on their storage requirements.
Logical units are spread across all the physical drives that belong to that set. Each
logical unit created from the RAID set is assigned a unique ID, called a logical unit
number (LUN).

1.9.2 Virtual Storage Provisioning


It is also known as Thin Provisioning. Virtual provisioning enables creating
and presenting a LUN with more capacity than is physically allocated to it on the
storage system. The LUN created using virtual provisioning is called a thin LUN to
distinguish it from the traditional LUN.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy