ISM Unit 1
ISM Unit 1
The following is a list of some of the factors that have contributed to the
growth of digital data:
Increase in data-processing capabilities:
Lower cost of digital storage:
Affordable and faster communication technology:
Proliferation of applications and smart devices
1.1.5 Information
Information is the intelligence and knowledge derived from data.
Businesses analyze raw data in order to identify meaningful trends. On the
basis of these trends, a company can plan or modify its strategy. For example, a
retailer identifies customers’ preferred products and brand names by analyzing their
purchase patterns and maintaining an inventory of those products.
INFORMATION STORAGE MANAGEMENT UNIT-1
Effective data analysis not only extends its benefits to existing businesses, but
also creates the potential for new business opportunities by using the information in
creative ways. Job portal is an example. In order to reach a wider set of prospective
employers, job seekers post their résumés on various websites offering job search
facilities.
These websites collect the résumés and post them on centrally accessible
locations for prospective employers. In addition, companies post available positions
on job search sites. Job-matching software matches keywords from résumés to
keywords in job postings. In this manner, the job search engine uses data and turns it
into information for employers and job seekers.
1.1.6 Storage
Data created by individuals or businesses must be stored so that it is easily
accessible for further processing. In a computing environment, devices designed for
storing data are termed storage devices or simply storage.
The type of storage used varies based on the type of data and the rate at which
it is created and used. Devices, such as a media card in a cell phone or digital camera,
DVDs, CD-ROMs, and disk drives in personal computers are examples of storage
devices.
Businesses have several options available for storing data, including internal
hard disks, external disk arrays, and tapes.
Figure 1-5 shows an example of an online order transaction system that involves the
five core elements of a data center and illustrates their functionality in a business
process.
I/O operations in a HDD are performed by rapidly moving the arm across the
rotating fl at platters coated with magnetic particles. Data is transferred between the
disk controller and magnetic platters through the read-write (R/W) head which is
attached to the arm. Data can be recorded and erased on magnetic platters any number
of times.
Platter
A typical HDD consists of one or more flat circular disks called platters.
The data is recorded on these platters in binary codes (0s and 1s).
A platter is a rigid, round disk coated with magnetic material on both
surfaces (top and bottom).
Data can be written to or read from both surfaces of the platter. The number of
platters and the storage capacity of each platter determine the total capacity of the
drive.
INFORMATION STORAGE MANAGEMENT UNIT-1
Spindle
A spindle connects all the platters and is connected to a motor.
The motor of the spindle rotates with a constant speed.
Common spindle speeds are 5,400 rpm, 7,200 rpm, 10,000 rpm, & 15,000rpm.
Read/Write Head
Read/Write (R/W) heads, as shown in Figure 2-7, read and write data from or
to platters.
Drives have two R/W heads per platter, one for each surface of the platter. The
R/W head changes the magnetic polarization on the surface of the platter when writing
data.
While reading data, the head detects the magnetic polarization on the surface of
the platter. During reads and writes, the R/W head senses the magnetic polarization
and never touches the surface of the platter.
When the spindle is rotating, there is a microscopic air gap maintained between
the R/W heads and the platters, known as the head flying height. This air gap is
removed when the spindle stops rotating and the R/W head rests on a special area on
the platter near the spindle. This area is called the landing zone.
The landing zone is coated with a lubricant to reduce friction between the head
and the platter. If the drive malfunctions and the R/W head accidentally touches the
surface of the platter outside the landing zone, a head crash occurs. In a head crash,
the magnetic coating on the platter is scratched and may cause damage to the R/W
head. A head crash generally results in data loss.
Data on the disk is recorded on tracks, The tracks are numbered, starting from
zero, from the outer edge of the platter.
The number of tracks per inch (TPI) on the platter (or the track density)
measures how tightly the tracks are packed on a platter. Each track is divided into
smaller units called sectors.
A sector is the smallest, individually addressable unit of storage.
INFORMATION STORAGE MANAGEMENT UNIT-1
Seek Time
The seek time (also called access time) describes the time taken to position the
R/W heads across the platter with a radial movement (moving along the radius of the
platter). In other words, it is the time taken to position and settle the arm and the head
over the correct track.
The seek time of a disk is typically specified by the drive manufacturer. The
average seek time on a modern disk is typically in the range of 3 to 15 milliseconds.
Rotational Latency
To access data, the actuator arm moves the R/W head over the platter to a
particular track while the platter spins to position the requested sector under the
R/W head.
INFORMATION STORAGE MANAGEMENT UNIT-1
The time taken by the platter to rotate and position the data under the R/W head
is called rotational latency.
Internal transfer rate is the speed at which data moves from a platter’s
surface to the internal buffer (cache) of the disk. The internal transfer rate takes into
account factors such as the seek time and rotational latency.
External transfer rate is the rate at which data can move through the interface
to the HBA. The external transfer rate is generally the advertised speed of the
interface, such as 133 MB/s
1.7 RAID
RAID or Redundant Array of Independent Disks, is a technology to
connect multiple secondary storage devices and use them as a single storage media.
RAID consists of an array of disks in which multiple disks are
connected together to achieve different goals. RAID levels define the use of disk
arrays.
2. Hardware RAID
In hardware RAID implementations, a specialized hardware controller is
implemented either on the host or on the array.
Controller card RAID is a host-based hardware RAID implementation in which
a specialized RAID controller is installed in the host, and disk drives are connected
to it. Manufacturers also integrate RAID controllers on motherboards.
A host-based RAID controller is not an efficient solution in a data center
environment with a large number of hosts.
The key functions of the RAID controllers are as follows:
Management and control of disk aggregations
Translation of I/O requests between logical disks and physical disks
Data regeneration in the event of disk failures
Striping
Striping is one of the fundamental techniques used in RAID (Redundant Array
of Independent Disks) to improve data storage performance. It involves dividing data
into smaller chunks (often referred to as blocks or stripes) and writing these chunks
sequentially across multiple hard drives in the RAID array. Each chunk of data is
typically of the same size, and the data is distributed across the drives in a round-robin
fashion.
Mirroring
Mirroring is one of the core techniques used in RAID (Redundant Array of
Independent Disks) to enhance data redundancy and fault tolerance. It involves
duplicating data across multiple hard drives in the RAID array.
Parity
Parity is a key concept in RAID (Redundant Array of Independent Disks) that
is used to provide fault tolerance and data recovery capabilities in certain RAID
levels. Parity is a calculated value that is used to reconstruct data in the event of a
drive failure.
RAID 0
o RAID level 0 provides data stripping, i.e., a data can place across
multiple disks. It is based on stripping that means if one disk fails then all data
in the array is lost.
o This level doesn't provide fault tolerance but increases the system
performance.
INFORMATION STORAGE MANAGEMENT UNIT-1
Advantages
1. It is easy to implement.
2. It utilizes the storage capacity in a better way.
Disadvantages
1. A single drive loss can result in the complete failure of the system.
2. Not a good choice for a critical system.
RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive
2. It provides 100% redundancy in case of a failure.
Only half space of the drive is used to store the data. The other half of drive is
just a mirror to the already stored data.
Advantages of RAID 1:
The main advantage of RAID 1 is fault tolerance. In this level, if one disk
fails, then the other automatically takes over.
In this level, the array will function even if any one of the drives fails.
Disadvantages of RAID 1:
In this level, one extra drive is required per drive for mirroring, so the expense
is higher.
RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data,
striped on different disks.
Like level 0, each data bit in a word is recorded on a separate disk and ECC
codes of the data words are stored on different set disks.
INFORMATION STORAGE MANAGEMENT UNIT-1
Due to its complex structure and high cost, RAID 2 is not commercially
available.
Advantages of RAID 2:
This level uses one designated drive to store parity.
It uses the hamming code for error detection.
Disadvantages of RAID 2:
It requires an additional drive for error detection.
RAID 3
RAID 3 consists of byte-level striping with dedicated parity. In this level,
the parity information is stored for each disk section and written to a dedicated
parity drive.
In case of drive failure, the parity drive is accessed, and data is
reconstructed from the remaining devices. Once the failed drive is replaced, the
missing data can be restored on the new drive.
In this level, data can be transferred in bulk. Thus high-speed data
transmission is possible.
Advantages of RAID 3:
In this level, data is regenerated using parity drive.
It contains high data transfer rates.
In this level, data is accessed in parallel.
Disadvantages of RAID 3:
It required an additional drive for parity.
It gives a slow performance for operating on small sized files.
RAID 4
INFORMATION STORAGE MANAGEMENT UNIT-1
Advantages
It helps in reconstructing the data if at most one data is lost.
Disadvantages
It can’t help in reconstructing when more than one data is lost.
RAID 5
RAID 5 is a slight modification of the RAID 4 system. The only
difference is that in RAID 5, the parity rotates among the drives.
It consists of block-level striping with DISTRIBUTED parity.
Same as RAID 4, this level allows recovery of at most 1 disk failure. If more
than one disk fails, then there is no way for data recovery.
This level was introduced to make the random write performance
better.
Advantages of RAID 5:
This level is cost effective and provides high performance.
In this level, parity is distributed across the disks in an array.
It is used to make the random write performance better.
Disadvantages of RAID 5:
INFORMATION STORAGE MANAGEMENT UNIT-1
In this level, disk failure recovery takes longer time as parity has to be
calculated from all available drives.
This level cannot survive in concurrent drive failure.
RAID 6
This level is an extension of RAID 5. It contains block-level stripping
with 2 parity bits.
Raid-6 helps when there is more than one disk failure. A pair of independent
parities are generated and stored on multiple disks at this level. Ideally, you need
four disk drives for this level.
There are also hybrid RAIDs, which make use of more than one RAID level
nested one after the other, to fulfill specific requirements.
Advantages
Very high data Accessibility.
Fast read data transactions.
Disadvantages
Due to double parity, it has slow write data transactions.
Extra space is required.
A read request can be serviced directly from cache if the requested data is
found in the cache. In modern intelligent storage systems, front end, cache, and back
end are typically integrated on a single board
Front End
The front end provides the interface between the storage system and the host. It
Consists of two components: front-end ports and front-end controllers. Typically,
a front end has redundant controllers for high availability, and each controller contains
multiple ports that enable large numbers of hosts to connect to the intelligent storage
system.
Front-end controllers route data to and from cache via the internal data bus.
When the cache receives the write data, the controller sends an acknowledgment
message back to the host.
Cache
Cache is semiconductor memory where data is placed temporarily to reduce the
time required to service I/O requests from the host.
Cache improves storage system performance by isolating hosts from the
mechanical delays associated with rotating disks or hard disk drives (HDD).
Accessing data from cache is fast and typically takes less than a millisecond.
On intelligent arrays, write data is first placed in cache and then written to disk.
Structure of Cache
Cache is organized into pages, which is the smallest unit of cache allocation.
The size of a cache page is configured according to the application I/O size.
Cache consists of the data store and tag RAM. The data store holds the data
whereas the tag RAM tracks the location of the data in the data store (see Figure 4-2)
and in the disk.
Entries in tag RAM indicate where data is found in cache and where the data
INFORMATION STORAGE MANAGEMENT UNIT-1
Cache mirroring
Cache mirroring is a technique used in computer systems, particularly in
storage systems and RAID controllers, to enhance data reliability and availability by
duplicating data stored in a cache.
The primary purpose of cache mirroring is to provide redundancy for cached
data, ensuring that critical information remains available even in the event of a cache
failure.
Cache vaulting:
Cache vaulting is particularly important in enterprise-level storage systems
where data integrity and availability are critical. It provides an additional layer of
protection against data loss caused by power disruptions or unexpected system
failures, reducing the risk of data corruption or inconsistencies.
Back End
The back end provides an interface between cache and the physical disks.
It consists of two components: back-end ports and back-end controllers.
The back-end controls data transfers between cache and the physical disks.
From cache, data is sent to the back end and then routed to the destination disk.
INFORMATION STORAGE MANAGEMENT UNIT-1
The back-end controller communicates with the disks when performing reads
and writes and also provides additional, but limited, temporary data storage. The
algorithms implemented on back-end controllers provide error detection and
correction, along with RAID functionality.
Physical Disk
Physical disks are connected to the back-end storage controller and provide
persistent data storage. Modern intelligent storage systems provide support to a variety
of disk drives with different speeds and types, such as FC, SATA, SAS, and flash
drives.
Midrange storage systems are also referred to as active-passive arrays and are
best suited for small- and medium-sized enterprise applications. They also provide
optimal storage solutions at a lower cost. In an active-passive array, a host can
perform I/Os to a LUN only through the controller that owns the LUN.
Midrange storage systems are typically designed with two controllers, each of
which contains host interfaces, cache, RAID controllers, and interface to disk drives.
These units are then assigned to the hosts based on their storage requirements.
Logical units are spread across all the physical drives that belong to that set. Each
logical unit created from the RAID set is assigned a unique ID, called a logical unit
number (LUN).