Lecture#5 Chap 6 RAID
Lecture#5 Chap 6 RAID
Array of
Independent
Disks
RAID
With the use of multiple disks, there is a wide
variety of ways in which the data can be
organized and in which redundancy can be
added to improve reliability.
Fortunately, industry has agreed on a
standardized scheme for multiple-disk
database design, known as RAID (Redundant
Array of Independent Disks).
The RAID scheme consists of seven levels, zero
through six.
The term RAID was originally coined in a paper by
a group of researchers at the University of
California at Berkeley
The paper outlined various RAID configurations
and applications and introduced the definitions of
the RAID levels that are still used.
The RAID strategy employs multiple disk drives and
distributes data in such a way as to enable
simultaneous access to data from multiple drives,
improving I/O performance and allowing easier
incremental increases in capacity.
RAID share three common
characteristics:
1. RAID is a set of physical disk drives viewed by
the operating system as a single logical
drive.
2. Data are distributed across the physical
drives of an array in a scheme known as
striping, described subsequently.
3. Redundant disk capacity is used to store
parity information, which guarantees data
recoverability in case of a disk failure.
RAID 0
No redundancy
Few applications, such as some on
supercomputers in which performance and
capacity are primary concerns and low cost is
more important than improved reliability.
Data striped across all disks
Round Robin striping
Increase speed
Multiple data requests probably not on
same disk
Disks seek in parallel
A set of data is likely to be striped across
multiple disks
Strips
The logical disk is divided into strips;
these strips may be physical blocks,
sectors, or some other unit.
The strips are mapped round robin to
consecutive physical disks in the RAID
array.
Data Mapping
The performance
The
performance of any of the RAID levels
depends critically on
the request patterns of the host system and
on the layout of the data.
These
issues can be most clearly
addressed in RAID 0,
For High data transfer rate
requirements
First, a high transfer capacity must exist along the
entire path between host memory and the individual
disk drives.
This includes internal controller buses, host system I/O
buses, I/O adapters, and host memory buses.
The second requirement is that the application must
make I/O requests that drive the disk array efficiently.
This requirement is met if the typical request is for large
amounts of logically contiguous data, compared to the
size of a strip.
In this case, a single I/O request involves the parallel
transfer of data from multiple disks, increasing the
effective transfer rate compared to a single-disk transfer.
Problems to consider
For an individual I/O request for a small
amount of data(strip size is relatively small),
the I/O time is dominated by the motion of
the disk heads (seek time) and the movement
of the disk (rotational latency).
If the strip size is relatively large, so that a
single I/O request only involves a single disk
access, then multiple waiting I/O requests can
be handled in parallel, reducing the queuing
time for each request.
Effective load balancing is achieved only if
there are typically multiple I/O requests
outstanding. The performance will also be
influenced by the strip size.
RAID Level 1
RAID 1 redundancy is achieved.
In these other RAID schemes, some form of parity
calculation is used to introduce redundancy,
whereas in RAID 1, redundancy is achieved by the
simple expedient of duplicating all the data.
Data striping is used, as in RAID 0. But in this case,
each logical strip is mapped to two separate
physical disks so that every disk in the array has a
mirror disk that contains the same data.
RAID 1 can also be implemented without data
striping, though this is less common.
Positive aspects to the RAID 1
organization
A read request can be serviced by either of
the two disks that contains the requested
data, whichever one involves the minimum
seek time plus rotational latency.
A write request requires that both
corresponding strips be updated, but this can
be done in parallel.
The write performance is dictated by the slower
of the two writes (i.e., the one that involves the
larger seek time plus rotational latency).
Positive aspects to the RAID 1
organization
Recovery from a failure is simple. When a
drive fails, the data may still be accessed
from the second drive.
disadvantage
The principal disadvantage of RAID 1 is the
cost;
it requires twice the disk space of the logical
disk that it supports.
Because of that, a RAID 1 configuration is
likely to be limited to drives that store system
software and data and other highly critical
files.
In these cases, RAID 1 provides real-time copy
of all data so that in the event of a disk failure,
all of the critical data are still immediately
available.
in a transaction-oriented environment,
RAID 1 can achieve high I/O request rates
if the bulk of the requests are reads.
In this situation, the performance of RAID
1 can approach double of that of RAID 0.
RAID Level 2
RAID levels 2 and 3 make use of a parallel access
technique.
In a parallel access array, all member disks
participate in the execution of every I/O request.
Typically, the bars of the individual drives are
synchronized so that each disk head is in the same
position on each disk at any given time.
As in the other RAID schemes, data striping is used. In
the case of RAID 2 and 3, the strips are very small,
often as small as a single byte or word.
With RAID 2, an error-correcting code is calculated
across corresponding bits on each data disk, and the
bits of the code are stored in the corresponding bit
positions on multiple parity disks.
Typically, a Hamming code is used, which is able to
correct single-bit errors
Pro cons
Although RAID 2 requires fewer disks than RAID 1, it
is still rather costly.
The number of redundant disks is proportional to
the log of the number of data disks. On a single
read, all disks are simultaneously accessed.
The requested data and the associated error-
correcting code are delivered to the array
controller.
If there is a single-bit error, the controller can
recognize and correct the error instantly, so that
the read access time is not slowed.
On a single write, all data disks and parity disks
must be accessed for the write operation.
RAID 2 would only be an effective choice
in an environment in which many disk
errors occur.
Given the high reliability of individual disks
and disk drives, RAID 2 is overkill and is not
implemented.
RAID Level 3
RAID 3 is organized in a similar fashion to RAID
2.
The difference is that RAID 3 requires only a
single redundant disk, no matter how large
the disk array.
RAID 3 employs parallel access, with data
distributed in small strips.
Instead of an error-correcting code, a simple
parity bit is computed for the set of individual
bits in the same position on all of the data
disks.
REDUNDANCY
In the event of a drive failure, the parity
drive is accessed and data is
reconstructed from the remaining
devices.
Once the failed drive is replaced, the
missing data can be restored on the new
drive and operation resumed.
Data reconstruction is simple.
Consider an array of five drives in which X0
through X3 contain data and X4 is the
parity disk. The parity for the ith bit is
calculated as follows:
Thus, the contents of each strip of data on X1 can
be regenerated from the contents of the
corresponding strips on the remaining disks in the
array.
This principle is true for RAID levels 3 through 6.
In the event of a disk failure, all of the data are still
available in what is referred to as reduced mode.
In this mode, for reads, the missing data are
regenerated on the fly using the exclusive-OR
calculation.
When data are written to a reduced RAID 3 array,
consistency of the parity must be maintained for
later regeneration.
Return to full operation requires that the failed disk
be replaced and the entire contents of the failed
disk be regenerated on the new disk.
PERFORMANCE
Because data are striped in very small strips,
RAID 3 can achieve very high data transfer
rates.
Any I/O request will involve the parallel
transfer of data from all of the data disks. For
large transfers, the performance
improvement is especially noticeable.
On the other hand, only one I/O request can
be executed at a time. Thus, in a transaction-
oriented environment, performance suffers.
RAID Level 4
RAID levels 4 through 6 make use of an
independent access technique.
In an independent access array, each
member disk operates independently, so
that separate I/O requests can be
satisfied in parallel.
Because of this, independent access
arrays are more suitable for applications
that require high I/O request rates
As in the other RAID schemes, data
striping is used. In the case of RAID 4
through 6, the strips are relatively large.
With RAID 4, a bit-by-bit parity strip is
calculated across corresponding strips on
each data disk, and the parity bits are
stored in the corresponding strip on the
parity disk.
To calculate the new parity, the array
management software must read the old
user strip and the old parity strip.
Then it can update these two strips with
the new data and the newly calculated
parity.
Thus, each strip write involves two reads
and two writes.
Pro cons
In the case of a larger size I/O write that
involves strips on all disk drives, parity is easily
computed by calculation using only the new
data bits.
Thus, the parity drive can be updated in
parallel with the data drives and there are no
extra reads or writes.
In any case, every write operation must
involve the parity disk, which therefore can
become a bottleneck.
RAID Level 5
RAID 5 is organized in a similar fashion to
RAID 4.
The difference is that RAID 5 distributes the
parity strips across all disks.
A typical allocation is a round-robin
scheme.
For an n-disk array, the parity strip is on a
different disk for the first n stripes, and the
pattern then repeats.
Pro cons
Thedistribution of parity strips across all
drives avoids the potential I/O bottle-
neck found in RAID 4.
RAID Level 6
Inthe RAID 6 scheme, two different parity
calculations are carried out and stored in
separate blocks on different disks.
Thus, a RAID 6 array whose user data
require N disks consists of N + 2 disks.
P and Q are two different data check
algorithms.
One of the two is the exclusive-OR
calculation used in RAID 4 and 5.
But the other is an independent data
check algorithm.
This makes it possible to regenerate data
even if two disks containing user data fail.
Pro cons
The advantage of RAID 6 is that it provides
extremely high data availability.
Three disks would have to fail to cause data to be
lost.
On the other hand, RAID 6 incurs a substantial
write penalty, because each write affects two
parity blocks.
RAID 6 controller can suffer more than a 30% drop
in overall write performance compared with a
RAID 5 implementation.
RAID 5 and RAID 6 read performance is
comparable.