UNIT - 4 Dbms
UNIT - 4 Dbms
UNIT - 4 Dbms
IMPLEMENTATION TECHNIQUES
RAID
Introduction
• RAID Redundant Array of Independent Disks
• Technology in which multiple secondary disks are connected
together to increase the performance, data redundancy or both
• For achieving data redundancy same data is backed up onto
another disk, we can retrieve data and perform operation
• It consists of an array of disks in which multiple disks are
connected to achieve different goals
• Main advantage the array of disks can be presented as a single
disk
Need for RAID
• RAID is a technology that is used to increase the
performance
• It is used for increased reliability of data storage
• An array of multiple disks accessed in parallel will give
greater throughput than a single disk
• With multiple disks and a suitable redundancy scheme, your
system can stay up and running when a disk fails, and even
while the replacement disk is being installed and its data
restored
Why RAID?
• Disk performance
• Disk redundancy
• Provide data security
• Protects Data from disk failure
Features
1. Contains the set of physical disk drives
2. The OS views the separate disks as a single logical disk
3. The data is distributed across the physical drives of the array
4. In case of disk failure, the parity information can be helped
to recover the data
RAID Levels
Various levels in RAID
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
RAID 0
• RAID level 0 divides data into block units and writes them across
a number of disks
• Each disk receives a block of data to write/read in parallel
• As data is placed across multiple disks, it is also called data
stripping
• There is no parity checking of data
• If data in one drive gets corrupted then all the data would be lost.
Thus RAID 0 does not support data recovery
• RAID 0 implementation requires minimum 2 disks
Contd…
Contd…
Advantages
1. Implementation is easy
2. It increases speed
3. No overhead of parity calculation
4. Throughput speed is increased
– Disk seek in parallel
– Multiple data requests probably not on same disk
– A set of data is likely to be striped across multiple disks
Disadvantages
No error checking of data
RAID 1
• It is called as mirroring as it copies data onto two disk drives
simultaneously
• The automatic duplication of the data means there is little
likelihood of data loss or system downtime
• This level provides 100% redundancy in case of failure
• Only half space of the drive is used to store the data. The other
half of drive is just a mirror to the already stored data
• Data can be read from either disk but is written on both disks
Contd…
Contd…
Advantage
1. Fault tolerance
2. Used to store systems software and other highly critical files
Disadvantage
3. Cost. Since data is duplicated, storage costs increase
RAID 2
• In RAID 2 mechanism, all disks participate in the execution of
every I/O request
• Data stripping is used
• This level makes use of mirroring as well as stores Error
Correcting Code (ECC) for its data striped on different disks
• The data is stored in separate set of disks and ECC is stored
another set of disks
• This level has a complex structure and high cost. Hence it is
not used commercially
• not implemented in practice due to high costs and overheads
Contd…
RAID 3
• Data is divided into byte units and written across multiple disk
drives
• Parity information is stored for each disk section and written to a
dedicated parity drive
• All disks can be accessed in parallel
• Data can be transferred in bulk. Thus high speed data
transmission is possible
• In case of drive failure, the parity drive is accessed and data is
reconstructed from the remaining devices
• Once the failed drive is replaced, the missing data can be restored
on the new drive
• RAID 3 can provide very high data transfer rates
Contd…
RAID 4
• In this level, an entire block of data is written onto data disks and
then the parity is generated and stored on a different disk
• Note that level 3 uses byte-level striping, whereas level 4 uses
block-level striping
• Both level 3 and level 4 require at least three disks to implement
RAID
• It consists of block-level stripping with a parity disk
Contd…
RAID 5
• RAID 5 is a modification of RAID level 4
• RAID 5 writes whole data blocks onto different disks, but the
parity bits generated for data block stripe are distributed among
all the data disks rather than storing them on a different dedicated
disk
RAID 6
• RAID 6 is a extension of RAID level 5
• In this level, two independent parities are generated and stored in
distributed fashion among multiple disks
• Two parities provide additional fault tolerance
• This level requires at least four disk drives to implement RAID.
The factors to be taken into account in choosing a RAID level
are:
1. Performance requirements in terms of number of I/O
operations
2. Performance when a disk has failed
3. Performance during rebuild
UNIT-IV
IMPLEMENTATION TECHNIQUES
FILE ORGANIZATION
Introduction
• A file organization is a method of arranging records in a file when the
file is stored on disk
• File organization is a logical relationship among various records.
• This method defines how file records are mapped onto disk blocks.
• File organization is used to describe the way in which the records
are stored in terms of blocks, and the blocks are placed on the
storage medium.
• A file is organized logically as a sequence of records
• Record is a sequence of fields
• 2 types of records
1. Fixed Length Record
2. Variable Length Record
Objective
• It contains an optimal selection of records, i.e., records can be
selected as fast as possible.
• To perform insert, delete or update transaction on the records
should be quick and easy.
• The duplicate records cannot be induced as a result of insert,
update or delete.
• For the minimal cost of storage, records should be stored
efficiently.
Types
1 1 1 1 A A A 1
0 0 0 1 1 1 1 1 1 1 1 1 1
NULL
Starting Length Which attributes of that record
Bitmap
Address Record have a null value
ORGANIZATION OF RECORDS
IN FILE
Introduction
There are three commonly used approaches of organizing records
in file-
1. Sequential File Organization
2. Heap File Organization
3. Hashing File Organization
Sequential File Organization
• The sequential file organization is a simple file organization
method in which the records are stored based on the search key
value
• This method is the easiest method for file organization
• In this method, files are stored sequentially
• This method can be implemented in two ways:
1. Pile File Method
2. Sorted File Method
Pile File Method
• It is a quite simple method. In this method, we store the record
in a sequence, i.e., one after another. Here, the record will be
inserted in the order in which they are inserted into tables.
• In case of updating or deleting of any record, the record will
be searched in the memory blocks. When it is found, then it
will be marked for deleting, and the new record is inserted.
Contd…
Insertion of new record
Suppose we have four records R1, R3 and so on upto R9 and R8 in a
sequence. Hence, records are nothing but a row in the table. Suppose
we want to insert a new record R2 in the sequence, then it will be
placed at the end of the file.
Sorted File Method
• In this method, the new record is always inserted at the file's end,
and then it will sort the sequence in ascending or descending order.
Sorting of records is based on any primary key or any other key.
• In the case of modification of any record, it will update the record
and then sort the file, and lastly, the updated record is placed in the
right place.
Contd…
Insertion of new record
Suppose there is a preexisting sorted sequence of four records R1, R3
and so on upto R6 and R7. Suppose a new record R2 has to be
inserted in the sequence, then it will be inserted at the end of the file,
and then it will sort the sequence.
Pros of sequential file organization
• It contains a fast and efficient method for the huge amount of data.
• In this method, files can be easily stored in cheaper storage
mechanism like magnetic tapes.
• It is simple in design. It requires no much effort to store the data.
• This method is used when most of the records have to be accessed
like grade calculation of a student, generating the salary slip, etc.
• This method is used for report generation or statistical calculations.
Cons of sequential file organization
• It will waste time as we cannot jump on a particular record that is
required but we have to move sequentially which takes our time.
• Sorted file method takes more time and space for sorting the
records.
Heap File Organization
• It is the simplest and most basic type of organization.
• It works with data blocks.
• In heap file organization, the records are inserted at the file's end.
When the records are inserted, it doesn't require the sorting and
ordering of records.
• When the data block is full, the new record is stored in some
other block. This new data block need not to be the very next
data block, but it can select any data block in the memory to store
new records. The heap file is also known as an unordered file.
• In the file, every record has a unique id, and every page in a file
is of the same size. It is the DBMS responsibility to store and
manage the new records.
Contd…
Contd…
Contd…
Pros of Heap file organization
• It is a very good method of file organization for bulk insertion. If
there is a large number of data which needs to load into the
database at a time, then this method is best suited.
• In case of a small database, fetching and retrieving of records is
faster than the sequential record.
Cons of Heap file organization
• This method is inefficient for the large database because it takes
time to search or modify the record.
• This method is inefficient for large databases.
Hash File Organization
• Hash File Organization uses the computation of hash function on
some fields of the records.
• The hash function's output determines the location of disk block
where the records are to be placed.
Contd…
• When a record has to be received using the hash key columns, then
the address is generated, and the whole record is retrieved using that
address.
• In the same way, when a new record has to be inserted, then the
address is generated using the hash key and record is directly
inserted. The same process is applied in the case of delete and
update.
• In this method, there is no effort for searching and sorting the entire
file. In this method, each record will be stored randomly in the
memory.
Contd…
INDEXING AND HASHING
Introduction
• An index is a data structure that organizes data records on the
disk to make the retrieval of data efficient
• The search key for an index is collection of one or more fields of
records using which we can efficiently retrieve the data that
satisfy the search conditions
• The indexes are required to speed up the search operations on file
of records
• 2 types of indices:
1. Ordered Indices
2. Hash Indices
Contd…
Ordered Indices: indexing is based on sorted ordering values
Hash Indices: indexing is based on uniform distribution of values
across range of buckets. The address of bucket is obtained using
the hash function
Several techniques
HASHING
Introduction
• This method is one where data is stored at the data blocks whose
address is generated by using hash table
• The memory location where these records are stored are called as
data block or bucket
• The hash function can use any of the column value to generate
the address
• Uses primary key to generate the hash index – address of the
data block
• Hash function can be simple mathematical function to any
complex mathematical function
Eg:
25 % 5 = 0 0 25
31 % 5 = 1 1 31
42 % 5 = 2 2 42
63 % 5 = 3 3 63
49 % 5 = 4 4 49
33 % 5 = 3
Overflow occurs
Types of Hashing
Hashing has two types
1. Static Hashing
2. Dynamic Hashing
Static Hashing
• In static hashing, when a search-key value is provided, the hash
function always computes the same address
• Eg:
If mod 4 hash function is used, then it shall generate only 5
values
• The output address shall always be same for that function
• The number of buckets provided remains unchanged at all times
• Operations used in static hashing are:
1. Insertion
2. Search
3. Delete
Contd…
Contd…
Advantages:
1. It is simple to implement
2. It allows speedy data storage
Disadvantages:
3. There are fixed number of buckets. This will create a
problematic situation if the number of records grow or shrink
4. The ordered access on hash key makes it inefficient
Overflow Chaining
• When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism
is called Closed Hashing
Open Hashing
• The open hashing is a form of static hashing technique
• When the collision occurs, i.e., if the hash key returns the same
address which is already allocated by some data record, then the
next available data block is used to enter new record instead of
overwriting the old record
• This technique is also called as linear probing
Eg:
Insert of record 105 in the hash table with the hash function h (key)
mod 10
Index Stud_RollNo
Index Stud_RollNo
0 10
0 10
1 1 1 1
2 22 2 22
3 3
4 4
5 55 5 55
6 106 6 106
7 7 105
8 88 8 88
9 19 9 19
Contd…
Advantages:
1. It is faster technique
2. It is simple to implement
Disadvantages:
3. It forms clustering, as the record is just inserted to next free
available slot
4. If the hash table gets full then the next subsequent records can
not be accommodated
Dynamic Hashing
• The problem with static hashing is that it does not expand or
shrink dynamically as the size of the database grows or
shrinks
• It provides a mechanism in which data buckets are added
and removed dynamically and on-demand
• The most commonly used technique of dynamic hashing is
extendible hashing
Extendible Hashing
• The extendible hashing is a dynamic hashing in which, if the
bucket is overflow, then the number of buckets are doubled and
data entries in buckets are re-distributed
Contd…
Difference between Static and Dynamic
Hashing
S. No. Static Hashing Dynamic Hashing
3. Open hashing and closed hashing are Extendible hashing and linear probing
forms of static hashing are forms of dynamic hashing
4. Space overhead is more Minimum space overhead due to
dynamic nature
5. As file grows the performance of There is no degradation in
static hash function decreases performance when the file grows
6. The bucket address table is not The bucket address table is required
required
7. The bucket is directly accessed The bucket address table is used to
access the bucket
THANK YOU!!!
THANK YOU!!!