DBMS Unit 5
DBMS Unit 5
DBMS Unit 5
Lecture by
K Chandrasena Chary
Associate Professor & TPO
Department of CSE,
Sree Chaitanya Institute of
Technological Sciences,
Karimnagar, Telangana
Mobile:9885744643
Mail: chandu518@gmail.com
Syllabus
Syllabus
UNIT – IV Course Overview:
FILE ORGANIZATION & INDEXING
Introduction to File organization
Different Types of File organizations
Introduction to Indexing
Primary & secondary indexes
Index Data structures
Tree & Hash based Indexing
File Organization in DBMS
1. The File is a collection of records. Using the primary key, we can access the records. The
type and frequency of access can be determined by the type of file organization which
was used for a given set of records.
2. File organization is a logical relationship among various records.
3. This method defines how file records are mapped onto disk blocks.
4. File organization is used to describe the way in which the records are stored in terms of
blocks, and the blocks are placed on the storage medium.
5. The first approach to map the database to the file is to use the several files and store only
one fixed length record in any given file.
6. An alternative approach is to structure our files so that we can contain multiple lengths
for records.
7. Files of fixed length records are easier to implement than the files of variable length
records.
Types of file organization:
File organization contains various methods. These particular methods have pros and
cons on the basis of access or selection.
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the
cluster key, we generate the value of the hash key for the cluster key and store the records
with the same hash key value.
Pros of Cluster file organization
1. The cluster file organization is used when there is a frequent request for
joining the tables with same joining condition.
2. It provides the efficient result when there is a 1:M mapping between the
tables.
3. The first column of the database is the search key that contains a copy of the primary key
or candidate key of the table. The values of the primary key are stored in sorted order so
that the corresponding data can be accessed easily.
4. The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known as
ordered indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
In the case of a database with no index, we have to search the disk block from starting till it reaches
543. The DBMS will read the record after reading 543*10=5430 bytes. In the case of an index, we
will search using indexes and the DBMS will read the record after reading 542*2= 1084 bytes which
are very less compared to the previous case.
Primary Index
If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the
records. As primary keys are stored in sorted order, the performance of the searching operation is
quite efficient. The primary index can be classified into two types:
Dense index and Sparse index.
Dense Index
The dense index contains an index record for every search key value in the data file. It makes
searching faster. In this, the number of records in the index table is same as the number of
records in the main table. It needs more space to store index record itself. The index records
have the search key and a pointer to the actual record on the disk.
Sparse index
In the data file, index record appears only for a few items. Each item points to a block. In this,
instead of pointing to each record in the main table, the index points to the records in the
main table in a gap.
Clustering Index
A clustered index can be defined as an ordered data
file. Sometimes the index is created on non-primary
key columns which may not be unique for each
record.
If we want to insert some new record into the file but the address of a data bucket generated
by the hash function is not empty, or data already exists in that address. This situation in the
static hashing is known as bucket overflow. This is a critical situation in this method.
To overcome this situation, there are various methods. Some commonly used methods are
as follows:
1. Open Hashing - When a hash function generates an address at which data is already
stored, then the next bucket will be allocated to it. This mechanism is called as Linear
Probing.
For example: suppose R3 is a new address which needs to be inserted, the hash function
generates address as 112 for R3. But the generated address is already full. So the system
searches next available data bucket, 113 and assigns R3 to it.
2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is
linked after the previous one. This mechanism is known as Overflow chaining.
For example: Suppose R3 is a new address which needs to be inserted into the table, the hash
function generates address as 110 for it. But this bucket is full to store the new data. In this
case, a new bucket is inserted at the end of 110 buckets and is linked to it.
Dynamic Hashing
1. The dynamic hashing method is used to overcome the problems of static hashing like
bucket overflow.
2. In this method, data buckets grow or shrink as the records increases or decreases. This
method is also known as Extendable hashing method.
3. This method makes hashing dynamic, i.e., it allows insertion or deletion without
resulting in poor performance.
How to search a key
4. First, calculate the hash address of the key.
5. Check how many bits are used in the directory, and these bits are called as i.
6. Take the least significant i bits of the hash address.
7. This gives an index of the directory. Now using the index, go to the directory and find
bucket address where the record might be.
How to insert a new record
Firstly, you have to follow the same procedure for retrieval, ending up in some bucket.
If there is still space in that bucket, then place the record in it. If the bucket is full,
then we will split the bucket and redistribute the records.
For example: Consider the following grouping of keys into buckets, depending on the
prefix of their hash address:
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two
bits of 5 and 6 are 01, so it will go into bucket B1. The last two bits of 1 and 3
are 10, so it will go into bucket B2. The last two bits of 7 are 11, so it will go
into B3.
Advantages of dynamic hashing
1. In this method, the performance does not decrease as the data grows in the system. It
simply increases the size of memory to accommodate the data.
2. In this method, memory is well utilized as it grows and shrinks with the data. There will
not be any unused memory lying.
3. This method is good for the dynamic database where data grows and shrinks frequently.
Disadvantages of dynamic hashing
4. In this method, if the data size increases then the bucket size is also increased. These
addresses of data will be maintained in the bucket address table.
5. This is because the data address will keep changing as buckets grow and shrink.
6. If there is a huge increase in data, maintaining the bucket address table becomes tedious.
7. In this case, the bucket overflow situation will also occur. But it might take little time to
reach this situation than static hashing.
Wish You All The Best