sqlIndexes2
sqlIndexes2
Sparse Index
In sparse index, index records are not created for every search key. An index record
here contains a search key and an actual pointer to the data on the disk. To search
a record, we first proceed by index record and reach at the actual location of the
data. If the data we are looking for is not where we directly reach by following the
index, then the system starts sequential search until the desired data is found.
Introduction of Indexing
The main goal of designing the database is faster access to any data in the database
and quicker insert/delete/update to any data. This is because no one likes waiting.
When a database is very huge, even a smallest transaction will take time to perform
the action. In order to reduce the time spent in transactions, Indexes are used.
Indexes are similar to book catalogues in library or even like an index in a book.
What it does? It makes our search simpler and quicker. Same concept is applied
here in DBMS to access the files from the memory.
When records are stored in the primary memory like RAM, accessing them is very
easy and quick. But records are not limited in numbers to store in RAM. They are
very huge and we have to store it in the secondary memories like hard disk. As we
have seen already, in memory we cannot store records like we see – tables. They
are stored in the form of files in different data blocks. Each block is capable of storing
one or more records depending on its size.
When we have to retrieve any required data or perform some transaction on those data, we
have to pull them from memory, perform the transaction and save them back to the memory.
In order to do all these activities, we need to have a link between the records and the data
blocks so that we can know where these records are stored. This link between the records
and the data block is called index. It acts like a bridge between the records and the data
block.
Most of the time, we use this primary key to create index. Sometimes, we will have to fetch
the records based on other columns in the table which are not primary key. In such cases we
create index on those columns.
Ordered Indices
Imagine we have a student table with thousands of records, each of which is 10
bytes long. Imagine their IDs start from 1 2, 3… and goes on. And we have to
search student with ID 678. In a normal database with no index, it searches the disk
block from the beginning till it reaches 678. So the DBMS will reach this record after
reading 677*10 = 6770 bytes. But if we have index on ID column, then the address
of the location will be stored as each record as (1,200), (2, 201)… (678, 879) and so
on. One can imagine it as a smaller table with index column and address column.
Now if we want to search record with ID 678, then it will search using indexes. i.e.;
here it will traverse only 677*2 = 1354 bytes which very less compared to earlier one.
Hence retrieving the record from the disk becomes faster. Most of the cases these
indexes are sorted and kept to make searching faster. If the indexes are sorted, then
it is called as ordered indices
Primary Index
If the index is created on the primary key of the table then it is called as Primary
Indexing. Since these primary keys are unique to each record and it has 1:1 relation
between the records, it is much easier to fetch the record using it. Also, these
primary key are kept in sorted form which helps in performance of the transactions.
The primary indexing is of two types – Dense Index and Sparse Index.
Dense Index
In this case, indexing is created for primary key as well as on the columns on which
we perform transactions. That means, user can fire query not only based on primary
key column. He can query based on any columns in the table according to his
requirement. But creating index only on primary key will not help in this case. Hence
index on all the search key columns are stored. This method is called dense index.
For example, Student can be searched based on his ID which is a primary key. In
addition, we search for student by his first name, last name, particular age group,
residing in some place, opted for some course etc. That means most of the columns
in the table can be used for searching the student based on different criteria. But if
we have index on his ID, other searches will not be efficient. Hence index on other
search columns are also stored to make the fetch faster.
Though it addresses quick search on any search key, the space used for index and
address becomes overhead in the memory. Here the (index, address) becomes
almost same as (table records, address). Hence more space is consumed to store
the indexes as the record size increases.
Sparse Index
Let us see how above example of dense index is converted into sparse index.
Secondary Index
In the sparse indexing, as the table size grows, the (index, address) mapping file
size also grows. In the memory, usually these mappings are kept in the primary
memory so that address fetch should be faster. And latter the actual data is
searched from the secondary memory based on the address got from mapping. If the
size of this mapping grows, fetching the address itself becomes slower. Hence
sparse index will not be efficient. In order to overcome this problem next version of
sparse indexing is introduced i.e.; Secondary Indexing.
In this method, another level of indexing is introduced to reduce the (index, address)
mapping size. That means initially huge range for the columns are selected so that
first level of mapping size is small. Then each range is further divided into smaller
ranges. First level of mapping is stored in the primary memory so that address fetch
is faster. Secondary level of mapping and the actual data are stored in the secondary
memory – hard disk.
In the above diagram, we can see that columns are divided into groups of 100s first.
These groups are stored in the primary memory. In the secondary memory, these
groups are further divided into sub-groups. Actual data records are then stored in the
physical memory. We can notice that, address index in the first level is pointing to
the first address in the secondary level and each secondary index addresses are
pointing to the first address in the data block. If we have to search any data in
between these values, then it will search the corresponding address from first and
second level respectively. Then it will go to the address in the data blocks and
perform linear search to get the data.
For example, if it has to search 111 in the above diagram example, it will search the
max (111) <= 111 in the first level index. It will get 100 at this level. Then in the
secondary index level, again it does max (111) <= 111, and gets 110. Now it goes to
data block with address 110 and starts searching each record till it gets 111. This is
how a search is done in this method. Inserting/deleting/updating is also done in
same manner.
Clustering Index
In some cases, the index is created on non-primary key columns which may not be
unique for each record. In such cases, in order to identify the records faster, we will
group two or more columns together to get the unique values and create index out of
them. This method is known as clustering index. Basically, records with similar
characteristics are grouped together and indexes are created for these groups.
For example, students studying in each semester are grouped together. i.e.;
1st Semester students, 2nd semester students, 3rd semester students etc are grouped.
In above diagram we can see that, indexes are created for each semester in the
index file. In the data block, the students of each semester are grouped together to
form the cluster. The address in the index file points to the beginning of each cluster.
In the data blocks, requested student ID is then search in sequentially.
New records are inserted into the clusters based on their group. In above case, if a
new student joins 3rd semester, then his record is inserted into the semester 3 cluster
in the secondary memory. Same is done with update and delete.
If there is short of memory in any cluster, new data blocks are added to that cluster.
This method of file organization is better compared to other methods as it provides
clean distribution of records, and hence making search easier and faster. But in each
cluster, there would be unused space left. Hence it will take more memory compared
to other methods.
Unique Indexes
Unique indexes are used not only for performance, but also for data integrity. A
unique index does not allow any duplicate values to be inserted into the table. The
basic syntax is as follows.
CREATE UNIQUE INDEX index_name
on table_name (column_name);
DROP INDEX index_name;