Unit-6 DBMS - Indexing
Unit-6 DBMS - Indexing
Database System Concepts - 5th Edition 11.2 ©Silberschatz, Korth and Sudarshan
Physical Storage Media
Database System Concepts - 5th Edition 11.4 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
Flash memory
NOR Flash
Fast reads, very slow erase, lower capacity
Used to store program code in many
embedded devices
NAND Flash
Page-at-a-time read/write, multi-page erase
High capacity (several GB)
Widely used as data storage mechanism in
portable devices
Database System Concepts - 5th Edition 11.5 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
Magnetic-disk
Data is stored on spinning disk, and read/written
magnetically
Primary medium for the long-term storage of data;
typically stores entire database.
Data must be moved from disk to main memory for
access, and written back for storage
direct-access – possible to read data on disk in any
order, unlike magnetic tape
Survives power failures and system crashes
disk failure can destroy data: is rare but does
happen
Database System Concepts - 5th Edition 11.6 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
Optical storage
non-volatile, data is read optically from a spinning
disk using a laser
CD-ROM (640 MB) and DVD (4.7 to 17 GB) most
popular forms
Write-one, read-many (WORM) optical disks used for
archival storage (CD-R, DVD-R, DVD+R)
Multiple write versions also available (CD-RW, DVD-
RW, DVD+RW, and DVD-RAM)
Reads and writes are slower than with magnetic
disk
Juke-box systems, with large numbers of removable
disks, a few drives, and a mechanism for automatic
loading/unloading of disks available for storing large
volumes of data
Database System Concepts - 5th Edition 11.7 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
Tape storage
non-volatile, used primarily for backup (to
recover from disk failure), and for archival data
sequential-access – much slower than disk
very high capacity (40 to 300 GB tapes
available)
tape can be removed from drive storage
costs much cheaper than disk, but drives are
expensive
Tape jukeboxes available for storing massive
amounts of data
hundreds of terabytes (1 terabyte = 109 bytes)
to even a petabyte (1 petabyte = 10 12 bytes)
Database System Concepts - 5th Edition 11.8 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy
Database System Concepts - 5th Edition 11.9 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy (Cont.)
primary storage: Fastest media but volatile (cache,
main memory).
secondary storage: next level in hierarchy, non-
volatile, moderately fast access time
also called on-line storage
E.g. flash memory, magnetic disks
tertiary storage: lowest level in hierarchy, non-
volatile, slow access time
also called off-line storage
E.g. magnetic tape, optical storage
Database System Concepts - 5th Edition 11.10 ©Silberschatz, Korth and Sudarshan
Magnetic Hard Disk Mechanism
NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
Database System Concepts - 5th Edition 11.11 ©Silberschatz, Korth and Sudarshan
Magnetic Disks
Read-write head
Positioned very close to the platter surface (almost
touching it)
Reads or writes magnetically encoded information.
Surface of platter divided into circular tracks
Over 50K-100K tracks per platter on typical hard disks
Each track is divided into sectors.
Sector size typically 512 bytes
Typical sectors per track: 500 (on inner tracks) to
1000 (on outer tracks)
To read/write a sector
disk arm swings to position head on right track
platter spins continually; data is read/written as
sector passes under head
Database System Concepts - 5th Edition 11.12 ©Silberschatz, Korth and Sudarshan
Magnetic Disks (Cont.)
Head-disk assemblies
multiple disk platters on a single spindle (1 to 5
usually)
one head per platter, mounted on a common
arm.
Cylinder i consists of ith track of all the platters
Earlier generation disks were susceptible to “head-
crashes” leading to loss of all data on disk
Current generation disks are less susceptible to
such disastrous failures, but individual sectors
may get corrupted
Database System Concepts - 5th Edition 11.13 ©Silberschatz, Korth and Sudarshan
Disk Controller
Disk controller – interfaces between the computer
system and the disk drive hardware.
accepts high-level commands to read or write a
sector
initiates actions such as moving the disk arm to
the right track and actually reading or writing the
data
Computes and attaches checksums to each
sector to verify that data is read back correctly
If data is corrupted, with very high probability
stored checksum won’t match recomputed
checksum
Ensures successful writing by reading back sector
after writing it
Performs remapping of bad sectors
Database System Concepts - 5th Edition 11.14 ©Silberschatz, Korth and Sudarshan
Disk Subsystem
Database System Concepts - 5th Edition 11.15 ©Silberschatz, Korth and Sudarshan
Performance Measures of Disks
Access time – the time it takes from when a read or
write request is issued to when data transfer begins.
Consists of:
Seek time – time it takes to reposition the arm over
the correct track.
Average seek time is 1/2 the worst case seek
time.
– Would be 1/3 if all tracks had the same number
of sectors, and we ignore the time to start and
stop arm movement
4 to 10 milliseconds on typical disks
Rotational latency – time it takes for the sector to
be accessed to appear under the head.
Average latency is 1/2 of the worst case latency.
4 to 11 milliseconds on typical disks (5400 to
15000 r.p.m.)
Database System Concepts - 5th Edition 11.16 ©Silberschatz, Korth and Sudarshan
Performance Measures (Cont.)
Data-transfer rate – the rate at which data can be
retrieved from or stored to the disk.
25 to 100 MB per second max rate, lower for inner
tracks
Multiple disks may share a controller, so rate that
controller can handle is also important
E.g. ATA-5: 66 MB/sec, SATA: 150 MB/sec, Ultra
320 SCSI: 320 MB/s
Fiber Channel (FC2Gb): 256 MB/s
Database System Concepts - 5th Edition 11.17 ©Silberschatz, Korth and Sudarshan
Performance Measures (Cont.)
Mean time to failure (MTTF) – the average time the
disk is expected to run continuously without any
failure.
Typically 3 to 5 years
Probability of failure of new disks is quite low,
corresponding to a theoretical MTTF of 500,000 to
1,200,000 hours for a new disk
E.g., an MTTF of 1,200,000 hours for a new disk
means that given 1000 relatively new disks, on
an average one will fail every 1200 hours
MTTF decreases as disk ages
Database System Concepts - 5th Edition 11.18 ©Silberschatz, Korth and Sudarshan
Optical Disks
Compact disk-read only memory (CD-ROM)
Seek time about 100 msec (optical read head is
heavier and slower)
Higher latency (3000 RPM) and lower data-
transfer rates (3-6 MB/s) compared to magnetic
disks
Digital Video Disk (DVD)
DVD-5 holds 4.7 GB , variants up to 17 GB
Slow seek time, for same reasons as CD-ROM
Record once versions (CD-R and DVD-R)
Database System Concepts - 5th Edition 11.19 ©Silberschatz, Korth and Sudarshan
Magnetic Tapes
Hold large volumes of data and provide high transfer rates
Few GB for DAT (Digital Audio Tape) format, 10-40 GB with
DLT (Digital Linear Tape) format, 100 – 400 GB+ with
Ultrium format, and 330 GB with Ampex helical scan
format
Transfer rates from few to 10s of MB/s
Currently the cheapest storage medium
Tapes are cheap, but cost of drives is very high
Very slow access time in comparison to magnetic disks and
optical disks
limited to sequential access.
Some formats (Accelis) provide faster seek (10s of
seconds) at cost of lower capacity
Used mainly for backup, for storage of infrequently used
information, and as an off-line medium for transferring
information from one system to another.
Tape jukeboxes used for very large capacity storage
(terabyte (1012 bytes) to petabye (1015 bytes)
Database System Concepts - 5th Edition 11.20 ©Silberschatz, Korth and Sudarshan
Storage Access
A database file is partitioned into fixed-length
storage units called blocks. Blocks are units of
both storage allocation and data transfer.
Database system seeks to minimize the number of
block transfers between the disk and memory. We
can reduce the number of disk accesses by
keeping as many blocks as possible in main
memory.
Buffer – portion of main memory available to store
copies of disk blocks.
Buffer manager – subsystem responsible for
allocating buffer space in main memory.
Database System Concepts - 5th Edition 11.21 ©Silberschatz, Korth and Sudarshan
Buffer Manager
Programs call on the buffer manager when they need a
block from disk.
Buffer manager does the following:
If the block is already in the buffer, return the
address of the block in main memory
1. If the block is not in the buffer
1. Allocate space in the buffer for the block
1. Replacing (throwing out) some other block, if
required, to make space for the new block.
2. Replaced block written back to disk only if it
was modified since the most recent time that
it was written to/fetched from the disk.
2. Read the block from the disk to the buffer, and
return the address of the block in main memory
to requester.
Database System Concepts - 5th Edition 11.22 ©Silberschatz, Korth and Sudarshan
Buffer-Replacement Policies
Database System Concepts - 5th Edition 11.24 ©Silberschatz, Korth and Sudarshan
File Organization
The database is stored as a collection of files.
Each file is a sequence of records. A record is a
sequence of fields.
One approach:
assume record size is fixed
each file has records of one particular type only
different files are used for different relations
This case is easiest to implement; will consider
variable length records later.
Database System Concepts - 5th Edition 11.25 ©Silberschatz, Korth and Sudarshan
Fixed-Length Records
Simple approach:
Store record i starting from byte n (i – 1), where n is
the size of each record.
Record access is simple but records may cross
blocks
Modification: do not allow records to cross block
boundaries
Deletion of record i:
alternatives:
move records i + 1, . . ., n
to i, . . . , n – 1
move record n to i
do not move records, but
link all free records on a
free list
Database System Concepts - 5th Edition 11.26 ©Silberschatz, Korth and Sudarshan
Free Lists
Store the address of the first deleted record in the file header.
Use this first record to store the address of the second
deleted record, and so on
Can think of these stored addresses as pointers since they
“point” to the location of a record.
More space efficient representation: reuse space for normal
attributes of free records to store pointers. (No pointers
stored in in-use records.)
Database System Concepts - 5th Edition 11.27 ©Silberschatz, Korth and Sudarshan
Variable-Length Records
Variable-length records arise in database
systems in several ways:
Storage of multiple record types in a file.
Record types that allow variable lengths for
one or more fields.
Record types that allow repeating fields
(used in some older data models).
Database System Concepts - 5th Edition 11.28 ©Silberschatz, Korth and Sudarshan
Variable-Length Records: Slotted Page
Structure
Database System Concepts - 5th Edition 11.30 ©Silberschatz, Korth and Sudarshan
Sequential File Organization
Suitable for applications that require
sequential processing of the entire file
The records in the file are ordered by a
search-key
Database System Concepts - 5th Edition 11.31 ©Silberschatz, Korth and Sudarshan
Sequential File Organization
(Cont.)
Deletion – use pointer chains
Insertion –locate the position where the record is to be
inserted
if there is free space insert there
if no free space, insert the record in an overflow
block
In either case, pointer chain must be updated
Need to reorganize the file
from time to time to restore
sequential order
Database System Concepts - 5th Edition 11.32 ©Silberschatz, Korth and Sudarshan
Multitable Clustering File Organization
(cont.)
Database System Concepts - 5th Edition 11.34 ©Silberschatz, Korth and Sudarshan
Data Dictionary Storage (Cont.)
Catalog structure
Relational representation on disk
specialized data structures designed for
efficient access, in memory
A possible catalog
Relation_metadata representation:
= (relation_name,
number_of_attributes,
storage_organization, location)
Attribute_metadata = (relation_name, attribute_name,
domain_type,
position, length)
User_metadata = (user_name,
encrypted_password, group)
Index_metadata = (relation_name, index_name,
index_type,
index_attributes)
View_metadata = (view_name, definition)
Database System Concepts - 5th Edition 11.35 ©Silberschatz, Korth and Sudarshan
Record Representation
Records with fixed length fields are easy to
represent
Similar to records (structs) in programming
languages
Extensions to represent null values
E.g. a bitmap indicating which attributes are
null
Variable length fields can be represented by a pair
(offset, length)
offset: the location within the record, length:
field length.
All fields
A-102 start10
at 400
predefined location, but extra
000 Perryridge
indirection required
account_num for variable length fields
balance
ber branch_namenull-bitmap
Database System Concepts - 5th Edition 11.38 ©Silberschatz, Korth and Sudarshan
Ordered Indices
Database System Concepts - 5th Edition 11.39 ©Silberschatz, Korth and Sudarshan
Dense Index Files
Dense index — Index record appears for every
search-key value in the file.
Database System Concepts - 5th Edition 11.40 ©Silberschatz, Korth and Sudarshan
Sparse Index Files
Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
Find index record with largest search-key value < K
Search file sequentially starting at the record to which the index
record points
Database System Concepts - 5th Edition 11.41 ©Silberschatz, Korth and Sudarshan
Sparse Index Files (Cont.)
Compared to dense indices:
Less space and less maintenance overhead for
insertions and deletions.
Generally slower than dense index for locating
records.
Good tradeoff: sparse index with an index entry for
every block in file, corresponding to least search-key
value in the block.
Database System Concepts - 5th Edition 11.42 ©Silberschatz, Korth and Sudarshan
Multilevel Index
If primary index does not fit in memory, access
becomes expensive.
Solution: treat primary index kept on disk as a
sequential file and construct a sparse index on it.
outer index – a sparse index of primary index
inner index – the primary index file
If even outer index is too large to fit in main
memory, yet another level of index can be created,
and so on.
Indices at all levels must be updated on insertion
or deletion from the file.
Database System Concepts - 5th Edition 11.43 ©Silberschatz, Korth and Sudarshan
Multilevel Index (Cont.)
Database System Concepts - 5th Edition 11.44 ©Silberschatz, Korth and Sudarshan
Index Update: Record Deletion
If deleted record was the only record in the file with its
particular search-key value, the search-key is deleted from
the index also.
Single-level index deletion:
Dense indices – deletion of search-key: similar to file
record deletion.
Sparse indices –
if deleted key value exists in the index, the value is
replaced by the next search-key value in the file (in
search-key order).
If the next search-key value already has an index entry,
the entry is deleted instead of being replaced.
Database System Concepts - 5th Edition 11.45 ©Silberschatz, Korth and Sudarshan
Index Update: Record Insertion
Single-level index insertion:
Perform a lookup using the key value from
inserted record
Dense indices – if the search-key value does not
appear in the index, insert it.
Sparse indices – if index stores an entry for each
block of the file, no change needs to be made to
the index unless a new block is created.
If a new block is created, the first search-key
value appearing in the new block is inserted
into the index.
Multilevel insertion (as well as deletion) algorithms
are simple extensions of the single-level algorithms
Database System Concepts - 5th Edition 11.46 ©Silberschatz, Korth and Sudarshan
Secondary Indices Example
Database System Concepts - 5th Edition 11.47 ©Silberschatz, Korth and Sudarshan
Primary and Secondary Indices
Indices offer substantial benefits when searching
for records.
BUT: Updating indices imposes overhead on
database modification --when a file is modified,
every index on the file must be updated,
Sequential scan using primary index is efficient,
but a sequential scan using a secondary index is
expensive
Each record access may fetch a new block from
disk
Block fetch requires about 5 to 10 micro
seconds, versus about 100 nanoseconds for
memory access
Database System Concepts - 5th Edition 11.48 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files
Database System Concepts - 5th Edition 11.49 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files (Cont.)
A B+-tree is a rooted tree satisfying the following properties:
Database System Concepts - 5th Edition 11.50 ©Silberschatz, Korth and Sudarshan
B+-Tree Node Structure
Typical node
Database System Concepts - 5th Edition 11.51 ©Silberschatz, Korth and Sudarshan
Leaf Nodes in B+-Trees
Properties of a leaf node:
For i = 1, 2, . . ., n–1, pointer Pi either points to a file
record with search-key value Ki, or to a bucket of
pointers to file records, each record having search-
key value Ki. Only need bucket structure if search-
key does not form a primary key.
If Li, Lj are leaf nodes and i < j, Li’s search-key
values are less than Lj’s search-key values
Pn points to next leaf node in search-key order
Database System Concepts - 5th Edition 11.52 ©Silberschatz, Korth and Sudarshan
Non-Leaf Nodes in B+-Trees
Non leaf nodes form a multi-level sparse index on the
leaf nodes. For a non-leaf node with m pointers:
All the search-keys in the subtree to which P1 points
are less than K1
For 2 i n – 1, all the search-keys in the subtree to
which Pi points have values greater than or equal to
Ki–1 and less than Ki
All the search-keys in the subtree to which Pn points
have values greater than or equal to Kn–1
Database System Concepts - 5th Edition 11.53 ©Silberschatz, Korth and Sudarshan
Example of a B+-tree
Database System Concepts - 5th Edition 11.54 ©Silberschatz, Korth and Sudarshan
Example of B+-tree
Database System Concepts - 5th Edition 11.55 ©Silberschatz, Korth and Sudarshan
Observations about B+-trees
Since the inter-node connections are done by pointers,
“logically” close blocks need not be “physically” close.
The non-leaf levels of the B+-tree form a hierarchy of
sparse indices.
The B+-tree contains a relatively small number of levels
Level below root has at least 2* n/2 values
Next level has at least 2* n/2 * n/2 values
.. etc.
If there are K search-key values in the file, the tree
height is no more than logn/2(K)
thus searches can be conducted efficiently.
Insertions and deletions to the main file can be handled
efficiently, as the index can be restructured in
logarithmic time (as we shall see).
Database System Concepts - 5th Edition 11.56 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees
Find all records with a search-key value of k.
1. N=root
2. Repeat
1. Examine N for the smallest search-key value > k.
2. If such a value exists, assume it is Ki. Then set N = Pi
3. Otherwise k Kn–1. Set N = Pn
Until N is a leaf node
3. If for some i, key Ki = k follow pointer Pi to the desired record
or bucket.
4. Else no record with search-key value k exists.
Database System Concepts - 5th Edition 11.57 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees (Cont.)
If there are K search-key values in the file, the
height of the tree is no more than logn/2(K).
A node is generally the same size as a disk block,
typically 4 kilobytes
and n is typically around 100 (40 bytes per index
entry).
With 1 million search key values and n = 100
at most log50(1,000,000) = 4 nodes are accessed
in a lookup.
Contrast this with a balanced binary tree with 1
million search key values — around 20 nodes are
accessed in a lookup
above difference is significant since every node
access may need a disk I/O, costing around 20
milliseconds
Database System Concepts - 5th Edition 11.58 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Insertion
Database System Concepts - 5th Edition 11.59 ©Silberschatz, Korth and Sudarshan
Updates on B +-Trees: Insertion
(Cont.)
Splitting a leaf node:
take the n (search-key value, pointer) pairs (including
the one being inserted) in sorted order. Place the
first n/2 in the original node, and the rest in a new
node.
let the new node be p, and let k be the least key
value in p. Insert (k,p) in the parent of the node being
split.
If the parent is full, split it and propagate the split
further up.
Splitting of nodes proceeds upwards till a node that is
not full is found.
In the worst case the root node may be split
increasing the height of the tree by 1.
Database System Concepts - 5th Edition 11.61 ©Silberschatz, Korth and Sudarshan
Insertion in B+-Trees (Cont.)
Splitting a non-leaf node: when inserting (k,p) into
an already full internal node N
Copy N to an in-memory area M with space for
n+1 pointers and n keys
Insert (k,p) into M
Copy P1,K1, …, K n/2-1 ,P n/2 from M back into
node N
Copy Pn/2+1,K n/2+1,…,Kn,Pn+1 from M into newly
allocated node N’
Insert (K n/2,N’) into parent N
Read pseudocode in book! Mianus
Database System Concepts - 5th Edition 11.62 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Deletion
Find the record to be deleted, and remove it from the
main file and from the bucket (if present)
Remove (search-key value, pointer) from the leaf node if
there is no bucket or if the bucket has become empty
If the node has too few entries due to the removal, and
the entries in the node and a sibling fit into a single
node, then merge siblings:
Insert all the search-key values in the two nodes into
a single node (the one on the left), and delete the
other node.
Delete the pair (Ki–1, Pi), where Pi is the pointer to the
deleted node, from its parent, recursively using the
above procedure.
Database System Concepts - 5th Edition 11.63 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Deletion
Otherwise, if the node has too few entries due to
the removal, but the entries in the node and a
sibling do not fit into a single node, then
redistribute pointers:
Redistribute the pointers between the node and
a sibling such that both have more than the
minimum number of entries.
Update the corresponding search-key value in
the parent of the node.
The node deletions may cascade upwards till a
node which has n/2 or more pointers is found.
If the root node has only one pointer after deletion,
it is deleted and the sole child becomes the root.
Database System Concepts - 5th Edition 11.64 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion
Database System Concepts - 5th Edition 11.65 ©Silberschatz, Korth and Sudarshan
Examples of B +-Tree Deletion
(Cont.)
Database System Concepts - 5th Edition 11.66 ©Silberschatz, Korth and Sudarshan
Examples of B +-Tree Deletion
(Cont.)
Database System Concepts - 5th Edition 11.67 ©Silberschatz, Korth and Sudarshan
Example of B+-tree Deletion (Cont.)
Database System Concepts - 5th Edition 11.69 ©Silberschatz, Korth and Sudarshan
B+-Tree File Organization (Cont.)
Database System Concepts - 5th Edition 11.71 ©Silberschatz, Korth and Sudarshan
B-Tree Index Files
Similar to B+-tree, but B-tree allows search-key
values to appear only once; eliminates redundant
storage of search keys.
Search keys in nonleaf nodes appear nowhere else
in the B-tree; an additional pointer field for each
search key in a nonleaf node must be included.
Generalized B-tree leaf node
Database System Concepts - 5th Edition 11.72 ©Silberschatz, Korth and Sudarshan
B-Tree Index File Example
Database System Concepts - 5th Edition 11.73 ©Silberschatz, Korth and Sudarshan
B-Tree Index Files (Cont.)
Advantages of B-Tree indices:
May use less tree nodes than a corresponding B +-Tree.
Sometimes possible to find search-key value before
reaching leaf node.
Disadvantages of B-Tree indices:
Only small fraction of all search-key values are found
early
Non-leaf nodes are larger, so fan-out is reduced. Thus, B-
Trees typically have greater depth than corresponding B +-
Tree
Insertion and deletion more complicated than in B +-Trees
Implementation is harder than B +-Trees.
Typically, advantages of B-Trees do not out weigh
disadvantages.
Database System Concepts - 5th Edition 11.74 ©Silberschatz, Korth and Sudarshan
Multiple-Key Access
Use multiple indices for certain types of queries.
Example:
select account_number
from account
where branch_name = “Perryridge” and balance =
1000
Possible strategies for processing query using indices
on single attributes:
1. Use index on branch_name to find accounts with
branch name Perryridge; test balance = 1000
2. Use index on balance to find accounts with
balances of $1000; test branch_name = “Perryridge”.
3. Use branch_name index to find pointers to all
records pertaining to the Perryridge branch.
Similarly use index on balance. Take intersection of
both sets of pointers obtained.
Database System Concepts - 5th Edition 11.75 ©Silberschatz, Korth and Sudarshan
Indices on Multiple Keys
Composite search keys are search keys containing
more than one attribute
E.g. (branch_name, balance)
Lexicographic ordering: (a1, a2) < (b1, b2) if either
a1 < b1, or
a1=b1 and a2 < b2
Database System Concepts - 5th Edition 11.76 ©Silberschatz, Korth and Sudarshan
Indices on Multiple Attributes
Database System Concepts - 5th Edition 11.77 ©Silberschatz, Korth and Sudarshan
Non-Unique Search Keys
Alternatives:
Buckets on separate block (bad idea)
List of tuple pointers with each key
Low space overhead, no extra cost for
queries
Extra code to handle read/update of long lists
Deletion of a tuple can be expensive if there
are many duplicates on search key (why?)
Make search key unique by adding a record-
identifier
Extra storage overhead for keys
Simpler code for insertion/deletion
Widely used
Database System Concepts - 5th Edition 11.78 ©Silberschatz, Korth and Sudarshan
Other Issues in Indexing
Covering indices
Add extra attributes to index so (some) queries can avoid
fetching the actual records
Particularly useful for secondary indices
– Why?
Can store extra attributes only at leaf
Record relocation and secondary indices
If a record moves, all secondary indices that store record
pointers have to be updated
Node splits in B+-tree file organizations become very
expensive
Solution: use primary-index search key instead of record
pointer in secondary index
Extra traversal of primary index to locate record
– Higher cost for queries, but node splits are cheap
Add record-id if primary-index search key is non-unique
Database System Concepts - 5th Edition 11.79 ©Silberschatz, Korth and Sudarshan
Hashing
Database System Concepts - 5th Edition 11.81 ©Silberschatz, Korth and Sudarshan
Example of Hash File Organization
Database System Concepts - 5th Edition 11.82 ©Silberschatz, Korth and Sudarshan
Example of Hash File Organization
Hash file
organization of
account file, using
branch_name as
key
(see previous slide
for details).
Database System Concepts - 5th Edition 11.83 ©Silberschatz, Korth and Sudarshan
Hash Functions
Worst hash function maps all search-key values to the
same bucket; this makes access time proportional to the
number of search-key values in the file.
An ideal hash function is uniform, i.e., each bucket is
assigned the same number of search-key values from the
set of all possible values.
Ideal hash function is random, so each bucket will have
the same number of records assigned to it irrespective of
the actual distribution of search-key values in the file.
Typical hash functions perform computation on the
internal binary representation of the search-key.
For example, for a string search-key, the binary
representations of all the characters in the string
could be added and the sum modulo the number of
buckets could be returned. .
Database System Concepts - 5th Edition 11.84 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows
Bucket overflow can occur because of
Insufficient buckets
Skew in distribution of records. This can occur due
to two reasons:
multiple records have same search-key value
chosen hash function produces non-uniform
distribution of key values
Although the probability of bucket overflow can be
reduced, it cannot be eliminated; it is handled by using
overflow buckets.
Database System Concepts - 5th Edition 11.85 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows
(Cont.)
Database System Concepts - 5th Edition 11.86 ©Silberschatz, Korth and Sudarshan
Hash Indices
Hashing can be used not only for file organization, but
also for index-structure creation.
A hash index organizes the search keys, with their
associated record pointers, into a hash file structure.
Strictly speaking, hash indices are always secondary
indices
if the file itself is organized using hashing, a separate
primary hash index on it using the same search-key
is unnecessary.
However, we use the term hash index to refer to both
secondary index structures and hash organized files.
Database System Concepts - 5th Edition 11.87 ©Silberschatz, Korth and Sudarshan
Example of Hash Index
Database System Concepts - 5th Edition 11.88 ©Silberschatz, Korth and Sudarshan
Deficiencies of Static Hashing
In static hashing, function h maps search-key values to a
fixed set of B of bucket addresses. Databases grow or
shrink with time.
If initial number of buckets is too small, and file grows,
performance will degrade due to too much overflows.
If space is allocated for anticipated growth, a
significant amount of space will be wasted initially (and
buckets will be underfull).
If database shrinks, again space will be wasted.
One solution: periodic re-organization of the file with a new
hash function
Expensive, disrupts normal operations
Better solution: allow the number of buckets to be modified
dynamically.
Database System Concepts - 5th Edition 11.89 ©Silberschatz, Korth and Sudarshan
Dynamic Hashing
Good for database that grows and shrinks in size
Allows the hash function to be modified dynamically
Extendable hashing – one form of dynamic hashing
Hash function generates values over a large range —
typically b-bit integers, with b = 32.
At any time use only a prefix of the hash function to
index into a table of bucket addresses.
Let the length of the prefix be i bits, 0 i 32.
Bucket address table size = 2i. Initially i = 0
Value of i grows and shrinks as the size of the
database grows and shrinks.
Multiple entries in the bucket address table may point
to a bucket (why?)
Thus, actual number of buckets is < 2i
The number of buckets also changes dynamically
due to coalescing and splitting of buckets.
Database System Concepts - 5th Edition 11.90 ©Silberschatz, Korth and Sudarshan
General Extendable Hash Structure
Database System Concepts - 5th Edition 11.91 ©Silberschatz, Korth and Sudarshan
Use of Extendable Hash Structure
Each bucket j stores a value ij
All the entries that point to the same bucket have the same
values on the first ij bits.
To locate the bucket containing search-key Kj:
1. Compute h(Kj) = X
2. Use the first i high order bits of X as a displacement into bucket
address table, and follow the pointer to appropriate bucket
To insert a record with search-key value Kj
follow same procedure as look-up and locate the bucket, say j.
If there is room in the bucket j insert record in the bucket.
Else the bucket must be split and insertion re-attempted (next
slide.)
Overflow buckets used instead in some cases (will see
shortly)
Database System Concepts - 5th Edition 11.92 ©Silberschatz, Korth and Sudarshan
Insertion in Extendable Hash Structure
(Cont)
To split a bucket j when inserting record with search-key value Kj:
If i > ij (more than one pointer to bucket j)
allocate a new bucket z, and set ij = iz = (ij + 1)
Update the second half of the bucket address table entries
originally pointing to j, to point to z
remove each record in bucket j and reinsert (in j or z)
recompute new bucket for Kj and insert record in the
bucket (further splitting is required if the bucket is still full)
If i = ij (only one pointer to bucket j)
If i reaches some limit b, or too many splits have happened
in this insertion, create an overflow bucket
Else
increment i and double the size of the bucket address
table.
replace each entry in the table by two entries that point
to the same bucket.
recompute new bucket address table entry for Kj
Now i > ij so use the first case above.
Database System Concepts - 5th Edition 11.93 ©Silberschatz, Korth and Sudarshan
Deletion in Extendable Hash
Structure
To delete a key value,
locate it in its bucket and remove it.
The bucket itself can be removed if it becomes empty
(with appropriate updates to the bucket address table).
Coalescing of buckets can be done (can coalesce only
with a “buddy” bucket having same value of ij and
same ij –1 prefix, if it is present)
Decreasing bucket address table size is also possible
Note: decreasing bucket address table size is an
expensive operation and should be done only if
number of buckets becomes much smaller than the
size of the table
Database System Concepts - 5th Edition 11.94 ©Silberschatz, Korth and Sudarshan
Use of Extendable Hash Structure:
Example
Database System Concepts - 5th Edition 11.95 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Hash structure after insertion of one Brighton and two
Downtown records
Database System Concepts - 5th Edition 11.96 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Hash structure after insertion of Mianus record
Database System Concepts - 5th Edition 11.97 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Database System Concepts - 5th Edition 11.98 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Hash structure after insertion of Redwood and Round
Hill records
Database System Concepts - 5th Edition 11.99 ©Silberschatz, Korth and Sudarshan
Extendable Hashing vs. Other
Schemes
Benefits of extendable hashing:
Hash performance does not degrade with growth of file
Minimal space overhead
Disadvantages of extendable hashing
Extra level of indirection to find desired record
Bucket address table may itself become very big (larger
than memory)
Cannot allocate very large contiguous areas on disk
either
Solution: B+-tree file organization to store bucket
address table
Changing size of bucket address table is an expensive
operation
Linear hashing is an alternative mechanism
Allows incremental growth of its directory (equivalent to
bucket address table)
At the cost of more bucket overflows
Database System Concepts - 5th Edition 11.100 ©Silberschatz, Korth and Sudarshan
Comparison of Ordered Indexing and Hashing
Database System Concepts - 5th Edition 11.101 ©Silberschatz, Korth and Sudarshan
Bitmap Indices
Bitmap indices are a special type of index designed for
efficient querying on multiple keys
Records in a relation are assumed to be numbered
sequentially from, say, 0
Given a number n it must be easy to retrieve record n
Particularly easy if records are of fixed size
Applicable on attributes that take on a relatively small
number of distinct values
E.g. gender, country, state, …
E.g. income-level (income broken up into a small
number of levels such as 0-9999, 10000-19999,
20000-50000, 50000- infinity)
A bitmap is simply an array of bits
Database System Concepts - 5th Edition 11.102 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
In its simplest form a bitmap index on an attribute has a
bitmap for each value of the attribute
Bitmap has as many bits as records
In a bitmap for value v, the bit for a record is 1 if the
record has the value v for the attribute, and is 0
otherwise
Database System Concepts - 5th Edition 11.103 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
Bitmap indices are useful for queries on multiple attributes
not particularly useful for single attribute queries
Queries are answered using bitmap operations
Intersection (and)
Union (or)
Complementation (not)
Each operation takes two bitmaps of the same size and
applies the operation on corresponding bits to get the
result bitmap
E.g. 100110 AND 110011 = 100010
100110 OR 110011 = 110111
NOT 100110 = 011001
Males with income level L1: 10010 AND 10100 = 10000
Can then retrieve required tuples.
Counting number of matching tuples is even faster
Database System Concepts - 5th Edition 11.104 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
Bitmap indices generally very small compared with relation
size
E.g. if record is 100 bytes, space for a single bitmap is
1/800 of space used by relation.
If number of distinct attribute values is 8, bitmap is
only 1% of relation size
Deletion needs to be handled properly
Existence bitmap to note if there is a valid record at a
record location
Needed for complementation
not(A=v): (NOT bitmap-A-v) AND ExistenceBitmap
Should keep bitmaps for all values, even null value
To correctly handle SQL null semantics for NOT(A=v):
intersect above result with (NOT bitmap-A-Null)
Database System Concepts - 5th Edition 11.105 ©Silberschatz, Korth and Sudarshan
Efficient Implementation of Bitmap
Operations
Bitmaps are packed into words; a single word and (a basic CPU
instruction) computes and of 32 or 64 bits at once
E.g. 1-million-bit maps can be and-ed with just 31,250
instruction
Counting number of 1s can be done fast by a trick:
Use each byte to index into a precomputed array of 256
elements each storing the count of 1s in the binary
representation
Can use pairs of bytes to speed up further at a higher
memory cost
Add up the retrieved counts
Bitmaps can be used instead of Tuple-ID lists at leaf levels of
B+-trees, for values that have a large number of matching records
Worthwhile if > 1/64 of the records have that value, assuming
a tuple-id is 64 bits
Above technique merges benefits of bitmap and B+-tree
indices
Database System Concepts - 5th Edition 11.106 ©Silberschatz, Korth and Sudarshan
Index Definition in SQL
Create an index
create index <index-name> on <relation-
name>
(<attribute-list>)
E.g.: create index b-index on branch(branch_name)
Use create unique index to indirectly specify and
enforce the condition that the search key is a candidate
key is a candidate key.
Not really required if SQL unique integrity constraint
is supported
To drop an index
drop index <index-name>
Most database systems allow specification of type of
index, and clustering.
Database System Concepts - 5th Edition 11.107 ©Silberschatz, Korth and Sudarshan