Dbms PPT For Chapter 7
Dbms PPT For Chapter 7
Dbms PPT For Chapter 7
TERM 2008-09
B. Tech II/IT
II Semester
INDEX
UNIT-8 PPT SLIDES
S.NO
Module as per
Lecture
PPT
Session planner
No
Slide NO
------------------------------------------------------------------------------------------------1.
L1
L1- 1 to L1- 4
2.
L2
L2- 1 to L2- 7
3.
L3
L3- 1 to L3- 5
4.
L4
L4- 1 to L4- 2
5.
L5
L5- 1 to L5- 4
L6
L6- 1 to L6 -5
7.
L7
L7- 1 to L7- 7
8.
B+ tree
L8
L8- 1 to L8- 9
6.
Slide No:L1-1
Slide No:L1-2
Index Classification
Primary vs. secondary: If search key contains primary key, then
called primary index.
Unique index: Search key contains a candidate key.
Clustered vs. unclustered: If order of data records is the same as,
or `close to, order of data entries, then called clustered index.
Alternative 1 implies clustered; in practice, clustered also
implies Alternative 1 (since sorted files are rare).
A file can be clustered on at most one search key.
Cost of retrieving data records through index varies greatly
based on whether index is clustered or not!
Slide No:L1-3
UNCLUSTERED
CLUSTERED
Data entries
Data entries
(Index File)
(Data file)
Data Records
Data Records
Slide No:L1-4
Indexes
Slide No:L2-1
B+ Tree Indexes
Non-leaf
Pages
Leaf
Pages
(Sorted by search key)
K 1
P1
K 2
K m Pm
P 2
Slide No:L2-2
Example B+ Tree
Note how data entries
in leaf level are sorted
Root
17
Entries <= 17
5
2*
3*
Entries > 17
27
13
5*
7* 8*
14* 16*
22* 24*
30
27* 29*
Hash-Based Indexes
Good for equality selections.
Index is a collection of buckets.
Bucket = primary page plus zero or more
overflow pages.
Buckets contain data entries.
Hashing function h: h(r) = bucket in which (data
entry for) record r belongs. h looks at the search
key fields of r.
No need for index entries in this scheme.
Slide No:L2-4
Slide No:L2-6
Slide No:L3-1
Slide No:L3-2
Operations to Compare
Slide No:L3-3
Slide No:L3-4
Assumptions (contd.)
Scans:
Leaf levels of a tree-index are chained.
Index data-entries plus actual file
scanned for unclustered indexes.
Range searches:
We use tree indexes to restrict the set of
data records fetched, but ignore hash
indexes.
Slide No:L3-5
Cost of Operations
(a) Scan
(b) Equality
(c ) Range
(1) H eap
BD
0.5BD
BD
2D
(2) Sorted
BD
Dlog 2B
D(log 2 B +
# pgs with
match recs)
(3)
1.5BD
Dlog F 1.5B D(log F 1.5B
Clustered
+ # pgs w.
match recs)
(4) U nclust. BD(R+0.15)
D(1 +
D(log F 0.15B
Tree index
log F 0.15B) + # pgs w.
match recs)
(5) U nclust. BD(R+0.125) 2D
BD
H ash index
Slide No:L4-1
Search
+ BD
Search
+D
Search
+BD
Search
+D
Search
+D
Search
+ 2D
Search
+ 2D
Search
+ 2D
Search
+ 2D
Slide No:L4-2
Choice of Indexes
Slide No:L5-1
Slide No:L5-2
Slide No:L5-3
SELECT E.dno
FROM Emp E
WHERE E.hobby=Stamp
Slide No:L5-4
11,80
11
12,10
12
12,20
13,75
<age, sal>
10,12
20,12
75,13
10
cal 11
80
joe 12
20
sue 13
75
13
<age>
10
Data records
sorted by name
80,11
<sal, age>
Slide No:L6-1
12
20
75
80
<sal>
Data entries
sorted by <sal>
Slide No:L6-2
Index-Only Plans
SELECT E.dno, COUNT(*)
A number of
<E.dno>FROM Emp E
queries can be
GROUP BY E.dno
answered without
retrieving any
tuples from one <E.dno,E.sal>SELECT E.dno, MIN(E.sal)
FROM Emp E
or more of the
Tree index!
GROUP BY E.dno
relations
involved if a
suitable index is
SELECT AVG(E.sal)
<E. age,E.sal>
FROM Emp E
available.
or
Summary
Many alternative file organizations exist, each appropriate in some
situation.
If selection queries are frequent, sorting the file or building an
index is important.
Hash-based indexes only good for equality search.
Sorted files and tree-based indexes best for range
search; also good for equality search. (Files rarely
kept sorted in practice; B+ tree index is better.)
Index is a collection of data entries plus a way to quickly find
entries with given key values.
Slide No:L6-4
Summary (Contd.)
Data entries can be actual data records, <key, rid> pairs, or
<key, rid-list> pairs.
Choice orthogonal to indexing technique
used to locate data entries with a given key
value.
Can have several indexes on a given file of data records,
each with a different search key.
Indexes can be classified as clustered vs. unclustered,
primary vs. secondary, and dense vs. sparse. Differences
have important consequences for utility/performance.
Slide No:L6-5
Introduction
Slide No:L7-1
Range Searches
``Find all students with gpa > 3.0
If data is in sorted file, do binary
search to find first such student, then
scan to find others.
Cost of binary search can be quite
high.
Simple idea: Create an `index file.
Page 1
Page 2
Index File
kN
k1 k2
Page N
Page 3
Slide No:L7-2
Data File
index entry
P
0
ISAM
K 2
K m
Pm
Leaf
Pages
Overflow
page
Primary pages
Slide No:L7-3
Comments on ISAM
File creation: Leaf (data) pages allocated
sequentially, sorted by search key; then index
pages
allocated, then space for overflow pages.
Index entries: <search key value, page id>; they
`direct search for data entries, which are in leaf pages.
Search: Start at root; use key comparisons to go to leaf.
Cost log F N ; F = # entries/index pg, N = # leaf pgs
Insert: Find leaf data entry belongs to, and put it there.
Delete: Find and remove from leaf; if empty overflow
page, de-allocate.
Slide No:L7-4
Data
Pages
Index Pages
Overflow pages
Root
40
10*
15*
20
33
20*
27*
51
33*
37*
40*
46*
Slide No:L7-5
51*
63
55*
63*
97*
Root
40
Index
Pages
20
33
20*
27*
51
63
Primary
Leaf
10*
15*
33*
37*
40*
46*
48*
41*
Pages
Overflow
23*
Pages
42*
Slide No:L7-6
51*
55*
63*
97*
Root
40
10*
15*
20
33
20*
27*
23*
51
33*
37*
40*
46*
48*
41*
Slide No:L7-7
63
55*
63*
Data Entries
("Sequence set")
Slide No:L8-1
Example B+ Tree
Search begins at root, and key comparisons direct it
to a leaf (as in ISAM).
Search for 5*, 15*, all data entries >= 24* ...
Root
13
2*
3*
5*
7*
14* 16*
17
24
30
Slide No:L8-2
B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67%.
average fanout = 133
Typical capacities:
Height 4: 1334 = 312,900,700 records
Height 3: 1333 =
2,352,637 records
Can often hold top levels in buffer pool:
Level 1 =
1 page =
8 Kbytes
Level 2 =
133 pages =
1 Mbyte
Level 3 = 17,689 pages = 133 MBytes
Slide No:L8-3
Slide No:L8-4
Observe how
minimum
occupancy is
guaranteed in both
leaf and index pg
splits.
Note difference
between copy-up
and push-up; be
sure you understand
the reasons for this.
2*
3*
5*
17
13
24
7*
8*
30
Slide No:L8-5
Root
17
2*
3*
24
13
5*
7* 8*
14* 16*
30
Slide No:L8-7
2*
3*
27
13
5*
7* 8*
14* 16*
22* 24*
30
27* 29*
Must merge.
Observe `toss of index
entry (on right), and
`pull down of index
entry (below).
30
22*
27*
29*
33*
34*
38*
39*
Root
5
2*
3*
5*
7*
8*
13
14* 16*
17
30
Slide No:L8-9