0% found this document useful (0 votes)
8 views97 pages

03 UW Indexing

Uploaded by

selezeno4ka1337
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views97 pages

03 UW Indexing

Uploaded by

selezeno4ka1337
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 97

Ullman et al.

:
Database System
Principles

Notes 4: Indexing

1
Chapter 4

Indexing & Hashing


record
value ? value

2
Topics

• Conventional indexes
• B-trees
• Hashing schemes

3
• A single-level index is an auxiliary file that
makes it more efficient to search for a record
in the data file.

• The index is usually specified on one field of


the file (although it could be specified on
several fields).

• One form of an index is a file of entries


<field value, pointer to record>,
which is ordered by field value.

• The index is called an access path on the field.

4
• The index file usually occupies considerably
less disk blocks than the data file because its
entries are much smaller
• A binary search on the index yields a pointer
to the file record

• Indexes can also be characterized as dense or


sparse
– A dense index has an index entry for every
search key value (usually every record) in the
data file.
– A sparse (or nondense) index, on the other
hand, has index entries for only some of the
search values
(typically one entry per data file block)
5
Sequential File
10
20
30
40
50
60
70
80
90
100

6
Dense Index Sequential File

10 10
20 20
30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120

7
Sparse Index Sequential File

10 10
30 20
50
70
30
40
90
110 50
130 60
150
70
170 80
190 90
210 100
230

8
Sparse 2nd level Sequential File

10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230

9
Notes on pointers:

(1) Block pointer (sparse index) can


be smaller than record
pointer

BP

RP

10
Sparse vs. Dense Tradeoff

• Sparse: Less index space per record


can keep more of
index in memory
• Dense: Can tell if any record exists
without accessing file

(Later:
– sparse better for insertions
– dense needed for secondary indexes)

11
Terms
• Index sequential file
• Search key (  primary key)
• Primary index (on ordering field)
• Secondary index (on non-ordering field)
• Dense index (all Search Key values in)
• Sparse index
• Multi-level index

12
Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

13
Duplicate
keys

10
10
10
20
20
30
30
30
40
45
14
Duplicate
keys
Dense index, one way to
implement?
10
10 10
10
10 10
20 20

20 20
30 30
30 30
30 30
40
45
15
Duplicate
keys
Dense index, better way?
10
10 10
20
30 10
40 20
20
30
30
30
40
45
16
Duplicate
keys
Sparse index, one way? (see previous page)
careful if looking

10
10 10
for 20 or 30!

10
20 10
30 20
20
30
30
30
40
45
17
Duplicate
keys
Sparse index, another way?
– place first new key from block
10
should 10 10
20
this be 30 10
40? 30 20
20
30
30
30
40
45
18
Summary Duplicate values,
primary index
• Index may point to first instance of
each value only
File
Index a
a a
.
.

b
19
Deletion from sparse index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

20
Deletion from sparse index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

21
Deletion from sparse index
– delete record 30
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

22
Deletion from sparse index
– delete record 30
10
10 20
40 30
50 30 40
70 40

90 50
110 60
130 70
150 80

23
Deletion from sparse index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

24
Deletion from sparse index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

25
Deletion from sparse index
– delete records 30 & 40
10
10 20
50 30
70 50 30
70 40

90 50
110 60
130 70
150 80

26
Deletion from dense index
– delete record 30
10
10 20
20
30 30
40 40

50 50
60 60
70 70
80 80

27
Deletion from dense index
– delete record 30
10
10 20
20
30 30 40
40 40

50 50
60 60
70 70
80 80

28
Deletion from dense index
– delete record 30
10
10 20
20
40 30 30 40
40 40

50 50
60 60
70 70
80 80

29
Insertion, sparse index case
– insert record 34
10
10 20
30
40 30
60

40
50
60

30
Insertion, sparse index case
– insert record 34
10
10 20
30
40 30
60 34
40
50
• our lucky day! 60
we have free space
where we need it!

31
Insertion, sparse index case
– insert record 15
10
10 20
30
40 30
60

40
50
60

32
Insertion, sparse index case
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
• Illustrated: Immediate reorganization
60
• Variation:
– insert new block (chained file)
– update index

33
Insertion, sparse index case
– insert record 25
10 25
10 20
30
40 30 overflow blocks
60 (reorganize later...)
40
50
60

34
Insertion, dense index case

• Similar
• Often more expensive . . .

35
Secondary indexes
Sequence
• Sparse index field

30
30
20
50
80 20
100 70
90 80
... 40
100
10
does not make sense! 90
60

36
Secondary indexes
Sequence
• Dense index field

10 30
20 50
30
40 20
70
50
60
80
70
40
... 100
10
90
60

37
Secondary indexes
Sequence
• Dense index field

10 30
20 50
10 30
40 20
50
70
90
... 50
60
80
sparse 70
40
high ... 100
level 10
90
60

38
With secondary indexes:
• Lowest level is dense
• Other levels are sparse

Also: Pointers are record


pointers
(not block pointers)

39
Duplicate values & secondary
indexes
one option...
10 20
10 10
10
20
Problem: 20
40
excess overhead!20
30 10
• disk space 40 40
• search time 40 10
40 40
40
... 30
40

40
Duplicate values & secondary
indexes
another option...
10 20
10
Problem:
variable size 20
20
40
records in
10
index! 30
40
40 10
You can specify it in 40
your SQL statement:
CREATE INDEX. 30
40

41
Duplicate values & secondary
indexes
20 
10 10
20
30 20
40 40

50 10
60 40
...
10 
40
Another idea: 30 
Chain records with same key?
40 
Problems:
• Need to add fields to records
• Need to follow chain to know records

42
Duplicate values & secondary
indexes
20
10 10
20
30 20
40 40
50 10
60 40
...
10
40
30
Pointers can be stored 40
in separate blocks
buckets
43
Why “bucket” idea is useful

Indexes Records
Name: primary EMP
(name,dept,floor,...)
Dept: secondary
Floor: secondary

See the following query

44
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index

Toy 2nd

 Intersect toy bucket and 2nd Floor

bucket to get set of matching EMP’s45


Summary so far

• Conventional index
– Basic Ideas: sparse, dense, multi-
level…
– Duplicate Keys
– Deletion/Insertion
– Secondary indexes

46
Conventional indexes
Advantage:
- Simple
- Index is sequential file
good for scans
Disadvantage:
- Inserts expensive,
and/or
- Lose sequentiality &
balance
47
Example Index (sequential)
10
20 39
30 31
33 35
continuous 36
40
50
60 32
free space 38
34
70
80
90 overflow area
(not sequential)

48
• Clustering Index (it’s a primary index)

– Defined on an ordered data file


– The data file is ordered on a non-key field.
– Includes one index entry for each distinct value
of the field; the index entry points to the first
data block that contains records with that field
value.
– It is an example of non-dense index where
Insertion and Deletion is relatively
straightforward with a clustering index.

49
• Clustering Index

50
• Clustering Index version 2

51
Outline:

• Conventional indexes
• B-Trees  NEXT
• Hashing schemes

52
• NEXT: Another type of index
– Give up on sequentiality of index
– Try to get “balance”

53
3
5
11

30
30
35

100
101
B+Tree Example

110
100
Root

120
130

150
156 120
179 150
180
180
200
54
n=3
Lookup in a B+ tree

Useful for range queries too

SELECT * FROM R WHERE R.o >= a AND R.o <= b;

55
Sample non-leaf

57

81

95
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95

56
sequence
To record 57
with key 57
To record 81
with key 81
Sample leaf node:

To record
with key 85 95
in
From non-leaf node

57
to next leaf
In textbook’s notation
n=3
Leaf:
30 35
30
35

Non-leaf:
30
30

58
Size of nodes: n+1 pointers
(fixed)
n keys

59
Don’t want nodes to be too
empty
• Use at least

Non-leaf: (n+1)/2pointers

Leaf: (n+1)/2 pointers to


data

60
Leaf
n=3

Non-leaf
min. node

3 120
5 150
11 180
Full node

30
30
35
61

counts even if null


B+tree rules tree of order
n
(1) All leaves at same lowest level
(balanced tree)
(2) Pointers in leaves point to
records except for
“sequence pointer”
(to next leaf)

62
(3) Number of pointers/keys for
B+tree
Max Max Min Min
ptrs keys ptrsdata keys
Non-leaf n+1 n (n+1)/2 (n+1)/2- 1
(non-root)
Leaf
(non-root) n+1 n (n+1)/2 (n+1)/2
Root n+1 n 1 1

63
Insert into B+tree

(a) simple case


– space available in leaf
(b) leaf overflow
(c) non-leaf overflow
(d) new root

64
(a) Insert key = 32 n=3

100
30
11

30
31
3
5

65
(a) Insert key = 32 n=3

100
30
11

30
31
32
3
5

66
(a) Insert key = 7 n=3

100
30
11

30
31
3
5

67
(a) Insert key = 7 n=3

100
30
57
11

30
31
3
5

68
(a) Insert key = 7 n=3

100
30
7
57
11

30
31
3
5

69
100

150
156 120
179 150
180
(c) Insert key = 160

180
n=3

200
70
100

150
156 120
179 150
180
(c) Insert key = 160

160
179

180
n=3

200
71
100

150
156 120
179 150
180
(c) Insert key = 160

160
179
180

180
n=3

200
72
100
160
150
156 120
179 150
180
(c) Insert key = 160

160
179
180

180
n=3

200
73
(d) New root, insert n=3
45

10
20
30
10
12

20
25

30
32
40
1
2
3

74
(d) New root, insert n=3
45

10
20
30
10
12

20
25

30
32
40

40
45
1
2
3

75
1
2 45
3

10
12
10
20
30
20
25
(d) New root, insert

30
32 40
40

40
n=3

45
76
(d) New root, insert n=3
45

30
new root

10
20
30

40
10
12

20
25

30
32
40

40
45
1
2
3

77
Deletion from B+tree

(a) Simple case - no example


(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf

78
(b) Coalesce with
n=4
sibling
– Delete 50

100
10
40
10
20
30

40
50

79
(b) Coalesce with
n=4
sibling
– Delete 50

100
10
40
40
10
20
30

40
50

80
(c) Redistribute keys
n=4
– Delete 50

100
10
40
10
20
30
35

40
50
81
(c) Redistribute keys
n=4
– Delete 50

40 35
100
10

35
10
20
30
35

40
50
82
(d) Non-leaf coalese
n=4
– Delete 37

25
10
20

30
40
30
37
10
14

20
22

25
26

40
45
1
3

83
(d) Non-leaf coalese
n=4
– Delete 37

25
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

84
(d) Non-leaf coalese
n=4
– Delete 37

25
40
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

85
(d) Non-leaf coalese
n=4
– Delete 37

25
new root

40
25
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

86
B+tree deletions in practice

– Often, coalescing is not


implemented
– Too hard and not worth it!

87
• Speaking of buffering…
Is LRU a good policy for B+tree
buffers?
 Of course not!
 Should try to keep root in memory
at all times
(and perhaps some nodes from second level)

88
Variation on B+tree: B-tree (no
+)
• Idea:
– Avoid duplicate keys
– Have record pointers in non-leaf
nodes

89
K1 P1 K2 P2 K3 P3

to record to record to record


with K1 with K2 with K3
to keys to keys to keys to keys
< K1 K1<x<K2 K2<x<k3 >k3

90
10
20
30
40
25
50
45
60
70
B-tree example

80
90 85 65
100 105 125
110
120
130
140 145
150 165
160
n=2

170
91

180
B-tree example n=2
• sequence pointers
not useful now!

125
(but keep space for simplicity)

65
105

145
165
25
45

85
100
110
120
130
140
150
160
170
180
10
20
30
40
50
60
70
80
90

92
Note on inserts
• Say we insert record with key = 25
n=3

10
20
30
leaf

93
Note on inserts
• Say we insert record with key = 25
n=3

10
20
30
leaf

• Afterwards

20


:
10

25
30
94
So, for B-trees:

MAX MIN
Tree Rec Keys Tree Rec Keys
Ptrs Ptrs Ptrs Ptrs
Non-leaf
non-root n+1 n n (n+1)/2 (n+1)/2-1
(n+1)/2-1
Leaf
non-root 1 n n 1 n/2 n/2
Root
non-leafn+1 n n 2 1 1
Root
Leaf 1 n n 1 1 1

95
Tradeoffs:
 B-trees have faster lookup than
B+trees

in B-tree, non-leaf & leaf different


sizes
(the number of pointers are different)
in B-tree, deletion more
 B+trees
complicated preferred!

96
Outline/summary
• Conventional Indexes
• Sparse vs. dense
• Primary vs. secondary
• B trees
• B+trees vs. B-trees
• B+trees vs. indexed sequential
• Hashing schemes --> Next

97

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy