0% found this document useful (0 votes)

8 views97 pages

03 UW Indexing

Uploaded by

selezeno4ka1337

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views97 pages

03 UW Indexing

Uploaded by

selezeno4ka1337

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 97

Ullman et al.

:
Database System
Principles

Notes 4: Indexing

1
Chapter 4

Indexing & Hashing

record
value ? value

2
Topics

• Conventional indexes
• B-trees
• Hashing schemes

3
• A single-level index is an auxiliary file that
makes it more efficient to search for a record
in the data file.

• The index is usually specified on one field of

the file (although it could be specified on
several fields).

• One form of an index is a file of entries

<field value, pointer to record>,
which is ordered by field value.

• The index is called an access path on the field.

4
• The index file usually occupies considerably
less disk blocks than the data file because its
entries are much smaller
• A binary search on the index yields a pointer
to the file record

• Indexes can also be characterized as dense or

sparse
– A dense index has an index entry for every
search key value (usually every record) in the
data file.
– A sparse (or nondense) index, on the other
hand, has index entries for only some of the
search values
(typically one entry per data file block)
5
Sequential File
10
20
30
40
50
60
70
80
90
100

6
Dense Index Sequential File

10 10
20 20
30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120

7
Sparse Index Sequential File

10 10
30 20
50
70
30
40
90
110 50
130 60
150
70
170 80
190 90
210 100
230

8
Sparse 2nd level Sequential File

10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230

9
Notes on pointers:

(1) Block pointer (sparse index) can

be smaller than record
pointer

10
Sparse vs. Dense Tradeoff

• Sparse: Less index space per record

can keep more of
index in memory
• Dense: Can tell if any record exists
without accessing file

(Later:
– sparse better for insertions
– dense needed for secondary indexes)

11
Terms
• Index sequential file
• Search key (  primary key)
• Primary index (on ordering field)
• Secondary index (on non-ordering field)
• Dense index (all Search Key values in)
• Sparse index
• Multi-level index

12
Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

13
Duplicate
keys

10
10
10
20
20
30
30
30
40
45
14
Duplicate
keys
Dense index, one way to
implement?
10
10 10
10
10 10
20 20

20 20
30 30
30 30
30 30
40
45
15
Duplicate
keys
Dense index, better way?
10
10 10
20
30 10
40 20
20
30
30
30
40
45
16
Duplicate
keys
Sparse index, one way? (see previous page)
careful if looking

10
10 10
for 20 or 30!

10
20 10
30 20
20
30
30
30
40
45
17
Duplicate
keys
Sparse index, another way?
– place first new key from block
10
should 10 10
20
this be 30 10
40? 30 20
20
30
30
30
40
45
18
Summary Duplicate values,
primary index
• Index may point to first instance of
each value only
File
Index a
a a
.
.

b
19
Deletion from sparse index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

20
Deletion from sparse index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

21
Deletion from sparse index
– delete record 30
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

22
Deletion from sparse index
– delete record 30
10
10 20
40 30
50 30 40
70 40

90 50
110 60
130 70
150 80

23
Deletion from sparse index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

24
Deletion from sparse index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

25
Deletion from sparse index
– delete records 30 & 40
10
10 20
50 30
70 50 30
70 40

90 50
110 60
130 70
150 80

26
Deletion from dense index
– delete record 30
10
10 20
20
30 30
40 40

50 50
60 60
70 70
80 80

27
Deletion from dense index
– delete record 30
10
10 20
20
30 30 40
40 40

50 50
60 60
70 70
80 80

28
Deletion from dense index
– delete record 30
10
10 20
20
40 30 30 40
40 40

50 50
60 60
70 70
80 80

29
Insertion, sparse index case
– insert record 34
10
10 20
30
40 30
60

40
50
60

30
Insertion, sparse index case
– insert record 34
10
10 20
30
40 30
60 34
40
50
• our lucky day! 60
we have free space
where we need it!

31
Insertion, sparse index case
– insert record 15
10
10 20
30
40 30
60

40
50
60

32
Insertion, sparse index case
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
• Illustrated: Immediate reorganization
60
• Variation:
– insert new block (chained file)
– update index

33
Insertion, sparse index case
– insert record 25
10 25
10 20
30
40 30 overflow blocks
60 (reorganize later...)
40
50
60

34
Insertion, dense index case

• Similar
• Often more expensive . . .

35
Secondary indexes
Sequence
• Sparse index field

30
30
20
50
80 20
100 70
90 80
... 40
100
10
does not make sense! 90
60

36
Secondary indexes
Sequence
• Dense index field

10 30
20 50
30
40 20
70
50
60
80
70
40
... 100
10
90
60

37
Secondary indexes
Sequence
• Dense index field

10 30
20 50
10 30
40 20
50
70
90
... 50
60
80
sparse 70
40
high ... 100
level 10
90
60

38
With secondary indexes:
• Lowest level is dense
• Other levels are sparse

Also: Pointers are record

pointers
(not block pointers)

39
Duplicate values & secondary
indexes
one option...
10 20
10 10
10
20
Problem: 20
40
excess overhead!20
30 10
• disk space 40 40
• search time 40 10
40 40
40
... 30
40

40
Duplicate values & secondary
indexes
another option...
10 20
10
Problem:
variable size 20
20
40
records in
10
index! 30
40
40 10
You can specify it in 40
your SQL statement:
CREATE INDEX. 30
40

41
Duplicate values & secondary
indexes
20 
10 10
20
30 20
40 40

50 10
60 40
...
10 
40
Another idea: 30 
Chain records with same key?
40 
Problems:
• Need to add fields to records
• Need to follow chain to know records

42
Duplicate values & secondary
indexes
20
10 10
20
30 20
40 40
50 10
60 40
...
10
40
30
Pointers can be stored 40
in separate blocks
buckets
43
Why “bucket” idea is useful

Indexes Records
Name: primary EMP
(name,dept,floor,...)
Dept: secondary
Floor: secondary

See the following query

44
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index

Toy 2nd

 Intersect toy bucket and 2nd Floor

bucket to get set of matching EMP’s45

Summary so far

• Conventional index
– Basic Ideas: sparse, dense, multi-
level…
– Duplicate Keys
– Deletion/Insertion
– Secondary indexes

46
Conventional indexes
Advantage:
- Simple
- Index is sequential file
good for scans
Disadvantage:
- Inserts expensive,
and/or
- Lose sequentiality &
balance
47
Example Index (sequential)
10
20 39
30 31
33 35
continuous 36
40
50
60 32
free space 38
34
70
80
90 overflow area
(not sequential)

48
• Clustering Index (it’s a primary index)

– Defined on an ordered data file

– The data file is ordered on a non-key field.
– Includes one index entry for each distinct value
of the field; the index entry points to the first
data block that contains records with that field
value.
– It is an example of non-dense index where
Insertion and Deletion is relatively
straightforward with a clustering index.

49
• Clustering Index

50
• Clustering Index version 2

51
Outline:

• Conventional indexes
• B-Trees  NEXT
• Hashing schemes

52
• NEXT: Another type of index
– Give up on sequentiality of index
– Try to get “balance”

53
3
5
11

30
30
35

100
101
B+Tree Example

110
100
Root

120
130

150
156 120
179 150
180
180
200
54
n=3
Lookup in a B+ tree

Useful for range queries too

SELECT * FROM R WHERE R.o >= a AND R.o <= b;

55
Sample non-leaf

95
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95

56
sequence
To record 57
with key 57
To record 81
with key 81
Sample leaf node:

To record
with key 85 95
in
From non-leaf node

57
to next leaf
In textbook’s notation
n=3
Leaf:
30 35
30
35

Non-leaf:
30
30

58
Size of nodes: n+1 pointers
(fixed)
n keys

59
Don’t want nodes to be too
empty
• Use at least

Non-leaf: (n+1)/2pointers

Leaf: (n+1)/2 pointers to

data

60
Leaf
n=3

Non-leaf
min. node

3 120
5 150
11 180
Full node

30
30
35
61

counts even if null

B+tree rules tree of order
n
(1) All leaves at same lowest level
(balanced tree)
(2) Pointers in leaves point to
records except for
“sequence pointer”
(to next leaf)

62
(3) Number of pointers/keys for
B+tree
Max Max Min Min
ptrs keys ptrsdata keys
Non-leaf n+1 n (n+1)/2 (n+1)/2- 1
(non-root)
Leaf
(non-root) n+1 n (n+1)/2 (n+1)/2
Root n+1 n 1 1

63
Insert into B+tree

(a) simple case

– space available in leaf
(b) leaf overflow
(c) non-leaf overflow
(d) new root

64
(a) Insert key = 32 n=3

100
30
11

30
31
3
5

65
(a) Insert key = 32 n=3

100
30
11

30
31
32
3
5

66
(a) Insert key = 7 n=3

100
30
11

30
31
3
5

67
(a) Insert key = 7 n=3

100
30
57
11

30
31
3
5

68
(a) Insert key = 7 n=3

100
30
7
57
11

30
31
3
5

69
100

150
156 120
179 150
180
(c) Insert key = 160

180
n=3

200
70
100

150
156 120
179 150
180
(c) Insert key = 160

160
179

180
n=3

200
71
100

150
156 120
179 150
180
(c) Insert key = 160

160
179
180

180
n=3

200
72
100
160
150
156 120
179 150
180
(c) Insert key = 160

160
179
180

180
n=3

200
73
(d) New root, insert n=3
45

10
20
30
10
12

20
25

30
32
40
1
2
3

74
(d) New root, insert n=3
45

10
20
30
10
12

20
25

30
32
40

40
45
1
2
3

75
1
2 45
3

10
12
10
20
30
20
25
(d) New root, insert

30
32 40
40

40
n=3

45
76
(d) New root, insert n=3
45

30
new root

10
20
30

40
10
12

20
25

30
32
40

40
45
1
2
3

77
Deletion from B+tree

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf

78
(b) Coalesce with
n=4
sibling
– Delete 50

100
10
40
10
20
30

40
50

79
(b) Coalesce with
n=4
sibling
– Delete 50

100
10
40
40
10
20
30

40
50

80
(c) Redistribute keys
n=4
– Delete 50

100
10
40
10
20
30
35

40
50
81
(c) Redistribute keys
n=4
– Delete 50

40 35
100
10

35
10
20
30
35

40
50
82
(d) Non-leaf coalese
n=4
– Delete 37

25
10
20

30
40
30
37
10
14

20
22

25
26

40
45
1
3

83
(d) Non-leaf coalese
n=4
– Delete 37

25
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

84
(d) Non-leaf coalese
n=4
– Delete 37

25
40
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

85
(d) Non-leaf coalese
n=4
– Delete 37

25
new root

40
25
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

86
B+tree deletions in practice

– Often, coalescing is not

implemented
– Too hard and not worth it!

87
• Speaking of buffering…
Is LRU a good policy for B+tree
buffers?
 Of course not!
 Should try to keep root in memory
at all times
(and perhaps some nodes from second level)

88
Variation on B+tree: B-tree (no
+)
• Idea:
– Avoid duplicate keys
– Have record pointers in non-leaf
nodes

89
K1 P1 K2 P2 K3 P3

to record to record to record

with K1 with K2 with K3
to keys to keys to keys to keys
< K1 K1<x<K2 K2<x<k3 >k3

90
10
20
30
40
25
50
45
60
70
B-tree example

80
90 85 65
100 105 125
110
120
130
140 145
150 165
160
n=2

170
91

180
B-tree example n=2
• sequence pointers
not useful now!

125
(but keep space for simplicity)

65
105

145
165
25
45

85
100
110
120
130
140
150
160
170
180
10
20
30
40
50
60
70
80
90

92
Note on inserts
• Say we insert record with key = 25
n=3

10
20
30
leaf

93
Note on inserts
• Say we insert record with key = 25
n=3

10
20
30
leaf

• Afterwards

20
–

–
:
10

25
30
94
So, for B-trees:

MAX MIN
Tree Rec Keys Tree Rec Keys
Ptrs Ptrs Ptrs Ptrs
Non-leaf
non-root n+1 n n (n+1)/2 (n+1)/2-1
(n+1)/2-1
Leaf
non-root 1 n n 1 n/2 n/2
Root
non-leafn+1 n n 2 1 1
Root
Leaf 1 n n 1 1 1

95
Tradeoffs:
 B-trees have faster lookup than
B+trees

in B-tree, non-leaf & leaf different

sizes
(the number of pointers are different)
in B-tree, deletion more
 B+trees
complicated preferred!

96
Outline/summary
• Conventional Indexes
• Sparse vs. dense
• Primary vs. secondary
• B trees
• B+trees vs. B-trees
• B+trees vs. indexed sequential
• Hashing schemes --> Next

Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
Unit Iv
No ratings yet
Unit Iv
29 pages
Indexing
No ratings yet
Indexing
41 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Index Structures
No ratings yet
Index Structures
34 pages
File Organization
No ratings yet
File Organization
41 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
Index Method2
No ratings yet
Index Method2
26 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
Indexing
No ratings yet
Indexing
24 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
2 - Indexing Structures - Ch14
No ratings yet
2 - Indexing Structures - Ch14
50 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
02 - Indices
No ratings yet
02 - Indices
208 pages
Weekly Exercises 01
No ratings yet
Weekly Exercises 01
16 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Unit5 Dbms Indexing
No ratings yet
Unit5 Dbms Indexing
6 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
25 pages
Indexing
No ratings yet
Indexing
6 pages
Index and Hashing
No ratings yet
Index and Hashing
82 pages
7 Indexing
No ratings yet
7 Indexing
13 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Module 4 Indexing
No ratings yet
Module 4 Indexing
20 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Storage and Indexing
No ratings yet
Storage and Indexing
41 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Indexing
No ratings yet
Indexing
10 pages
Exam Notes COA
No ratings yet
Exam Notes COA
36 pages
02 Blocking - Addional
No ratings yet
02 Blocking - Addional
74 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
30 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
Nasipit Lumber Co. v. NLRC
No ratings yet
Nasipit Lumber Co. v. NLRC
8 pages
Digital Marketing
100% (1)
Digital Marketing
52 pages
CH 13
No ratings yet
CH 13
34 pages
Libro Azul New Edition
No ratings yet
Libro Azul New Edition
208 pages
CS143: Index: Basic Problem Random-Order File
No ratings yet
CS143: Index: Basic Problem Random-Order File
12 pages
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
Indexing and Hashing
No ratings yet
Indexing and Hashing
20 pages
Dynamics and Control of Advanced Structures and Machines, Hans Irschik Michael Krommer Valerii P. Matveenko Alexander K. Belyaev Editors
No ratings yet
Dynamics and Control of Advanced Structures and Machines, Hans Irschik Michael Krommer Valerii P. Matveenko Alexander K. Belyaev Editors
282 pages
Indexing: Contents
No ratings yet
Indexing: Contents
13 pages
Indexing
No ratings yet
Indexing
8 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
Automatic Drive Positioner System
No ratings yet
Automatic Drive Positioner System
15 pages
List 20150331 PDF
No ratings yet
List 20150331 PDF
131 pages
M18-17-B200-101 - MTV1000 - Service's Manual
100% (1)
M18-17-B200-101 - MTV1000 - Service's Manual
63 pages
Mathematical Tools DPP 09 (Of Lec-10) Yakeen NEET 2026
No ratings yet
Mathematical Tools DPP 09 (Of Lec-10) Yakeen NEET 2026
4 pages
Eco Friendly Delivery Business Plan Detailed
No ratings yet
Eco Friendly Delivery Business Plan Detailed
14 pages
Accessories With Name v98
100% (1)
Accessories With Name v98
1 page
Unit II Part C Answer Key - Docx-1
No ratings yet
Unit II Part C Answer Key - Docx-1
18 pages
Finding Domain and Range of A Function G8 1
No ratings yet
Finding Domain and Range of A Function G8 1
13 pages
Unit 3 Storage Strategies Indices B-Trees Hashing
No ratings yet
Unit 3 Storage Strategies Indices B-Trees Hashing
12 pages
Class Test Reasoning-Aptitue
No ratings yet
Class Test Reasoning-Aptitue
8 pages
Makoto Fujita (Chemist)
No ratings yet
Makoto Fujita (Chemist)
3 pages
Hunt PathwaysEmpowermentReflections 2001
No ratings yet
Hunt PathwaysEmpowermentReflections 2001
12 pages
Taekwondo: Engage & Discuss
No ratings yet
Taekwondo: Engage & Discuss
16 pages
Schemes
No ratings yet
Schemes
4 pages
Derecho de Contratos en Rusia PDF
No ratings yet
Derecho de Contratos en Rusia PDF
26 pages
D061758283 168461912821264 Schedulesc
No ratings yet
D061758283 168461912821264 Schedulesc
3 pages
Getting Started For Windows: Visp 2.6.1: Visual Servoing Platform
No ratings yet
Getting Started For Windows: Visp 2.6.1: Visual Servoing Platform
16 pages
Grant of GP 5400
No ratings yet
Grant of GP 5400
2 pages
Formula 1 and Le Mans History
No ratings yet
Formula 1 and Le Mans History
2 pages
Fire Protection Gen. Notes
No ratings yet
Fire Protection Gen. Notes
1 page
UST LAW Review (Vol. 65 274
No ratings yet
UST LAW Review (Vol. 65 274
3 pages
MLBSherman Act
No ratings yet
MLBSherman Act
7 pages
The Life and Culture of The Indigenous People
No ratings yet
The Life and Culture of The Indigenous People
2 pages
Abovegroundhydrants Dropdownpillar 84-25-27 P7 PDF
No ratings yet
Abovegroundhydrants Dropdownpillar 84-25-27 P7 PDF
4 pages
Joaquin, JR., and BJ Productions, Inc., vs. Honorable Franklin Drilon, Gabriel Zosa, William Esposo, Felipe Medina, JR., and Casey Francisco
No ratings yet
Joaquin, JR., and BJ Productions, Inc., vs. Honorable Franklin Drilon, Gabriel Zosa, William Esposo, Felipe Medina, JR., and Casey Francisco
5 pages
Emergency Lighting Specification Zoneworks XT L10 V1.1
No ratings yet
Emergency Lighting Specification Zoneworks XT L10 V1.1
5 pages
Panduit LOTO Training Quiz
No ratings yet
Panduit LOTO Training Quiz
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

03 UW Indexing

Uploaded by

03 UW Indexing

Uploaded by

Ullman et al.

Indexing & Hashing

• The index is usually specified on one field of

• One form of an index is a file of entries

• The index is called an access path on the field.

• Indexes can also be characterized as dense or

(1) Block pointer (sparse index) can

• Sparse: Less index space per record

Also: Pointers are record

See the following query

 Intersect toy bucket and 2nd Floor

bucket to get set of matching EMP’s45

– Defined on an ordered data file

Useful for range queries too

SELECT * FROM R WHERE R.o >= a AND R.o <= b;

Leaf: (n+1)/2 pointers to

counts even if null

(a) simple case

(a) Simple case - no example

– Often, coalescing is not

to record to record to record

in B-tree, non-leaf & leaf different

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.