0% found this document useful (0 votes)

3 views33 pages

Hashing

The document discusses hashing as a method to improve search times in data structures, highlighting its time complexity of O(1) compared to other search methods. It explains the concept of hashing functions, collisions, and various collision resolution techniques such as linear probing and extendible hashing. The document also addresses implementation issues and the advantages of using extendible hashing to manage overflow and maintain efficient data retrieval.

Uploaded by

testforanything13579

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views33 pages

Hashing

Uploaded by

testforanything13579

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Hashing

Dr. Nehal Nabil Hassan Mostafa

Indexing Hashing

• Note that any search in a table never gives • O(1) -> Get what I want at one step (1 disk
O(1) access)
• Time complexity=O(log n)

What is the motivation

Motivation is improving search time.
• Sequential search -> O(n)
• Binary search -> O(Log(n))
• Tree -> O(Logk(n))
• Hashing -> O(1)
Hashing function
• Hashing function: It is a function h(k)
that transforms a key (from key space)
to an address( from address space which
is chosen before (pre specified) )
• Hash Tables which can be searched for
an item in O(1) time using a hash
function to form an address from the
key.
Example:
• U : set of available keys 100 -> {0, ….., 999}
• H(k): U -> {0, …., 999}
Key ASCII Product H(k) = Product
mod 1000
Ball 66.65 66x65 = 4290 290
Lowell 76,79 76x79 = 6004 4
Tree 84,82 84x82 = 6888 888
Marwa 70,65 70x65 = 4550 550

There is no relation between key and location?

Solution: If 2 different keys are sort to same address so keys are called synonyms,
address is called home address and this is called collision.
So…
Indexing Hashing
O(log(n)) O(1)
Table driven Computation driven
There is a relation between key and There is not a relation between key and
address address
Address can have only one key(no More than one key can be assigned to
collision) same address(collision) – disadvantage
To avoid collision
• Use extra memory size to increase the address space.
• Putting more than one record at same address using buckets.
• Use good hashing function (to spread the records among the
addresses spaces).
Average case Best case Worst case
Predicted Record Destination:
IF
• N: # available addresses
• R: # records
• P(X): The probability that a given address assigned to it x records
• P(x) records assigned to a given address))
• P(X) = =
• Packing density = R/N
Problem:
Given 1000 addresses and 1000 records
Find:
1. what is the packing density ?
2. P (0), P(1) and P(2)
3. No of addresses which have 0, 1 and 2 records.?
Solution:
1.Packing density=R/n=1
2. P(X) =
3. P (0) = (10 *e-1 )/0! = e-1 = 0.368
4. P (1) = (1*e-1 )/1! = e-1 = 0.368
5. P (2) = ( 12 *e-1 )/2! = ½ e-1 = 0.184
No of addresses expected to have 0 records = 1000* 0.368 = 368 (Unused record)
No of addresses expected to have 1 records = 1000* 0.368 = 368 (No synonyms )
No of addresses expected to have 2 records = 1000* 0.184 = 184
Notes: -
• Any extra record in an address is considered
an overflow.
• P(no of collision) = 1- p(0) – p(1) .
• No of collision = N * p(collision).
• No of overflow = N×

Difference between collisions and overflow

Problem:
• Storing 1000 record at 1000 spaces & storing 500 record at 1000
spaces .At both cases try to answer:
• How many addresses are used?
• How many addresses have no synonyms i.e. have 1 record only?
• How many addresses contain 2 records or more?
• How many overflow records are expected?
Solution: The overflow records are the number of records over 1 record
at each address.

Percentage = [No. of overflow Rec. / Total No. of record] * 100

Packing density α percentage of overflow records.
Using good hashing function:
Collision Resolution (3 ways)
• Progressive overflow (linear probing)
• Double hashing
• Chained progressive overflow
Progressive overflow (linear probing):
• Insertion:
Key -> hashing fn -> home address
If(home address is free)
Add the record
Else
Go to save at first next empty address.
If I went back to the starting point -> no more
space to save records
• Searching:
Key -> hashing fn -> home address
Go on searching till:
• Find the record
• Go back to starting point -> the record is not in
the file
• Find an empty space -> the record is not in the file
Example:
Memory size = 23.
• To search for Evans go to 20 if not found
go to 21 and so on till you find it stop or
find empty space stop and it isn't found.
• Advantage: simplicity
• Disadvantage: if number of collision is
large, searching will be more (i.e. search
length increase)
• Search length: Number of access to
retrieve a record.
• Average search length= [∑ search length
of ….] / [No. of records]
Example cont.

Average search length = 11/5 =2.2

Double hashing:
• After forming the search length table, we start adding records when I
find the memory in the home address not free I add to this add the
value of searching length.
Example:
Using same data of last example:
• When I tried to add Adams I
found 21 not free so add (2) add -
>0
• When I tried to add Evans I
found 20 not free so add (5) add -
>2
Chained progressive overflow
• When adding at add x and x is not
free I add it at the nearest free space
and x points to that address
• Example
-So when I need to search I don’t
move sequentially as in progressive
overflow, but I directly go to the next
place where I may find the record If I
reach -1 the record is not in this file
Bucket:
• Bucket: block of Records corresponding … one address at hash table.
Problem
1) Store 750 Records at 500 addresses with bucket size =2.
2) Store 750 Records at 1000 addresses with bucket size =1.
2) Find the 4 requirements at the previous problem?
Implementation Issues:
Record structure:
• Should contain (counter + empty …………..)

• Determine bucket size

Implementation Issues:
Deletion:
Extendible Hashing

• Situation: Bucket (primary page) becomes full.

• Want to avoid overflow pages
• Add more buckets (i.e., increase “N”)?
• Okay, but need a new hash function!
• Doubling # of buckets makes this easier
• Say N values are powers of 2: how to do “mod N”?
• What happens to hash function when double “N”?
• Problems with Doubling
• Don’t want to have to double the size of the file.
• Don’t want to have to move all the data.
Extendible Hashing (cont)
• Idea: Add a level of indirection!

• Use directory of pointers to buckets,

• Double # of buckets by doubling the directory
• Directory much smaller than file, so doubling it is much cheaper.

• Split only the bucket that just overflowed!

• No overflow pages!
• Trick lies in how hash function is adjusted!
How it Works
• Directory is array of size 4, so 2 bits
needed.
• Bucket for record r has entry with index
= `global depth’ least significant bits of
h(r);
– If h(r) = 5 = binary 101, it is in bucket
2 pointed to
LOCAL DEPTH Bucket A
by 01. 4* 12* 32* 16*

– If h(r) = 7 =GLOBAL
binaryDEPTH
111, it is in bucket pointed to
by 11. 2 1
Bucket B
00 1* 5* 7* 13*

01
10 2
Bucket C
10*
11
DIRECTORY
Handling Inserts

• Find bucket where record belongs.

• If there’s room, put it there.
• Else, if bucket is full, split it:
• increment local depth of original page
• allocate new page with new local depth
• re-distribute records from original page.
• add entry for the new page to the directory
Example: Insert
• 21 = 10101
21,19, 15
• 19 = 10011
• 15 = 01111LOCAL DEPTH 2 Bucket A
4* 12* 32* 16*
GLOBAL DEPTH

2 2
1
Bucket B
00 1* 5* 21*
7* 13*

01
10 2
Bucket C
10*
11

DIRECTORY 2
Bucket D
7* 19* 15*

we denote key r by h(r). DATA PAGES

Insert h(r)=20 (Causes
Doubling)

LOCAL DEPTH 3 3
LOCAL DEPTH
2 32*16*
Bucket A
GLOBAL DEPTH 32* 16* Bucket A
4* 12* 32*16* GLOBAL DEPTH

2 2

00 1* 5* 21*13* Bucket B
3 2
000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
11 10* Bucket C Bucket C
011 10*
100
2
101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
3
3
4* 12* 20* Bucket A2
(`split image' 4* 12* 20* Bucket A2
of Bucket A) (`split image'
of Bucket A)
Points to Note
• 20 = binary 10100. Last 2 bits (00) tell us r in either A or A2.
Last 3 bits needed to tell which.
• Global depth of directory: Max # of bits needed to tell which
bucket an entry belongs to.
• Local depth of a bucket: # of bits used to determine if an entry
belongs to this bucket.
• When does split cause directory doubling?
• Before insert, local depth of bucket = global depth. Insert causes
local depth to become > global depth; directory is doubled by
copying it over and `fixing’ pointer to split image page.
Comments on Extendible Hashing
Delete:
•If removal of data entry makes bucket empty, can be
merged with `split image’

•If each directory element points to same bucket as

its split image, can halve directory.
Summary
• Hash-based indexes: best for equality searches, cannot
support range searches.
• Static Hashing can have long overflow chains.
• Extendible Hashing avoids overflow pages by splitting a full
bucket when a new data entry is to be added to it.
(Duplicates may require overflow pages.)
• Directory to keep track of buckets, doubles periodically.
• Can get large with skewed data; additional I/O if this does not fit
in main memory.

Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
2.8. ADS_collision Resolution-Extendible Hashing-1
No ratings yet
2.8. ADS_collision Resolution-Extendible Hashing-1
47 pages
UNIT V
No ratings yet
UNIT V
93 pages
Extendible Hashing
No ratings yet
Extendible Hashing
65 pages
DSA Unit VI Hashing and File Organization
No ratings yet
DSA Unit VI Hashing and File Organization
56 pages
hashing-2 (1)
No ratings yet
hashing-2 (1)
17 pages
Lecture14 Hash Based Indexing and Sorting MHH 18oct 2016
No ratings yet
Lecture14 Hash Based Indexing and Sorting MHH 18oct 2016
71 pages
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
No ratings yet
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
32 pages
DSimp2
No ratings yet
DSimp2
21 pages
Data Management: INFO125
No ratings yet
Data Management: INFO125
111 pages
Chapter 11 Hashing
No ratings yet
Chapter 11 Hashing
42 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Part 4 File Organizatin Lec 4 5part 2 File Organization L1&2
No ratings yet
Part 4 File Organizatin Lec 4 5part 2 File Organization L1&2
36 pages
Data Structure Seminar
No ratings yet
Data Structure Seminar
23 pages
Chap 12. Extendible Hashing: File Structures
No ratings yet
Chap 12. Extendible Hashing: File Structures
40 pages
Extendible hashing
No ratings yet
Extendible hashing
4 pages
Fs Mod 5 (WWW - Vtuloop.com)
No ratings yet
Fs Mod 5 (WWW - Vtuloop.com)
105 pages
L07
No ratings yet
L07
24 pages
Unit-3 Hashing Storage Btree
No ratings yet
Unit-3 Hashing Storage Btree
26 pages
Hashing
No ratings yet
Hashing
22 pages
FS Mod5
No ratings yet
FS Mod5
23 pages
Hashing
No ratings yet
Hashing
30 pages
DS 8
No ratings yet
DS 8
30 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
Unit-II_ReHashing_ExtendedHashing
No ratings yet
Unit-II_ReHashing_ExtendedHashing
11 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
11 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
DSA_240404_220052 (1)
No ratings yet
DSA_240404_220052 (1)
9 pages
Lec04 Hashing CH 11 P2
No ratings yet
Lec04 Hashing CH 11 P2
44 pages
Unit 4-Hashing
No ratings yet
Unit 4-Hashing
24 pages
Theory PDF
No ratings yet
Theory PDF
18 pages
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
No ratings yet
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
78 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
5 pages
Module 5
No ratings yet
Module 5
16 pages
CO3 Session 6
No ratings yet
CO3 Session 6
29 pages
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
No ratings yet
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
43 pages
Data and File Structures: Hashing
No ratings yet
Data and File Structures: Hashing
24 pages
CO3 Notes Hashing
No ratings yet
CO3 Notes Hashing
10 pages
Dynamic Hashing
No ratings yet
Dynamic Hashing
35 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
Static and Dynamic Hashing.docx
No ratings yet
Static and Dynamic Hashing.docx
3 pages
Module 5
No ratings yet
Module 5
18 pages
ms0068 2 Sem
No ratings yet
ms0068 2 Sem
9 pages
DSA Technical Textbook
No ratings yet
DSA Technical Textbook
620 pages
4D2-5&6 Hashing Techniques v1.02
No ratings yet
4D2-5&6 Hashing Techniques v1.02
9 pages
mod 5
No ratings yet
mod 5
13 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Study_Material_on_Hashing
No ratings yet
Study_Material_on_Hashing
4 pages
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
No ratings yet
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
15 pages
Module 5-FS
No ratings yet
Module 5-FS
21 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
Csci 2111: Data and File Structures Week 10, Lectures 1 & 2: Hashing
No ratings yet
Csci 2111: Data and File Structures Week 10, Lectures 1 & 2: Hashing
19 pages
Hill Climbing Algorithm in Artificial Intelligence
No ratings yet
Hill Climbing Algorithm in Artificial Intelligence
6 pages
DS UNIT-3
No ratings yet
DS UNIT-3
100 pages
E Ds Extendiblehashing
No ratings yet
E Ds Extendiblehashing
3 pages
Lecture08 AI UMT Fall 2020 21 - V3
No ratings yet
Lecture08 AI UMT Fall 2020 21 - V3
31 pages
Hashing
No ratings yet
Hashing
8 pages
C Blind Search
No ratings yet
C Blind Search
63 pages
File processing (1)
No ratings yet
File processing (1)
55 pages
CA solve
No ratings yet
CA solve
32 pages
AI UNIT- 2
No ratings yet
AI UNIT- 2
48 pages
answer key AI
No ratings yet
answer key AI
23 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
9 pages
خمس 5 مواضيع تدريبية ومقترحة بالحل في الرياضيات الثالثة ثانوي علوم تجريبية ورياضيات وتقني رياضي إعداد الأستاذ شلالي بلال موقع تربية أونلاين
No ratings yet
خمس 5 مواضيع تدريبية ومقترحة بالحل في الرياضيات الثالثة ثانوي علوم تجريبية ورياضيات وتقني رياضي إعداد الأستاذ شلالي بلال موقع تربية أونلاين
42 pages
2017 TXN - TXT 2
0% (1)
2017 TXN - TXT 2
3 pages
SW Lec3 Notes
No ratings yet
SW Lec3 Notes
10 pages
Operations Research - Lecture 10
No ratings yet
Operations Research - Lecture 10
21 pages
Operations Research - Lecture 12
No ratings yet
Operations Research - Lecture 12
13 pages
Lec 13 - Hashing
No ratings yet
Lec 13 - Hashing
43 pages
Hashing
50% (2)
Hashing
43 pages
Hard_Level_MCQs_Data_Structures_Algorithms
No ratings yet
Hard_Level_MCQs_Data_Structures_Algorithms
7 pages
DSA Java Hashing (M 26)
No ratings yet
DSA Java Hashing (M 26)
8 pages
Hash Table v2
No ratings yet
Hash Table v2
34 pages
Merkle-Damgard Scheme
No ratings yet
Merkle-Damgard Scheme
8 pages
What Is Hashing and How Does It Work
No ratings yet
What Is Hashing and How Does It Work
7 pages
Knapsack Backtracking
No ratings yet
Knapsack Backtracking
5 pages
HW3 Sol PDF
No ratings yet
HW3 Sol PDF
44 pages
Hashing
No ratings yet
Hashing
4 pages
Notebook
No ratings yet
Notebook
18 pages
The Electronic Equivalent of The Document and Fingerprint Pair Is The Message and Digest Pair
No ratings yet
The Electronic Equivalent of The Document and Fingerprint Pair Is The Message and Digest Pair
36 pages
Mod-3-Part-2 BB Control Abstraction 221027 153239
No ratings yet
Mod-3-Part-2 BB Control Abstraction 221027 153239
12 pages
Comparison of Uninformed Search Techniques
No ratings yet
Comparison of Uninformed Search Techniques
1 page
JFrog NPM CheatSheet V4
No ratings yet
JFrog NPM CheatSheet V4
1 page
Badan Siber Dan Sandi Negara: Website
No ratings yet
Badan Siber Dan Sandi Negara: Website
6 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
19 pages
Hash Function
No ratings yet
Hash Function
3 pages
Assignment - 05: Shivam Kamlesh Yadav BEIT-B4 77
No ratings yet
Assignment - 05: Shivam Kamlesh Yadav BEIT-B4 77
6 pages
Aima ch4 PDF
No ratings yet
Aima ch4 PDF
8 pages
AI Unit-2 Notes
No ratings yet
AI Unit-2 Notes
22 pages
HW 4
No ratings yet
HW 4
2 pages
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
The Summation of Series
From Everand
The Summation of Series
Harold T. Davis
4/5 (1)
Practise Mathematics: Grade 7 Book 4
From Everand
Practise Mathematics: Grade 7 Book 4
Esther Chen
No ratings yet
Practise Mathematics: Grade 7 Book 1
From Everand
Practise Mathematics: Grade 7 Book 1
Esther Chen
4/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Hashing

Uploaded by

Hashing

Uploaded by

Hashing

Dr. Nehal Nabil Hassan Mostafa

What is the motivation

There is no relation between key and location?

Difference between collisions and overflow

Percentage = [No. of overflow Rec. / Total No. of record] * 100

Average search length = 11/5 =2.2

• Determine bucket size

• Situation: Bucket (primary page) becomes full.

• Use directory of pointers to buckets,

• Split only the bucket that just overflowed!

• Find bucket where record belongs.

we denote key r by h(r). DATA PAGES

•If each directory element points to same bucket as

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.