0% found this document useful (0 votes)
3 views33 pages

Hashing

The document discusses hashing as a method to improve search times in data structures, highlighting its time complexity of O(1) compared to other search methods. It explains the concept of hashing functions, collisions, and various collision resolution techniques such as linear probing and extendible hashing. The document also addresses implementation issues and the advantages of using extendible hashing to manage overflow and maintain efficient data retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views33 pages

Hashing

The document discusses hashing as a method to improve search times in data structures, highlighting its time complexity of O(1) compared to other search methods. It explains the concept of hashing functions, collisions, and various collision resolution techniques such as linear probing and extendible hashing. The document also addresses implementation issues and the advantages of using extendible hashing to manage overflow and maintain efficient data retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Hashing

Dr. Nehal Nabil Hassan Mostafa


Indexing Hashing

• Note that any search in a table never gives • O(1) -> Get what I want at one step (1 disk
O(1) access)
• Time complexity=O(log n)

What is the motivation


Motivation is improving search time.
• Sequential search -> O(n)
• Binary search -> O(Log(n))
• Tree -> O(Logk(n))
• Hashing -> O(1)
Hashing function
• Hashing function: It is a function h(k)
that transforms a key (from key space)
to an address( from address space which
is chosen before (pre specified) )
• Hash Tables which can be searched for
an item in O(1) time using a hash
function to form an address from the
key.
Example:
• U : set of available keys 100 -> {0, ….., 999}
• H(k): U -> {0, …., 999}
Key ASCII Product H(k) = Product
mod 1000
Ball 66.65 66x65 = 4290 290
Lowell 76,79 76x79 = 6004 4
Tree 84,82 84x82 = 6888 888
Marwa 70,65 70x65 = 4550 550

There is no relation between key and location?


Solution: If 2 different keys are sort to same address so keys are called synonyms,
address is called home address and this is called collision.
So…
Indexing Hashing
O(log(n)) O(1)
Table driven Computation driven
There is a relation between key and There is not a relation between key and
address address
Address can have only one key(no More than one key can be assigned to
collision) same address(collision) – disadvantage
To avoid collision
• Use extra memory size to increase the address space.
• Putting more than one record at same address using buckets.
• Use good hashing function (to spread the records among the
addresses spaces).
Average case Best case Worst case
Predicted Record Destination:
IF
• N: # available addresses
• R: # records
• P(X): The probability that a given address assigned to it x records
• P(x) records assigned to a given address))
• P(X) = =
• Packing density = R/N
Problem:
Given 1000 addresses and 1000 records
Find:
1. what is the packing density ?
2. P (0), P(1) and P(2)
3. No of addresses which have 0, 1 and 2 records.?
Solution:
1.Packing density=R/n=1
2. P(X) =
3. P (0) = (10 *e-1 )/0! = e-1 = 0.368
4. P (1) = (1*e-1 )/1! = e-1 = 0.368
5. P (2) = ( 12 *e-1 )/2! = ½ e-1 = 0.184
No of addresses expected to have 0 records = 1000* 0.368 = 368 (Unused record)
No of addresses expected to have 1 records = 1000* 0.368 = 368 (No synonyms )
No of addresses expected to have 2 records = 1000* 0.184 = 184
Notes: -
• Any extra record in an address is considered
an overflow.
• P(no of collision) = 1- p(0) – p(1) .
• No of collision = N * p(collision).
• No of overflow = N×

Difference between collisions and overflow


Problem:
• Storing 1000 record at 1000 spaces & storing 500 record at 1000
spaces .At both cases try to answer:
• How many addresses are used?
• How many addresses have no synonyms i.e. have 1 record only?
• How many addresses contain 2 records or more?
• How many overflow records are expected?
Solution: The overflow records are the number of records over 1 record
at each address.

Percentage = [No. of overflow Rec. / Total No. of record] * 100


Packing density α percentage of overflow records.
Using good hashing function:
Collision Resolution (3 ways)
• Progressive overflow (linear probing)
• Double hashing
• Chained progressive overflow
Progressive overflow (linear probing):
• Insertion:
Key -> hashing fn -> home address
If(home address is free)
Add the record
Else
Go to save at first next empty address.
If I went back to the starting point -> no more
space to save records
• Searching:
Key -> hashing fn -> home address
Go on searching till:
• Find the record
• Go back to starting point -> the record is not in
the file
• Find an empty space -> the record is not in the file
Example:
Memory size = 23.
• To search for Evans go to 20 if not found
go to 21 and so on till you find it stop or
find empty space stop and it isn't found.
• Advantage: simplicity
• Disadvantage: if number of collision is
large, searching will be more (i.e. search
length increase)
• Search length: Number of access to
retrieve a record.
• Average search length= [∑ search length
of ….] / [No. of records]
Example cont.

Average search length = 11/5 =2.2


Double hashing:
• After forming the search length table, we start adding records when I
find the memory in the home address not free I add to this add the
value of searching length.
Example:
Using same data of last example:
• When I tried to add Adams I
found 21 not free so add (2) add -
>0
• When I tried to add Evans I
found 20 not free so add (5) add -
>2
Chained progressive overflow
• When adding at add x and x is not
free I add it at the nearest free space
and x points to that address
• Example
-So when I need to search I don’t
move sequentially as in progressive
overflow, but I directly go to the next
place where I may find the record If I
reach -1 the record is not in this file
Bucket:
• Bucket: block of Records corresponding … one address at hash table.
Problem
1) Store 750 Records at 500 addresses with bucket size =2.
2) Store 750 Records at 1000 addresses with bucket size =1.
2) Find the 4 requirements at the previous problem?
Implementation Issues:
Record structure:
• Should contain (counter + empty …………..)

• Determine bucket size


Implementation Issues:
Deletion:
Extendible Hashing

• Situation: Bucket (primary page) becomes full.


• Want to avoid overflow pages
• Add more buckets (i.e., increase “N”)?
• Okay, but need a new hash function!
• Doubling # of buckets makes this easier
• Say N values are powers of 2: how to do “mod N”?
• What happens to hash function when double “N”?
• Problems with Doubling
• Don’t want to have to double the size of the file.
• Don’t want to have to move all the data.
Extendible Hashing (cont)
• Idea: Add a level of indirection!

• Use directory of pointers to buckets,


• Double # of buckets by doubling the directory
• Directory much smaller than file, so doubling it is much cheaper.

• Split only the bucket that just overflowed!


• No overflow pages!
• Trick lies in how hash function is adjusted!
How it Works
• Directory is array of size 4, so 2 bits
needed.
• Bucket for record r has entry with index
= `global depth’ least significant bits of
h(r);
– If h(r) = 5 = binary 101, it is in bucket
2 pointed to
LOCAL DEPTH Bucket A
by 01. 4* 12* 32* 16*

– If h(r) = 7 =GLOBAL
binaryDEPTH
111, it is in bucket pointed to
by 11. 2 1
Bucket B
00 1* 5* 7* 13*

01
10 2
Bucket C
10*
11
DIRECTORY
Handling Inserts

• Find bucket where record belongs.


• If there’s room, put it there.
• Else, if bucket is full, split it:
• increment local depth of original page
• allocate new page with new local depth
• re-distribute records from original page.
• add entry for the new page to the directory
Example: Insert
• 21 = 10101
21,19, 15
• 19 = 10011
• 15 = 01111LOCAL DEPTH 2 Bucket A
4* 12* 32* 16*
GLOBAL DEPTH

2 2
1
Bucket B
00 1* 5* 21*
7* 13*

01
10 2
Bucket C
10*
11

DIRECTORY 2
Bucket D
7* 19* 15*

we denote key r by h(r). DATA PAGES


Insert h(r)=20 (Causes
Doubling)

LOCAL DEPTH 3 3
LOCAL DEPTH
2 32*16*
Bucket A
GLOBAL DEPTH 32* 16* Bucket A
4* 12* 32*16* GLOBAL DEPTH

2 2

00 1* 5* 21*13* Bucket B
3 2
000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
11 10* Bucket C Bucket C
011 10*
100
2
101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
3
3
4* 12* 20* Bucket A2
(`split image' 4* 12* 20* Bucket A2
of Bucket A) (`split image'
of Bucket A)
Points to Note
• 20 = binary 10100. Last 2 bits (00) tell us r in either A or A2.
Last 3 bits needed to tell which.
• Global depth of directory: Max # of bits needed to tell which
bucket an entry belongs to.
• Local depth of a bucket: # of bits used to determine if an entry
belongs to this bucket.
• When does split cause directory doubling?
• Before insert, local depth of bucket = global depth. Insert causes
local depth to become > global depth; directory is doubled by
copying it over and `fixing’ pointer to split image page.
Comments on Extendible Hashing
Delete:
•If removal of data entry makes bucket empty, can be
merged with `split image’

•If each directory element points to same bucket as


its split image, can halve directory.
Summary
• Hash-based indexes: best for equality searches, cannot
support range searches.
• Static Hashing can have long overflow chains.
• Extendible Hashing avoids overflow pages by splitting a full
bucket when a new data entry is to be added to it.
(Duplicates may require overflow pages.)
• Directory to keep track of buckets, doubles periodically.
• Can get large with skewed data; additional I/O if this does not fit
in main memory.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy