Hashing Part1

EE 232 Data Structures
& Algorithms
Session-18
Chapter 5
Hashing
Hashing 5-1
Motivation
 Let us assume that we want to search for a particular
item in a database of 20,000,000 data items
 How long would it take to find for a successful sea
rch?
 How long would it take for an unsuccessful search?
 It depends on the data structure
Hashing 5-2
Motivation…
 If the data structure is a linked list,
 the search time is O(N)
 If the data structure is a binary search tree,

 estimated running time is O(logN)
 log 20,000,000 ≈ 24
 Can we do even better than O(logN) ?

 hash table ADT
Hashing 5-3
Chapter 5: Hashing
5.1 General Idea
5.2 Hash Function
5.3 Separate Chaining
5.4 Open Addressing
5.5 Rehashing
5.6 Extendible Hashing
Hashing 5-4
Chapter 5: Hashing
Our goals: We will
 See several methods of implementing the hash table
 Compare these methods analytically
 Show numerous applications of hashing
 Compare hash tables with binary search trees
Hashing 5-5
First some terminology
 Hash table ADT is a data structure that supports o
nly a subset of the operations allowed by the binar
y search trees
 Implementation of a hash table is called hashing
 Hashing is a technique used for performing inserti

ons, deletions, and finds in a constant time
Hashing 5-6
Chapter 5: Hashing
5.1 General Idea
5.2 Hash Function
5.4 Open Addressing
5.5 Rehashing
Hashing 5-7
General Idea
 The general idea behind hashing is to directly map ea
ch data item into an address in memory using some f
unction
 key  hash function  index to an array
 Components of hashing
 A hash table is an array of some fixed size ‘m’

 A hash function h(k) that maps a search key k to s
ome location in the range [0...m-1]
h(k): S  {0, 1, …, m-1}
Hashing 5-8
General Idea… array
0
Name: Salman Arain h(Salman) = 1
University: NFC-IEFR 1
Office: First Floor
Mobile Number: 2
Email:
etc
Data Item
Here we are using a hashing function that

accepts my last name as a key and returns a 1
m-1
Hashing 5-9
General Idea…
 Desired Properties of h(k)
 simple to compute
 uniform distribution of keys over {0, 1, …, m-1}
when h(k1) = h(k2) for two distinct keys k1, k2 , w
e have a collision
Copyright © Kashif Javed Hashing 5-10

General Idea… array
0
Name: Salman Arain h(Salman) = 1
University: NFC-IEFR 1
Office: First Floor
Mobile Number: 2
Email:
etc
Data Items
Name: Hassan Hamid
University: UET
Office: room 8 EED
Mobile Number:
h(Hamid) = 1
Email:
etc
A collision has occurred m-1
Hashing 5-11
General Idea…
 Two Important Topics in Hashing
 How to select a hash function
 How to resolve collisions
Hashing 5-12
General Idea…
 Hashing revisited
 A hash table data structure is an array
 Each data element contains a key
 Each key is mapped to some number in the range
from 0 to TableSize-1, with the help of a hash functi
on
The hash function should be efficient to compute and sho
uld ensure that different data items get mapped to differe
nt numbers
 The key and the hashing function are used both to
insert the data into the table and to later find that d
ata
Hashing 5-13
General Idea…
 Example
 PTCL is a large telephone company, and they wan
t to maintain a database that provides the caller ID
capability
given a phone number, return the caller’s na
me
phone numbers range from 0 to r = 107 -1
want to do this as efficiently as possible
Hashing 5-14
General Idea…
 Solution 1
 an array indexed by key
takes O(1) time,
O(r) space - huge amount of wasted space
Umer (null) Hassan (null) (null)

Hamid Hamid
6829227 0000000 6829229 0000000 0000000
Hashing 5-15
General Idea…
 Solution 2
 Linked list
 takes O(r) time,
O(r) space (only as much space as is needed )
Umer Hamid Hassan Hamid

6829227 6829229
Hashing 5-16
General Idea…
 Solution 3
 Hash table
O(1) expected time, O(n+m) space, where m is table size
 Like an array, but come up with a function to map the
large range into one which we can manage
e.g. take the original key, modulo the (relatively small) size o
f the array, and use that as an index
6829229 mod 5 = 4
(null) (null) (null) (null) Hassan

Hamid
0 1 2 3 4
Hashing 5-17
Chapter 5: Hashing
5.1 General Idea
5.2 Hash Function
5.4 Open Addressing
5.5 Rehashing
Hashing 5-18
Hash Function
 A simple hash function
 If input keys (k) are integers
 hash function, h( k ) = k mod m
where m is the table size
 Example
Suppose m = 10,
k = 10, 20, 30, 40
h(k) = 0, 0, 0, 0
A bad choice if the keys end in zeros
Hashing 5-19
Hash Function…
 Another simple hash function
 If input keys (k) are integers
 hash function, h( k ) = k mod m
where m is the table size and is a prime number
 Example
Suppose m = 11,
k = 10, 20, 30, 40
h(k) = 10, 9, 8, 7
Distributes the keys more uniformly
Hashing 5-20
Hash Function…
 A simple hash function
 If the keys are strings, then the hash function can b
e some function of the characters in the strings
 One possibility is to simply add the ASCII values of t
he characters:
 length1 
h( str )    str[i ] %m
 Example
 i 0 
h(ABC) = (65 + 66 + 67)%m
Hashing 5-21
Hash Function…
 Programming details
typedef unsigned int Index;
Index
Hash1( const char *Key, int TableSize )
{
unsigned int HashVal = 0;
/* 1*/ while( *Key != '\0' )
/* 2*/ HashVal += *Key++;
/* 3*/ return HashVal %
TableSize; }
Hashing 5-22
Hash Function…
 Problem
 If the table size is large, the function does not distri
bute the keys well
 TableSize = 10,007 (prime number)
 Keys are <= 8 characters
 Each char is 1 byte long so highest value it can ha
ve is 28 – 1 = 127
 Hash function will have range: 0 to (127*8) = 0 to
1016
 ~10K spaces in the table and only using the first 1
K elements
Hashing 5-23
Hash Function…
 Another hash function
 If the keys are strings
 convert the string into some number in some arbitr
ary base b
 length1 i
h( str )    str[i ]  b %m
 i 0 
 Example
h(ABC) = (65b0 + 66b1 + 67b2) %m
Hashing 5-24
Hash Function…
 Examines first three characters of the input
 The value 27 represents the number of letters i
n English alphabet, plus the blank
Index
Hash2( const char *Key, int TableSize )
{
return ( Key[ 0 ] + 27 * Key[ 1 ] + 729 * Key[ 2 ] )% TableSize;
}
Hashing 5-25
Hash Function…
 Rule of Thumb
 Hash functions should try to achieve uniform full c

overage of the hash table, while minimizing collisio
ns
 Since this is usually impossible, and collisions will
almost always occur, an important design consider
ation is how you deal with the collision resolution
Hashing 5-26
Chapter 5: Hashing
5.1 General Idea
5.2 Hash Function
5.4 Open Addressing
5.5 Rehashing
Hashing 5-27
Separate Chaining
 How to deal with two keys which hash to the same sp
ot in the array?
 Use chaining
 All data items that hash to the same number are k
ept in a linked list
Setup an array of lists, indexed by the keys, to l
ists of items with the same key
Hashing 5-28
Separate Chaining…
 Example
0 Name: Kashif Javed Name: Hassan Hamid

University: UET University: UET
Office: room 8 EED
1 Office: room 4 EED
Mobile Number:
Mobile Number:
Email: Email:
2 etc etc
m-1 The two entries are now stored

in a linked list
Hashing 5-29
 Example
 Here the size of the hash t
able = 10
 Keys are the first ten perfe
ct squares 0, 1, 4, 9, 16, 25
, 36, 49, 64, and 81
 The hash function, h(k) =
k mod 10
A separate chaining hash table

Hashing 5-30
 To find an element
 using hash function, look up its position in table
 search for the element in the linked list of the hash
ed slot
 To insert an element
 compute h(k) to determine which list to traverse
 If T[h(k)] contains a null pointer, initialize this entry
to point to a linked list that contains k alone
 If T[h(k)] is a non-empty list, we add k at the begin
ning of this list
Hashing 5-31
 To delete an element
 compute h(k), then search for k within the list at T[
h(k)]
 delete k if it is found
Hashing 5-32
 Programming Details
#ifndef _HashSep_H
#define _HashSep_H
struct ListNode
{
struct ListNode;
ElementType Element;
typedef struct ListNode *Position;
Position Next;
struct HashTbl;
};
typedef struct HashTbl *HashTable;
HashTable InitializeTable( int TableSize );
void DestroyTable( HashTable H );
Position Find( ElementType Key, HashTable H );
void Insert( ElementType Key, HashTable H );
ElementType Retrieve( Position P );
#endif /* _HashSep_H */
Hashing 5-33
typedef Position List;
/* List *TheList will be an array

of lists, allocated later. The lists
use headers (for simplicity), */
struct HashTbl
{
int TableSize;
List *TheLists;
};
Hashing 5-34
HashTable
InitializeTable( int TableSize )
{
HashTable H;
int i;
/* 1*/ if( TableSize < MinTableSize )
{
/* 2*/ Error( "Table size too small" );
/* 3*/ return NULL;
}
/* Allocate table */
/* 4*/ H = malloc( sizeof( struct HashTbl ) );
/* 5*/ if( H == NULL )
/* 6*/ FatalError( "Out of space!!!" );
/* 7*/ H->TableSize = NextPrime( TableSize );
Hashing 5-35
/* Allocate array of lists */
/* 8*/ H->TheLists = malloc( sizeof( List ) * H->TableSize );
/* 9*/ if( H->TheLists == NULL )
/*10*/ FatalError( "Out of space!!!" );
/* Allocate list headers */
/*11*/ for( i = 0; i < H->TableSize; i++ )
{
/*12*/ H->TheLists[ i ] = malloc( sizeof( struct ListNode ) );
/*13*/ if( H->TheLists[ i ] == NULL )
/*14*/ FatalError( "Out of space!!!" );
else
/*15*/ H->TheLists[ i ]->Next = NULL;
}
/*16*/
return H;
}
Hashing 5-36
Position
Find( ElementType Key, HashTable H )
{
Position P;
List L;
/* 1*/ L = H->TheLists[ Hash( Key, H->TableSize ) ];
/* 2*/ P = L->Next;
/* 3*/ while( P != NULL && P->Element != Key )
/* 4*/ P = P->Next;
/* 5*/ return P;
}
Hashing 5-37
void
 Programming Insert( ElementType Key, HashTable H )
Details {
Position Pos, NewCell;
List L;
/* 1*/ Pos = Find( Key, H );
/* 2*/ if( Pos == NULL ) /* Key is not found */
{
/* 3*/ NewCell = malloc( sizeof( struct ListNode ) );
/* 4*/ if( NewCell == NULL )
/* 5*/ FatalError( "Out of space!!!" );
else {
/* 6*/ L = H->TheLists[ Hash( Key, H->TableSize ) ];
/* 7*/ NewCell->Next = L->Next;
/* 8*/ NewCell->Element = Key;
/* 9*/ L->Next = NewCell;
}
}
}
Hashing 5-38
 Analysing the performance of separate chaining hash t
able
 as we increase the number of elements N in the has
h table, more and more items will be stored in linked
lists, thus slowing everything down
 Also increasing the table size TableSize allows you t
o hold more data in an efficient manner
 It turns out that the ratio λ = N / T is the important q
uantity to analyze
 This is called the load factor
Hashing 5-39
 Analysing the performance of separate chaining hash
table…
 Time to perform search = the constant time requir
ed to evaluate the hash function + time to travers
e the list
 Note that, for separate chaining, the average lengt
h of a linked list is λ
 Thus, an unsuccessful search will require to traver
se λ links on average
 A successful search requires that about 1 + (λ/2) li
nks be traversed
Hashing 5-40
 Analysing the performance of separate chaining hash t

able…
 Thus, lowering the load factor is a good thing, from
the time point of view
 From the space point of view, lowering the load fact
or means increasing the table size
 This can lead to largely wasted space
 A reasonable compromise is λ ≈ 1
search times will be roughly O(1)
Hashing 5-41
Chapter 5: Hashing
5.1 General Idea
5.2 Hash Function
5.4 Open Addressing
5.5 Rehashing
Hashing 5-42
Open Addressing
 Separate chaining has the disadvantage of using link
ed lists that slows the algorithm because of the time r
equired to allocate new cells
 Open addressing
 relocate the key k to be inserted if it collides with a
n existing key
That is, we store k at an entry different from T[h
(k)]
Hashing 5-43
Open Addressing…
 Open addressing hashing resolves collisions by tryin
g alternative slots in the hash table, until an empty cel
l is found
 cells h0 (X), h1 (X), h2 (X),… are tried in succession
where hi (X) = (Hash(X) + F(i))mod TableSize with F
(0) = 0
 The function, F, is the collision resolution strategy
Hashing 5-44
Open Addressing…
 Linear Probing
 F(i) is a linear function of i, i.e. F(i) = i
h0(X) = Hash(X) + 0
h1(X) = Hash(X) + 1
h2(X) = Hash(X) + 2
…
cells are probed sequentially (with wraparound)
in search of an empty cell
Hashing 5-45
Open Addressing…
 *Example
 suppose that our hash function converts a 2-digit i
nteger into a single digit by taking the least-signific
ant digit
*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-46

Open Addressing…
 *Insertions
 Insert the numbers 81, 70, 97, 60, 51, 38, 89, 68, 24 into the
initially empty hash table:
0 1 2 3 4 5 6 7 8 9

Open Addressing…
 *Insertions…
 We can easily insert 81, 70, and 97 into their corresponding
bins:
0 1 2 3 4 5 6 7 8 9
70 81 97

Open Addressing…
 *Insertions…
 Inserting 60 causes a collision in bin 0, therefore, we check:
bin 1 (also full), and
bin 2 (empty)
0 1 2 3 4 5 6 7 8 9
70 81 60 97

Open Addressing…
 *Insertions…
 Inserting 51 also causes a collision, this time, in bin 1, theref
ore, we check:
bin 2 (also full), and
bin 3 (empty)
0 1 2 3 4 5 6 7 8 9
70 81 60 51 97

Open Addressing…
 *Insertions…
 38 and 89 can be placed into bins 8 and 9 respectively witho
ut collisions
0 1 2 3 4 5 6 7 8 9
70 81 60 51 97 38 89

Open Addressing…
 *Insertions…
 Inserting 68 causes a collision in bin 8, and therefore we che
ck bins:
9, 0, 1, 2, 3, and finally 4 which is empty
insert 68 into bin 4
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 97 38 89

Open Addressing…
 *Insertions…
 Inserting 24 causes a collision in bin 4, however the next bin
is empty
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89

Open Addressing…
 *Searching
 Testing for membership is similar to insertions
 Start at the appropriate bin, and continue searchin
g forward until either:
the item is found, or
an empty bin is found

Open Addressing…
 *Searching…
 Searching for 68, we first examine bin 8, then 9, 0, 1, 2, 3, a
nd 4, finding 68 in bin 4
 Searching for 23, we search bins 3, 4, 5, and bin 6 is empty,
so 23 is not in the table
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89

Open Addressing…
 *Removing
 We cannot simply remove elements from the hash table
 For example, if we delete 89 by removing it, we can no longe
r find 68
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89

Open Addressing…
 *Removing…
 However, we cannot simply move all entries up to fill the gap
 Moving 70 to bin 9 would make it impossible to find 70
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
81 60 51 68 24 97 38 70

Open Addressing…
 *Removing…
 Instead, we must probe forward, moving only those elements
which would not be moved to a location before their bin start
s
 For example, we remove 89
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38

Open Addressing…
 *Removing…
 We probe forward until we find an entry which can be moved
into bin 9
 We cannot move 70, 81, 60, or 51, but we can move 68
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68

Open Addressing…
 *Removing…
 Next, we search forward again, and note that 24 can be mov
ed forward
 The next cell is already empty, and therefore we are finished
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68

Open Addressing…
 *Removing…
 Suppose we now remove 60
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68

Open Addressing…
 *Removing…
 We find 60 in bin 2, and therefore we remove it
 We search forward and find that we can move 51 into bin 2
0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68

Open Addressing…
 *Removing…
 We cannot move 24 forward
 The next bin (5) is empty, therefore we are finished
0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68

Open Addressing…
 *Primary Clustering
 We have already observed the following phenome
non:
as we insert more elements into the hash table,
the contiguous regions get larger
Any key that hashes into the cluster will require
several attempts to resolve the collision
 This results in longer search times

Open Addressing…
 *Primary Clustering…
 Consider inserting the following entries 81, 70, 97, 63, 76, 38,
85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49
 Use the number modulo 25 to determine which bin it should
occupy
 The first five don’t cause any collisions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
76 81 63 70 97

Open Addressing…
 Inserting 38 causes a collision in bin 13
 The next seven do not cause any further collisions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
76 55 81 57 9 85 63 38 68 70 21 97 73

Open Addressing…
 The next four insertions cause collisions:
60 (bin 10)
72 (bin 22)
74 (bin 24)
85 (bin 10)
 We can safely insert 16 into bin 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
74 76 55 81 57 9 85 60 85 63 38 16 68 70 21 97 73 72

Open Addressing…
 The remaining insertions all cause collisions:
61 (bin 11)
7 (bin 7)
49 (bin 24)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
74 76 49 55 81 57 7 9 85 60 85 63 38 61 16 68 70 21 97 73 72

Open Addressing…
 Asymptotic Performance
 Primary clustering affects the number of probes re
quired to perform the insertions, searches or deleti
ons
 The average number of probes for a successful se
arch can be estimated as
Number of probes  ( ½ ) ( 1+1/( 1- ) )
where  is the load factor – what fraction of the table is u
sed
Hashing 5-69
Open Addressing…
 Asymptotic Performance…
 The number of probes for an unsuccessful search
or for an insertion is higher:
Number of probes  ( ½ ) ( 1+1/( 1- )2 )
if  = 0.75 , 8.5 probes are expected
if  = 0.9 , 50 probes are expected, and this unreasonable
Hashing 5-70
Open Addressing…
 *The following plot shows how the number of require
d probes increases

Open Addressing…
 *Primary clustering occurs with linear probing becaus
e the same linear pattern
 if a bin is inside a cluster, then the next bin must ei
ther
also be in that cluster, or
expand the cluster
 Instead of searching forward in a linear fashion, consi

der searching forward using a quadratic function

In Next Class
 Open addressing with quadratic probing

 Rehashing and extendible hashing
Hashing 5-73

Hashing Part1

Uploaded by

Copyright:

Available Formats

Hashing Part1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hashing Part1

Uploaded by

Copyright:

Available Formats

EE 232 Data Structures

 If the data structure is a binary search tree,

 Can we do even better than O(logN) ?

 Hashing is a technique used for performing inserti

 A hash table is an array of some fixed size ‘m’

Here we are using a hashing function that

Copyright © Kashif Javed Hashing 5-10

Umer (null) Hassan (null) (null)

Umer Hamid Hassan Hamid

(null) (null) (null) (null) Hassan

typedef unsigned int Index;

 Hash functions should try to achieve uniform full c

0 Name: Kashif Javed Name: Hassan Hamid

m-1 The two entries are now stored

A separate chaining hash table

typedef Position List;

/* List *TheList will be an array

 Analysing the performance of separate chaining hash t

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-46

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-47

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-48

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-49

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-50

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-51

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-52

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-53

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-54

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-55

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-56

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-57

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-58

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-59

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-60

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-61

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-62

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-63

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-64

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-65

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-66

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-67

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-68

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-71

 Instead of searching forward in a linear fashion, consi

*http://www.ece.uwaterloo.ca/~ece250/ Hashing 5-72

 Open addressing with quadratic probing

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.