Data Structure Using 'C' Hashing: Department of CSE & IT C.V. Raman College of Engineering Bhubaneswar
Data Structure Using 'C' Hashing: Department of CSE & IT C.V. Raman College of Engineering Bhubaneswar
Data Structure Using 'C' Hashing: Department of CSE & IT C.V. Raman College of Engineering Bhubaneswar
Hashing
Folding Method
Modular Method
Double Hashing
Storage Management
Garbage Collection
Dynamic memory management
Buddy System
Application
Hashing is the technique used for performing almost constant time search in case of
insertion, deletion and find operation.
The essence of hashing is to facilitate the next level searching method when
compared with the linear or binary search.
It is the process of mapping large amount of data item to a smaller table with the help
of a hashing function.
The advantage of this searching method is its efficiency to hand vast amount of data
items in a given collection (i.e. collection size).
Due to this hashing process, the result is a Hash data structure that can store or
retrieve data items in an average time disregard to the collection size.
Let ‘n’ distinct record Table is one of the important data structure used
for information retrieval.
Records with keys K1,K2-----Kn are stored in a file and we want to
retrieve a record with a given key value K.
One way is to start from the first record by comparing K with its key
value, if found then stop ,otherwise proceed to the next record. The
searching time is directly proportional to the number of records in the
file.
Another way is to use an access table, where the searching time is
significantly reduced.
Let us assume a function f and apply this function to K, then it returns ‘i’ that
is f(k)=i. The ith entry of the access table gives the location of record with key
value K.
One type of such access table is known as Hash table.
The hash table is an array which contains key values with pointers to the
corresponding records.
The basic idea here is to place a key inside the hash table, and the
location/index of that key will be calculated from the given key value itself.
The one to one correspondence between a key value and index in the hash
table is known as hashing.
Search engine
Looks for all documents containing a given word
The hash table contains key values with pointers to the corresponding
records.
The basic idea is to place key value into location in the hash table and this
location will be calculated from the key value itself.
The one to one correspondence between the key value and index in the
hash table is known as hashing.
A hash table, or a hash map, is a data structure that associates keys (names)
with values (attributes).
• Look-Up Table
• Dictionary
• Cache
• Extended Array
U
(universe of keys) h(k1)
h(k4)
K k1
h(k2) = h(k5)
(actual k4 k2
keys)
k5 k3 h(k3)
m-1
1. Truncation Method
2. Mid Square Method
3. Folding Method
4. Modular Method
Ex:1 If a hash table contains 999 entries at the most or 999 different key indexes
may be kept, then a hash function may be defined such that from an eight digit
integer 12345678, first, second and fifth digits from the right may be used to
define a key index i.e. 478, which is the key position in the hash lookup table
where this element will be inserted. Any other key combination may be used.
Ex:2 If students have an 9-digit identification number, take the last 3 digits as the
table position
e.g. 925371622
becomes
622
4. Modular Method
For mapping a given key element in the hash table, mod operation of
individual key is calculated. The remainder denotes particular address position
of each element.
The result so obtained is divided by an integer, usually taken to be the size
of the hash table to obtain the remainder as the hash key to place that element
in the lookup table.
Idea:
Map a key k into one of the m slots by taking the remainder of k divided
by m
h(k) = k mod m
Advantage:
fast, requires only one operation
Disadvantage:
Certain values of m are bad, e.g.,
power of 2
non-prime numbers
The idea is to keep a list of all elements that hash to the same
value.
The array elements are pointers to the first nodes of the lists.
A new item is inserted to the front of the list.
Advantages:
Better space utilization for large items.
Simple collision handling: searching linked list.
Overflow: we can store more items than the hash table size.
Deletion is quick and easy: deletion from the linked list.
1 81 1
2
4 64 4
5 25
6 36 16
7
9 49 9
Slot j contains a pointer to the head of the list of all elements that
hash to j
24 Department of CSE and IT
24
Closed Hashing(Open Addressing)
Basic idea:
probe sequence!
Probe sequences
<h(k,0), h(k,1), ..., h(k,m-1)>
Example
<1, 5, 9>
26 Department of CSE and IT
Common Open Addressing Methods
Linear probing
Quadratic probing
Double hashing
Three cases:
(1) Position in table is occupied with an element of
equal key
0
(2) Position in table is empty
(3) Position in table occupied with a different
h(k1)
element
h(k4)
Case 2: probe the next higher index until the
element is found or an empty position is found
h(k2) = h(k5)
The process wraps around to the beginning of the
table
h(k3)
m-1
Problems
Cannot mark the slot as empty
Impossible to retrieve keys inserted after that slot 0
was occupied
Solution
Mark the slot with a sentinel value DELETED
The deleted slot can later be used for insertion
Searching will be able to find all the keys
m-1
i=0,1,2,...
h1(k) = k mod 13 0
1 79
h2(k) = 1+ (k mod 11)
2
h(k, i) = (h1(k) + i h2(k) ) mod 13 3
Insert key 14: 4 69
h1(14,0) = 14 mod 13 = 1 5 98
6
h(14,1) = (h1(14) + h2(14)) mod 13
7 72
= (1 + 4) mod 13 = 5 8
h(14,2) = (h1(14) + 2 h2(14)) mod 13 9 14
10
= (1 + 8) mod 13 = 9
11 50
12
a (load factor)
1 a
k=0
Static Allocation :
allocation during translation that remains fixed throughout execution.
does not allow recursive subprograms
Dynamic Allocation: (Heap Storage Management)
Allocation Policies:
First fit chooses the first block in the free list big enough to satisfy the
request, and split it.
Next fit is like first fit, except that the search for a fitting block will start
where the last one stopped, instead of at the beginning of the free list.
Best fit chooses the smallest block bigger than the requested one.
Worst fit chooses the biggest, with the aim of avoiding the creation of too
many small fragments – but doesn’t work well in practice.
• Advantage:
– Both acquire( ) and release( ) operations are fast.
• Disadvantages:
– Only memory of size that is a power of 2 can be allocated
internal fragmentation if memory request is not a power of 2.
– When a block is released, its buddy may not be free, resulting in
external fragmentation.