Hashing
Hashing
HASHING
HASHING
Hashing is a technique used in data structures that efficiently stores and retrieves data in a way that
allows for quick access.
Hashing is finding an address where the data is to be stored as well as located using a key with the
help of the algorithmic function called as hash function.
A hash table is an array-based structure used to store <key, information> pairs. A hash table is a
data structure that stores records in an array, called a hash table.
A Hash table can be used for quick insertion and searching.
The resulting address is used as the basis for storing and retrieving records and this address is called
as home address of the record
For array to store a record in a hash table, hash function is applied to the key of the record
being stored, returning an index within the range of the hash table
The item is then stored in the table of that index position
For storing record:
Key generate array index store the records on that array index
The generation of array index uses hash function which conerts the keys into
array index
Key hash function Array index
Hash Table
Hash table is one of the most important data structures that
uses a special function known as a hash function that maps a
given value with a key to access the elements faster.
A Hash table is a data structure that stores some information,
and the information has basically two main components, i.e.,
key and value.
For example, suppose the key value is John and the value is
the phone number, so when we pass the key value in the hash
function shown as below:
Hash(key)= index;
When we pass the key in the hash function, then it gives the
index.
Hash(john) = 3;
The above example adds the john at the index 3.
ADVANTAGES OF HASH TABLE:
Hash provides better synchronization than other data structures.
Hash tables are more efficient than search trees or other data structures.
Hash tables have high performance when looking up data, inserting, and
deleting existing values.
Hash provides constant time for searching, insertion and deletion
operations on average.
Hash tables are space-efficient.
The time complexity for hash tables is constant regardless of the
number of items in the table.
Most Hash table implementation can automatically resize itself.
Hash tables are easy to use. They perform very well even when
working with large datasets.
Hash tables offer a high-speed data retrieval and manipulation.
DISADVANTAGES OF HASH TABLE:
Hash is inefficient when there are many collisions.
Hash collisions are practically not be avoided for large set of
possible keys.
Hash does not allow null values.
Hash tables have a limited capacity and will eventually fill
up.
Hash tables can be complex to implement.
Hash tables do not maintain the order of elements, which
makes it difficult to retrieve elements in a specific order.
OPERATIONS OF HASHTABLE
The three primary operations of a hash table are search,
insert, and delete.
Insertion – this Operation is used to add an element to the
hash table
Searching – this Operation is used to search for elements
Low collision probability: A good hash function should minimize the duplication of
output values, also known as collisions.
Data integrity: A good hash function should make it difficult for attackers to predict the
output based on known inputs.
Effective searching: A good hash function should minimize the number of comparisons
while performing a search.
Efficient data retrieval: A good hash function should allow for quick access to elements
with constant-time complexity
Collision resolution: A good hash function should have a collision resolution method
that uses an auxiliary data structure or systematic probing of the table.
Example 3: k = 1276 ,M = 11
h(1276) = 1276 mod 11
= 0 , The index is 0 at which the value is stored.
2.Mid square method: is a very good hashing method.
We square the key ,after getting number we take some digits from the middle of
that number as an address.
Square the value of the key k i.e. k2
Extract the middle r digits as the hash value.
Example: let us take 4 digit number as a key
1273 1391 1026
Now take square of these keys
1787569 1620529 1934881 1052676
Now take 3rd and 4th digit fro each number and that will be the address of these keys where the table
size is 1000
75 05 48 26
3.Digit Folding Method : one of the easiest way to compute the keys to break key into pieces,add
them and get the hash address
This method involves two steps:
Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has the same
number of digits except for the last part that can have less digits than the other parts.
Add the individual parts. The hash value is obtained by ignoring the last carry if any.
Example: let us consider 8 digit key as 12345
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
4. Multiplication method :This method involves the following steps:
Step 1:multiply the key k by a constant A in the range value A such that 0 < A < 1.
Multiply the key value with A.
Extract the fractional part of kA.(kA mod 1 is the fractional part)
Step 2:Multiply the result of the above step by the size of the hash table i.e. M. and take the floor of
the result.
H(k)=m(kA mod 1)
Formula:
h(K) = floor (M (kA mod 1)) Here, M is the size of the hash table. k is the key value.
A is a constant value.Where "k A mod 1" means the fractional part of k A, that is, k A -⌊k A⌋.
Example:
k = 12345 ,A = 0.357840 ,M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Collision Resolution Strategies /Techniques:
In Hashing, hash functions were used to generate hash values. The hash
value is used to create an index for the keys in the hash table.
The hash function may return the same hash value for two or more keys.
When two or more keys have the same hash value, a collision happens.
To handle this collision, we use Collision Resolution Techniques.
Collision Resolution Techniques:
Separate Chaining: : It is also called as Open Hashing.
Open Addressing: It is also called as Closed Hashing.
PROBING
Method Description
Just like the name suggests, this method searches for empty slots
linearly starting from
the position where the collision occurred and moving forward. If
Linear probing
the end of the list is reached and no empty slot is found. The
probing starts at the beginning of the list.
Quadratic
This method uses quadratic polynomial expressions to find the
probing
next available free slot.
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to the chain.
inserted or deleted.
Disadvantages:
1) Cache performance of chaining is not good as keys are stored using a linked list. Open
addressing provides better cache performance as everything is stored in the same table.
1. Unlike separate chaining, all the keys are stored inside the hash
table.
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Operations in Open Addressing
1. Insert Operation-
• Hash function is used to compute the hash value for a key to be inserted.
• Hash value is then used as an index to store the key in the hash table.
In case of collision,
• Using the hash value, that bucket of the hash table is checked.
Linear Probing-
1. When collision occurs, we linearly probe for the next bucket.
2. We keep probing until an empty bucket is found.
Advantage-It is easy to compute.
Disadvantage-
1.The main problem with linear probing is clustering.
2.Many consecutive elements form groups.
3.Then, it takes time to search an element or to find an empty bucket.
Open Addressing Techniques
1. Linear Probing -
Let us consider a simple hash function as “key mod 7” and a sequence of
keys as 50, 700, 76, 85, 92, 73, 101.
Open Addressing Techniques
2. Quadratic Probing:When collision occurs, we probe for i2‘th bucket in ith iteration.
We keep probing until an empty bucket is found.
3.Double Hashing-
We use another hash function hash2(x) and look for i * hash2(x) bucket in ith
iteration.
It requires more computation time as two hash functions need to be computed.
Double hashing is a collision resolving technique in Open Addressed Hash
tables. Double hashing uses the idea of applying a second hash function to key when
a collision occurs.
Separate chaining vs Open Addressing Techniques
100 entry because last two bits of both the entry are 00.
4. Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and
110 entry because last two bits of both the entry are 10.
5. Key 7 are still in B3. The record in B3 pointed by the 111 and 011
entry because last two bits of both the entry are 11.
Extensible/Extendible Hashing
Advantages of Extensible Hashing
1. Inthis method, the performance does not decrease as the
data
grows in the system. It simply increases the size of memory to
accommodate the data.
2. In this method, memory is well utilized as it grows and shrinks
with the data. There will not be any unused memory lying.
3. This method is good for the dynamic database where data grows
and shrinks frequently
Disadvantages of ExtensibleHashing:
1. In this method, if the data size increases then the bucket size is
also increased.
2. In this case, the bucket overflow situation will also occur. But it
might take little time to reach this situation than static hashing
Skip List
A skip list is a probabilistic data structure.
The skip list is used to store a sorted list of elements or data with a linked list.
It allows the process of the elements or data to view efficiently. In one single
step, it skips several elements of the entire list, which is why it is known as a skip
list.
The skip list is an extended version of the linked list.
It allows the user to search, remove, and insert the element very quickly.
It consists of a base list that includes a set of elements which maintains the link
hierarchy of the subsequent elements.
Skip list structure:
It is built in two layers:
The lowest layer
Top layer.
The lowest layer of the skip list is a common sorted linked list
The top layers of the skip list are like an "express line" where the elements are
skipped
Skip List
Let's take an example to understand the working of the skip list. In this example, we have 14
nodes, such that these nodes are divided into two layers, as shown in the diagram.
The lower layer is a common line that links all nodes, and the top layer is an express line that links
only the main nodes, as you can see in the diagram.
Suppose you want to find 47 in this example. You will start the search from the first node of the
express line and continue running on the express line until you find a node that is equal a 47 or more
than 47.
You can see in the example that 47 does not exist in the express line, so you search for a node of less
than 47, which is 40. Now, you go to the normal line with the help of 40, and search the 47,
as shown in the diagram.
Skip List
Skip List Basic Operations:There are the following types of operations in the skip list.
1. Insertion operation: It is used to add a new node to a particular location in a specific
situation.
2. Deletion operation: It is used to delete a node in a specific situation.
3. Search Operation: The searchoperationis used to searcha particular node in a
skip list.
Example 1: Create a skip list, we want to insert these following keys in the empty
skip list.
2. 29 with level 1.
3. 22 with level 4.
4. 9 with level 3.
5. 17 with level 1.
6. 4 with level 2.
Skip List
Example 1: Create a skip list, we want to insert these following
keys in the empty skip list.