Assignment 6
Assignment 6
Hashing is a process of converting data (like a string, number, or even file) into a fixed-size value or a "hash" using a
mathematical algorithm, known as a hash function. The resulting hash is unique to the original input, making it useful for
data retrieval, encryption, and ensuring data integrity. In hash-based data structures like hash tables, hashing allows for fast
data access and search operations.
Advantages of Hashing:
1. Efficient Data Retrieval: Hashing enables faster data access by generating a unique key for each data item. This
allows for constant-time complexity O(1)O(1)O(1) for search, insert, and delete operations, which is efficient in
scenarios involving large data sets.
2. Data Integrity and Security: Hashing is widely used in security for password storage and data verification. A
unique hash for each input ensures that unauthorized changes to data can be detected since even a minor change in
input results in a completely different hash.
3. Space Efficiency: Hash tables can store a large amount of data efficiently by reducing the need for complex tree or
list structures, allowing for faster access without using significant memory.
These qualities help ensure both performance and security in hashing applications.
Q3. Explain hash clash and its resolving techniques.
A hash clash (or hash collision) occurs when two different keys produce the same hash value, resulting in both keys being
mapped to the same index in a hash table. Collisions are inevitable in hashing since multiple keys can potentially map to a
limited number of indices in the hash table. Resolving collisions effectively is crucial to maintain the efficiency of hash-
based data structures.
Q4. Define Hash Clash. Explain Primary Clustering, secondary clustering, rehashing and double hashing.
Hash Clash (Hash Collision) is a situation in hashing where two different keys produce the same hash value and are
mapped to the same index in a hash table. Collisions are common because a hash function maps a large set of possible keys
to a smaller set of indices.
Types of Clustering
1. Primary Clustering:
o Primary clustering occurs in hashing when multiple elements are hashed to the same index, and the
chosen collision resolution method causes them to occupy contiguous slots in the hash table.
o This commonly happens in linear probing, where, after a collision, the algorithm checks the next
sequential slot (index + 1) until it finds an empty space.
o Problem: Primary clustering creates "clusters" or groups of occupied slots that get larger as more
elements are inserted, increasing the likelihood of collisions for future inserts in nearby slots.
2. Secondary Clustering:
o Secondary clustering happens when keys that hash to the same initial position follow the same probing
sequence, leading to repeated clustering patterns for keys with similar hashes.
o This can occur in quadratic probing, where, despite a non-linear search pattern, keys with the same
hash still end up in predictable positions.
o Problem: It can still lead to clusters in specific areas of the hash table, though not as severely as primary
clustering.
1. Rehashing:
o Rehashing involves creating a new, larger hash table and rehashing all the existing elements into it when
the current table becomes too full or collisions are too frequent.
o How It Works: The hash function is usually adjusted, and each element in the old table is reinserted into
the new one according to the new hash function. This reduces the load factor, minimizing collisions.
o Advantages: Reduces clustering and improves access time.
o Disadvantages: Computationally expensive as all elements need to be rehashed.
2. Double Hashing:
o Double hashing uses a second hash function to calculate the interval for probing, reducing clustering
issues by introducing more randomness in slot selection after a collision.
o How It Works: The formula for finding the next slot after a collision is hash1(key) + i *
hash2(key), where i is the number of attempts and hash2(key) is the second hash function.
o Advantages: Significantly reduces both primary and secondary clustering by spreading out keys more
effectively.
o Disadvantages: Slightly more complex to implement, and finding a suitable second hash function can be
challenging.
Q5. What is Collision ? explain Collision Resolution techniques with suitable examples.
Collision in hashing occurs when two distinct keys produce the same hash value, leading them to map to the same index in a
hash table. Since only one item can be stored per index in a standard hash table, this conflict needs to be resolved to maintain
data integrity and retrieval efficiency.
Let's insert each key into the hash table of length 10 using open addressing with linear probing and the hash function
h(k)=kmod 10h(k) = k \mod 10h(k)=kmod10. The table initially starts empty, and if a collision occurs, we use linear probing
to find the next available slot.
1. Insert 12:
o h(12)=12mod 10=2h(12) = 12 \mod 10 = 2h(12)=12mod10=2
o Slot 2 is empty, so insert 12 at index 2.
2. Insert 18:
o h(18)=18mod 10=8h(18) = 18 \mod 10 = 8h(18)=18mod10=8
o Slot 8 is empty, so insert 18 at index 8.
3. Insert 13:
o h(13)=13mod 10=3h(13) = 13 \mod 10 = 3h(13)=13mod10=3
o Slot 3 is empty, so insert 13 at index 3.
4. Insert 2:
o h(2)=2mod 10=2h(2) = 2 \mod 10 = 2h(2)=2mod10=2
o Slot 2 is already occupied by 12, so we use linear probing to find the next available slot.
o Slot 3 is also occupied, so we check slot 4, which is empty.
o Insert 2 at index 4.
5. Insert 3:
o h(3)=3mod 10=3h(3) = 3 \mod 10 = 3h(3)=3mod10=3
o Slot 3 is occupied by 13, so we use linear probing.
o Slots 4 and 5 are checked sequentially; slot 5 is empty.
o Insert 3 at index 5.
6. Insert 23:
o h(23)=23mod 10=3h(23) = 23 \mod 10 = 3h(23)=23mod10=3
o Slot 3 is occupied by 13. Using linear probing, we check slots 4, 5, and 6, which is empty.
o Insert 23 at index 6.
7. Insert 5:
o h(5)=5mod 10=5h(5) = 5 \mod 10 = 5h(5)=5mod10=5
o Slot 5 is occupied by 3, so we use linear probing.
o Slot 6 is occupied, so we check slot 7, which is empty.
o Insert 5 at index 7.
8. Insert 15:
o h(15)=15mod 10=5h(15) = 15 \mod 10 = 5h(15)=15mod10=5
o Slot 5 is occupied by 3. Using linear probing, we check slots 6, 7, 8, and finally slot 9, which is empty.
o Insert 15 at index 9.
To construct the hash table with chaining to resolve collisions, we'll use the simplest hash function, which is:
This means each integer will be placed in a slot based on the remainder when divided by 5. Since collisions are resolved by
chaining, each slot in the table will contain a linked list to store multiple values if needed.
Let's go through each integer and insert it into the appropriate slot in the hash table.
Given integers: 1, 2, 3, 4, 5, 10, 21, 22, 33, 34, 15, 32, 31, 48, 49, 50
Step-by-Step Insertion
1. h(1)=1mod 5=1
o Insert 1 in slot 1.
2. h(2)=2mod 5=2
o Insert 2 in slot 2.
3. h(3)=3mod 5=3
o Insert 3 in slot 3.
4. h(4)=4mod 5=4
o Insert 4 in slot 4.
5. h(5)=5mod 5=0
o Insert 5 in slot 0.
6. h(10)=10mod 5=0
o Slot 0 already has 5, so add 10 to the chain at slot 0.
7. h(21)=21mod 5=1
o Slot 1 already has 1, so add 21 to the chain at slot 1.
8. h(22)=22mod 5=2
o Slot 2 already has 2, so add 22 to the chain at slot 2.
9. h(33)=33mod 5=3
o Slot 3 already has 3, so add 33 to the chain at slot 3.
10. h(34)=34mod 5=4
o Slot 4 already has 4, so add 34 to the chain at slot 4.
11. h(15)=15mod 5=0
o Slot 0 has a chain of [5, 10], so add 15 to the chain at slot 0.
12. h(32)=32mod 5=2
o Slot 2 has a chain of [2, 22], so add 32 to the chain at slot 2.
13. h(31)=31mod 5=1
o Slot 1 has a chain of [1, 21], so add 31 to the chain at slot 1.
14. h(48)=48mod 5=3
o Slot 3 has a chain of [3, 33], so add 48 to the chain at slot 3.
15. h(49)=49mod 5=4
o Slot 4 has a chain of [4, 34], so add 49 to the chain at slot 4.
16. h(50)=50mod 5=0
o Slot 0 has a chain of [5, 10, 15], so add 50 to the chain at slot 0.