Unit 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

UNIT-5

TREES
INTRODUCTION TO TREES:

Tree - Terminology
In linear data structure data is organized in sequential order and in non-linear data structure

data is organized in random order. A tree is a very popular non-linear data structure used in

a wide range of applications. A tree data structure can be defined as follows...

Tree is a non-linear data structure which organizes data in hierarchical structure and

this is a recursive definition.

A tree data structure can also be defined as follows...

Tree data structure is a collection of data (Node) which is organized in hierarchical


structure recursively

In tree data structure, every individual element is called as Node. Node in a tree data
structure stores the actual data of that particular element and link to next element in
hierarchical structure.

In a tree data structure, if we have N number of nodes then we can have a maximum of N-
1 number of links.

Example
Terminology
In a tree data structure, we use the following terminology...

1. Root
In a tree data structure, the first node is called as Root Node. Every tree must have a root
node. We can say that the root node is the origin of the tree data structure. In any tree, there
must be only one root node. We never have multiple root nodes in a tree.

2. Edge
In a tree data structure, the connecting link between any two nodes is called as EDGE. In a
tree with 'N' number of nodes there will be a maximum of 'N-1' number of edges.

3. Parent
In a tree data structure, the node which is a predecessor of any node is called as PARENT
NODE. In simple words, the node which has a branch from it to any other node is called a
parent node. Parent node can also be defined as "The node which has child / children".

4. Child
In a tree data structure, the node which is descendant of any node is called as CHILD Node.
In simple words, the node which has a link from its parent node is called as child node. In a
tree, any parent node can have any number of child nodes. In a tree, all the nodes except root
are child nodes.

5. Siblings
In a tree data structure, nodes which belong to same Parent are called as SIBLINGS. In simple
words, the nodes with the same parent are called Sibling nodes.
6. Leaf
In a tree data structure, the node which does not have a child is called as LEAF Node. In simple
words, a leaf is a node with no child.

In a tree data structure, the leaf nodes are also called as External Nodes. External node is
also a node with no child. In a tree, leaf node is also called as 'Terminal' node.

7. Internal Nodes
In a tree data structure, the node which has atleast one child is called as INTERNAL Node. In
simple words, an internal node is a node with atleast one child.

In a tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root
node is also said to be Internal Node if the tree has more than one node. Internal nodes
are also called as 'Non-Terminal' nodes.
8. Degree
In a tree data structure, the total number of children of a node is called as DEGREE of that
Node. In simple words, the Degree of a node is total number of children it has. The highest
degree of a node among all the nodes in a tree is called as 'Degree of Tree'

9. Level
In a tree data structure, the root node is said to be at Level 0 and the children of root node
are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2 and so on...
In simple words, in a tree each step from top to bottom is called as a Level and the Level count
starts with '0' and incremented by one at each level (Step).
10. Height
In a tree data structure, the total number of edges from leaf node to a particular node in the
longest path is called as HEIGHT of that Node. In a tree, height of the root node is said to
be height of the tree. In a tree, height of all leaf nodes is '0'.

11. Depth
In a tree data structure, the total number of egdes from root node to a particular node is
called as DEPTH of that Node. In a tree, the total number of edges from root node to a leaf
node in the longest path is said to be Depth of the tree. In simple words, the highest depth
of any leaf node in a tree is said to be depth of that tree. In a tree, depth of the root node is
'0'.
12. Path
In a tree data structure, the sequence of Nodes and Edges from one node to another node is
called as PATH between that two Nodes. Length of a Path is total number of nodes in that
path. In below example the path A - B - E - J has length 4.

13. Sub Tree


In a tree data structure, each child from a node forms a subtree recursively. Every child node
will form a subtree on its parent node.
Tree Representations
A tree data structure can be represented in two methods. Those methods are as follows...

1. List Representation

2. Left Child - Right Sibling Representation

Consider the following tree...

1. List Representation
In this representation, we use two types of nodes one for representing the node with data

called 'data node' and another for representing only references called 'reference node'. We

start with a 'data node' from the root node in the tree. Then it is linked to an internal node
through a 'reference node' which is further linked to any other node directly. This process

repeats for all the nodes in the tree.

The above example tree can be represented using List representation as follows...

2. Left Child - Right Sibling Representation

In this representation, we use a list with one type of node which consists of three fields namely

Data field, Left child reference field and Right sibling reference field. Data field stores the

actual value of a node, left reference field stores the address of the left child and right

reference field stores the address of the right sibling node. Graphical representation of that

node is as follows...

In this representation, every node's data field stores the actual value of that node. If that node

has left a child, then left reference field stores the address of that left child node otherwise

stores NULL. If that node has the right sibling, then right reference field stores the address of

right sibling node otherwise stores NULL.

The above example tree can be represented using Left Child - Right Sibling representation as

follows...
Binary Tree Datastructure
In a normal tree, every node can have any number of children. A binary tree is a special type

of tree data structure in which every node can have a maximum of 2 children. One is known

as a left child and the other is known as right child.

A tree in which every node can have a maximum of two children is called Binary Tree.

In a binary tree, every node can have either 0 children or 1 child or 2 children but not more
than 2 children.
Example

There are different types of binary trees and they are...

1. Strictly Binary Tree


In a binary tree, every node can have a maximum of two children. But in strictly binary tree,
every node should have exactly two children or none. That means every internal node must
have exactly two children. A strictly Binary Tree can be defined as follows...

A binary tree in which every node has either two or zero number of children is called
Strictly Binary Tree

Strictly binary tree is also called as Full Binary Tree or Proper Binary Tree or 2-Tree
Strictly binary tree data structure is used to represent mathematical expressions.

Example

2. Complete Binary Tree


In a binary tree, every node can have a maximum of two children. But in strictly binary tree,
every node should have exactly two children or none and in complete binary tree all the nodes
must have exactly two children and at every level of complete binary tree there must be
2level number of nodes. For example at level 2 there must be 22 = 4 nodes and at level 3 there
must be 23 = 8 nodes.

A binary tree in which every internal node has exactly two children and all leaf nodes
are at same level is called Complete Binary Tree.

Complete binary tree is also called as Perfect Binary Tree


3. Extended Binary Tree
A binary tree can be converted into Full Binary tree by adding dummy nodes to existing nodes
wherever required.

The full binary tree obtained by adding dummy nodes to a binary tree is called as
Extended Binary Tree.

In above figure, a normal binary tree is converted into full binary tree by adding dummy nodes
(In pink colour).

Binary Tree Representations


A binary tree data structure is represented using two methods. Those methods are as

follows...

1. Array Representation
2. Linked List Representation

Consider the following binary tree...

1. Array Representation of Binary Tree

In array representation of a binary tree, we use one-dimensional array (1-D Array) to

represent a binary tree.

Consider the above example of a binary tree and it is represented as follows...

To represent a binary tree of depth 'n' using array representation, we need one dimensional

array with a maximum size of 2n + 1.

2. Linked List Representation of Binary Tree

We use a double linked list to represent a binary tree. In a double linked list, every node

consists of three fields. First field for storing left child address, second for storing actual data

and third for storing right child address.

In this linked list representation, a node has the following structure...


The above example of the binary tree represented using Linked list representation is shown

as follows...

Binary Tree Traversals


When we wanted to display a binary tree, we need to follow some order in which all the nodes

of that binary tree must be displayed. In any binary tree, displaying order of nodes depends

on the traversal method.

Displaying (or) visiting order of nodes in a binary tree is called as Binary Tree

Traversal.

There are three types of binary tree traversals.


1. In - Order Traversal

2. Pre - Order Traversal

3. Post - Order Traversal

Consider the following binary tree...

1. In - Order Traversal ( leftChild - root - rightChild )

In In-Order traversal, the root node is visited between the left child and right child. In this

traversal, the left child node is visited first, then the root node is visited and later we go for

visiting the right child node. This in-order traversal is applicable for every root node of all

subtrees in the tree. This is performed recursively for all nodes in the tree.

In the above example of a binary tree, first we try to visit left child of root node 'A', but A's left

child 'B' is a root node for left subtree. so we try to visit its (B's) left child 'D' and again D is a

root for subtree with nodes D, I and J. So we try to visit its left child 'I' and it is the leftmost

child. So first we visit 'I' then go for its root node 'D' and later we visit D's right child 'J'. With

this we have completed the left part of node B. Then visit 'B' and next B's right child 'F' is

visited. With this we have completed left part of node A. Then visit root node 'A'. With this we

have completed left and root parts of node A. Then we go for the right part of the node A. In

right of A again there is a subtree with root C. So go for left child of C and again it is a subtree

with root G. But G does not have left part so we visit 'G' and then visit G's right child K. With

this we have completed the left part of node C. Then visit root node 'C' and next visit C's right
child 'H' which is the rightmost child in the tree. So we stop the process.

That means here we have visited in the order of I - D - J - B - F - A - G - K - C - H using In-Order

Traversal.

In-Order Traversal for above example of binary tree is

I-D-J-B-F-A-G-K-C–H

2. Pre - Order Traversal ( root - leftChild - rightChild )

In Pre-Order traversal, the root node is visited before the left child and right child nodes. In

this traversal, the root node is visited first, then its left child and later its right child. This pre-

order traversal is applicable for every root node of all subtrees in the tree.

In the above example of binary tree, first we visit root node 'A' then visit its left child 'B' which

is a root for D and F. So we visit B's left child 'D' and again D is a root for I and J. So we visit

D's left child 'I' which is the leftmost child. So next we go for visiting D's right child 'J'. With this

we have completed root, left and right parts of node D and root, left parts of node B. Next

visit B's right child 'F'. With this we have completed root and left parts of node A. So we go for

A's right child 'C' which is a root node for G and H. After visiting C, we go for its left

child 'G' which is a root for node K. So next we visit left of G, but it does not have left child so

we go for G's right child 'K'. With this, we have completed node C's root and left parts. Next

visit C's right child 'H' which is the rightmost child in the tree. So we stop the process.

That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order

Traversal.

Pre-Order Traversal for above example binary tree is


A-B-D-I-J-F-C-G-K-H

3. Post - Order Traversal ( leftChild - rightChild - root )

In Post-Order traversal, the root node is visited after left child and right child. In this traversal,

left child node is visited first, then its right child and then its root node. This is recursively

performed until the right most node is visited.

Here we have visited in the order of I - J - D - F - B - K - G - H - C - A using Post-Order Traversal.

Post-Order Traversal for above example binary tree is

I-J-D-F-B-K-G-H-C-A
Program to Create Binary Tree and display using In-Order
Traversal - C Programming
#include<stdio.h>
#include<conio.h>

struct Node{
int data;
struct Node *left;
struct Node *right;
};

struct Node *root = NULL;


int count = 0;

struct Node* insert(struct Node*, int);


void display(struct Node*);

void main(){
int choice, value;
clrscr();
printf("\n----- Binary Tree -----\n");
while(1){
printf("\n***** MENU *****\n");
printf("1. Insert\n2. Display\n3. Exit");
printf("\nEnter your choice: ");
scanf("%d",&choice);
switch(choice){
case 1: printf("\nEnter the value to be insert: ");
scanf("%d", &value);
root = insert(root,value);
break;
case 2: display(root); break;
case 3: exit(0);
default: printf("\nPlease select correct operations!!!\n");
}
}
}

struct Node* insert(struct Node *root,int value){


struct Node *newNode;
newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = value;
if(root == NULL){
newNode->left = newNode->right = NULL;
root = newNode;
count++;
}
else{
if(count%2 != 0)
root->left = insert(root->left,value);
else
root->right = insert(root->right,value);
}
return root;
}
// display is performed by using Inorder Traversal
void display(struct Node *root)
{
if(root != NULL){
display(root->left);
printf("%d\t",root->data);
display(root->right);
}
}

Output
INTRODUCTION TO HASHING AND HASH FUNCTIONS:
HASHING:
Hashing is defined as follows...

Hashing is the process of indexing and retrieving element (data) in a data


structure to provide a faster way of finding the element using a hash
key.q`1 1 `

Hash function is a function which takes a piece of data (i.e. key) as input and produces an
integer (i.e. hash value) as output which maps the data to a particular index in the hash table.

Basic concept of hashing and hash table is shown in the following figure...

(or)

What is Hashing?
Hashing is a technique in which we make use of hash functions to search
or calculate the address at which the value is present in the memory. The
benefit of using this technique is that it has a time complexity of O(1). We
calculate this address and store it in the hash table. We need to take care
that the concept of hashing is applicable on distinct keys.

For example, let the hash function be h(k) = k mod 10 where k represents
the keys. Let the keys be: 12, 11, 14, 17, 16, 15, 18 and 13. The empty
table will look as follows having a table size of 10 and index starting from 0.
To insert values into the hash table, we make use of the hash function.
h(12) = 12mod10 = 2
h(11) = 11mod10 = 1
h(14) = 14mod10 = 4
h(17) = 17mod10 = 7
h(16) = 16mod10 = 6
h(15) = 15mod10 = 5
h(18) = 18mod10 = 8
h(13) = 13mod10 = 3
The output of h(k) is the index at which we need to place the particular
value in the hash table. Thus, after inserting values, the hash table will be:
What is a Hash Function?
A hash function is a function that maps keys to one of the values in the
hash table. Hash functions return the location/address where we can store
the particular key. We input the key to the hash function and the output is
an address. The following diagram depicts the way a hash function
operates:

For example, A possible hash function could be

h(k) = k mod m
Where h(k) denotes the hash function, m denotes the hash table size and k
denotes the keys or values.
Characteristics of an Ideal Hash
Function:
An ideal hash function has the following properties:

• Easy to compute
• For distinct keys, there must be distinct outputs.
• Uniformly distributed keys
Ways to Calculate Hash Functions:
There are three ways through which we can calculate hash functions:

1. Division Method
2. Mid Square Method
3. Folding Method
4. Multiplication Method

1. Division Method:
Say that we have a Hash Table of size 'S', and we want to store a (key, value) pair in the
Hash Table. The Hash Function, according to the Division method, would be:

1. H(key) = key mod M


o Here M is an integer value used for calculating the Hash value, and M should be greater
than S. Sometimes, S is used as M.
o This is the simplest and easiest method to obtain a Hash value.
o The best practice is using this method when M is a prime number, as we can distribute
all the keys uniformly.
o It is also fast as it requires only one computation - modulus.

Let us now take an example to understand the cons of this method:

Size of the Hash Table = 5 (M, S)

Key: Value pairs: {10: "Sudha", 11: "Venkat", 12: "Jeevani"}

For every pair:

1. {10: "Sudha"}
Key mod M = 10 mod 5 = 0
2. {11: "Venkat"}
Key mod M = 11 mod 5 = 1
3. {12: "Jeevani"}
Key mod M = 12 mod 5 = 2

Observe that the Hash values were consecutive. This is the disadvantage of this type
of Hash Function. We get consecutive indexes for consecutive keys, leading to poor
performance due to decreased security. Sometimes, we need to analyze many
consequences while choosing the Hash Table size.

A simple program to demonstrate the mechanism of the


division method:
1. #include<stdio.h>
2. int main()
3. {
4. int size, i, indexes[3];
5. int keys[3] = {10, 11, 12};
6. printf("Enter the size of the Hash Table: ");
7. scanf("%d", &size);
8. int M = size
9. for(i = 0; i < 3; i ++)
10. {
11. indexes[i] = (keys[i] % M);
12. }
13. printf("\nThe indexes of the values in the Hash Table: ");
14. for(i = 0; i < 3; i++)
15. {
16. printf("%d ", indexes[i]);
17. }
18. return 0;
19. }

Output:

Enter the size of the Hash Table: 5


The indexes of the values in the Hash Table: 0 1 2

2. Mid Square Method:


It is a two-step process of computing the Hash value. Given a {key: value} pair, the
Hash Function would be calculated by:
1. Square the key -> key * key
2. Choose some digits from the middle of the number to obtain the Hash value.

We should choose the number of digits to extract based on the size of the Hash Table.
Suppose the Hash Table size is 100; indexes will range from 0 to 99. Hence, we should
select 2 digits from the middle.

Suppose the size of the Hash Table is 10 and the key: value pairs are:

{10: "Sudha, 11: "Venkat", 12: "Jeevani"}

Number of digits to be selected: Indexes: (0 - 9), so 1

H(10) = 10 * 10 = 100 = 0

H(11) = 11 * 11 = 121 = 2

H(12) = 12 * 12 = 144 = 4

o All the digits in the key are utilized to contribute to the index, thus increasing the
performance of the Data Structure.
o If the key is a large value, squaring it further increases the value, which is considered
the con.
o Collisions might occur, too, but we can try to reduce or handle them.
o Another important point here is that, with the huge numbers, we need to take care of
overflow conditions. For suppose, if we take a 6-digit key, we get a 12-digit number
that exceeds the range of defined integers when we square it. We can use the long int
or string multiplication technique.

3. Folding Method
Given a {key: value} pair and the table size is 100 (0 - 99 indexes), the key is broken
down into 2 segments each except the last segment. The last segment can have less
number of digits. Now, the Hash Function would be:

1. H(x) = (sum of equal-sized segments) mod (size of the Hash Table)


o The last carry with fewer digits can be ignored in calculating the Hash value.

For suppose "k" is a 10-digit key and the size of the table is 100(0 - 99), k is divided
into:
sum = (k1k2) + (k3k4) + (k5k6) + (k7k8) + (k9k10)

Now, H(x) = sum % 100

ADVERTISEMENT

Let us now take an example:

The {key: value} pairs: {1234: "Sudha", 5678: "Venkat"}

Size of the table: 100 (0 - 99)

For {1234: "Sudha"}:

1234 = 12 + 34 = 46

46 % 100 = 46

For {5678: "Venkat"}:

5678 = 56 + 78 = 134

134 % 99 = 35
4. Multiplication method
Unlike the three methods above, this method has more steps involved:

1. We must choose a constant between 0 and 1, say, A.


2. Multiply the key with the chosen A.
3. Now, take the fractional part from the product and multiply it by the table size.
4. The Hash will be the floor (only the integer part) of the above result.

So, the Hash Function under this method will be:

1. H(x) = floor(size(key*A mod 1))

For example:

{Key: value} pairs: {1234: "Sudha", 5678: "Venkat"}

Size of the table: 100

A = 0.56

For {1234: "Sudha"}:

H(1234) = floor(size(1234*0.56 mod 1))

= floor(100 * 0.04)

= floor(4) = 4

For {5678: "Venkat"}:

H(5678) = floor(size(5678*0.56 mod 1))

= floor(99 * 0.68)

= floor(67.32)

= 67
o It is considered best practice to use the multiplication method when the Hash Table
size is a power of 2 as it makes the access and all the operations faster.

Basic Operations on hash function:


The basic operations that we can perform on a hash function are:
1. Search: This operation helps to search an element from a hash table.
2. Insert: Insert operation is used to insert values into the hash table.
3. Delete: It is used to delete values from the hash table.

Collision in Hashing:
Collision is said to occur when two keys generate the same value.
Whenever there is more than one key that point/map to the same slot in the
hash table, the phenomenon is called a collision. Thus, it becomes very
important to choose a good hash function so that the hash function does
not generate the same index for multiple keys in the hash table.

For example, let keys be 10, 12, 23, 42, 51 and let the hash function
be h(k) = k mod 10.
Then, h(12) = 12mod10 = 2
And, h(42) = 42mod10 = 2
Thus, both the keys 12 and 42 are generating the same index. So which
value will we store in that particular index as we can store only one of
them? This problem is solved by collision resolution techniques.

Collision Resolution Techniques:


Collision resolution techniques come into use when a collision has occurred
and we need to fix it. Broadly speaking, collision resolution techniques are
divided into two categories:
1. Open addressing
2. Separate chaining

Open addressing also has various further categories used to resolve


collision problems.

Open Addressing:
It is one of the collision resolution techniques. In this technique, we store all
the keys in the table itself. Therefore, the size of the table has to be greater
than or equal to the total number of keys present in the hash function.
Open addressing comprises three different categories: Linear probing,
quadratic probing and double hashing.

1. Linear Probing:
In linear probing, we look for the next empty slot linearly if the allotted slot
is filled already. Let H denote the new hash function that will form after
linear probing. Then,
H(k, i) = [ h(k) + i] mod m
Here, i is the collision number of the key where i = 0, 1, 2, 3, 4…..
And h(k) is the original hash function.
In simple words, if there is a collision at h(k,3) i.e. some value is already
present at that position, then, by linear probing, we will check h(k, 4) and if
it is empty, we will place out value at h(k, 4). If h(k, 4) is not empty, again
h(k, 5) will be checked until we have found an empty place.

Primary clustering:
Linear probing faces the problem of primary clustering. In primary
clustering, we need to traverse the whole cluster every time we wish to
insert a new value in case of collision.

For example, let the hash function be h(k) = k mod 12 and let the keys be
31, 26, 43, 27, 34, 46, 14, 58, 13, 17, 22. Then,
h(31) = 7
h(26) = 2
h(43) = 7
h(27) = 3
h(34) = 10
h(46) = 10
h(14) = 2
h(58) = 10
h(13) = 1
h(17) = 5

Thus, the collision occurs at 43, 46, 14, 58, 22.


Therefore, H(43, 1) = [h(43) + 1] mod 12 = 8mod12 = 8

H(46, 1) = [h(46)+1]mod12 = 11mod12 = 11

H(14, 1) = [h(14)+1]mod12 = 3mod12 = 3


But, index 3 is already filled, therefore, we will solve this for i=2.
H(14, 2) = h(14)+2 mod12 = 4

H(58, 1) = [h(58)+1]mod12 = 11
But 11th index already has a value, so perform hashing for i=2
H(58, 2) = [h(58)+2]mod12 = 0

Next, h(22) has a collision. But slots of index 10, 11, 0, 1, 2, 3, 4, 5 are
already filled. This is a primary cluster.
In order to find a free slot, we need to get through this group of clusters.
This increases the number of comparisons. In the worst case, the number
of comparisons could be m-1 where m = size of the hash table.
To resolve this problem, quadratic probing was introduced.

2. Quadratic Probing:
In quadratic probing, the degree of i is 2 where i is the collision number.
Thus, in quadratic probing, the hash function formed is:

H(k, i) = [h(k) + i2] mod m


However, quadratic probing has a drawback. There are chances of
secondary clustering in quadratic probing.

Secondary clustering:
Let us understand the concept of secondary clustering with the help of an
example.
Let keys = 24, 17, 32, 2, 13, 50, 30, 16 and let m = 11 for h(k) = k mod m.
Then,
h(24) = 2
h(17) = 6
h(32) = 10
h(2) = 2
h(13) = 2
h(50) = 6
h(30) = 8
h(61) = 6

Here, 3 keys are mapping to the same location: 24, 2, 13. Suppose the 2nd
index is already filled, then a collision occurs for all three keys.
To find the next possible value, we will compute H(k) as follows:
H(24, 1) = 3
H(2, 1) = 3
H(13, 1) = 3

If 3rd index is already filled, then we will look for the next possible value.
H(24, 2) = 6
H(2, 2) = 6
H(13, 2) = 6

Similarly, if we check for consequent values, they will produce the same
index.

From here, we can conclude that the keys that are mapped to the same
memory location follow the same collision resolution path. In this example,
the number of free slots is 5 and the number of filled slots are 6. But, every
time we are reaching those 6 filled slots. Due to this, we can’t utilize the
table effectively, as we can’t reach those free slots.
3. Double Hashing:
As the name suggests, in double hashing, we use two hash functions
instead of one. If a collision occurs due to one hash function we can make
use of the other hash function to find the next free slot.

Let H(k) represent the new hash function formed. Then, its value will be:

H(k, i) = [h(k) + i* h’(k)] mod m


Here, h(k) = primary hash function
And, h’(k) = secondary hash function

Separate Chaining:
In separate chaining, we store all the values with the same index with the
help of a linked list. This technique functions by maintaining a list of nodes.

For example, let the keys be 100, 200, 25, 125, 76, 86, 96 and let m = 10.
Given, h(k) = k mod 10
Then, h(100) = 100 mod 10 = 0
h(200) = 200 mod 10 = 0
h(25) = 25 mod 10 = 5
h(125) = 125 mod 10 = 5
h(76) = 76 mod 10 = 6
h(86) = 86 mod 10 = 6
h(96) = 96 mod 10 = 6

Then, the values will be as follows and the linked list will be maintained.
Drawback of Separate chaining: The use of a linked list requires extra
pointers. For n keys, there will be n extra pointers. If the table size is m,
then the number of pointers will be (m+n).

HASH TABLES:
Hash Table is defined as follows...

Hash table is just an array which maps a key (data) into the data structure with the
help of hash function such that insertion, deletion and search operations are
performed with constant time complexity (i.e. O(1)).

Hash tables are used to perform insertion, deletion and search operations very quickly in a
data structure. Using hash table concept, insertion, deletion, and search operations are
accomplished in constant time complexity. Generally, every hash table makes use of a
function called hash function to map the data into the hash table.
// Implementing hash table in C

#include <stdio.h>

#include <stdlib.h>

struct set

int key;

int data;

};

struct set *array;

int capacity = 10;

int size = 0;

int hashFunction(int key)

return (key % capacity);

int checkPrime(int n)

{
int i;

if (n == 1 || n == 0)

return 0;

for (i = 2; i < n / 2; i++)

if (n % i == 0)

return 0;

return 1;

int getPrime(int n)

if (n % 2 == 0)

n++;

while (!checkPrime(n))

n += 2;

return n;

void init_array()

capacity = getPrime(capacity);

array = (struct set *)malloc(capacity * sizeof(struct set));

for (int i = 0; i < capacity; i++)


{

array[i].key = 0;

array[i].data = 0;

void insert(int key, int data)

int index = hashFunction(key);

if (array[index].data == 0)

array[index].key = key;

array[index].data = data;

size++;

printf("\n Key (%d) has been inserted \n", key);

else if (array[index].key == key)

array[index].data = data;

else

printf("\n Collision occured \n");

void remove_element(int key)

int index = hashFunction(key);

if (array[index].data == 0)

{
printf("\n This key does not exist \n");

else

array[index].key = 0;

array[index].data = 0;

size--;

printf("\n Key (%d) has been removed \n", key);

void display()

int i;

for (i = 0; i < capacity; i++)

if (array[i].data == 0)

printf("\n array[%d]: / ", i);

else

printf("\n key: %d array[%d]: %d \t", array[i].key, i, array[i].data);

int size_of_hashtable()

return size;

}
int main()

int choice, key, data, n;

int c = 0;

init_array();

do

printf("1.Insert item in the Hash Table"

"\n2.Remove item from the Hash Table"

"\n3.Check the size of Hash Table"

"\n4.Display a Hash Table"

"\n\n Please enter your choice: ");

scanf("%d", &choice);

switch (choice)

case 1:

printf("Enter key -:\t");

scanf("%d", &key);

printf("Enter data -:\t");

scanf("%d", &data);

insert(key, data);

break;

case 2:

printf("Enter the key to delete-:");

scanf("%d", &key);
remove_element(key);

break;

case 3:

n = size_of_hashtable();

printf("Size of Hash Table is-:%d\n", n);

break;

case 4:

display();

break;

default:

printf("Invalid Input\n");

printf("\nDo you want to continue (press 1 for yes): ");

scanf("%d", &c);

} while (c == 1);

Basic Operations
1. Insert (key, value): Adds a new key-value pair to the hashtable.
2. Search (key): Retrieves the value associated with the given key.
3. Delete (key): Removes the key-value pair from the hashtable.
Insert Operation
Example: Insert the key-value pair ("apple", 50)

1. Compute the Hash: Use the hash function on the key to compute an index.
2. Store the Value: Place the value at the computed index in the array.

Diagram:

Search Operation
Example: Search for the key "apple"

1. Compute the Hash: Apply the hash function to the key to find the index.
2. Retrieve the Value: Look up the value at the computed index.

Diagram:
Delete Operation
Example: Delete the key "apple"

1. Compute the Hash: Apply the hash function to the key to find the index.
2. Remove the Entry: Remove the key-value pair at the computed index.

Diagram:

• applications of hashing in unique


identifier generation
hashing algorithms are widely used in generating unique identifiers due to their
efficiency, consistency, and ability to handle large data sets. Here are some key
applications of hashing in unique identifier generation:

1. UUID Generation:

• Universally Unique Identifiers (UUIDs): Hash functions are used to


generate UUIDs, which are 128-bit values designed to be unique across
time and space. This is particularly useful in distributed systems where
uniqueness is crucial to avoid identifier collisions.

2. Database Indexing:

• Primary Keys: Hash functions can generate unique primary keys for
database records. This ensures efficient data retrieval and minimizes the
chances of collisions.
• Indexing: Hash-based indexing techniques, like hash tables, provide
quick data access by mapping keys to data locations.

3. Version Control Systems:

• Commit Hashes: Systems like Git use SHA-1 (and more recently SHA-
256) to create unique hashes for commits, ensuring each state of the
codebase is uniquely identified and can be referenced or reverted.

4. Cryptographic Applications:

• Digital Signatures: Hash functions create unique identifiers for data


that can be signed and verified, ensuring data integrity and
authenticity.
• Tokenization: In security applications, sensitive data can be hashed to
create unique tokens that protect the original data while ensuring each
token is unique.

5. Content Addressable Storage:

• Data Deduplication: Hashing is used to create unique fingerprints for


data blocks or files. Identical data blocks generate the same hash,
allowing systems to store a single copy and save storage space.
• Content Delivery Networks (CDNs): Files are identified and retrieved
based on their hash, ensuring efficient and accurate content
distribution.
6. Distributed Hash Tables (DHTs):

• Peer-to-Peer Networks: Hashing enables efficient lookup and storage


of data across nodes in a peer-to-peer network by mapping keys to
node addresses. Examples include systems like BitTorrent and
Kademlia.

7. Authentication Systems:

• Password Hashing: User passwords are hashed before storage. During


authentication, the input password is hashed and compared to the
stored hash to verify user identity.
• Session Identifiers: Hash functions generate unique session tokens to
track user sessions securely.

8. URL Shortening Services:

• Shortened URLs: Hash functions can create unique short codes for
long URLs. These codes serve as unique identifiers that redirect to the
original URLs.

9. Blockchain and Cryptocurrencies:

• Block Hashes: Each block in a blockchain is uniquely identified by a


hash of its contents. This ensures the integrity and immutability of the
blockchain.
• Transaction IDs: Individual transactions within a blockchain are hashed
to generate unique transaction identifiers.

10. Data Integrity and Verification:

• Checksums: Hashes are used to generate checksums that verify the


integrity of files and data transfers. Any alteration in the data results in
a different hash, signaling potential corruption or tampering.

CACHING:
Caching is a technique used in data structures to store frequently accessed data in a
way that allows for faster retrieval. The primary goal of caching is to improve
performance by reducing the time required to access data. Here are some common
applications and implementations of caching in various data structures:
1. Hash Tables
• Key-Value Storage: Hash tables are used to implement caches where the key is the
identifier for the cached data, and the value is the data itself. Hash functions map
keys to indices in an array, providing average O(1) time complexity for lookups.
• Example: A web browser cache uses a hash table to store the mapping of URLs to
web page data.
2. LRU (Least Recently Used) Cache
• Eviction Policy: LRU cache evicts the least recently accessed item when the cache
reaches its capacity. It can be implemented using a combination of a doubly linked
list and a hash map.
• Data Structures Used:
• Doubly Linked List: Maintains the order of access, with the most recently
accessed item at the front and the least recently accessed item at the back.
• Hash Map: Provides O(1) access to the nodes in the linked list.
• Example: CPU caches often use the LRU policy to manage a limited number of cache
lines.
3. LFU (Least Frequently Used) Cache
• Eviction Policy: LFU cache evicts the least frequently accessed items. It can be
implemented using a min-heap or a combination of a hash map and a frequency list.
• Data Structures Used:
• Hash Map: Maps keys to their corresponding values and frequency counts.
• Min-Heap: Tracks the least frequently accessed items.
• Example: Some database systems use LFU caches to manage frequently accessed
query results.
4. MRU (Most Recently Used) Cache
• Eviction Policy: MRU cache evicts the most recently accessed item. This policy is less
common but can be useful in specific scenarios where recent accesses are less likely
to be needed again soon.
• Data Structures Used: Similar to LRU, it can use a doubly linked list and a hash map
but with a different eviction strategy.
• Example: In certain types of multimedia applications, MRU caching might be
beneficial.
5. Two-Level Cache
• Combined Policies: This approach uses two different caching strategies, typically
combining LRU and LFU, to balance the benefits of both.
• Data Structures Used:
• Two Separate Caches: Each implemented with appropriate data structures
(e.g., hash maps, linked lists) for LRU and LFU policies.
• Example: Operating systems often use a two-level cache to manage virtual memory
pages.
6. ARC (Adaptive Replacement Cache)
• Adaptive Policy: ARC dynamically adjusts between LRU and LFU policies based on
the workload.
• Data Structures Used:
• Two LRU Lists: One for recent accesses and one for frequent accesses.
• Hash Map: Tracks the presence of items in the cache.
• Example: Some advanced file systems and database systems implement ARC for
efficient caching.
7. Priority Queue Cache
• Priority-Based Eviction: Items in the cache are evicted based on their priority, which
can be determined by various factors like access time, frequency, or cost.
• Data Structures Used:
• Heap/Priority Queue: Manages the priority of items.
• Hash Map: Provides fast access to cache items.
• Example: Network routing caches may use priority-based eviction to manage routing
information.
8. Bloom Filter Cache
• Probabilistic Data Structure: A Bloom filter is used to quickly check if an item is
possibly in the cache, reducing unnecessary lookups to slower storage layers.
• Data Structures Used:
• Bloom Filter: Provides fast, probabilistic membership testing.
• Hash Map: Stores the actual cached items.
• Example: Web proxy servers use Bloom filters to cache URLs and reduce redundant
requests to the origin server.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy