Unit 5
Unit 5
Unit 5
TREES
INTRODUCTION TO TREES:
Tree - Terminology
In linear data structure data is organized in sequential order and in non-linear data structure
data is organized in random order. A tree is a very popular non-linear data structure used in
Tree is a non-linear data structure which organizes data in hierarchical structure and
In tree data structure, every individual element is called as Node. Node in a tree data
structure stores the actual data of that particular element and link to next element in
hierarchical structure.
In a tree data structure, if we have N number of nodes then we can have a maximum of N-
1 number of links.
Example
Terminology
In a tree data structure, we use the following terminology...
1. Root
In a tree data structure, the first node is called as Root Node. Every tree must have a root
node. We can say that the root node is the origin of the tree data structure. In any tree, there
must be only one root node. We never have multiple root nodes in a tree.
2. Edge
In a tree data structure, the connecting link between any two nodes is called as EDGE. In a
tree with 'N' number of nodes there will be a maximum of 'N-1' number of edges.
3. Parent
In a tree data structure, the node which is a predecessor of any node is called as PARENT
NODE. In simple words, the node which has a branch from it to any other node is called a
parent node. Parent node can also be defined as "The node which has child / children".
4. Child
In a tree data structure, the node which is descendant of any node is called as CHILD Node.
In simple words, the node which has a link from its parent node is called as child node. In a
tree, any parent node can have any number of child nodes. In a tree, all the nodes except root
are child nodes.
5. Siblings
In a tree data structure, nodes which belong to same Parent are called as SIBLINGS. In simple
words, the nodes with the same parent are called Sibling nodes.
6. Leaf
In a tree data structure, the node which does not have a child is called as LEAF Node. In simple
words, a leaf is a node with no child.
In a tree data structure, the leaf nodes are also called as External Nodes. External node is
also a node with no child. In a tree, leaf node is also called as 'Terminal' node.
7. Internal Nodes
In a tree data structure, the node which has atleast one child is called as INTERNAL Node. In
simple words, an internal node is a node with atleast one child.
In a tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root
node is also said to be Internal Node if the tree has more than one node. Internal nodes
are also called as 'Non-Terminal' nodes.
8. Degree
In a tree data structure, the total number of children of a node is called as DEGREE of that
Node. In simple words, the Degree of a node is total number of children it has. The highest
degree of a node among all the nodes in a tree is called as 'Degree of Tree'
9. Level
In a tree data structure, the root node is said to be at Level 0 and the children of root node
are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2 and so on...
In simple words, in a tree each step from top to bottom is called as a Level and the Level count
starts with '0' and incremented by one at each level (Step).
10. Height
In a tree data structure, the total number of edges from leaf node to a particular node in the
longest path is called as HEIGHT of that Node. In a tree, height of the root node is said to
be height of the tree. In a tree, height of all leaf nodes is '0'.
11. Depth
In a tree data structure, the total number of egdes from root node to a particular node is
called as DEPTH of that Node. In a tree, the total number of edges from root node to a leaf
node in the longest path is said to be Depth of the tree. In simple words, the highest depth
of any leaf node in a tree is said to be depth of that tree. In a tree, depth of the root node is
'0'.
12. Path
In a tree data structure, the sequence of Nodes and Edges from one node to another node is
called as PATH between that two Nodes. Length of a Path is total number of nodes in that
path. In below example the path A - B - E - J has length 4.
1. List Representation
1. List Representation
In this representation, we use two types of nodes one for representing the node with data
called 'data node' and another for representing only references called 'reference node'. We
start with a 'data node' from the root node in the tree. Then it is linked to an internal node
through a 'reference node' which is further linked to any other node directly. This process
The above example tree can be represented using List representation as follows...
In this representation, we use a list with one type of node which consists of three fields namely
Data field, Left child reference field and Right sibling reference field. Data field stores the
actual value of a node, left reference field stores the address of the left child and right
reference field stores the address of the right sibling node. Graphical representation of that
node is as follows...
In this representation, every node's data field stores the actual value of that node. If that node
has left a child, then left reference field stores the address of that left child node otherwise
stores NULL. If that node has the right sibling, then right reference field stores the address of
The above example tree can be represented using Left Child - Right Sibling representation as
follows...
Binary Tree Datastructure
In a normal tree, every node can have any number of children. A binary tree is a special type
of tree data structure in which every node can have a maximum of 2 children. One is known
A tree in which every node can have a maximum of two children is called Binary Tree.
In a binary tree, every node can have either 0 children or 1 child or 2 children but not more
than 2 children.
Example
A binary tree in which every node has either two or zero number of children is called
Strictly Binary Tree
Strictly binary tree is also called as Full Binary Tree or Proper Binary Tree or 2-Tree
Strictly binary tree data structure is used to represent mathematical expressions.
Example
A binary tree in which every internal node has exactly two children and all leaf nodes
are at same level is called Complete Binary Tree.
The full binary tree obtained by adding dummy nodes to a binary tree is called as
Extended Binary Tree.
In above figure, a normal binary tree is converted into full binary tree by adding dummy nodes
(In pink colour).
follows...
1. Array Representation
2. Linked List Representation
To represent a binary tree of depth 'n' using array representation, we need one dimensional
We use a double linked list to represent a binary tree. In a double linked list, every node
consists of three fields. First field for storing left child address, second for storing actual data
as follows...
of that binary tree must be displayed. In any binary tree, displaying order of nodes depends
Displaying (or) visiting order of nodes in a binary tree is called as Binary Tree
Traversal.
In In-Order traversal, the root node is visited between the left child and right child. In this
traversal, the left child node is visited first, then the root node is visited and later we go for
visiting the right child node. This in-order traversal is applicable for every root node of all
subtrees in the tree. This is performed recursively for all nodes in the tree.
In the above example of a binary tree, first we try to visit left child of root node 'A', but A's left
child 'B' is a root node for left subtree. so we try to visit its (B's) left child 'D' and again D is a
root for subtree with nodes D, I and J. So we try to visit its left child 'I' and it is the leftmost
child. So first we visit 'I' then go for its root node 'D' and later we visit D's right child 'J'. With
this we have completed the left part of node B. Then visit 'B' and next B's right child 'F' is
visited. With this we have completed left part of node A. Then visit root node 'A'. With this we
have completed left and root parts of node A. Then we go for the right part of the node A. In
right of A again there is a subtree with root C. So go for left child of C and again it is a subtree
with root G. But G does not have left part so we visit 'G' and then visit G's right child K. With
this we have completed the left part of node C. Then visit root node 'C' and next visit C's right
child 'H' which is the rightmost child in the tree. So we stop the process.
Traversal.
I-D-J-B-F-A-G-K-C–H
In Pre-Order traversal, the root node is visited before the left child and right child nodes. In
this traversal, the root node is visited first, then its left child and later its right child. This pre-
order traversal is applicable for every root node of all subtrees in the tree.
In the above example of binary tree, first we visit root node 'A' then visit its left child 'B' which
is a root for D and F. So we visit B's left child 'D' and again D is a root for I and J. So we visit
D's left child 'I' which is the leftmost child. So next we go for visiting D's right child 'J'. With this
we have completed root, left and right parts of node D and root, left parts of node B. Next
visit B's right child 'F'. With this we have completed root and left parts of node A. So we go for
A's right child 'C' which is a root node for G and H. After visiting C, we go for its left
child 'G' which is a root for node K. So next we visit left of G, but it does not have left child so
we go for G's right child 'K'. With this, we have completed node C's root and left parts. Next
visit C's right child 'H' which is the rightmost child in the tree. So we stop the process.
That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order
Traversal.
In Post-Order traversal, the root node is visited after left child and right child. In this traversal,
left child node is visited first, then its right child and then its root node. This is recursively
I-J-D-F-B-K-G-H-C-A
Program to Create Binary Tree and display using In-Order
Traversal - C Programming
#include<stdio.h>
#include<conio.h>
struct Node{
int data;
struct Node *left;
struct Node *right;
};
void main(){
int choice, value;
clrscr();
printf("\n----- Binary Tree -----\n");
while(1){
printf("\n***** MENU *****\n");
printf("1. Insert\n2. Display\n3. Exit");
printf("\nEnter your choice: ");
scanf("%d",&choice);
switch(choice){
case 1: printf("\nEnter the value to be insert: ");
scanf("%d", &value);
root = insert(root,value);
break;
case 2: display(root); break;
case 3: exit(0);
default: printf("\nPlease select correct operations!!!\n");
}
}
}
Output
INTRODUCTION TO HASHING AND HASH FUNCTIONS:
HASHING:
Hashing is defined as follows...
Hash function is a function which takes a piece of data (i.e. key) as input and produces an
integer (i.e. hash value) as output which maps the data to a particular index in the hash table.
Basic concept of hashing and hash table is shown in the following figure...
(or)
What is Hashing?
Hashing is a technique in which we make use of hash functions to search
or calculate the address at which the value is present in the memory. The
benefit of using this technique is that it has a time complexity of O(1). We
calculate this address and store it in the hash table. We need to take care
that the concept of hashing is applicable on distinct keys.
For example, let the hash function be h(k) = k mod 10 where k represents
the keys. Let the keys be: 12, 11, 14, 17, 16, 15, 18 and 13. The empty
table will look as follows having a table size of 10 and index starting from 0.
To insert values into the hash table, we make use of the hash function.
h(12) = 12mod10 = 2
h(11) = 11mod10 = 1
h(14) = 14mod10 = 4
h(17) = 17mod10 = 7
h(16) = 16mod10 = 6
h(15) = 15mod10 = 5
h(18) = 18mod10 = 8
h(13) = 13mod10 = 3
The output of h(k) is the index at which we need to place the particular
value in the hash table. Thus, after inserting values, the hash table will be:
What is a Hash Function?
A hash function is a function that maps keys to one of the values in the
hash table. Hash functions return the location/address where we can store
the particular key. We input the key to the hash function and the output is
an address. The following diagram depicts the way a hash function
operates:
h(k) = k mod m
Where h(k) denotes the hash function, m denotes the hash table size and k
denotes the keys or values.
Characteristics of an Ideal Hash
Function:
An ideal hash function has the following properties:
• Easy to compute
• For distinct keys, there must be distinct outputs.
• Uniformly distributed keys
Ways to Calculate Hash Functions:
There are three ways through which we can calculate hash functions:
1. Division Method
2. Mid Square Method
3. Folding Method
4. Multiplication Method
1. Division Method:
Say that we have a Hash Table of size 'S', and we want to store a (key, value) pair in the
Hash Table. The Hash Function, according to the Division method, would be:
1. {10: "Sudha"}
Key mod M = 10 mod 5 = 0
2. {11: "Venkat"}
Key mod M = 11 mod 5 = 1
3. {12: "Jeevani"}
Key mod M = 12 mod 5 = 2
Observe that the Hash values were consecutive. This is the disadvantage of this type
of Hash Function. We get consecutive indexes for consecutive keys, leading to poor
performance due to decreased security. Sometimes, we need to analyze many
consequences while choosing the Hash Table size.
Output:
We should choose the number of digits to extract based on the size of the Hash Table.
Suppose the Hash Table size is 100; indexes will range from 0 to 99. Hence, we should
select 2 digits from the middle.
Suppose the size of the Hash Table is 10 and the key: value pairs are:
H(10) = 10 * 10 = 100 = 0
H(11) = 11 * 11 = 121 = 2
H(12) = 12 * 12 = 144 = 4
o All the digits in the key are utilized to contribute to the index, thus increasing the
performance of the Data Structure.
o If the key is a large value, squaring it further increases the value, which is considered
the con.
o Collisions might occur, too, but we can try to reduce or handle them.
o Another important point here is that, with the huge numbers, we need to take care of
overflow conditions. For suppose, if we take a 6-digit key, we get a 12-digit number
that exceeds the range of defined integers when we square it. We can use the long int
or string multiplication technique.
3. Folding Method
Given a {key: value} pair and the table size is 100 (0 - 99 indexes), the key is broken
down into 2 segments each except the last segment. The last segment can have less
number of digits. Now, the Hash Function would be:
For suppose "k" is a 10-digit key and the size of the table is 100(0 - 99), k is divided
into:
sum = (k1k2) + (k3k4) + (k5k6) + (k7k8) + (k9k10)
ADVERTISEMENT
1234 = 12 + 34 = 46
46 % 100 = 46
5678 = 56 + 78 = 134
134 % 99 = 35
4. Multiplication method
Unlike the three methods above, this method has more steps involved:
For example:
A = 0.56
= floor(100 * 0.04)
= floor(4) = 4
= floor(99 * 0.68)
= floor(67.32)
= 67
o It is considered best practice to use the multiplication method when the Hash Table
size is a power of 2 as it makes the access and all the operations faster.
Collision in Hashing:
Collision is said to occur when two keys generate the same value.
Whenever there is more than one key that point/map to the same slot in the
hash table, the phenomenon is called a collision. Thus, it becomes very
important to choose a good hash function so that the hash function does
not generate the same index for multiple keys in the hash table.
For example, let keys be 10, 12, 23, 42, 51 and let the hash function
be h(k) = k mod 10.
Then, h(12) = 12mod10 = 2
And, h(42) = 42mod10 = 2
Thus, both the keys 12 and 42 are generating the same index. So which
value will we store in that particular index as we can store only one of
them? This problem is solved by collision resolution techniques.
Open Addressing:
It is one of the collision resolution techniques. In this technique, we store all
the keys in the table itself. Therefore, the size of the table has to be greater
than or equal to the total number of keys present in the hash function.
Open addressing comprises three different categories: Linear probing,
quadratic probing and double hashing.
1. Linear Probing:
In linear probing, we look for the next empty slot linearly if the allotted slot
is filled already. Let H denote the new hash function that will form after
linear probing. Then,
H(k, i) = [ h(k) + i] mod m
Here, i is the collision number of the key where i = 0, 1, 2, 3, 4…..
And h(k) is the original hash function.
In simple words, if there is a collision at h(k,3) i.e. some value is already
present at that position, then, by linear probing, we will check h(k, 4) and if
it is empty, we will place out value at h(k, 4). If h(k, 4) is not empty, again
h(k, 5) will be checked until we have found an empty place.
Primary clustering:
Linear probing faces the problem of primary clustering. In primary
clustering, we need to traverse the whole cluster every time we wish to
insert a new value in case of collision.
For example, let the hash function be h(k) = k mod 12 and let the keys be
31, 26, 43, 27, 34, 46, 14, 58, 13, 17, 22. Then,
h(31) = 7
h(26) = 2
h(43) = 7
h(27) = 3
h(34) = 10
h(46) = 10
h(14) = 2
h(58) = 10
h(13) = 1
h(17) = 5
H(58, 1) = [h(58)+1]mod12 = 11
But 11th index already has a value, so perform hashing for i=2
H(58, 2) = [h(58)+2]mod12 = 0
Next, h(22) has a collision. But slots of index 10, 11, 0, 1, 2, 3, 4, 5 are
already filled. This is a primary cluster.
In order to find a free slot, we need to get through this group of clusters.
This increases the number of comparisons. In the worst case, the number
of comparisons could be m-1 where m = size of the hash table.
To resolve this problem, quadratic probing was introduced.
2. Quadratic Probing:
In quadratic probing, the degree of i is 2 where i is the collision number.
Thus, in quadratic probing, the hash function formed is:
Secondary clustering:
Let us understand the concept of secondary clustering with the help of an
example.
Let keys = 24, 17, 32, 2, 13, 50, 30, 16 and let m = 11 for h(k) = k mod m.
Then,
h(24) = 2
h(17) = 6
h(32) = 10
h(2) = 2
h(13) = 2
h(50) = 6
h(30) = 8
h(61) = 6
Here, 3 keys are mapping to the same location: 24, 2, 13. Suppose the 2nd
index is already filled, then a collision occurs for all three keys.
To find the next possible value, we will compute H(k) as follows:
H(24, 1) = 3
H(2, 1) = 3
H(13, 1) = 3
If 3rd index is already filled, then we will look for the next possible value.
H(24, 2) = 6
H(2, 2) = 6
H(13, 2) = 6
Similarly, if we check for consequent values, they will produce the same
index.
From here, we can conclude that the keys that are mapped to the same
memory location follow the same collision resolution path. In this example,
the number of free slots is 5 and the number of filled slots are 6. But, every
time we are reaching those 6 filled slots. Due to this, we can’t utilize the
table effectively, as we can’t reach those free slots.
3. Double Hashing:
As the name suggests, in double hashing, we use two hash functions
instead of one. If a collision occurs due to one hash function we can make
use of the other hash function to find the next free slot.
Let H(k) represent the new hash function formed. Then, its value will be:
Separate Chaining:
In separate chaining, we store all the values with the same index with the
help of a linked list. This technique functions by maintaining a list of nodes.
For example, let the keys be 100, 200, 25, 125, 76, 86, 96 and let m = 10.
Given, h(k) = k mod 10
Then, h(100) = 100 mod 10 = 0
h(200) = 200 mod 10 = 0
h(25) = 25 mod 10 = 5
h(125) = 125 mod 10 = 5
h(76) = 76 mod 10 = 6
h(86) = 86 mod 10 = 6
h(96) = 96 mod 10 = 6
Then, the values will be as follows and the linked list will be maintained.
Drawback of Separate chaining: The use of a linked list requires extra
pointers. For n keys, there will be n extra pointers. If the table size is m,
then the number of pointers will be (m+n).
HASH TABLES:
Hash Table is defined as follows...
Hash table is just an array which maps a key (data) into the data structure with the
help of hash function such that insertion, deletion and search operations are
performed with constant time complexity (i.e. O(1)).
Hash tables are used to perform insertion, deletion and search operations very quickly in a
data structure. Using hash table concept, insertion, deletion, and search operations are
accomplished in constant time complexity. Generally, every hash table makes use of a
function called hash function to map the data into the hash table.
// Implementing hash table in C
#include <stdio.h>
#include <stdlib.h>
struct set
int key;
int data;
};
int size = 0;
int checkPrime(int n)
{
int i;
if (n == 1 || n == 0)
return 0;
if (n % i == 0)
return 0;
return 1;
int getPrime(int n)
if (n % 2 == 0)
n++;
while (!checkPrime(n))
n += 2;
return n;
void init_array()
capacity = getPrime(capacity);
array[i].key = 0;
array[i].data = 0;
if (array[index].data == 0)
array[index].key = key;
array[index].data = data;
size++;
array[index].data = data;
else
if (array[index].data == 0)
{
printf("\n This key does not exist \n");
else
array[index].key = 0;
array[index].data = 0;
size--;
void display()
int i;
if (array[i].data == 0)
else
int size_of_hashtable()
return size;
}
int main()
int c = 0;
init_array();
do
scanf("%d", &choice);
switch (choice)
case 1:
scanf("%d", &key);
scanf("%d", &data);
insert(key, data);
break;
case 2:
scanf("%d", &key);
remove_element(key);
break;
case 3:
n = size_of_hashtable();
break;
case 4:
display();
break;
default:
printf("Invalid Input\n");
scanf("%d", &c);
} while (c == 1);
Basic Operations
1. Insert (key, value): Adds a new key-value pair to the hashtable.
2. Search (key): Retrieves the value associated with the given key.
3. Delete (key): Removes the key-value pair from the hashtable.
Insert Operation
Example: Insert the key-value pair ("apple", 50)
1. Compute the Hash: Use the hash function on the key to compute an index.
2. Store the Value: Place the value at the computed index in the array.
Diagram:
Search Operation
Example: Search for the key "apple"
1. Compute the Hash: Apply the hash function to the key to find the index.
2. Retrieve the Value: Look up the value at the computed index.
Diagram:
Delete Operation
Example: Delete the key "apple"
1. Compute the Hash: Apply the hash function to the key to find the index.
2. Remove the Entry: Remove the key-value pair at the computed index.
Diagram:
1. UUID Generation:
2. Database Indexing:
• Primary Keys: Hash functions can generate unique primary keys for
database records. This ensures efficient data retrieval and minimizes the
chances of collisions.
• Indexing: Hash-based indexing techniques, like hash tables, provide
quick data access by mapping keys to data locations.
• Commit Hashes: Systems like Git use SHA-1 (and more recently SHA-
256) to create unique hashes for commits, ensuring each state of the
codebase is uniquely identified and can be referenced or reverted.
4. Cryptographic Applications:
7. Authentication Systems:
• Shortened URLs: Hash functions can create unique short codes for
long URLs. These codes serve as unique identifiers that redirect to the
original URLs.
CACHING:
Caching is a technique used in data structures to store frequently accessed data in a
way that allows for faster retrieval. The primary goal of caching is to improve
performance by reducing the time required to access data. Here are some common
applications and implementations of caching in various data structures:
1. Hash Tables
• Key-Value Storage: Hash tables are used to implement caches where the key is the
identifier for the cached data, and the value is the data itself. Hash functions map
keys to indices in an array, providing average O(1) time complexity for lookups.
• Example: A web browser cache uses a hash table to store the mapping of URLs to
web page data.
2. LRU (Least Recently Used) Cache
• Eviction Policy: LRU cache evicts the least recently accessed item when the cache
reaches its capacity. It can be implemented using a combination of a doubly linked
list and a hash map.
• Data Structures Used:
• Doubly Linked List: Maintains the order of access, with the most recently
accessed item at the front and the least recently accessed item at the back.
• Hash Map: Provides O(1) access to the nodes in the linked list.
• Example: CPU caches often use the LRU policy to manage a limited number of cache
lines.
3. LFU (Least Frequently Used) Cache
• Eviction Policy: LFU cache evicts the least frequently accessed items. It can be
implemented using a min-heap or a combination of a hash map and a frequency list.
• Data Structures Used:
• Hash Map: Maps keys to their corresponding values and frequency counts.
• Min-Heap: Tracks the least frequently accessed items.
• Example: Some database systems use LFU caches to manage frequently accessed
query results.
4. MRU (Most Recently Used) Cache
• Eviction Policy: MRU cache evicts the most recently accessed item. This policy is less
common but can be useful in specific scenarios where recent accesses are less likely
to be needed again soon.
• Data Structures Used: Similar to LRU, it can use a doubly linked list and a hash map
but with a different eviction strategy.
• Example: In certain types of multimedia applications, MRU caching might be
beneficial.
5. Two-Level Cache
• Combined Policies: This approach uses two different caching strategies, typically
combining LRU and LFU, to balance the benefits of both.
• Data Structures Used:
• Two Separate Caches: Each implemented with appropriate data structures
(e.g., hash maps, linked lists) for LRU and LFU policies.
• Example: Operating systems often use a two-level cache to manage virtual memory
pages.
6. ARC (Adaptive Replacement Cache)
• Adaptive Policy: ARC dynamically adjusts between LRU and LFU policies based on
the workload.
• Data Structures Used:
• Two LRU Lists: One for recent accesses and one for frequent accesses.
• Hash Map: Tracks the presence of items in the cache.
• Example: Some advanced file systems and database systems implement ARC for
efficient caching.
7. Priority Queue Cache
• Priority-Based Eviction: Items in the cache are evicted based on their priority, which
can be determined by various factors like access time, frequency, or cost.
• Data Structures Used:
• Heap/Priority Queue: Manages the priority of items.
• Hash Map: Provides fast access to cache items.
• Example: Network routing caches may use priority-based eviction to manage routing
information.
8. Bloom Filter Cache
• Probabilistic Data Structure: A Bloom filter is used to quickly check if an item is
possibly in the cache, reducing unnecessary lookups to slower storage layers.
• Data Structures Used:
• Bloom Filter: Provides fast, probabilistic membership testing.
• Hash Map: Stores the actual cached items.
• Example: Web proxy servers use Bloom filters to cache URLs and reduce redundant
requests to the origin server.