0% found this document useful (0 votes)
17 views

Dictionaries: Sets

Uploaded by

22wh1a1231
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Dictionaries: Sets

Uploaded by

22wh1a1231
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

ADVANCED DATA STRUCTURES UNIT - I CSE

DICTIONARIES

A dictionary is a container of elements from a totally ordered universe that supports the
basic operations of inserting/deleting elements and searching for a given element.
In this chapter, first, we introduce the abstract data type Set which includes dictionaries,
priority queues, etc. as subclasses.

Sets:
The set is the most fundamental data model of mathematics.
A set is a collection of well defined elements. The members of a set are all different.
There are special operations that are commonly performed on sets, such as Union,
intersection, difference.

1. The union of two sets S and T, denoted S ∪ T, is the set containing the
elements that are in S or T, or both.
2. The intersection of sets S and T, written S ∩ T, is the set containing the
elements that are in both S and T.
3. The difference of sets S and T, denoted S − T, is the set containing those
elements that are in S but not in T.

For example:
Let S be the set {1, 2, 3} and T the set {3, 4, 5}. Then
S ∪ T = {1, 2, 3, 4, 5}, S ∩ T = {3}, and S − T = {1, 2}

Set implementation:
Possible data structures include
 Bit Vector
 Array
 Linked List
o Unsorted
o Sorted

Dictionaries:
A dictionary is a dynamic set ADT with the operations:
 Search(S, k) – an access operation that returns a pointer x to an element where x.key
=k
 Insert(S, x) – a manipulation operation that adds the element pointed to by x to S

KHIT Page 3
ADVANCED DATA STRUCTURES UNIT - I CSE

 Delete(S, x) – a manipulation operation that removes the element pointed to by x


from S
Dictionaries store elements so that they can be located quickly using keys
It is useful in implementing symbol tables, text retrieval systems, database systems, page
mapping tables, etc.

Implementation:
1. Fixed Length arrays
2. Linked lists: sorted, unsorted, skip-lists
3. Hash Tables: open, closed
4. Trees
 Binary Search Trees (BSTs)
 Balanced BSTs
o AVL Trees
o Red-Black Trees
 Splay Trees
 Multiway Search Trees
o 2-3 Trees
o B Trees
 Tries
Let n be the number of elements is a dictionary D. The following is a summary of the
performance of some basic implementation methods:
Worst case complexity of

O(n) O(n) O(n) O(n)

O(n) O(n) O(n) O(1)

O(n) O(n) O(n) O(n)

Among these, the sorted list has the best average case performance.
In this chapter, we discuss two data structures for dictionaries, namely Hash Tables and Skip
Lists.

KHIT Page 4
ADVANCED DATA STRUCTURES UNIT - I CSE

HASHING
Division Method:

One common method of determining a hash key of the division method of hashing

The formula that will be used is:

H(key) = key % no.of slots in the table

i.e. h(key) = key mod array size

For example:

Consider a table with 8 slots. i.e. array size 8.

Hash key = key % table size

The key values are 36, 18, 72, 43, 6, 42

The division method is generally a reasonable strategy, unless the key happens to have
some undesirable properties.

Note: if the table size is 10 and all of the keys end in zero.

In the above example 42 mod 8 => 2, it’s already filled position in the hash table. This is
known as collision. i.e. two or more record keys map to the same array index.

In This case, the choice of hash function and table size needs to be carefully considered. The
best table sizes are prime numbers.

KHIT Page 5
ADVANCED DATA STRUCTURES UNIT - I CSE

Multiplication method:

The simplest situation when the keys are floating –point numbers known to be in affixed
range.

For example:

If the keys are numbers that are greater than 0 and less than 1, we can just multiply by m
(table size) and round off to the nearest integer to get an address between 0 an m-1.

Algorithm:

1. Choose constant A in the range 0 < A < 1.


2. Multiply key k by A.
3. Extract the fractional part of k*A
4. Multiply the fractional part by number of slots, m.
5. Take the floor of the result.

Mathematically

h( k ) = ⌊m · (k·A mod 1)⌋

where k·A mod 1 = k·A − ⌊k·A⌋ = fractional part of k·A

 Disadvantage: Slower than division method.


 Advantage: Value of m is not critical.

Example:

m = 8 (implies m = 23, p = 3)

w=5

k = 21

Must have 0 < s < 25; choose s = 13 → A = 13/32.

 h(k) = ⌊m · (k·A mod 1)⌋

h( 21 ) = ⌊8 · (21·13/32 mod 1)⌋ = 4

k·A = 21 · 13/32 = 273/32 = 8 17/32 → k·A mod 1 = 17/32

→ m · ( k·A mod 1) = 8 · 17/32 = 17/4 =


4 1/4 → ⌊m · ( k·A mod 1)⌋ = 4

So that h(21) = 4.

KHIT Page 6
ADVANCED DATA STRUCTURES UNIT - I CSE

Example:
m = 8 (implies m = 23, p = 3)
w=5
k = 21
s = 13
k·s = 21 · 13

= 273

= 8 · 25 + 17

= r1 . r0

r1 = 8 · 25
r0 = 17 = 100012
Written in w = 5 bits, r0 = 100012 The p = 3 most significant bits of r0 is 1002 or 410,
so h(21) = 4.

Exercise Example:

m = 4 (implies m = 22, p = 2)
w=3
k = 12
s=5 0 < s < 2w = 23 = 8
k·s = 12 · 5

=?

= ? · 23 + ?

= r1 . r0

r1 = ? · 23
r0 = ? = ?2
 Written in w = 3 bits, r0 = ?2
 The p = 2 most significant bits of r0 is ?2 or ?10, so h(12) = ?.

KHIT Page 7
ADVANCED DATA STRUCTURES UNIT - I CSE

Universal method:

Hashing is a fun idea that has lots of unexpected uses. Here, we look at a novel type of hash
function that makes it easy to create a family of universal hash functions. The method is
based on a random binary matrix and is very simple to implement.

The idea is very simple. Suppose you have an input data item that you have input data with
m – bits and you want a hash function that produces n – bits then first generate a random
binary matrix (M) of order nxm.

The hash function is h(x) = Mx Where x to be a binary vector

For example, Suppose you have a key 11, the binary form is 1011 and it is a four bit input
value(m) and want to generate output a three bit hash value(n).

Then generate a random matrix gives say:

(0100)
M = (1011)
(1101)
and if the data value was 1011 the hash value would be computed as:

( 0 1 0 0 ) (1) (0)
h(x) = Mx = ( 1 0 1 1 ) (0) = (1)
( 1 1 0 1 ) (1) (0)
(1)
There are a number of other ways to look at the way the arithmetic is done that suggest
different ways of implementing the algorithm.

The first is to notice that what you are doing is anding each row with the data column
vector. That is taking the second row as an example: ( 1 0 1 1 )And (1 0 1 1) = (1 0 1 1)
and then you add up the bits in the result: 1+0+1+1=1

now the index is 010, convert that into decimal is 2.

There is no.of other ways to look at the way the arithmetic is done that suggest different
ways of implementing the algorithm.

Hashing gives an alternative approach that is often the fastest and most convenient way of
solving these problems like AI – search programs, cryptography, networks, complexity
theory.

KHIT Page 8
ADVANCED DATA STRUCTURES UNIT - I CSE

Collision Resolution Techniques:

In general, a hashing function can map several keys into the same address. That leads to a
collision. The colliding records must be stored and accessed as determined by a collision –
resolution techniques.
There are two broad classes of such techniques:

Open Hashing (also called separate chaining) and


Closed Hashing (also called open addressing)

The difference between the two has to do with whether collision are stored outside the
table (open hashing), or whether collision result in storing and of the records at another slot
in the table (closed hashing).

The particular hashing method that one uses depends on many factors. One important
factor is the ratio of the no.of keys in the table to the no.of hash addresses. It is called load
factor, and is given by: Load factor (α) = n/m,
where n is no.of keys in the table and m is no.of hash address (table size)

Open Hashing:

The simplest form of open hashing defines each slot in the hash table to be the head of a
linked list. All records that hash to a particular slot are placed on that slot’s linked list.

The below figure illustrates a hash table where each slot stores one record and a link pointer
to the rest of the list.

Consider the same example of division method:

KHIT Page 9
ADVANCED DATA STRUCTURES UNIT - I CSE

Any key that hash to the same index are simply added to the linked list; there is no need to
search for empty cells in the array. This method is called separating chaining.

Closed Hashing (Open addressing):

It resolves collisions in the prime area – that is that contains all of the home addresses.
i.e. when a data item cannot be placed at the index calculated by the hash function, another
location in the array is sought.

There are different methods of open addressing, which vary in the method used to find the
next vacant cell.
They are (i) Linear probing
(ii) Quadratic probing
(iii) Pseudo random probing
(iv) Double hashing
(v) Key – offset

Hashing with Linear probe:

We resolve the collision by adding 1 to the current address.


Assuming that the table is not full
We apply division method of hashing

Consider the example:

Add the keys 10, 5, and 15 to the above table


H(k) = k mod table size
10 mod 8 = 2 a collision, so add 1 to the address then check is it empty or filled. If it is filled
then apply the same function, like this we can place this key 10 in the index 5 cell.

If the physical end of the table is reached during the linear search will wrap around to the
beginning of the table and continue from there.

If an empty slot is not found before reaching the point of collision; the table is full

KHIT Page 10
ADVANCED DATA STRUCTURES UNIT - I CSE

A problem with the linear probe method is that it is possible for blocks of data to form when
collisions are resolved. This is known as primary clustering.
This means that any key that hashes into the cluster will require several attempts to resolve
the collision.

Linear probes have two advantages: First, They are quite simple to implement. Second, data
tend to remain near their home address.

Exercise Example:
Insert the nodes 89, 18, 49, 58, and 69 into a hash table that holds 10 items using the
division method.

Hashing with Quadratic probe:

In this probe, rather than always moving one spot, move ‘ i 2 ‘ spots from the point of
collision, where ‘i’ is the no. of attempts to resolve the collision.

In linear probe, if the primary hash index is x, subsequent probe go to x+1, x+2, x+3 and so
on. In Quadratic probing, probes go to x+1, x+4, x+9, and so on, the distance from the initial
probe is the square of the step number: x+12, x+22, x+32, x+42 and so on.

i.e., at first it picks the adjacent cell, if that is occupied, it tries 4 cells away, if that is
occupied it tries 9 cells away, and so on. It eliminates the primary clustering problem with
linear probe.
Consider the above exercise problem, keys 89, 18, 49, 58, 69 with table size 10.

Here each key that hashes to same location will require a longer probe. This phenomenon is
called secondary clustering. It is not a serious problem, but quadratic probing is not often
used because there is a slightly better solution.

KHIT Page 11
ADVANCED DATA STRUCTURES UNIT - I CSE

Hashing with Pseudo – random probing:

This method uses pseudo – random number to resolve the collision i.e. this probe function
would select the next position on the probe sequence at random from among the unvisited
slots that is the probe sequence should be a random permutation of hash table positions.

Unfortunately, we can’t actually select the next position in the probe sequence at random,
because we would not be able to duplicate this same probe sequence when searching for
the key.

In this probing, the ith slot in the probe sequence is (h (key) + r1) mod M) where r1 is the ith
value in the random permutation of the numbers from 1 to M-1.
All insertions and searches use the same sequence of random numbers.

Consider the same example of division method:

36 % 8 = 4
18 % 8 = 2
72 % 8 = 0 now insert 60
43 % 8 = 3 60 % 8 = 4; is a collision
6 % 8 = 6

The Pseudo – random permutation to use is: 0 6 5 2 3 4 1 7


For collision resolution

Current slot = ( 4 + 0 ) % 8 = 4; searching slot 4 and it is occupied

Again, Current slot = ( 4 + 6 ) % 8 = 2; occupied

Current slot = ( 2 + 5 ) % 8 = 1; it is empty; key 60 is occupies slot 1

Pseudo random numbers are a relatively simple solution, but they have one significant
limitation all keys follow only one collision resolution path through the list. Because this
collision resolution can create significant secondary clustering

KHIT Page 12
ADVANCED DATA STRUCTURES UNIT - I CSE

Double Hashing:

Double hashing uses the idea of applying a second hash function to the key when a collision
occurs. The result of the second hash function will be the number of positions from the
point of collision to insert.

There are a couple of requirements for the second function:


 It must never evaluate to 0
 Must make sure that all cells can be probed
A popular second hash function is : H2(key) = R – (key % R)
Where R is a prime number that is similar than the size of the table

Table size = 10
Hash1 (key) = key % 10
Hash2 (key) = 7 – (key % 7)
Because 7 is a prime number than the size of the table

Insert keys: 89, 18, 49, 58, 69

Hash (89) = 89 % 10 = 9
Hash (18) = 18 % 10 = 8

Hash1 (49) = 49 % 10 = 9; it’s a collision


Hash2 (49) = 7 – (49 % 7) = 7; positions from location 9

Hash1 (58) = 58 % 10 =8; it’s a collision


Hash2 (58) = 7 – (58 % 7) = 5; positions from location 8

NOTE:

Linear probing  ‘m’ distinct probe sequences, primary clustering


Quadratic probing  ‘m’ distinct probe sequences, no primary but secondary clustering
Double hashing  ‘m2’ distinct probe sequences, no primary and secondary clustering

KHIT Page 13
ADVANCED DATA STRUCTURES UNIT - I CSE

Key – offset:

It is double hashing method that produces different collision paths for different keys. Where
as the pseudo random – number generator produces a new address as a function of the
previous address; key offset calculates the new address as a function of the old address and
key.

One of the simplest versions simply adds the quotient of key divided by the list size to the
address to determine the next collision resolution address, as shown below

Offset = key / list size


Address = ( ( offset + old address) modulo list size ) )

For example:

The key is 166702 and list size is 307, using modulo - division hashing method generates an
address of 1. It’s a collision because there was a key 070918.

Using key offset to calculate the next address, we get 237 as shown below

Offset = 166702 / 307 = 543


Address = ( ( 543 + 001) ) modulo 307 = 237

If 237 were also a collision, we would repeat the process to locate the next address, as
shown below

Offset = 166702 / 307 = 543


Address = ( ( 543 + 237) ) modulo 307 = 166

If it is free, then place the key in this address.

KHIT Page 14
ADVANCED DATA STRUCTURES UNIT - I CSE

Skip Lists:

Skip list is a type of data structure that can be used as an alternative to balanced (binary)
trees or B-Trees. As compared to a binary tree, skip lists allow quick search, insertions and
deletions of elements with simple algorithms. This is achieved by using probabilistic
balancing rather than strictly enforce balancing as done in B-trees.

Skip lists are also significantly faster than equivalent algorithms for B-Trees.

A skip list is basically a linked list with additional pointers that allow intermediate nodes to
be skipped, hence the name skip list.

In a simple linked list that consists of ‘n’ elements, to perform a search ‘n’ comparisons are
required in the worst case.

For example:

If a second pointer pointing two nodes ahead is added to every node, the number of
comparisons goes down to n/2+1 in the worst case.

Consider a stored list where every other node has an additional pointer, to the node two a
head of it in the list.

Here, every other node has an additional pointer.

Next, every second node has a pointer two ahead of it

KHIT Page 15
ADVANCED DATA STRUCTURES UNIT - I CSE

In the list of above figure, every second node has a pointer two ahead of it; every fourth

node has a pointer four ahead if it. Here we need to examine no more than +2
nodes.

In below figure, (every (2i)th node has a pointer (2i) node ahead (i = 1, 2,...); then the number

of nodes to be examined can be reduced to log2n while only doubling the number of
pointers.

Here, Every (2i)th node has a pointer to a node (2i)nodes ahead (i = 1, 2,...)

 A node that has k forward pointers is called a level k node. If every (2i)th node has a
pointer (2i) nodes ahead, then
# of level 1 nodes 50 %
# of level 2 nodes 25 %
# of level 3 nodes 12.5 %
 Such a data structure can be used for fast searching but insertions and deletions will
be extremely cumbersome, since levels of nodes will have to change.
 What would happen if the levels of nodes were randomly chosen but in the same
proportions (below figure)?
o level of a node is chosen randomly when the node is inserted
o A node's ith pointer, instead of pointing to a node that is 2 i - 1 nodes ahead,
points to the next node of level i or higher.
o In this case, insertions and deletions will not change the level of any node.
o Some arrangements of levels would give poor execution times but it can be
shown that such arrangements are rare.
Such a linked representation is called a skip list.
 Each element is represented by a node the level of which is chosen randomly when
the node is inserted, without regard for the number of elements in the data
structure.
 A level i node has i forward pointers, indexed 1 through i. There is no need to store
the level of a node in the node.
 Maxlevel is the maximum number of levels in a node.
o Level of a list = Maxlevel
o Level of empty list = 1
o Level of header = Maxlevel

KHIT Page 16
ADVANCED DATA STRUCTURES UNIT - I CSE

It is a skip list

Initialization:
An element NIL is allocated and given a key greater than any legal key. All levels of all lists
are terminated with NIL. A new list is initialized so that the level of list = maxlevel and all
forward pointers of the list's header point to NIL

Search:
We search for an element by traversing forward pointers that do not overshoot the node
containing the element being searched for. When no more progress can be made at the
current level of forward pointers, the search moves down to the next level. When we can
make no more progress at level 1, we must be immediately in front of the node that
contains the desired element (if it is in the list).

Insertion and Deletion:


 Insertion and deletion are through search and splice
 update [i] contains a pointer to the rightmost node of level i or higher that is to the
left of the location of insertion or deletion.
 If an insertion generates a node with a level greater than the previous maximum
level, we update the maximum level and initialize appropriate portions of update list.
 After a deletion, we check to see if we have deleted the maximum level element of
the list and if so, decrease the maximum level of the list.
 Below figure provides an example of Insert and Delete. The pseudo - code for Insert
and Delete is shown below.

KHIT Page 17
ADVANCED DATA STRUCTURES UNIT - I CSE

Analysis of Skip lists:

In a skiplist of 16 elements, we may have

 9 elements at level 1
 3 elements at level 2
 3 elements at level 3
 1 element at level 6

One important question is:


Where do we start our search? Analysis shows we should start from level L(n) where

L(n) = log2n

In general if p is the probability fraction,

L(n) = log n

where p is the fraction of the nodes with level i pointers which also have level (i + 1)
pointers.

 However, starting at the highest level does not alter the efficiency in a significant
way.
 Another important question to ask is:
What should be MaxLevel? A good choice is

MaxLevel = L(N) = log N

where N is an upperbound on the number of elements is a skiplist.

 Complexity of search, delete, insert is dominated by the time required to search for
the appropriate element. This in turn is proportional to the length of the search
path. This is determined by the pattern in which elements with different levels
appear as we traverse the list.

 Insert and delete involve additional cost proportional to the level of the node being
inserted or deleted.

KHIT Page 18
ADVANCED DATA STRUCTURES UNIT - II CSE

BALANCED TREES
Introduction:

Tree:

A Tree consists of a finite set of elements, called nodes, and set of directed lines called
branches, that connect the nodes.

The no. of branches associated with a node is the degree of the node.
i.e. in - degree and out - degree.

 A leaf is any node with an out degree of zero. i.e. no successor


 A node that is not a root or leaf is known as an internal node
 A node is a parent if it has successor node, conversely a node with a predecessor is
called child
 A path is a sequence of nodes in which each node is adjacent to the next one
 The level of a node is its distance from the root
 The height of the tree is the level of the leaf in the longest path from the root plus 1
 A sub tree is any connected structure below the root. Sub tree can also be further
divided into sub trees

Binary tree:

A binary tree is a tree in which no node can have more than two sub trees designated as the
left sub tree and the right sub tree.

Note: each sub tree is itself a binary tree.

Balance factor:

The balance factor of a binary tree is the difference in height between its left and right sub
trees. i.e. Balance factor = HL - HR

In a balanced binary tree, the height of its sub trees differs by no more than one (its balance
factor is -1, 0, +1) and also its sub trees are also balanced.

KHIT Page 19
ADVANCED DATA STRUCTURES UNIT - II CSE

We now turn our attention to operations: search, insertion, deletion

In the design of the linear list structure, we had two choices: an array or a linked list

 The array structure provides a very efficient search algorithm, but its insertion and
deletion algorithm are very inefficient.
 The linked list structure provides efficient insertion and deletion, but its search
algorithm is very inefficient.

What we need is a structure that provides an efficient search, at the same time efficient
insertion and deletion algorithms.

The binary search tree and the AVL tree provide that structure.

Binary search tree:

A binary search tree (BST) is a binary tree with the following properties:
 All items in the left sub tree are less than the root.
 All items in the right sub tree are greater than or equal to the root.
 Each sub tree is itself a binary search tree.

While the binary search tree is simple and easy to understand, it has one major problem:
It is not balance.

To over come this problem, AVL trees are designed, which are balanced.

AVL TREES

In 1962, two Russian mathematicians, G.M Adelson – velskii and E.M Landis, aerated the
balanced binary tree structure that is named after them – the AVL tree.

An AVL tree is a search tree in which the heights of the sub trees differ by no more than 1. It
is thus a balanced binary tree.

An AVL tree is a binary tree that either is empty or consists of two AVL sub tree, T L and TR,
whose heights differ by no more than 1.

| HL - HR | < = 1
Where HL is the height of the left sub tree, HR is the height of the right sub tree
The bar symbols indicate absolute value.

NOTE: An AVL tree is a height balanced binary search tree.

KHIT Page 20
ADVANCED DATA STRUCTURES UNIT - II CSE

Consider an example: AVL tree

AVL Tree Balance factor:

The Balance factor for any node in an AVL tree must be +1, 0, -1.

We use the descriptive identifiers


 LH for left high (+1) to indicate that the length sub tree is higher than the right sub
tree
 EH for even high (0) to indicate that the sub tree are the same height
 RH for right high (-1) to indicate that the left sub tree is shortest than the right sub
tree

Balancing Trees:

When ever we insert a node into a tree or delete a node from a tree, the resulting tree may
be unbalanced then we must rebalance it.

AVL trees are balanced by rotating nodes either to the left or to the right.

Now, we are going to discuss the basic balancing algorithms. They are

1. Left of left:
A sub tree of a tree that is left high has also become left high
2. Right of right:
A sub tree of a tree that is right high has also become right high
3. Right of left:
A sub tree of a tree that is left high has become right high
4. Left of right:
A sub tree of a tree that is right high has become left high

KHIT Page 21
ADVANCED DATA STRUCTURES UNIT - II CSE

Left of left:

When the out – of – balance condition has been created by a left high sub tree of a left high
tree, we must balance the tree by rotating the out – of – balance node to the right.

Let’s begin with a Simple case:

Complex case:

NOTE:

In the above two cases, we have single rotation right.

KHIT Page 22
ADVANCED DATA STRUCTURES UNIT - II CSE

Right of Right:

This case is the mirror of previous case. It contains a simple left rotation.

Simple case: here, simple left rotation

Complex case: here, complex left rotation

NOTE:

In the above two cases, we have single rotation left.

KHIT Page 23
ADVANCED DATA STRUCTURES UNIT - II CSE

Right of Left:

The above two types required single rotation to balance the trees. Now we study about two
out – of – balance conditions in which we need to rotate two nodes, one to the left and one
to the right to balance the trees.

Simple case: simple double rotation right.

Here, an out – of – balance tree in which the root is left high and left sub tree is right high –
a right of left tree.

To balance this tree, we first rotate the left sub tree to the left, then we rotate the root to
the right, making the left node the new root.

Complex case: complex double rotation right.

KHIT Page 24
ADVANCED DATA STRUCTURES UNIT - II CSE

Left of Right:

This case is also complicated

Simple case: simple double rotation

Complex case:

NOTE:

In both cases, i.e. Right of left and Left of right, we have double rotations.

KHIT Page 25
ADVANCED DATA STRUCTURES UNIT - II CSE

Maximum Height of an AVL Tree:

What is the maximum height of an AVL tree having exactly n nodes? To answer this
question, we will pose the following question:

What is the minimum number of nodes (sparsest possible AVL tree) an AVL tree of height h
can have ?

Let Fh be an AVL tree of height h, having the minimum number of nodes. Fh can be
visualized as in Figure.

Let Fl and Fr be AVL trees which are the left subtree and right subtree, respectively,
of Fh. Then Fl or Fr must have height h-2.

Suppose Fl has height h-1 so that Fr has height h-2. Note that Fr has to be an AVL tree
having the minimum number of nodes among all AVL trees with height of h-1.
Similarly, Fr will have the minimum number of nodes among all AVL trees of height
h--2. Thus we have
| Fh| = | Fh - 1| + | Fh - 2| + 1
where | Fr| denotes the number of nodes in Fr. Such trees are called Fibonacci trees.
See Figure. Some Fibonacci trees are shown in Figure 4.20. Note that | F0| = 1 and |
F1| = 2.

Adding 1 to both sides, we get


| Fh| + 1 = (| Fh - 1| + 1) + (| Fh - 2| + 1)

Thus the numbers | Fh| + 1 are Fibonacci numbers. Using the approximate formula for
Fibonacci numbers, we get

| Fh| + 1

h 1.44log| Fn|

The sparsest possible AVL tree with n nodes has height


h 1.44log n

The worst case height of an AVL tree with n nodes is


1.44log n

KHIT Page 26
ADVANCED DATA STRUCTURES UNIT - II CSE

Figure 5.3: Fibonacci trees

Figure 5.4: Rotations in a binary search tree

KHIT Page 27
ADVANCED DATA STRUCTURES UNIT - II CSE

2 – 3 TREES

The basic idea behind maintaining a search tree is to make the insertion, deletion and
searching operations efficient.

In AVL Trees the searching operation is efficient. However, insertion & deletion involves
rotation that makes the operation complicated.

To eliminate this complication a data structure was designed, called as 2 - 3 trees.

To build a 2 – 3 tree there are certain rules that need to be followed. These rules are as
follows:

 All the non – leaf nodes in a 2 – 3 tree must always have two or three non – empty
child nodes that are again 2 – 3 trees.

 The level of all the leaf nodes must always be the same.

 One single node can contain (left and right) then that node contains single data. The
data occurring on left sub tree of that node is less than the data of the node and the
data occurring on right sub tree of that node is greater than the data of the node.

 If any node has three children (left, middle, right) then that node contains two data
values, let say i and j where i < j, the data of all the nodes on the middle sub tree are
greater than i but less than j and the data of all nodes on the right sub tree are
greater than j.

Example of 2 – 3 Trees:

KHIT Page 28
ADVANCED DATA STRUCTURES UNIT - II CSE

Insertions in 2 – 3 Trees:

Let us now try to understand the process of insertion of a value in 2 – 3 trees. To insert a
value in a 2 – 3 trees we first need to search the position where the value can be inserted,
and then the value and node in which the value is to be inserted are adjusted.

Algorithm:

 Insert new leaf in appropriate place

 Repeat until all non leaf nodes have 2 or 3 children


 If there is a node with 4 children, split the parent into two parent nodes, with 2
children each
 If you split the root, then add a new root

 Adjust search values along insertion path

Example:

Insert 5  5

Insert 21 

Insert 8 

Insert 63 

KHIT Page 29
ADVANCED DATA STRUCTURES UNIT - II CSE

Insert 69 

Insert 32 

Insert 7, 9, 25 

KHIT Page 30
ADVANCED DATA STRUCTURES UNIT - II CSE

Deletions in 2 – 3 Trees:

Deletion of a value from a 2 – 3 trees is exactly opposite to insertion

In case of insertion the node where the data is to be inserted is split if it already contains
maximum no. of values. But in case of deletion, two nodes are merged if the node of the
value to be deleted contains minimum number of values (i.e. only one value)

Example 1:

Consider a 2 – 3 trees

Delete 47 

Delete 63 

KHIT Page 31
ADVANCED DATA STRUCTURES UNIT - II CSE

Example 2:

Delete 47 

The Resultant 2 – 3 After deletion of 47 :

KHIT Page 32
ADVANCED DATA STRUCTURES UNIT - III CSE

PRIRORITY QUEUES
A priority queue is an important data type in computer science. Major operations supported
by priority queues are Inserting and Delete min.

Insert, which does the obvious thing; and Delete min, which finds, returns, and removes the
minimum element in the priority queue.

The priority queues are extensive use in.

 Implementing schedulers in OS, and Distributed systems


 Representing event lists in discrete event simulation
 Implementing numerous graph algorithms efficiently
 Selecting kth largest or kth smallest element in lists (order statistics problem)
 Sorting Applications

Simple Implementation:

There are several ways to implement a priority queue

Linked list: stored and unsorted

 Performing insertions at front in O(1) and traversing the list which requires O(N) time

 To delete the minimum, we could insist that the list be kept always sorted: this
makes insertions expensive O(N) and delete-mins cheap O(1)

Another way of implementing priority queues would be use a binary search tree.

 This gives an O(log N) average running time for both operations

 The basic data structure we will use will not require pointers and will support both
operations in O(log N) worst case time. The implementations we will use is known as
a binary – heap

Binary – Heaps:

Heaps (occasionally called as partially ordered trees) are a very popular data structure for
implementing priority queues.

Binary heaps are refer to merely as heaps, like binary search trees, heaps have two
properties, namely, a structure property and a heap order property.

KHIT Page 33
ADVANCED DATA STRUCTURES UNIT - III CSE

Structure property:

A heap is a binary tree that is completely filled, with the possible exception of the bottom
level, which is filled from left to right, such tree is known as a complete binary tree as shown
below

A binary heap is a complete binary tree with elements from a partially ordered set, such that
the element at every node is less than (or equal to) the element at its left child and the
element at its right child.

It is easy to show that a complete binary tree height ‘ h ‘ has between 2h and 2h+1-1 nodes.

This implies that the height of a complete binary tree is log N , which is clearly O(log N).

One important observation is that because a complete binary tree is so regular, it can be
represented in an array and no pointers are necessary.

Since a heap is a complete binary tree, the elements can be conveniently stored in an array.
If an element is at position i in the array, then the left child will be in position 2i, the right

child will be in the position (2i+1), and the parent is in position i/2

The only problem with this implementation is that an estimate of the maximum heap size is
required in advance, but typically this is not a problem.

Because of the heap property, the minimum element will always be present at the root of
the heap. Thus the find min operation will have worst case O(1) running time.

KHIT Page 34
ADVANCED DATA STRUCTURES UNIT - III CSE

Heap – order property:

It is the property that allows operations to be performed quickly. Since we want to be able
to find the minimum quickly, it makes sense that the smallest element should be at the root.

If we consider that any sub tree should also be a heap, then any node should be smaller
than all of its descendants.

Applying this logic, we arrive at the heap order property. In a heap, for every node X, the
key in the parent of X is smaller than ( or equal to) the key X, with exception of the root.
(Which has no parent)?

NOTE: Binary heaps were first introduced by Williams in 1964.


NOTE: Binary Heap is either a min – heap or a max – heap. A min heap supports the insert
and delete min operations while a max heap supports the insert and delete max operations

Basic Heap Operations:

It is easy to perform the two required operations. All the work involves ensuring that the
heap order property is maintained.

Insert:

To insert an element say x, into the heap with n elements we first create a hole in position
(n+1) and see if the heap property is violated by putting x into the hole. If the heap property
is violated then we have found the current position for x. Otherwise we push - up or
percolate – up x until the heap property is restored.
To do this we slide the element that is in the holes parent node into the hole thus bubbling
the hole up toward the root. We continue this process until x can be placed in the whole.

Consider the heap

KHIT Page 35
ADVANCED DATA STRUCTURES UNIT - III CSE

We create a hole in the next available heap location. Inserting 14 in the hole would violate
the heap order property so 31 is slid down into the hole. This strategy is continued until the
correct location for 14 is found.

This general strategy is known as a percolate up. i.e. the new element is percolated up the
heap until the correct location is found.

NOTE: Worst case complexity of insert is O(h) where h is the height of the heap. Thus
insertions are O(log n) where n is the no .of elements in the heap.

Delete min:

Where the minimum is deleted a hole is created at the root level. Since the heap now has
one less element and the heap is a complete binary tree, the element in the least position is
to be relocated.

This we first do by placing the last element in the hole created at the root. This will leave the
heap property possibly violated at the root level.

We now push – down or percolate – down the hole at the root until the violation of heap
property is stopped. While pushing down the hole it is important to slide it down to the less
of its two children (pushing up the latter). This is done so as not to create another violation
of heap property.

Consider the previous example:

First remove or delete min is 13.

KHIT Page 36
ADVANCED DATA STRUCTURES UNIT - III CSE

This general strategy is known as a percolate down. We use same technique as in the insert
routine to avoid the use of swaps in this routine.

NOTE: The worst case running time of delete min is O(log n) where n is the no. of elements
in the heap.

KHIT Page 37
ADVANCED DATA STRUCTURES UNIT - III CSE

Creating Heap:

The build heap operation takes as input n elements. The problem here is to create a heap of
these elements i.e. places them into an empty heap.

 Obvious approach is to insert the n element one at a time into an initially empty
heap. Since each insert will take O(1) average and O(log n) worst case time, the total
running time of this algorithm would be O(n) average but O(n log n) worst case

 Another approach proposed by Floyd in 1964 is to use a procedure called push –


down or percolate – down repeatedly. Starting with the array consisting of the given
n elements in the input – order.
If percolate – down (i) percolate down from node i, perform the algorithm to create
a heap – ordered tree.

for ( i = n/2 ; I > 0 ; i --)


percolate - down (i)

Consider the tree is the unordered tree

Left: initial tree Right: after percolate – down (7)

KHIT Page 38
ADVANCED DATA STRUCTURES UNIT - III CSE

Left: after percolate – down (6) Right: after percolate – down (5)

Left: after percolate – down (4) Right: after percolate – down (3)

Left: after percolate – down (2) Right: after percolate – down (1)

Each dashed line corresponds to two comparisons: one to find the smallest child and one to
compare the smaller child with the node.

Notice that there are only 10 dashed lines in the entire algorithm (there could been an 11 th
where?) corresponding to 20 comparisons.

KHIT Page 39
ADVANCED DATA STRUCTURES UNIT - III CSE

To bounding the running time of build heap, we must bound the no. of dashed lines. This
can be done by computing the sum of the heights of all the nodes in the heap, which is the
maximum no. of dashed lines. What we would like to show is that this is O(n).

THEOREM:

For the perfect binary tree of height h containing 2h+1-1 nodes, the sum of the heights of
nodes is 2h+1-1-(h+1)

Proof:

It is easy to see that this tree consists of 1 node at height h, 2 nodes at height h-1, 22 nodes
at height h-2, and in general 2i nodes at height h-i

The sum of the height of all the nodes is then S = ∑ 2i (h-i) where i = o to h

S = h + 2 (h – 1) + 4 (h – 2) + 8 (h – 3) + 16 (h – 4) + - - - + 2h – 1 (1)

Multiplying by 2 gives the equation

S = - h + 2 + 4 + 8 + 16 + - - - + 2h – 1 + 2h

There fore S = (2h+1 – 1) – (h+1) which proves the theorem

It is easy to see that the above is an upper bound on the sum of heights of nodes of a
complete binary tree. Since a complete binary tree of height h has between 2 h and 2h+1
nodes, the above sum is there fore O(n)
Where n is the no. of nodes in the heap
Since the worst case complexity of the heap building algorithm is of the order of the sum of
height of the nodes of the heap built, we then have the worst case complexity of heap
building as O(n)

KHIT Page 40
ADVANCED DATA STRUCTURES UNIT - III CSE

Binomial Queues:
We know that previous concepts support merging, insertion, and delete min all effectively in
O(log n) time per operation but insertion take constant average time.

Binomial Queues support all three operations in O(log n) worst case time per operation, but
insertions take constant time on average.

Binomial Queue Structure:

It differs from all the priority queue implementations that a binomial queue is not a heap –
ordered tree but rather a collection of heap – ordered trees known as a forest.

Each of the heap – ordered trees is of a constrained from known as a binomial tree. There is
at most one binomial tree of every height.

 A binomial tree of height 0 is a one – node tree


 A binomial tree Bk of height k is formed by attaching a binomial tree Bk-1 to the root
of another binomial tree Bk-1

B0 B1 B2 B3

The above diagram shows binomial trees B0 B1 B2 and B3 from the diagram we see that a
binomial tree Bk consists of a root with children B0 B1 B2 - - - Bk-1

NOTE: Binomial tree of height k have exactly 2k nodes and the no. of nodes at depth d is the
binomial coefficient kCd

NOTE: If we impose heap order on the binomial tree and allow at most one binomial tree of
any height we can uniquely represent a priority queue of any size by a collection of binomial
trees (forest).

KHIT Page 41
ADVANCED DATA STRUCTURES UNIT - III CSE

For instance, a priority queue of size 13 could be represented by the forest B 3 B2 B0


We might write this representation as 1 1 0 1. Which not only represent 13 in binary but
also represent the fact that B3 B2 and B0 are present in the representation and B1 is not.

As an example, a priority queue of six elements could be represented as in below figure

H1:

Figure: Binomial queue H1 with six elements

Binomial Queue operations:

Find – min:

This is implemented by scanning the roots of the entire tree. Since there are at most log n
different trees, the minimum can be found in O(log n) time.

Alternatively, one can keep track of the current minimum and perform find – min in O(1)
time. If we remember to update the minimum if it changes during other operations.

Merge:

Merging two binomial queues is a conceptually easy operation, which we will describe by
example.
Consider the two binomial queues H1 and H2 with six and seven elements respectively as
shown below.

H1: with 6 elements

KHIT Page 42
ADVANCED DATA STRUCTURES UNIT - III CSE

H2: with 7 elements

Merge of two B1 trees (i.e. 21 = 2 nodes) in H1 and H2. i.e.

Now we left with 1 tree of height 0 and 3 trees of height 2

Binomial queue H3: the result of merging H1 and H2

H3: with 13 elements

KHIT Page 43
ADVANCED DATA STRUCTURES UNIT - III CSE

The merging is performed by essentially adding the two queues together.

Let H3 be the new binomial queue.

Since H1 has no binomial tree of height 0, and H2 does, we can just use the binomial tree of
height o in H2 as part of H3.

Next, we add binomial trees of height 1.

Since both H1 and H2 have binomial tree of height 1, we merge them by making the larger
root a sub tree of the smaller, creating a binomial tree of height 2.

Thus, H3 will not have a binomial tree of height 1 as shown in the above diagrams.

There are now three binomial trees of height 2, namely, the original trees in both H1 and H2
plus the tree formed by adding of height 1 tree in both H1 and H2.

We keep one binomial tree of height 2 in H3 and merge the other two, creating a binomial
tree of height 3.

Since H1 and H2 have no trees of height 3, this tree becomes part of H3 and we are finished.
The resulting binomial queue is as shown in above figure.

Since merging two binomial trees takes constant time with almost any reasonable
implementation, and there are O(log n) binomial tree, the merge takes O(log n) time in the
worst case.

To make this operation efficient, we need to keep the trees in the binomial queue sorted by
height, which is certainly a simple thing to do.

Insertion:

 Insertion is a special case of merging, since we merely create a one – node tree and
perform a merge.

 The worst – case time of this operation is likewise O(log n)

 More precisely, if the priority queue into which the element is being inserted has the
property that the smallest non existent binomial tree is B i the running time is
proportional to i+1.

KHIT Page 44
ADVANCED DATA STRUCTURES UNIT - III CSE

For example:
In The previous example, H3 is missing a binomial tree of height 1, so the insertion will
terminate in two steps. Since each tree in a binomial queue is present with probability ½.

If we define the random variable X as representing the no. of steps in an insert operation,
then

X = 1 with probability 1/2 (B0 not present)


X = 2 with probability 1/2 (B0 and B1 not present)
X = 3 with probability 1/8
Thus average number of steps in an insert operation = 2.

Thus we expect an insertion to terminate in two steps on the average. Further more
performing n inserts on an initially empty binomial queue will take O(n) worst case time.

In deed it is possible to do this operation using only (n - 1) comparisons.

Consider an example, the binomial queue that are formed by inserting 1 through 7 in order.

After 1 is inserted:

After 2 is inserted:

After 3 is inserted:

After 4 is inserted:

KHIT Page 45
ADVANCED DATA STRUCTURES UNIT - III CSE

After 5 is inserted:

After 6 is inserted:

After 7 is inserted:

If we insert 8 then

Inserting 4 shows off a bad case, we merge 4 with B0 obtaining a new tree of height 1. We
merge this tree with B1 obtaining a tree of height 2 which is the new priority queue.

The next insertion after 7 is another bad case and would require three merges.

KHIT Page 46
ADVANCED DATA STRUCTURES UNIT - III CSE

Delete min:

 A delete min can be performed by first finding the binomial tree with the smallest
root.
 Let this tree be Bk and let the original priority queue be H
 Remove the binomial tree Bk from the forest of trees in H forming the new binomial
queue H′
 Now remove the root of Bk creating binomial trees B0 B1 - - - Bk – 1 which collectively
from priority queue H″.
 Finish the operation by merging H′ & H″

Consider the same example of merge operation which has H3.


H3:

The minimum root is 12 so we obtain the two priority queues H′ & H″

The binomial queue that results from merging H′ & H″ is as shown below

NOTE: The entire delete min operation takes O(log n) worst case time

KHIT Page 47
ADVANCED DATA STRUCTURES UNIT - III CSE

Binomial Amortized Analysis:

Amortized Analysis of Merge


To merge two binomial queues, an operation similar to addition of binary integers is
performed:
At any stage, we may have zero, one, two, or three Bk trees, depending on whether or not
the two priority queues contain a Bk tree and whether or not a Bk tree is carried over from
the previous step.
 If there is zero or more Bk tree, it is placed as a tree in the resulting binomial queue.
 If there are two, they are merged into a Bk + 1 tree and carried over
 If there are three, one is retained and other two merged.

Result 1:

 A binomial queue of n elements can be built by n successive insertions in 0(n) time.


 Brute force Analysis
Define the cost of each insertions to be
o 1time unit + an extra unit for each linking step
thus the total will be n units plus the total number of linking steps.
o The 1st, 3rd, ... and each odd-numbered step requires no linking steps since
there is no B0 present.
o A quarter of the insertions require only one linking step: 2nd, 6th, 10, ...
o One eighth of insertions require two linking steps.

We could do all this and bound the number of linking steps by n.

The above analysis will not help when we try to analyze a sequence of operations that
include more than just insertions.

 Amortized Analysis

Consider the result of an insertion.

o If there is no B0 tree, then the insertion costs one time unit. The result of
insertion is that there is now a B0 tree and the forest has one more tree.
o If there is a B0 tree but not B1 tree, then insertion costs 2 time units. The new
forest will have a B1 tree but not a B0 tree. Thus number of trees in the forest is
unchanged.
o An insertion that costs 3 time units will create a B2 tree but destroy a B0 and
B1, yielding one less tree in the forest.
o In general, an insertion that costs c units results in a net increase of 2 - c trees.
Since
 a Bc - 1 tree is created
 all Bi trees, 0 i c - 1 are removed.

KHIT Page 48
ADVANCED DATA STRUCTURES UNIT - III CSE

Thus expensive insertions remove trees and cheap insertions create trees.

Let ti =
ci =

We have

c0 =0
ti + (ci - ci - 1) = 2

Result 2:
 The amortized running times of Insert, Delete-min, and Merge are 0(1), 0(log n), and
0(log n) respectively.
Potential function = # of trees in the queue
To prove this result we choose:
 Insertion

ti =
ci =
ai = ti + (ci - ci - 1)
ai =2 i

= 2n - (cn - c0)
 As long as (cn - c0) is positive, we are done.
 In any case (cn - c0) is bounded by log n if we start with an empty tree.
 Merge:
Assume that the two queues to be merged have n1 and n2nodes with T1 and T2 trees. Let n =
n1+ n2. Actual time to perform merge is given by:
ti = 0(logn1 + logn2)
= 0(max(logn1, logn2)
= 0(log n)
(ci - ci - 1) is at most (log n) since there can be at most (log n) trees after merge.
 Deletemin:
The analysis here follows the same argument as for merge.

KHIT Page 49
ADVANCED DATA STRUCTURES UNIT - III CSE

Lazy Binomial Queues:

Binomial queues in which merging is done lazily.


Here, to merge two binomial queues, we simply concatenate the two lists of binomial trees.
In the resulting forest, there may be several trees of the same size.
Because of the lazy merge, merge and insert are both worst case 0(1) time.
 Delete min:
o Convert lazy binomial queue into a standard binomial queue
o Do delete min as in standard queue.
Fibonacci Heaps
Fibonacci heap supports all basic heap operations in 0(1) amortized time, with the exception
of delete min and delete which take 0(log n) amortized time.
Fibonacci heaps generalize binomial queues by adding two new concepts:
 A different implementation of decrease-key
 Lazy merging: Two heaps are merged only when it is required.

It can be shown in a Fibonacci heap that any node of rank r 1 has at least Fr + 1
descendant.

KHIT Page 50
ADVANCED DATA STRUCTURES UNIT - IV CSE

GRAPHS
In this chapter, we turn our attention to a data structure – Graphs - that differs from all of
the other in one major concept: each node may have multiple predecessors as well as
multiple successors.

Graphs are very useful structures. They can be used to solve complex routing problems,
such as designing and routing airlines among the airports they serve. Similarly, they can be
used to route messages over a computer network from one node to another.

Basic Concepts:
A graph is a collection of nodes, called vertices and a collection of segments called lines
connecting pair of vertices. In other words a graph consists of two sets, a set of vertices and
set of lines.

Graphs may be either directed or undirected.

 A directed graph or digraph is a graph in which each line has a direction (arrow
head) to its successor. The line in a directed graph is known as arc. The flow along
the arc between two vertices can follow only the in directed direction.

 An undirected graph is a graph in which there is no direction (arrow head) on any of


the lines, which are known as edges. The flow between two vertices can go in either
direction.

A path is a sequence of vertices in which each vertex is adjacent to the next one.

For example: {A, B, C, E} is a one path and {A, B, E, F} is another.

NOTE: Both directed and undirected graphs have paths.

KHIT Page 51
ADVANCED DATA STRUCTURES UNIT - IV CSE

Two vertices in a graph are said to be adjacent vertices (or neighbors) if there is a path of
length 1 connecting them.

Consider the above diagrams

In directed graph, B is adjacent to A, where as E is not adjacent to D; but D is adjacent to E.

In undirected graph, E and D are adjacent, but D and F are not.

A cycle is a path, it start with vertex and ends with same vertex.

Example:

A-B-C-A is a cycle

A loop is a special case of cycle in which a single arc begins and ends with the same vertex.
In a loop the end points of the line are the same.

Two vertices are said to be connected if there is a path between them. A graph is said to be
connected if, ignoring direction, there is a path from any vertex to any other vertex.

A directed graph is strongly connected if there is a path from each vertex to every other
vertex in the digraph.

A directed graph is weakly connected if at least two vertices are connected (A connected
undirected graph would always be strongly connected, so the concept is not normally used
with undirected graphs)

KHIT Page 52
ADVANCED DATA STRUCTURES UNIT - IV CSE

A graph is a disjoint graph if it is not connected

The degree of a vertex is the no/of lines incident to it

The out - degree of a vertex in a digraph is the no. of arcs leaving the vertex

The in - degree is the no. of arcs entering the vertex

For example: for vertex B; degree = 3, in - degree = 1, out - degree = 2

NOTE: A tree is a graph in which each vertex has only one predecessor; how ever a graph is
not a tree.

Operations on Graphs:
There are six primitive graph operations that provide the basic modules needed to maintain
a graph. They are

1. Insert a vertex
2. Delete a vertex
3. Add an edge
4. Delete an edge
5. Find a vertex
6. Traverse a graph

Vertex insertion:

Insert vertex adds a new vertex to a graph

When a vertex is inserted it is disjoint; it is not connected to any other vertices in the list

After inserting a vertex it must be connected

The below diagram shows a graph before and after a new vertex is added

KHIT Page 53
ADVANCED DATA STRUCTURES UNIT - IV CSE

Algorithm:

Algorithm insert vertex (graph, data)


Allocate memory for new vertex
Store data in new vertex
Increment graph count
if (empty graph)
Set graph first to new node
else
Search for insertion point
if (inserting before first vertex)
Set graph first to new vertex
else
Insert new vertex in sequence
end if
end insert vertex

Vertex deletion:

Delete vertex removes a vertex from the graph when a vertex is deleted; all connecting
edges are also removed

KHIT Page 54
ADVANCED DATA STRUCTURES UNIT - IV CSE

Algorithm:

Algorithm delete vertex(graph, key)

Return +1 if successful
-1 if degree not zero
-2 if key is not found
if (empty graph)
return -2;
end if
search for vertex to be deleted
if (not found)
return -2;
end if
if ( vertex indegree > 0 or indegree > 0)
return -1;
end if
delete vertex
decrement graph count
return 1;
end delete vertex

Edge addition:

Add edge connects a vertex to destination vertex. If a vertex requires multiple edges, add
an edge must be called once for each adjacent vertex. To add an edge, two vertices must be
specified. If the graph is a digraph, one of the vertices must be specified as the source and
one as the destination.

The below diagram shows adding an edge {A, E} to the graph

KHIT Page 55
ADVANCED DATA STRUCTURES UNIT - IV CSE

Algorithm:

Algorithm insertArc (graph, fromkey, tokey)

Return +1 if successful
-2 if fromkey not found
-3 if tokey not found
Allocate memory for new arc
Search and set fromvertex
if (fromvertex not found)
return -2;
end if
search and set tovertex
if (tovertex not found)
return -3;
end if
increment fromvertex outdegree
increment tovertex indegree
set arc destination to tovertex
if ( fromvertex arc list empty)
set fromvertex firstArc to new arc
set new arc nextArc to null
return 1;
end if
find insertion point in arc list
if (insert at beginning of arc list)
set fromvertex firstArc to new arc
else
insert in arc list
end if
return 1;
end insertArc

KHIT Page 56
ADVANCED DATA STRUCTURES UNIT - IV CSE

Edge deletion:

Delete edge removes one edge from a graph.

Below diagram shows that deleted the edge {B, E} from the graph

Algorithm:

Algorithm deleteArc(graph, fromkey, tokey)

Return +1 if successful
-2 if fromkey not found
-3 if tokey not found
if (empty graph)
return -2;
end if
search and set fromvertex to tovertex with key equal to fromkey
if (fromvertex arc not found)
return -2;
end if
if (fromvertex arc list null)
return -3;
end if
search and find arc with key equal to tokey
if (tokey not found)
return -3;
enf if
set tovertex to arc destination
delete arc
decrement fromvertex outdegree
decrement tovertex indegree
return 1;
end deleteArc

KHIT Page 57
ADVANCED DATA STRUCTURES UNIT - IV CSE

Find vertex:

Find vertex traverse a graph, looking for a specified vertex. If the vertex is found its data are
returned and if it is not found then an error is indicated.

The below figure shows find vertex traverses the graph, looking for vertex C

Algorithm:

Algorithm retrievevertex (graph, key, dataout)


Return +1 if successful
-2 if key not found
if (empty graph)
return -2;
end if
search for vertex
if (vertex found)
move locptr data to dataout
return 1;
else
return -2;
end if
end retrievevertex

KHIT Page 58
ADVANCED DATA STRUCTURES UNIT - IV CSE

Graph Storage Structure:

To represent a graph, we need to store two sets. The first set represents the vertices of the
graph and the second set represents the edges or arcs. The two most common structures
used to store these sets are arrays and linked lists. Although the arrays offer some simplicity
this is a major limitation.

Adjacency Matrix:

The adjacency matrix uses a vector (one – dimensional array) for the vertices and a matrix
(two – dimensional array) to store the edges. If two vertices are adjacent – that is if there is
no edge between them, intersect is set to 0.

If the graph is directed, the intersection in the adjacency matrix indicates the direction

In the below diagram, there is an arc from sources vertex B to destination vertex C. In the
adjacency matrix, this arc is seen as a 1 in the intersection from B (on the left) to C (on the
top). Because there is no arc from C to B, however, the intersection from C to B is 0.

KHIT Page 59
ADVANCED DATA STRUCTURES UNIT - IV CSE

NOTE: In adjacency matrix representation, we use a vector to store the vertices and a matrix
to store the edges.

In addition to the limitation that the size of graph must be know before the program starts,
there is another serious limitation in the adjacency matrix: only one edge can be stored
between any two vertices. Although this limitation does not prevent many graphs from
using the matrix format, some network structures require multiple lines between vertices.

Adjacency list:

The adjacency list uses a two – dimensional ragged array to store the edges. An adjacency
list is shown below.

The vertex list is a singly linked list of vertices in the list. Depending on the application, it
could also be implemented using doubly linked lists or circularly linked lists. The pointer at
the left of the list links the vertex entries. The pointer at the right in the vertex is a head
pointer to a linked list of edges from the vertex. Thus, in the non – directed graph on the left
in above figure there is a path from vertex B to vertices A, C, and E. To find these edges in
the adjacency list, we start at B’s vertex list entry and traverse the linked list to A, then to C,
and finally to E.

NOTE: In the adjacency list, we use a linked list to store the vertices and a two – dimensional
linked list to store the arcs.

KHIT Page 60
ADVANCED DATA STRUCTURES UNIT - IV CSE

Traverse graph:

There is always at least one application that requires that all vertices in a given graph be
visited; as we traverse the graph, we set the visited flag to on to indicate that the data have
been processed

That is traversal of a graph means visiting each of its nodes exactly once. This is
accomplished by visiting the nodes in a systematic manner

There are two standard graph traversals: depth first and breadth first. Both use visited flag

Depth – First Traversal:

In the depth – first traversal, we process all of a vertex’s descendants before we move to an
adjacent vertex. This concept is most easily seen when the graph is a tree

In the below figure we show the tree pre – order traversal processing sequence, one of the
standard depth – first traversals

In a similar manner, the depth – first traversal of a graph starts by processing the first
vertex; we select any vertex adjacent to the first vertex and process it. This continues until
we found no adjacent entries

This is similar to reaching a leaf in a tree. We require a stack to complete the traversal
i.e. last – in – first – out (LIFO) order

Let’s trace a depth – first traversal through the graph in below figure the numbering in the
box next to a vertex indicates the processing order

Trace of the DFS:

1. We begin by pushing the first vertex, into the stack

KHIT Page 61
ADVANCED DATA STRUCTURES UNIT - IV CSE

2. We then loop, pop the stack and after processing the vertex, push all of the adjacent
vertices into the stack

3. When the stack is empty traversal is completed

NOTE: In the depth – first traversal, all of a node’s descendents are processed before
moving to an adjacent node

Consider the above graph, let node A be the starting vertex

1. Begin with node A push onto stack

2. While stack not equal to empty

Pop A; state A is visited

Push nodes adjacent to A to stack and make their state waiting

3. Pop X; state B is visited

Push nodes adjacent to X into stack

4. Pop H; state H is visited

Push nodes adjacent to H into stack already G is in waiting state, then push nodes E
and P

5. Pop P; state P is visited


KHIT Page 62
ADVANCED DATA STRUCTURES UNIT - IV CSE

Push nodes adjacent to P are H, G, E; H is already in visited state, G and E are in


waiting state

6. Pop E; state E is visited

Push adjacent nodes, H is already visited, so push Y and M into the stack

7. Pop Y; state Y is visited

Push nodes adjacent to Y into stack, E is visited, M already in waiting state

8. Pop M; state M is visited

Push nodes adjacent to M, which is J

9. Pop J; state J is visited

No nodes are there to be process

10. Pop G; state G is visited

Now the stack is empty

The depth – first order of the visited nodes are A X H P E Y M J G

Breadth – First traversal:

In the breadth – first traversal of a graph, we process all adjacent vertices of a vertex before
going to the next level. We first saw the breadth – first traversal of a tree as shown in below

This traversal starts at level 0 and then processes all the vertices in level 1 before going on
to process the vertices in level 2.

The breadth – first traversal of a graph follows the same concept, begin by picking a starting
vertex A after processing it, process all of its adjacent vertices and continue this process
until get no adjacent vertices

KHIT Page 63
ADVANCED DATA STRUCTURES UNIT - IV CSE

The breadth – first traversal uses a queue rather than a stack. As we process each vertex, we
place all of its adjacent vertices in the queue. Then to select the next vertex to be processed,
we delete a vertex from the queue and process it.

Trace of the BFS:

1. We begin by enqueuing vertex A in the queue

2. We then loop, dequeuing the queue and processing the vertex from the front of
the queue. After processing the vertex, we place all of its adjacent vertices into
the queue. Thus in the above diagram we dequeue vertex X, process it, and then
place vertices G and H in the queue.

3. When the queue is empty, the traversal is complete.

NOTE: In the breadth – first traversal, all adjacent vertices are processed before processing
the descendents of a vertex.

Let’s trace this logic through the graph in below figure:

KHIT Page 64
ADVANCED DATA STRUCTURES UNIT - IV CSE

Algorithms:

Depth – First Search:

Policy: Don’t push nodes twice


// non-recursive, preorder, depth-first search
void dfs (Node v) {
if (v == null)
return;
push(v);
while (stack is not empty) {
pop(v);
if (v has not yet been visited)
mark&visit(v);
for (each w adjacent to v)
if (w has not yet been visited && not yet stacked)
push(w);
} // while
} // dfs

Breadth-First Search:

// non-recursive, preorder, breadth-first search


void bfs (Node v) {
if (v == null)
return;
enqueue(v);
while (queue is not empty) {
dequeue(v);
if (v has not yet been visited)
mark&visit(v);
for (each w adjacent to v)
if (w has not yet been visited && has not been queued)
enqueue(w);
} // while
} // bfs

KHIT Page 65
ADVANCED DATA STRUCTURES UNIT - IV CSE

Exercise problems:

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

GRAPH ALGORITHMS
The trees are the special case of graphs. A tree may be defined as a connected graph
without any cycles.

A spanning tree of a graph is a sub - graph which is basically a tree and it contains all the
vertices of graph containing no cycles.

Minimum – cost spanning tree:

A network is a graph whose lines are weighted. It is also known as a weighted graph. The
weight is an attribute of an edge. In an adjacency matrix, the weight is stored as the
intersection value. In an adjacency list, it is stored as the value in the adjacency linked list.

A minimum – cost spanning tree is a spanning tree in which the total weight of lines is
guaranteed to be the minimum of all possible trees in the graph.

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

Example:

Before going into the types of algorithms which will apply to get the minimum spanning tree
of a graph

Let’s manually determine the minimum – cost spanning tree

We can start with any vertex. Because the vertex list is usually key sequenced,

Let’s start with A

The above example shows a graph and one of its minimum – cost spanning trees. Since the
identification of a minimum – cost spanning tree involves the selection of a subset of edges.

Applications of spanning tree:

1. Spanning trees are very important in designing efficient routing algorithms

2. Spanning trees are have wide applications in many areas such as network design

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

There are two popular techniques to construct a minimum – cost spanning tree from a
weighted graph. One such method is known as Prim’s algorithm and another one is
Kruskal’s algorithm

Prim’s Algorithm:

The prim’s algorithm is implemented using the adjacency matrix of a graph. This matrix is
denoted by adjMatrix [i, j] where i and j operate from 0 to n-1 for n – node weighted
undirected graph.

We can represent a weighted graph by an adjacency matrix to store the set of edges, an
entry (i, j) in an adjacency matrix contains information on the edge that goes from the
vertex i to the vertex j. each matrix entry contains the weight of the corresponding edge.

Let’s consider an example:

Now the sequence of below figures show the working of prim’s algorithm:

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

Kruskal’s Algorithm:
Like Prim’s algorithm, Kruskal’s algorithm also constructs the minimum spanning tree of a
graph by adding edges to the spanning tree one – by – one. At all points during its
execution, the set of edges selected by prim’s algorithm forms exactly one tree, on the
other hand, the set of edges selected by Kruskal’s algorithm forms a forest of trees.

Let us consider a graph

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

The edges of the graph are arranged in increasing order of weights as shown below. Initially
the spanning tree, T is empty. We select the edge with smallest weight and include it in T.

If selected edge creates a cycle, then it will be removed from T. We repeat these two steps
until the tree T contains n-1 edges (where n is the number of vertices in the graph)

If the tree contains less than n-1 edges and the edge list is empty, then no spanning tree is
possible for the graph

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

Minimum – cost spanning tree using Kruskal’s algorithm

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

Shortest path Algorithm:

A minimum spanning tree gives no indication about the shortest path between two vertices.
Rather only the overall cost or weight is minimized. In real life, we are required to find the
shortest path between two cities.

For example: one would be interested in finding most economical route between any two
cities in a given railways network.

We are given a directed graph G in which every edge has a weight and our problem is to find
a path from one vertex v to another vertex w such that the sum of the weights on the path
is as minimal as possible. We shall call such path a shortest path.

The shortest path from vertex A to vertex E is ADCE and has a total cost of 12
compared to the cost of 20 for the edge directly from A to E and the cost of 14 for the path
ABCE

It turns out that is just easy to solve the more general problem of starting at one vertex,
called the source, and finding the shortest path to every other vertex, instead of to just one
destination vertex. For simplicity, we take the source to be vertex A and our problem then
consists of finding the shortest path from vertex A to every other vertex in the graph.

Dijkstra’s Algorithm:

The solution we will show for the shortest path problem is called Dijkstra’s Algorithm, after
Edsger Dijkstra, who first described it in 1959. This algorithm is based on the adjacency
matrix representation of a graph. Somewhat surprisingly, it finds not only the shortest path
from one specified vertex to another, but the shortest paths from the specified vertex to all
the other vertices.

Dijkstra’s Algorithm is often called as greedy technique.

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

The algorithm works by maintaining a set S of vertices whose shortest distance from the
source is already known.

Where d[i] contains the length of the current shortest path from source vertex to vertex i
Let’s apply Dijkstra’s algorithm to the given digraph for getting stages of algorithm

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

Exercise problems:

1. Construct a minimum – cost spanning tree of a given graph by using prim’s algorithm
and kruskal’s algorithm

2. Find the minimum – cost spanning tree for the below graphs using prim’s and
kruskal’s algorithms

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - V CSE

3. Find the shortest path from A to all other vertices using Dijkstra’s algorithm for the
graph as shown below

4. Find the shortest path by using Dijkstra’s algorithm on the following edge weighted
directed graph with the vertex P as the source

KHIT Page 66
ADVANCED DATA STRUCTURES UNIT - VII CSE

PATTERN MATCHING
Computers are well recognized to perform numerical computations – but they are equally
capable of processing textual data. A document generally contains textual data, or simply
text. Computers are used to edit documents, to search documents, to transmit documents
over the Internet, to display documents on monitors, and to print documents on printers.

The main concern in the text processing is centered on either manipulation or movement of
characters or searching for pattern or words.

Before proceeding to the search algorithms for text processing, we recall the operations on
strings. Representing a string as an array of characters is simple and efficient.

Pattern Matching Algorithms:

Pattern searching problem and its related variations are commonly come across in
computing.

For example: We may wish to determine whether or not the substring “DATA” occurs in the
text: ADVANCED DATA STRUCTURES

To examine text in a computer we are essentially limited to examining it one character at a


time. We there fore need a way of deciding when a match has been made.

Two substrings can be said to match when they are equal character-by-character from the
first to the last character. It follows that if the number of character-for-character matches is
equal to the length of both the substring sought and the text substring

A mismatch between two substrings must therefore mean that the two substrings are not
the same character-for-character or in other words the number of character-for-character
matches is less than the length of the two substrings.

This section uses the following notation: For a text string T of length n and a string P of
length m, we want to find whether P is a substring of T, The concept of a match is that there
is a substring of T starting at some index I that matches P, character-by-character, so that

T[i] = P [0], T [i+1] = P [1], . . . T [i+m-1] = P [m-1]

T[i] for i=0 to 35

A D V A N C E D D A T A S T R U C T U R E S

D A T A
P 

KHIT Page 81
ADVANCED DATA STRUCTURES UNIT - VII CSE

Brute – Force Algorithm:

A brute – force algorithm solves a problem in the most simple, direct or obvious way

A simple method to string matching is starting the comparison of P (string pattern) and T
(text string) from the first character of T and the first character of P. If there is a mismatch,
the comparison starts from the second character of T and so on.

The running time of brute – force pattern matching algorithm is not efficient in the worst
case. The running time of the algorithm is O (nm). The algorithm has a quadratic running
time O (n2), when m = n/2

Boyer – Moore Algorithm:

In Brute – force, it is required to compare every character in text T in order to locate a


pattern P as a substring. But, Boyer – Moore (B-M) pattern matching algorithm sometimes
avoids comparisons between P and a considerable fraction of the character in T.

This algorithm scans the character of the search pattern from right to left. If a match is not
found then a shift is made by some number of characters. This algorithm is also called as
“looking glass heuristic”

Algorithm:

Boyer-Moore (Text: T[0..n], Pattern: P[0…m])


{
// set i and j to the last index of p
i  m-1;
j  m-1;
// loop to the end of the text string
While (i<n)
// if both characters match
if (P[ j ] = T[ i ])
// end reached the end of P
if ( j = = 0 )
// found a match
Return i;
else
// go to next char
i  i-1
j  i-1;
else
// skip over the whole word or shift to last occurrence
i  i + m – min ( j, 1+last [ T[i] ])
KHIT Page 82
ADVANCED DATA STRUCTURES UNIT - VII CSE

j  m-1;
retirn -1; //no match
}

Program Logic:

 j and l are the two vales


 where j is the value of the pattern index and l is the value of last(c) function which is
the right most occurrence of particular pattern character ‘c’

Analysis:
The computation of the last function takes O (m+|∑|) time and actual search takes O(mn)
time. Therefore the worst case running time of Boyer-Moore algorithm is O(nm + |∑|).
Implies that the worst-case running time is quadratic, in case of n = m, the same as the naïve
algorithm.
Boyer-Moore algorithm is extremely fast on large alphabet (relative to the length of the
pattern).
Example:

Consider a text T = X Y X Z X X Y X T Z X Y X Z X Y Y X X Y

To make against the pattern P = X Y X Z X Y

We will first build last table using last(c) function where ‘c’ represents character from T

Consider the pattern

KHIT Page 83
ADVANCED DATA STRUCTURES UNIT - VII CSE

The remaining characters in text string is T which is not present in pattern i.e. last (T) = -1

STEP 1:

X Y X Z X X Y X T Z X Y X Z X Y Y X X Y

Mismatch

X Y X Z X Y

0 1 2 3 4 5 j

We will find l = last (c)  last (X) = 4, j=5

As 1 + l <= j  1 + 4 <= 5  then we shift the pattern by j – l i.e. by 1 position

STEP 2:

X Y X Z X X Y X T Z X Y X Z X Y Y X X Y

Mismatch

X Y X Z X Y

0 1 2 3 4 5 j

l = last (X) = 4, j = 3
As j < l +1  3 < 4 + 1  pattern shift by ( l – j) positions i.e. by 1 position

KHIT Page 84
ADVANCED DATA STRUCTURES UNIT - VII CSE

STEP 3:

X Y X Z X X Y X T Z X Y X Z X Y Y X X Y

Mismatch

X Y X Z X Y

0 1 2 3 4 5 j

l = last (X) = 4, j=5  l + 1 <= j  4 + 1 <= 5  ( j – l ) positions i.e. by 1 position

STEP 4:

X Y X Z X X Y X T Z X Y X Z X Y Y X X Y

Mismatch

X Y X Z X Y

0 1 2 3 4 5 j

l = last (T) = -1, j=5  l + 1 <= j  -1 + 1 <= 5  ( j – l )  (5 – ( -1 )) positions i.e. by 6


position

STEP 5:

X Y X Z X X Y X T Z X Y X Z X Y Y X X Y

Mismatch

X Y X Z X Y

0 1 2 3 4 5 j

Pattern will shift by 1 position

STEP 5:

X Y X Z X X Y X T Z X Y X Z X Y Y X X Y

Matched

X Y X Z X Y

0 1 2 3 4 5 j

Now the match for the given pattern is found in the given string

KHIT Page 85
ADVANCED DATA STRUCTURES UNIT - VII CSE

Knuth – Morris – Pratt Algorithm:

The basic method of the B – M algorithm ultimately reduces the efficiency of pattern
matching algorithm. Hence the K – M – P algorithm came up which avoids the repeated
comparison of characters

The basic idea behind this algorithm is to build a prefix array. Some times this array is also
called ∏ array. This prefix array is built using the prefix and suffix information of pattern

The overlapping prefix and suffix is used in K – M – P algorithm

The KMP algorithm achieves the efficiency of O (m + n) which is optimal in worst case.
Where n is the length of text and m is length of pattern

Let us first understand how to compute the prefix array for given pattern

Algorithm 1:

Compute – prefix (char p[size])


{
// problem description: this algorithm computes prefix
// table for given pattern
// input: pattern p
// output: prefix table for given pattern

prefix – table [0]  0

for ( q  1 to m ) do //m is length of pattern


{
while ( k > 0 AND p[k] ! = p[q])

k  prefix – table [k-1]


If ( p[k] = p[q] ) then
kk+1

prefix – table [q]  k


}

return prefix – table;

KHIT Page 86
ADVANCED DATA STRUCTURES UNIT - VII CSE

Example:

Suppose the given pattern is “abadab“


The prefix or ∏ array for this pattern can be built as follows

STEP 1:

Initially, put 0 in the 0th location of prefix array

0 1 2 3 4 5

a b a d a b

STEP 2:

Consider a string from given pattern

0 1 2 3 4 5

a b a d a b

0 0

STEP 3:

Consider a string from given pattern

KHIT Page 87
ADVANCED DATA STRUCTURES UNIT - VII CSE

0 1 2 3 4 5

a b a d a b

0 0 1

STEP 4:

Consider a string from given pattern

0 1 2 3 4 5

a b a d a b

0 0 1 0

KHIT Page 88
ADVANCED DATA STRUCTURES UNIT - VII CSE

STEP 5:

Consider a string from given pattern

abada from abadab

prefix: a , a b , a b a , a b a d The length of matching prefix – suffix is 1

suffix: a , d a , a d a , b a d a

0 1 2 3 4 5

a b a d a b

0 0 1 0 1

STEP 6:

Consider a string from given pattern

abadab from abadab

prefix: a , a b , a b a , a b a d , a b a d a The length of matching prefix – suffix is 2

suffix: b , a b , d a b , a d a b , b a d a b

0 1 2 3 4 5

a b a d a b

0 0 1 0 1 2

NOTE:

If there is more than one matching prefix – suffix then make entry with largest length of
matching into the prefix table

E.g.: string a b a b a

Prefix: a , a b , a b a , a b a b

Suffix: a , b a , a b a , b a b a

Make an entry into the prefix table with largest length of matching is 3.

KHIT Page 89
ADVANCED DATA STRUCTURES UNIT - VII CSE

Tries:

A trie (pronounced ‘try’ means retrieval) is a tree based data structure for sorting strings in
order to support fast pattern matching. The main application for tries is in information
retrieval. The trie uses the digits in the keys to organize and search the dictionary.

The example Trie is as follows:

a b

l n r a

l o d t e d t

o n

t e

The above trie shows words like allot, alone, and, are, bat, bad. The idea is that all strings
sharing common prefix should come from a common node. The tries are used in spell
checking programs.

A trie is a data structure that supports pattern matching queries in time proportional to the
pattern size.

Advantages:

 In tries the keys are searched using common prefixes. Hence it is faster. The lookup
of keys depends upon the height in case of binary search tree.

 Tries takes less space when they contain a large number of short strings. As nodes
are shared between the keys

KHIT Page 90
ADVANCED DATA STRUCTURES UNIT - VII CSE

Digital Search Tree:

 Digital search tree is a binary tree in which each node contains one element
 Every element is attached as a node using the binary representation
 The bits are read from left to right
 All the keys in the left sub-tree of a node at level i have bit 0 at ith position, similarly
the right sub-tree of a node at a level i have bit 1 at ith position
Example:

Consider following stream of keys with binary representation to construct a digital search
tree

A T F R C H I N

00001 10011 00101 10010 00011 01000 01001 01110

A
00001

F T
00101 10011

R
10010

C 00001 H 00001

I N
00001 00001

NOTE:

 Assume that there are fixed of bits

 If we read bit 0 then move onto the left sub branch

 If we read bit 1 then move onto the right sub branch

KHIT Page 91
ADVANCED DATA STRUCTURES UNIT - VII CSE

Binary trie:

A binary tri is a binary tree that has two kinds of nodes


 Branch nodes
 Element nodes
The branch node has two child’s one left and another right
The element node has single data member

Example:
Consider the elements 0000, 0010, 0001, 1001, 1000, and 1100. The binary trie can be built
as the numbers in square represents bit number

Level 1

Level 2

Level 3

Level 4

0000 0001 0010 1000 1001 1100

Compressed Binary Trie:

The binary trie may contain branch nodes whose degree is one. For creating compressed
binary trie, eliminate degree one nodes. Compressed binary trie above is as follows

KHIT Page 92
ADVANCED DATA STRUCTURES UNIT - VII CSE

Patricia:
The Patricia stands for Practical Algorithm To Retrieve Information Coded In Alphanumeric.
Building a Patricia is quite simple.
In Patricia every node will have a bit index. This number is written at every node. Based on
this number the trie will be searched.

Let us understand the procedure of building Patricia with the help of an example

Index 4 3 2 1 0
A 0 0 0 0 1
S 1 0 0 1 1
E 0 0 1 0 1
R 1 0 0 1 0
C 0 0 0 1 1
H 0 1 0 0 0
I 0 1 0 0 1

STEP 1:

As this is very first node we will simply create it as root node. For obtaining its bit index, we
will search the index of leftmost 1

43210
00001
Left most index is at index 0
th
Hence bit index of A is 0. The 0 index of A has value 1. Hence we will have right link up to
self node

STEP 2: Insert S: 1 0 0 1 1

We will start searching for S in existing trie. A bit index is 0. The 0th index of S denotes 1.
That means S should be attached as a right child of A. but wait! before attaching the node S
we must find out the bit index of S. as S should be attached to A, the A is closest node of S.
hence compare S and A.

KHIT Page 93
ADVANCED DATA STRUCTURES UNIT - VII CSE

Where bits of S and A differ. This index is 4. Hence bit index of S is 4.


But now S can not be attached as a child of A because index of S > index of child of A. hence
S should be moved up. The 4th index bit of S denotes 1, hence right link will go up to self
node.

STEP 3: Insert E: 0 0 1 0 1

For inserting E in an existing trie we go on searching from root. The bit index at root is 4. The
4th index bit of E is 0, so move onto left branch. With A, the bit index is 0. The 0 th index of E
is 1.

Hence E can be attached as right child of A. but before attaching E to A, we must find bit
index of E. the closest node of E is A. hence compare E and A.

As bit index of E is 2, we can not attach E as a child of A 9since bit index of A is 0). Hence we
traverse upwards. But as bit index of S is 4, we must attach E as child of S the bit index of E is
2 and the 2nd index bit is 1. Hence right link up to self node

KHIT Page 94
ADVANCED DATA STRUCTURES UNIT - VII CSE

STEP 4: Insert R: 1 0 0 1 0

We start from root, the bit index is 4. The 4th index of R is 1. That is attach R as right child of
S, the S is nearest neighbor of R. hence compare S and R

The 0th index entry of R is 0. Hence left link up to self node

STEP 5: Insert C: 0 0 0 1 1

The search will be S – E – A. the 0th index of A is 1. Hence C can be attached as right child of
A. the A is nearest of C.
But bit index of A is 0 and bit index of C is 1. Hence C can not be the child of A. we traverse
up, the 1st index bit of C is 1. Then right link up of C is up to self node.

KHIT Page 95
ADVANCED DATA STRUCTURES UNIT - VII CSE

STEP 6: Insert H: 0 1 0 0 0

Search traverse through S – E – C – A, nearest node of H is A. compare H and E

We can not attach H as child of A. traverse up, towards S. the 3rd index bit of H is 1, so right
link up to self node

KHIT Page 96
ADVANCED DATA STRUCTURES UNIT - VII CSE

STEP 7: Insert I: 0 1 0 0 1

At node S, bit index is 4, 4th index of I is 0 so move left


At node H, bit index is 3, 3rd index bit of I is 1 so move right
The H and I are now nearest node, comparing them

Multi Way Trie:


The multi way trie is an ordered prefix trie. Using the prefixes of the given word the trie
structure is built. Consider the words SET, SKY
For these words the multi way trie can be built as follows:

KHIT Page 97

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy