280 - DS Complete-4
280 - DS Complete-4
B+-tree
A B+-tree requires that each leaf be the same distance from the root, as in this picture,
where searching for any of the 11 values (all listed on the bottom level) will involve
loading three nodes from the disk (the root block, a second-level block, and a leaf).
In practice, d will be larger — as large, in fact, as it takes to fill a disk block. Suppose a
block is 4KB, our keys are 4-byte integers, and each reference is a 6-byte file offset.
Then we'd choose d to be the largest value so that 4 (d − 1) + 6 d ≤ 4096; solving this
inequality for d, we end up with d ≤ 410, so we'd use 410 for d. As you can see, d can
be large.
A B+-tree maintains the following invariants:
Every node has one more references than it has keys.
All leaves are at the same distance from the root.
For every non-leaf node N with k being the number of keys in N: all keys in the
first child's subtree are less than N's first key; and all keys in the ith child's
subtree (2 ≤ i ≤ k) are between the (i − 1)th key of n and the ith key of n.
The root has at least two children.
Every non-leaf, non-root node has at least floor(d / 2) children.
Each leaf contains at least floor(d / 2) keys.
Every key from the table appears in a leaf, in left-to-right sorted order.
In our examples, we'll continue to use 4 for d. Looking at our invariants, this requires
that each leaf have at least two keys, and each internal node to have at least two
children (and thus at least one key).
2. Insertion algorithm
Descend to the leaf where the key fits.
1. If the node has an empty space, insert the key/reference pair into the node.
2. If the node is already full, split it into two nodes, distributing the keys evenly
between the two nodes. If the node is a leaf, take a copy of the minimum value in
the second of these two nodes and repeat this insertion algorithm to insert it into
the parent node. If the node is a non-leaf, exclude the middle value during the
split and repeat this insertion algorithm to insert this excluded value into the
parent node.
Initial:
Insert 20:
Insert 13:
Insert 15:
Insert 10:
Insert 11:
Insert 12:
3. Deletion algorithm
Delete 13:
Delete 15:
Delete 1:
Expression Trees:
Trees are used in many other ways in the computer science. Compilers and database
are two major examples in this regard. In case of compilers, when the languages are
translated into machine language, tree-like structures are used. We have also seen an
example of expression tree comprising the mathematical expression. Let’s have more
discussion on the expression trees. We will see what are the benefits of expression
trees and how can we build an expression tree. Following is the figure of an expression
tree.
In the above tree, the expression on the left side is a + b * c while on the right side, we
have d * e + f * g. If you look at the figure, it becomes evident that the inner nodes
contain operators while leaf nodes have operands. We know that there are two types of
nodes in the tree i.e. inner nodes and leaf nodes. The leaf nodes are such nodes which
have left and right subtrees as null. You will find these at the bottom level of the tree.
The leaf nodes are connected with the inner nodes. So in trees, we have some inner
nodes and some leaf nodes.
In the above diagram, all the inner nodes (the nodes which have either left or right child
or both) have operators. In this case, we have + or * as operators. Whereas leaf nodes
contain operands only i.e. a, b, c, d, e, f, g. This tree is binary as the operators are
binary. We have discussed the evaluation of postfix and infix expressions and have
seen that the binary operators need two operands. In the infix expressions, one operand
is on the left side of the operator and the other is on the right side. Suppose, if we have
+ operator, it will be written as 2 + 4. However, in case of multiplication, we will write as
5*6. We may have unary operators like negation (-) or in Boolean expression we have
NOT. In this example, there are all the binary operators. Therefore, this tree is a binary
tree. This is not the Binary Search Tree. In BST, the values on the left side of the nodes
are smaller and the values on the right side are greater than the node. Therefore, this is
not a BST. Here we have an expression tree with no sorting process involved.
This is not necessary that expression tree is always binary tree. Suppose we have a
unary operator like negation. In this case, we have a node which has (-) in it and there is
only one leaf node under it. It means just negate that operand.
Let’s talk about the traversal of the expression tree. The inorder traversal may be
executed here.
Lecture-18
Binary Search Tree (BST)
A Binary Search Tree (BST) is a tree in which all the nodes follow the below-mentioned
properties −
The left sub-tree of a node has a key less than or equal to its parent node's key.
The right sub-tree of a node has a key greater than to its parent node's key.
Thus, BST divides all its sub-trees into two segments; the left sub-tree and the right
sub-tree and can be defined as −
Representation
BST is a collection of nodes arranged in a way where they maintain BST properties.
Each node has a key and an associated value. While searching, the desired key is
compared to the keys in BST and if found, the associated value is retrieved.
Following is a pictorial representation of BST −
We observe that the root node key (27) has all less-valued keys on the left sub-tree
and the higher valued keys on the right sub-tree.
Basic Operations
Following are the basic operations of a tree −
Search − Searches an element in a tree.
Insert − Inserts an element in a tree.
Pre-order Traversal − Traverses a tree in a pre-order manner.
In-order Traversal − Traverses a tree in an in-order manner.
Post-order Traversal − Traverses a tree in a post-order manner.
Node
Define a node having some data, references to its left and right child nodes.
struct node {
int data;
struct node *leftChild;
struct node *rightChild;
};
Search Operation
Whenever an element is to be searched, start searching from the root node. Then if the
data is less than the key value, search for the element in the left subtree. Otherwise,
search for the element in the right subtree. Follow the same algorithm for each node.
Algorithm
while(current->data != data){
if(current != NULL) {
printf("%d ",current->data);
//not found
if(current == NULL){
return NULL;
}
}
}
return current;
}
Insert Operation
Whenever an element is to be inserted, first locate its proper location. Start searching
from the root node, then if the data is less than the key value, search for the empty
location in the left subtree and insert the data. Otherwise, search for the empty location
in the right subtree and insert the data.
Algorithm
tempNode->data = data;
tempNode->leftChild = NULL;
tempNode->rightChild = NULL;
if(current == NULL) {
parent->leftChild = tempNode;
return;
}
} //go to right of the tree
else {
current = current->rightChild;
4
Visit D and mark it as visited
and put onto the stack. Here,
we have B and C nodes,
which are adjacent to D and
both are unvisited. However,
we shall again choose in an
alphabetical order.
We choose B, mark it as
visited and put onto the stack.
Here Bdoes not have any
unvisited adjacent node. So,
we pop Bfrom the stack.
6
As C does not have any unvisited adjacent node so we keep popping the stack until we
find a node that has an unvisited adjacent node. In this case, there's none and we keep
popping until the stack is empty.
Lecture-21
Breadth First Search
Breadth First Search (BFS) algorithm traverses a graph in a breadthward motion and
uses a queue to remember to get the next vertex to start a search, when a dead end
occurs in any iteration.
As in the example given above, BFS algorithm traverses from A to B to E to F first then
to C and G lastly to D. It employs the following rules.
Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Insert it
in a queue.
Rule 2 − If no adjacent vertex is found, remove the first vertex from the queue.
Rule 3 − Repeat Rule 1 and Rule 2 until the queue is empty.
We start from
visiting S(starting node), and
mark it as visited.
3
We then see an unvisited
adjacent node from S. In this
example, we have three nodes
but alphabetically we
choose A, mark it as visited
and enqueue it.
From A we have D as
unvisited adjacent node. We
mark it as visited and enqueue
it.
At this stage, we are left with no unmarked (unvisited) nodes. But as per the algorithm
we keep on dequeuing in order to get all unvisited nodes. When the queue gets
emptied, the program is over.
Lecture-22
Graph representation
You can represent a graph in many ways. The two most common ways of representing
a graph is as follows:
Adjacency matrix
An adjacency matrix is a VxV binary matrix A. Element Ai,j is 1 if there is an edge from
vertex i to vertex j else Ai,jis 0.
Note: A binary matrix is a matrix in which the cells can have only one of two possible
values - either a 0 or 1.
The adjacency matrix can also be modified for the weighted graph in which instead of
storing 0 or 1 in Ai,j, the weight or cost of the edge will be stored.
In an undirected graph, if Ai,j = 1, then Aj,i = 1. In a directed graph, if Ai,j = 1,
then Aj,i may or may not be 1.
Adjacency matrix provides constant time access (O(1) ) to determine if there is an
edge between two nodes. Space complexity of the adjacency matrix is O(V2).
The adjacency matrix of the following graph is:
i/j : 1 2 3 4
1:0101
2:1010
3:0101
4:1010
Consider the same undirected graph from an adjacency matrix. The adjacency list of the
graph is as follows:
A1 → 2 → 4
A2 → 1 → 3
A3 → 2 → 4
A4 → 1 → 3
Consider the same directed graph from an adjacency matrix. The adjacency list of the
graph is as follows:
A1 → 2
A2 → 4
A3 → 1 → 4
A4 → 2
Lecture-23
Topological Sorting:
Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices
such that for every directed edge uv, vertex u comes before v in the
ordering. Topological Sorting for a graph is not possible if the graph is not a DAG.
For example, a topological sorting of the following graph is “5 4 2 3 1 0”. There can be
more than one topological sorting for a graph. For example, another topological sorting
of the following graph is “4 5 2 3 1 0”. The first vertex in topological sorting is always a
vertex with in-degree as 0 (a vertex with no in-coming edges).
Algorithm to find Topological Sorting:
In DFS, we start from a vertex, we first print it and then recursively call DFS for its
adjacent vertices. In topological sorting, we use a temporary stack. We don’t print the
vertex immediately, we first recursively call topological sorting for all its adjacent
vertices, then push it to a stack. Finally, print contents of stack. Note that a vertex is
pushed to stack only when all of its adjacent vertices (and their adjacent vertices and so
on) are already in stack.
Topological Sorting vs Depth First Traversal (DFS):
In DFS, we print a vertex and then recursively call DFS for its adjacent vertices. In
topological sorting, we need to print a vertex before its adjacent vertices. For example,
in the given graph, the vertex ‘5’ should be printed before vertex ‘0’, but unlike DFS, the
vertex ‘4’ should also be printed before vertex ‘0’. So Topological sorting is different
from DFS. For example, a DFS of the shown graph is “5 2 3 1 0 4”, but it is not a
topological sorting
Dynamic Programming
The Floyd Warshall Algorithm is for solving the All Pairs Shortest Path problem. The
problem is to find shortest distances between every pair of vertices in a given edge
weighted directed Graph.
Example:
Input:
graph[][] = { {0, 5, INF, 10},
{INF, 0, 3, INF},
{INF, INF, 0, 1},
{INF, INF, INF, 0} }
which represents the following graph
10
(0)------->(3)
| /|\
5| |
| |1
\|/ |
(1)------->(2)
3
Note that the value of graph[i][j] is 0 if i is equal to j
And graph[i][j] is INF (infinite) if there is no edge from vertex i to j.
Output:
Shortest distance matrix
0 5 8 9
INF 0 3 4
INF INF 0 1
INF INF INF 0
Floyd Warshall Algorithm
We initialize the solution matrix same as the input graph matrix as a first step. Then we
update the solution matrix by considering all vertices as an intermediate vertex. The
idea is to one by one pick all vertices and update all shortest paths which include the
picked vertex as an intermediate vertex in the shortest path. When we pick vertex
number k as an intermediate vertex, we already have considered vertices {0, 1, 2, .. k-1}
as intermediate vertices. For every pair (i, j) of source and destination vertices
respectively, there are two possible cases.
1) k is not an intermediate vertex in shortest path from i to j. We keep the value of
dist[i][j] as it is.
2) k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j]
as dist[i][k] + dist[k][j].
The following figure shows the above optimal substructure property in the all-pairs
shortest path problem.
Lecture-24
Bubble Sort
We take an unsorted array for our example. Bubble sort takes Ο(n 2) time so we're
keeping it short and precise.
Bubble sort starts with very first two elements, comparing them to check which one is
greater.
In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we
compare 33 with 27.
We find that 27 is smaller than 33 and these two values must be swapped.
Next we compare 33 and 35. We find that both are in already sorted positions.
We know then that 10 is smaller 35. Hence they are not sorted.
We swap these values. We find that we have reached the end of the array. After one
iteration, the array should look like this −
To be precise, we are now showing how an array should look like after each iteration.
After the second iteration, it should look like this −