18CS42 - Module 4

DESIGN AND ANALYSIS OF
ALGORITHMS
MODULE 4
 Topics
 Dynamic Programming:
 General method with Examples,
 Multistage Graphs.
 Transitive Closure: Warshall’s Algorithm.
 All Pairs Shortest Paths: Floyd's Algorithm,
 Knapsack problem,
 Optimal Binary Search Trees,
 Travelling Sales Person problem.
 Space-Time Tradeoffs: Sorting by Counting,
 Input Enhancement in String Matching Harspool’s algorithm.

Introduction to Dynamic Programming
 Dynamic programming is an algorithm design technique.
 It was invented by U.S. mathematician, Richard Bellman, in the 1950s. It is a general method for
optimizing multistage decision processes.
 The word “programming” in the name of this technique stands for “planning” and does not refer to
computer programming.
 Dynamic programming is a technique for solving problems with overlapping sub-problems. These sub-
problems arise from a recurrence relating a given problem’s solution to solutions of its smaller sub-
problems.
 Dynamic programming suggests, rather than solving overlapping sub-problems again and again, each of
the smaller sub-problems are solved only once and recording the results in a table from which a solution
to the original problem can then be obtained.
 Ex: Fibonacci numbers. The Fibonacci numbers are the elements of the sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34,
...,
 which can be defined by the simple recurrence,
 F(n) = F(n − 1) + F(n − 2) for n > 1 and two initial conditions
 F(0) = 0, F(1) = 1.
 Dynamic programming is an algorithm design method that can be used when the solution to a problem can
be viewed as the result of a sequence of decisions.

 Ex: 1. Knapsack: The solution to the knapsack can be viewed as the result of a sequence of decisions.
That is we have to decide the values of Xi, where 1 <= i <= n. First we make a decision on X1, then on
X2, then on X3, and soon. An optimal sequence of decisions maximizes the objective function
 Ex: 2. To find shortest path from vertex i to vertex j in a directed graph G. We have to decide which
vertex should be the second vertex, which should be the third, which should be the fourth, and soon,
until vertex j is reached. An optimal sequence of decisions is the path of least length.
 Dynamic programming is an algorithm design method that can be used when the solution to a problem can
be viewed as the result of a sequence of decisions.

 Ex: 1. Knapsack: The solution to the knapsack can be viewed as the result of a sequence of decisions.
That is we have to decide the values of Xi, where 1 <= i <= n. First we make a decision on X1, then on
X2, then on X3, and soon. An optimal sequence of decisions maximizes the objective function
 Ex: 2. To find shortest path from vertex i to vertex j in a directed graph G. We have to decide which
vertex should be the second vertex, which should be the third, which should be the fourth, and soon,
until vertex j is reached. An optimal sequence of decisions is the path of least length.
Greedy Technique Dynamic Programming
 Greedy is a simple and straightforward approach  Dynamic programming is an optimization technique

that makes the locally optimal choice at each step that breaks down a complex problem into smaller
with the hope of finding a global optimum. overlapping sub-problems and solves each sub-
problem only once, storing the results for future use.
 It involves making a series of choices without  It involves solving problems by combining the
considering the overall future consequences. The solutions of sub-problems, rather than solving each
algorithm makes the best choice at each step based sub-problem independently.
on the current information available.
 Greedy algorithms usually work well for problems  Dynamic programming ensures that each sub-problem
where a locally optimal choice leads to a globally is solved only once, avoiding redundant computations
optimal solution. and significantly reducing the time complexity of the
overall solution.
 Greedy approach does not always guarantee an  Dynamic programming guarantees an optimal solution
optimal solution for all problems, and in some for problems that can be solved using this technique.
cases, it may lead to suboptimal results or
incorrect solutions.
DYNAMIC PROGRAMMING
 Steps:
 Break down the complex problem into simpler problems.
 Find optimal solution to the sub-problems
 Store the results of sub-problems (memoization).
 Re use the results so that same problems is not calculated more than once.
 Finally calculate the results of complex problems

MULTISTAGE GRAPHS
 A multi stage graph G = (V, E) is a directed graph in which the vertices are partitioned into k >2
disjoint sets Vi, 1 < i < k. In addition, if {u, v) is an edge in E, then u ∈ Vi and v ∈ Vi + 1 for some i, 1<
i < k.
 The sets V1 and Vk are such that | V1|= |Vk|= 1. Let s and t, respectively, be the vertices in V1 and Vk . The
vertex s is the source, and t is the sink.
 Let c (i, j) be the cost of edge (i, j). The cost of a path from s to t is the sum of the costs of the edges on
the path. The multi stage graph problem is to find a minimum-cost path from s to t.
MULTISTAGE GRAPHS
FORWARD APPROACH
 Let cost (i, j) be the cost of the path. Then, using the forward approach,
 where cost (i, j) is the cost of path from vertex j to the sink node and i is the stage number of vertex j.
 c( j, l ) is the cost of edge ( j, l ) in the graph.

Multistage graph Algorithm
MULTISTAGE GRAPHS
BACKWARD APPROACH
 Let bcost (i, j) be the cost of the path. Then, using the forward approach,
 where bcost (i-1, j) is the cost of path from (i-1)th stage from vertex j to the source vertex.
 l is the vertex from stage vi-1 , j is the vertex from stage vi
 c( l, j ) edge present in the graph and is the cost of edge from l to j.

Warshall’s and Floyd’s Algorithms
 Warshall’s algorithm for computing the transitive closure of a directed graph and
 Floyd’s algorithm for computing the all-pairs shortest-paths problem.
 Transitive closure of a directed graph: Transitive Closure is the reachability matrix to reach from vertex u to vertex
v of a graph. Given a graph, find a vertex v is reachable from another vertex u, for all vertex pairs (u, v). The final
matrix is the Boolean type.
 Ex,
Transitive Closure
 DEFINITION
 The transitive closure of a directed graph with n vertices can be defined as the n × n boolean matrix T =
{ tij }, in which the element in the ith row and the jth column is 1 if there exists a nontrivial path (i.e., directed
path of a positive length) from the ith vertex to the jth vertex; otherwise, tij is 0.
 Warshall’s algorithm for computing the transitive closure of a directed graph. It is called Warshall’s
algorithm after Stephen Warshall.
 Warshall’s algorithm constructs the transitive closure through a series of n × n boolean matrices:
 R(0) is the adjacency matrix of the digraph.
 R(1) contains the information about paths that can use the first vertex as intermediate and soon
 The last matrix in the series, R(n) , reflects paths that can use all n vertices of the digraph as intermediate and is
transitive closure of digraph.

 The central point of the algorithm is that we can compute all the elements of each matrix R(k) from its
immediate predecessor R(k-1) in series. Let r(k)ij , the element in the ith row and jth column of matrix R(k), be
equal to 1.
 The following formula is used for generating the elements of matrix R(k) from the elements of matrix R(k-1) :
 This formula implies the following rule for generating elements of matrix R(k) from elements of matrix R(k-1),
 If an element rij is 1 in R(k-1), it remains 1 in R(k).
 If an element rij is 0 in R(k-1), it has to be changed to 1 in R(k) if and only if the element in its row i and column k and
the element in its column j and row k are both 1’s in R(k-1).
Application of Warshall’s Algorithms
Warshall’s Algorithms
Warshall’s Algorithms
Floyd’s Algorithms
 Floyd’s Algorithm for the All-Pairs Shortest-Paths Problem
 Given a weighted connected graph (undirected or directed), the all-pairs shortest paths problem finds the
distances—i.e., the lengths of the shortest paths— from each vertex to all other vertices.
 It is called Floyd’s algorithm after its co-inventor Robert W. Floyd. It is applicable to both undirected and
directed weighted graphs provided that they do not contain a cycle of a negative length. (The distance
between any two vertices in such a cycle can be made small by repeating the cycle enough times).
 Floyd’s Algorithm for the All-Pairs Shortest-Paths Problem
 Its important applications to communications, transportation networks. Recent applications of the all-pairs
shortest-path problem is pre-computing distances for motion planning in computer games.
 The lengths of shortest paths in an n × n matrix is recorded in matrix D called the distance matrix: the
element dij in the ith row and the jth column of this matrix indicates the length of the shortest path from the ith
vertex to the jth vertex.
 Floyd’s algorithm computes the distance matrix of a weighted graph with n vertices through a series of n
× n matrices:
 Each of these matrices contains the lengths of shortest paths,
 The element d(k) ij in the ith row and the jth column of matrix D(k) (i, j = 1, 2, . . . , n, k = 0, 1, . . . , n) is equal to the
length of the shortest path among all paths from the ith vertex to the jth vertex with each intermediate vertex, if any
numbered not higher than k.
 D(0), does not allow any intermediate vertices in its paths; hence, D(0) is simply the weight matrix of the graph.
 The last matrix in the series, D(n), contains the lengths of the shortest paths among all paths that use all n vertices as
intermediate and hence is the distance matrix.
 As in Warshall’s algorithm, we can compute all the elements of each matrix D(k) from its immediate
predecessor D(k-1).
 The d(k) ij be the element in the ith row and the ijh column of matrix D(k). Each of the paths is made up two
disjoint subsets:
 those that do not use the kth vertex vk as intermediate vertex (i.e) This means that d(k) ij is equal to the length of the
shortest path among all paths from the ith vertex vi to the jth vertex vj and
 those that do (i.e., paths is made up of a path from vi to vk with each intermediate vertex numbered not higher than k −
1 and a path from vk to vj with each intermediate vertex numbered not higher than k − 1).
 Taking into account the lengths of the shortest paths in both subsets leads to the following recurrence:
Application of Floyd’s Algorithms
1.
2.
Knapsack Problem and Memory Functions
 Knapsack problem using Dynamic Programming
 Definition: Given a knapsack with the following
 M – Capacity of Knapsack
 n – Number of objects
 w – Array consisting of weights w1, w2, w3, …. wn
 p – Array consisting of profits p1, p2, p3, … pn or v1, v2, v3, … vn
 x – array consisting of either 0 or 1
 0 in xi represents ith object has not been selected
 1 in xi represents ith object has been selected

Knapsack Problem
 Approach to solve knapsack problem: To design a dynamic programming algorithm, derive a recurrence
relation that expresses a solution to an instance of the knapsack problem in terms of solutions to its smaller
sub-instances.
 Let us consider an instance defined by the first i items, 1 ≤ i ≤ n,
 with weights w1, w2, w3, …. wi
 values v1, v2, v3, … vi , and
 knapsack capacity j, 1 ≤ j ≤ W.
 Let F(i, j) be the value of an optimal solution to this instance, i.e., the value of the most valuable subset of the first i
items that fit into the knapsack of capacity j.

Knapsack Problem
 Approach to solve knapsack problem:
 Divide all the subsets of the first i items that fit the knapsack of capacity j into two categories:
1. Among the subsets that do not include the ith item ( wi > j ), the value of an optimal subset is, by definition, F
( i, j) = F ( i − 1, j).
2. Among the subsets that do include the ith item (hence, j − wi ≥ 0 or wi < j ), an optimal subset is made up of this
item and an optimal subset of the first i − 1 item that fits into the knapsack of capacity j − wi .
The value of such an optimal subset is F ( i, j) = vi + F ( i − 1 , j − wi ).

Knapsack Problem
 Approach to solve knapsack problem:
 These observations lead to the following recurrence:
Goal is to find F(n, W), the maximal value of a subset of the n given items
Knapsack Problem
Knapsack Problem
 The maximal value is F(4, 5) = $37.
 Find the composition of an optimal subset by back-tracing the computations of this entry in the table.
 Since F(4, 5) > F(3, 5), item 4 has to be included in an optimal solution {5 − 2 = 3 remaining units of the
knapsack capacity }.
 The value of the latter is F(3, 3). Since F(3, 3) = F(2, 3), item 3 not in an optimal subset.
 Since F(2, 3) > F(1, 3), item 2 is a part of an optimal selection, { 3 − 1 = 2 remaining units of the knapsack
capacity }.
 Since F(1, 2) > F(0, 2), item 1 is the final part of the optimal solution, { 2 − 2 = 0 remaining units of the
knapsack capacity }.
 Therefore {item 1, item 2, item 4} are included in knapsack.

Knapsack Problem
 The time efficiency and space efficiency of this algorithm are both in Θ(nW) .
 The time needed to find the composition of an optimal solution is in O(n).

Knapsack Problem and Memory Functions
 The previous approach to finding a solution to such a recurrence leads to an algorithm that solves common
sub-problems more than once and hence is very inefficient.
 The goal is to get a method that solves only sub-problems that are necessary and does so only once. Such
a method exists; is based on using memory functions.
 Using Memory Functions method, it fills a table with solutions to all smaller sub-problems, but each of
them is solved only once.

Memory Functions
 Initially, all the table’s entries are initialized with a special “null” symbol to indicate that they have not yet
been calculated.
 Thereafter, whenever a new value needs to be calculated,
 the method checks the corresponding entry in the table first: if this entry is not “null,” it is simply retrieved from
the table;
 otherwise, it is computed by the recursive call whose result is then recorded in the table.
Memory Functions
 The following algorithm implements this idea for the knapsack problem. After initializing the table, the recursive function
needs to be called with i = n (the number of items) and j = W (the knapsack capacity).
Memory Functions
 Note: The table in Figure 8.6 gives the results. Only 11 out of 20 nontrivial values (i.e., not those in row 0 or in column
0) have been computed.

THE TRAVELING SALES PERSON PROBLEM
 The traveling sales person problem finds application in a variety of situations. Suppose we have to route a
postal van to pick up mail from boxes located at n different sites. An n + 1 vertex graph can be used to
represent the situation. One vertex represents the post office from which the postal van starts and to which it
must return. Edge(i,j) is assigned a cost equal to the distance from site i to site j. The route taken by the
postal van is a tour, and we are interested in finding a tour of minimum length.
 Let G = (V,E) be a directed graph with edge costs c ij. The variable cij is defined such that
 cij > 0 for all i and j and
 cij = ∞ if (i,j) ∉ E.
 Let V = n and n > 1. A tour of G is a directed simple cycle that includes every vertex in V.
 The cost of a tour is the sum of the cost of the edges on the tour. The traveling sales person problem is to
find a tour of minimum cost.

 Let g (i, S) be the length of a shortest path starting at vertex i, going through all vertices in S, and
terminating at vertex 1.The function g (l, V - {1}) is the length of an optimal sales person tour. From the
principal of optimality it follows that
 Generalizing
OPTIMAL BINARY SEARCH TREES
 A binary search tree is one of the most important data structures in computer science. One of its principal
applications is to implement a dictionary, a set of elements with the operations of searching, insertion, and
deletion.
 Given 3 keys, 10, 20, and 30 the possible binary trees are,
 The total number of binary search trees with n keys is equal to the nth Catalan number,
 Example 1, Given four keys A, B, C, and D with probabilities 0.1, 0.2, 0.4, and 0.3, respectively. What is the
average number of comparisons to search a key for optimal binary search tree.
 The average number of comparisons in a successful search,
 In the first of these trees is 0.1 * 1+ 0.2 *2 + 0.4 * 3+ 0.3 *4 = 2.9, and
 For the second one it is 0.1 * 2 + 0.2 * 1+ 0.4 * 2 + 0.3 * 3 = 2.1.
 c (n) = 14, binary search trees with these keys.

 So let a1, . . . , an be distinct keys ordered from the smallest to the largest and let p1, . . . , pn be the
probabilities of searching for them. Let C(i, j) be the smallest average number of comparisons made in a
successful search in a binary search tree Ti j made up of keys ai, . . . , aj , where i, j are some integer indices, 1
≤ i ≤ j ≤ n.
 Using dynamic programming approach find values of C(i, j) for all smaller instances of the problem, and get
value of C(1, n).
 Using dynamic programming we derive a recurrence relation considering all possible ways to choose a root a
k among the keys a i , . . . , a j .
 The binary search tree contains, the root key ai,
 the left sub-tree Tik - 1 contains keys a i, . . . , a k-1 optimally arranged, and
 the right sub-tree Tik + 1 contains keys a k + 1, . . . , a j also optimally arranged.

 So let a1, . . . , an be distinct keys ordered from the smallest to the largest and let p1, . . . , pn be the
probabilities of searching for them. Let C(i, j) be the smallest average number of comparisons made in a
successful search in a binary search tree Ti j made up of keys ai, . . . , aj , where i, j are some integer indices, 1
≤ i ≤ j ≤ n.
 Using dynamic programming approach find values of C(i, j) for all smaller instances of the problem, and get
value of C(1, n).
 Using dynamic programming we derive a recurrence relation considering all possible ways to choose a root a
k among the keys a i , . . . , a j .
 the right sub-tree Tik + 1 contains keys a k + 1, . . . , a j also optimally arranged.

 the right sub-tree Tik + 1 contains keys a k + 1, . . . , a j also optimally arranged

 If we count tree levels starting with 1 to make the comparison numbers equal the keys levels, the following
recurrence relation is obtained:

 Final tables:
 The algorithm’s space efficiency is quadratic; the time efficiency of this algorithm is cubic.
 Construct an optimal Binary search tree for the following four-key set.
 a)
 b)
Input Enhancement in String Matching: Horspool’s Algorithm
 Brute-Force String Matching
 Brute-Force String Matching
 There several faster algorithms,
 Knuth-Morris-Pratt algorithm
 Boyer-Moore algorithm
 Horspool’s algorithm
 These algorithms exploit the idea of input-enhancement: preprocess the pattern to get some information
about it, store this information in a table, and then use this information during an actual search for the
pattern in a given text.
 Knuth- Morris-Pratt algorithm does it left to right, whereas the Boyer-Moore algorithm does it right to
left.
Horspool’s Algorithm
 Consider, as an example, searching for the pattern BARBER in some text:
 Compare the corresponding pairs of characters in the pattern and the text starting with the last R of the
pattern and moving right to left.

 If all the pattern’s characters match successfully, a matching substring is found. Then the search can be either stopped
altogether or continued if another occurrence of the same pattern is desired.
 If a mismatch occurs, shift the pattern to the right, make as large a shift as possible without risking the possibility of
missing a matching substring in the text.
 Horspool’s algorithm determines the size of such a shift by looking at the character c of the text that is
aligned against the last character of the pattern.

 There are four possibilities can occur,
 Case 1 If there are no c’s in the pattern— Ex, c is letter S
 Case 2 If there are occurrences of character c in the pattern but it is not the last one there — Ex, c is letter B —
the shift should align the rightmost occurrence of c in the pattern with the c in the text:
 There are four possibilities can occur,
 Case 3 If c happens to be the last character in the pattern but there are no c’s among its other m − 1
characters— Ex, c is letter R—the pattern should be shifted by the entire pattern’s length m:
 Case 4 Finally, if c happens to be the last character in the pattern and there are other c’s among its first m − 1
characters— Ex, c is letter R in our example—the rightmost occurrence of c among the first m − 1 characters in
the pattern should be aligned with the text’s c:
 The table’s entries will indicate the shift sizes computed by the formula

18CS42 - Module 4

Uploaded by

Copyright:

Available Formats

18CS42 - Module 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

18CS42 - Module 4

Uploaded by

Copyright:

Available Formats

DESIGN AND ANALYSIS OF

 General method with Examples,

 Transitive Closure: Warshall’s Algorithm.

 All Pairs Shortest Paths: Floyd's Algorithm,

 Optimal Binary Search Trees,

 Travelling Sales Person problem.

 Space-Time Tradeoffs: Sorting by Counting,

 Input Enhancement in String Matching Harspool’s algorithm.

 Dynamic programming is an algorithm design technique.

optimizing multistage decision processes.

 which can be defined by the simple recurrence,

 F(n) = F(n − 1) + F(n − 2) for n > 1 and two initial conditions

be viewed as the result of a sequence of decisions.

be viewed as the result of a sequence of decisions.

 Greedy is a simple and straightforward approach  Dynamic programming is an optimization technique

 Break down the complex problem into simpler problems.

 Find optimal solution to the sub-problems

 Store the results of sub-problems (memoization).

 Finally calculate the results of complex problems

 c( j, l ) is the cost of edge ( j, l ) in the graph.

 l is the vertex from stage vi-1 , j is the vertex from stage vi

 c( l, j ) edge present in the graph and is the cost of edge from l to j.

 Floyd’s algorithm for computing the all-pairs shortest-paths problem.

algorithm after Stephen Warshall.

 R(0) is the adjacency matrix of the digraph.

transitive closure of digraph.

 If an element rij is 1 in R(k-1), it remains 1 in R(k).

shortest-path problem is pre-computing distances for motion planning in computer games.

 Each of these matrices contains the lengths of shortest paths,

 Definition: Given a knapsack with the following

 w – Array consisting of weights w1, w2, w3, …. wn

 p – Array consisting of profits p1, p2, p3, … pn or v1, v2, v3, … vn

 x – array consisting of either 0 or 1

 0 in xi represents ith object has not been selected

 1 in xi represents ith object has been selected

 with weights w1, w2, w3, …. wi

 values v1, v2, v3, … vi , and

items that fit into the knapsack of capacity j.

The value of such an optimal subset is F ( i, j) = vi + F ( i − 1 , j − wi ).

 These observations lead to the following recurrence:

 Therefore {item 1, item 2, item 4} are included in knapsack.

 The time needed to find the composition of an optimal solution is in O(n).

sub-problems more than once and hence is very inefficient.

a method exists; is based on using memory functions.

them is solved only once.

 Thereafter, whenever a new value needs to be calculated,

0) have been computed.

 cij > 0 for all i and j and

find a tour of minimum cost.

 The average number of comparisons in a successful search,

 For the second one it is 0.1 * 2 + 0.2 * 1+ 0.4 * 2 + 0.3 * 3 = 2.1.

 c (n) = 14, binary search trees with these keys.

value of C(1, n).

k among the keys a i , . . . , a j .

 The binary search tree contains, the root key ai,

 the right sub-tree Tik + 1 contains keys a k + 1, . . . , a j also optimally arranged.

value of C(1, n).

k among the keys a i , . . . , a j .

 The binary search tree contains, the root key ai,