Data Structures and Algorithms: Unit - Iii Snapshots
Data Structures and Algorithms: Unit - Iii Snapshots
Data Structures and Algorithms: Unit - Iii Snapshots
3.0
Introduction
Dynamic programming has developed into a major model of algorithm design in computer. Richard Bellman coined the name in 1957 to describe the optimal control problem. The programming means a series of choices. The dynamic means the idea of choice may depend on the current state, rather than being decide ahead of time. The main feature of this method was that to replace exponential time computation by polynomial computation. This chapter focuses on the General Method, developing dynamic programming solutions for problems from different application areas.
3.1
Objective
This chapter discusses the basic traversal methods and various search techniques. The techniques, which involve examining every node in the given data object instance are referred to as traversal methods. The second category includes techniques applicable to graphs. These may not examine all vertices and so are referred to only as search methods.
3.2
Content
Page 83
1 7 8 10 5 8 3 4
Figure 3.1: Graph
Many problems can be formulated as multistage graph problems. Consider a resource allocation problem in which n units of resource are to be allocated to r projects. If j, 0 j n, units of the resource are allocated to project i, then the resulting net profits is N(i,j). The problem is to allocate the resource to the r projects in such a way as to maximize total net profit. This problem can be formulated as an r+1 stage graph problem as follows. Stage i, 1 i r, represents project i. There are n+1 vertices V(i,j), 0 j n, associated with stage i, 2 i r. Stages 1 and r+1 each have one vertex, V(1,0)=s and V(r+1,n)=t, respectively. Vertex V(i,j), 2 i r, represents the state in which a total of j units of resource have been allocated to projects 1,2,,i-1. The edges in G are of the form {V(i,j), V(i+1,l)} for all j l and 1 i<r. The edge {V(i,j), V(i+1,l)}, j l, is assigned a weight or cost of N(i,l-j) and corresponds to allocating l-j units of resource to project i, 1 i<r. In addition, G has edges of the type V(r,j), V(r+1,n) . Each such edge is assigned a weight of max0 p n-j{N(r,p)}. The resulting graph for a three-project problem with n=4 is shown in Figure 3.3. It should be easy to see that an optimal allocation of resources is defined by a maximum cost s to t path. This is easily converted into a minimum-cost problem by changing the sign of all the edge costs.
Page 85
V1
V2
2
V3 4 2
V4
v5
2 1
3
5 4 7
7
s
1
7 t 3
4
4 3
1 0
2 5
1 2
11
8
5 6
1 1
2 11
5
Figure 3.2: Five-Stage graph A dynamic programming formulation for a k-stage graph problem is obtained by first noticing that every s to t path is the result of a sequence of k-2 decisions. The ith decision involves determining which vertex in Vi+1, 1 i k-2, is to be on the path. It is easy to see that the principle of optimality holds. Let p(i,j) be a minimum-cost path from vertex j in Vi to vertex t. Let cost(i,j) be the cost of this path. Then, using the forward approach
Page 86
V(2,1) N(2,2) t=
N(1,4)
N(3,0)
--(4.3)
Since cost(k-1,j)=c(j,t) if j,t E and cost(k-1,j) = if {j,t} E,(4.3) may be solved for cost(1,s) by first computing cost(k-2,j) for all j Vk-2, then cost(k-3,j) for all j Vk-3, and so on, and finally cost(1,s). Trying this out on the graph of Figure 3.2, we obtain
Page 87
Note that in the calculation of cost(2,2), one has reused the values of cost(3,6), cost(3,7) and cost(3,8) and so avoided their re-computation. A minimum cost s to t path has a cost of 16. This path can be determined easily if one record the decision made at each state(vertex). Let d(i,j) be the value of l(where l is a node) that minimizes c(j,l) +cost(i+1,l). From Figure 3.2, the result will be obtain: d(3,6) = d(2,2) = d(1,1) = 2; 10; d(3,7) = 7; d(2,3) = 10; 6; d(3,8) = 10; d(2,4) = 8; d(2,5) = 8;
Let the minimum-cost path be s=1, v2,v3,,vk-1,t. It is easy to see that v2=d(1,1)=2, v3=d(2,d(1,1)=7, and v4=d(3,d(2,d(1,1)))=d(3,7)=10. Before writing an algorithm to solve 4.3 for a general k-stage graph, let us impose an ordering on the vertices in V. This ordering makes it easier to write the algorithm. And the require that the n vertices in V are indexed 1 through n. Indices are assigned in order of stages. First, s is assigned index 1, then vertices in V2 are assigned indices, then vertices from V3, and so on. Vertex t has index n. Hence, indices assigned to vertices in Vi+1 are bigger than those assigned to vertices in Vi. As a result of this indexing scheme, cost and d can be computed in the order n-1, n-2,,1. The first subscript in cost, p, and d only identifies the stage number and is omitted in the algorithm. The resulting algorithm, in pseudo code, is FGraph (Algorithm 3.1). The complexity analysis of the function FGraph is fairly straightforward. If G is represented by its adjacency lists, then r in line 9 of Algorithm 3.1 can be found in time proportional to the degree of vertex j. Hence, if G has |E| edges, then the time for the for loop of line 7 is (|V|+|E|). The time for the for loop of line 16 is (k). Hence, the total time is (|V|+|E|). In addition to the space needed for the input, space is needed for cost[], d[], and p[].
Page 88
The multistage graph problem can also be solved using the backward approach. Let bp(i,j) be a minimum-cost path from vertex s to a vertex j in Vi. Let bcost(i,j) be the cost of bp(i,j). From the backward approach it is obtained bcost(i,j) = min {bcost(i-1,l) + c(l,j)} --(4.4) lVi-1 l,j E Since bcost(2,j)=c(1,j) if 1,j E and bcost(2,j) = if1,j E, bcost(i,j) can be computed using (4.4) by first computing bcost for i=3, then for i=4, and so on. For the graph of Figure 3.2, it is obtained bcost(3,6) = min{bcost(2,2)+c(2,6),bcost(2,3)+c(3,6)} = min{9+4,7+2} =9 bcost(3,7) = 11 bcost(3,8) bcost(4,9) bcost(4,10) bcost(4,11) = 10 = 15 = 14 = 16 Page 89
The corresponding algorithm in pseudocode, to obtain a minimum-cost s-t path is BGraph (Algorithm 3.2). The first subscript on bcost, p and d are omitted for the same reasons as before. This algorithm has the same complexity as FGraph provided its inverse adjacency lists now represent G. Algorithm BGraph(G,k,n,p) //Same functions as FGraph { bcost[1] :=0.0; for j:= 2 to n do {// Compute bcost[j]. Let r be such that r,j is an edge of G and bcost[r] + c[r,j] is minimum; bcost[j] := bcost[r] + c[r,j]; d[j] :=r; } // Find a minimum-cost path. p[1]:= 1; p[k]:=n; for j:=k 1 to 2 do p[j] := d[p[j+1]]; } Algorithm 3.2: Multistage graph pseudo code corresponding to backward approach Both FGraph and BGraph work correctly even on a more generalized version of multistage graphs. In this generalization, the graph is permitted to have edges u,v such that u Vi, v Vj, and i<j.
Clearly, A0(i,j)=cost(i,j), 1 i n, 1 j n. One can obtain a recurrence for Ak(i,j) using an argument similar to that used before. A shortest path from i to j going through no vertex higher than k either goes through vertex k or it does not. If it does, Ak(i,j)= Ak1 (i,k)+ Ak-1(k,j). If it does not, then no intermediate vertex, has index greater than k-1. Hence Ak(i,j)= Ak-1(i,j).Combining one gets Ak(i,j)= min{ Ak-1(i,j), Ak-1(i,k)+ Ak-1(k,j)}, k>=1 --(4.6)
Algorithm AllPaths(cost,A,n) //cost[1:n,1:n] is the cost adjacency matrix of a graph //with n vertices; A[i,j] is the cost of a shortest path //from vertex i to vertex j. //cost[i,j]=0.0, for 1<=i<=n. { for i=1 to n do for j:=1 to n do A[i,j]:=cost[i,j]; //Copy cost into A. for k:=1 to n do for i:= 1 to n do for j:= 1 to n do A[i,j]:=min(A[i,j],A[i,k]+A[k,j]); } Algorithm 3.3: Function to compute lengths of shortest paths Example: The graph of Figure (4.4(a)) has the cost matrix of Figure (4.4(b)). The initial A matrix, A(0), plus the values after three iterations A(1), A(2) and A(3) are given in Figure (4.4).
Page 91
3 A0 1 2 3 A2 1 2 3 1 0 6 3 1 4 0 3 11 2 0 3 6 2 0
A1 1 2 3 A3 1 2 3
1 0 6 3
2 4 0 7
3 11 2 0 3 6 2 0
(b)A0 1 2 0 6 3 4 0 7 (d)A2
(c)A1 1 2 0 5 4 0
3 7 3 (e)A
Figure 3.4: Directed graph and associated matrices Let M=max{cost(i,j)|i,j E(G). It is easy to see that An(ij)<=(n-1)M. From the working of All Paths, it is clear that if i,j E(G) and i j, then one can initialize cost(i,j) to any number greater than (n-1)M (rather than the maximum allowable floating point number). If, at termination, A(i,j)>(n-1)M, then there is no directed path from i to j in G. Even for this choice of , care should be taken to avoid any floating-point overflows. The time needed by All Paths (Algorithm 3.3) is especially easy to determine because the looping is independent of the data in the matrix A. Line 11 is iterated n3 times, and so the time for All Paths is 0(n3)
Page 92
--(4.1)
the Knapsack problem is KNAP(1,n,m). Let y1,y2,,yn be an optimal sequence of 0/1 values for x1,x2,,xn, respectively. If y1=0, then y2,y3,,yn must constitute an optimal sequence for the problem KNAP(2,n,m). If it does not, then y1,y2,yn is not an optimal sequence for KNAP(1,n,m). If y1=1, then y2,,yn must be an optimal sequence for the problem KNAP(2,n,m-w1). If it is not then there is another 0/1 sequence z2,z3,,zn such that 2 i n wizi m-w1 and 2 i n pizi > 2 i n piyi. Hence, the sequence y1,z2,z3,,zn is a sequence for (4.1) with greater value. Again the principle of optimality applies. Let S0 be the initial problem state. Assume that n decisions di, 1 i n, have to be made. Let D1={r1,r2,,rj} be the set of possible decision ri, 1 i j. Let i be an optimal sequence of decisions with respect to the problem state Si. Then, when the principle of optimality holds, an optimal sequence of decisions with respect to S0 is the best of the decision sequences ri, i, 1 i j. Let gj(y) be the value of an optimal solution to KNAP(j+1,n,y). Clearly, g 0(m) is the value of an optimal solution to KNAP(1,n,m). The possible decisions for x1 are 0 and 1 (D1={0,1}).From the principle of optimality it follows that g0(m) = max{ g1(m), g1(m-w1) + p1} --(4.2)
While the principle of optimality has been stated only with respect to the initial state and decision, it can be applied equally well to intermediate states and decisions. Another important feature of the dynamic programming approach is that optimal solutions to subproblems are retained so as to avoid recomputing their values. The use of these tabulated values makes it natural to recast the recursive equations into an iterative algorithm. A solution to the knapsack problem can be obtained by making a sequence of decisions on the variables x1,x2,,xn. A decision on variable xi involves determining which of the values 0 or 1 is to be assigned to it. Let us assume that decisions on the xi are made in the order xn,xn-1,,x1. Following a decision on xn, one may be in one of two possible states: the capacity remaining in the knapsack is m and no profit has accrued or the capacity remaining is m-wn and a profit of pn has accrued. It is clear that the remaining decisions xn-1,..,x1 must be optimal with respect to the problem state resulting from the decision on xn. Otherwise, xn,,x1 will not be optimal. Hence, the principle of optimality holds.
Page 93
Equation 4.8 can be solved for fn(m) by beginning with the knowledge fo(y)=0 for all y and fi(y)=-, y<0. Then f1,f2,,fn can be successively computed using (4.8). When the wis are integer, it is needed to compute fi(y) for integer y, 0 y m. Since fi(y) =- for y<0, these function values need not be computed explicitly . Since each fi can be computed from fi-1 in (m) time, it takes (mn) time to compute fn. When the wis are real numbers, fi(y) is needed for real numbers y such that 0 y m. So, fi cannot be explicitly computed for all y in this range. Even when the w is are integer, the explicit (mn) computation of fn may not be the most efficient computation. So, explore an alternative method for both cases. Notice that fi(y) is an ascending step function; that is, there are a finite number of ys, 0=y1<y2<<yk, such that fi(y1)<fi(y2)<<fi(yk); fi(y) = -, y<y1; fi(y)=f(yk), y yk; and fi(y)=fi(yj), yj y<yj+1. So, we need to compute only fi(yj), 1 j k. one uses the ordered set Si={(f(yj),yj)|1 j k} to represent fi(y). Each member of Si is a pair (P,W), where P=fi(yj) and W=yj. Notice that So = {(0,0)}. One can compute Si+1 from Si by first computing: Si1 = {(P,W)|(P-pi, W-wi) Si } --(4.9)
Now, Si+1 can be computed by merging the pairs in Si and Si1 together. Note that if Si+1 contains two pairs (Pj, Wj) and (Pk,Wk) with the property that Pj Pk and Wj Wk, then the pair (Pj,Wj) can be discarded because of 4.8. Discarding or purging rules such as this one are also known as dominance rules. Dominated tuples get purged. In the above, (Pk, Wk) dominates (Pj, Wj). Interestingly, the strategy one has have come up with can also be derived by attempting to solve the knapsack problem via a systematic examination of the up to 2 n possibilities for x1,x2,,xn. Let Si represent the possible states resulting from the 2i decision sequences for x1,x2,,xi. A state refers to a pair(Pj,Wj), Wj being the total weight of objects included in the knapsack and Pj being the corresponding profit. To obtain Si+1, one notes that the possibilities for xi+1 are xi+1=0 or Xi+1=1. When xi+1=0, the resulting states are the same as for Si. When xi+1=1, the resulting states are obtained by adding(pi+1, wi+1) to each state in Si. Call the set of these additional states Si1. The Si1 is the same as in Equation 4.9. Now, Si+1 can be computed by merging the states in Si and Si1 together. Example 1: Page 94
PW = record {float p; float w;} Algorithm DKnap(p,w,x,n,m) { //pair[] is an array of PWs. b[0]:=1; pair[1].p:=pair[1].w :=0.0; // S0 t:=1; h:=1; //start and end of S0 b[1]:=next:=2; // Next free spot in pair[] for i:= 1 to n-1 do { //Generate Si. k:=t; u:=Largest(pair,w,t,h,i,m); for j:= t to u do { // Generates S1i-1 and merge. pp:=pair[j].p+p[i];ww:=pair[j].w+w[i]; //(pp,ww) is the next element in S1i-1 while ((k h) and (pair[k].w ww)) do { pair[next].p := pair[k].p; pair[next].w := pair[k].w; next := next +1 ; k:=k+1 } if()k h) and (pair[k].w=ww)) then { if pp<pair[k].p then pp:=pair[k].p; k:=k+1;
Page 95
Page 96
0 5 6
10 0 13 8 (b)
15 9 0 9
20 10 12 0
4 (a)
Thus g(2, )=C21 = 5, g(3, )=C31=6, and g(4, )=C41=8. Using (4.11), we obtain g(2,{3}) g(3,{2}) g(4,{2}) = = = C23+g(3, ) = 15 18 13 g(2,{4})=18 g(3,{4})=20 g(4,{3})=15 = = = 25 25 23
Next, we compute g(i,S) with |S| = 2 , i 1, 1S and iS. g(2,{3,4}) = min{c23+g(3,{4}),C24+g(4,{3})} g(3,{2,4}) = min{c32+g(2,{4}),C34+g(4,{2})} g(4,{2,3}) = min{c42+g(2,{3}),C43+g(3,{2})} Finally, from (4.10) we obtain
Page 97
3.2.7
Introduction to Searching
Data stored in an organized manner requires to be accessed for processing. Locating a particular data item in the memory involves searching the data item. Searching is a technique where the memory is scanned for the required data. Computer systems are often used to store large amounts of data from which individual records must be retrieved according to some search specification. Thus the efficient storage of data to facilitate fast searching is an important issue.
3.2.8
There are several types of searching techniques available, which require the familiarity of certain terms to understand them better. A Table or a file is a group of elements called record. Associated with each record is a key, which is used to differentiate among the records. The key is present within a record at a specific offset from the state of the record. Such a key is called an internal key or an embedded key. There is a separate table of keys that include pointers to the records. Such keys are known as external keys. For each file there are atleast one set of keys which are unique. Such a key is known as primary key. In a file of name and addresses, if the state is used as the key for a particular search, it will probably not be unique, since there may be two records with the same state in the file. Such a key is called a secondary key.
Page 98
3.2.9
Sequential Search
Simplest form of a search is sequential or linear search. This search is applicable to a table organized either as an array or as a linked list. The simplest technique for searching an unordered table for a particular record is to scan each entry in the table in sequential manner until the record is found. The storage medium involved lacks any type of direct access facility. The logic of sequential search is extremely straight forward; it begins with the first available record and repeatedly proceeds to next record until the search key is found or can be concluded that it will not be found. This method which traverses data sequentially to locate item is called linear or sequential search. To simplify, item to data is assigned to the position following the last element of data, which leads to:
loc=n+1
Algorithm for linear search linear (data,n,jtem,loc) /* here data is a linear array of n items and item is a given item of information */ data[n+1]= item loc=1 search for item Do data[loc] not equal to item loc:= loc+1 while the termination condition if (loc = n+ 1) loc := 0 exit To examine how long it will take to find an item matching a key in the collections, discussed so far, the following are to be considered: The average time The worst-case time and The best possible time. However, the general concern is with the worst-case time as calculations based on worst-case times can lead to guaranteed performance predictions. Conveniently, the worst-case times are generally easier to calculate than average times. Page 99
3.2.10
Binary Search
Suppose the items are placed in an array and are to be sorted in either ascending or descending order on the key first, a much better performance is to be obtained with an extremely efficient searching algorithm known as Binary search. In binary search, comparison of the key with the item in the middle position of the array is done. If there is a match, the control is returned. If the key is less than the middle key, then the item sought must lie in the lower half of the array; if it's greater then the item sought must lie in the upper half of the array. Hence the procedure is repeated on the lower (or upper) half of the array. Suppose the items are placed in an array that is sorted either in ascending or in descending order, Binary Search technique can be used.In this method Key is compared with the middle item of the array If there is a match it is returned successfully If the key is lesser the lower half of the array is to be searched If the key is greater the upper half of the array is to be searched. This procedure is repeated till the array is exhausted or the item is found.
Page 101
Figure 3.6.a Indexed sequential File The real advantage of the indexed sequential method is that the items in the table be examined sequentially and if all the records in the file must be accessed, yet the search time for a particular item is sharply reduced. A sequential search is performed on the smaller index rather than on the larger table. Once the correct index position has been found a second sequential search is performed on a small portion of the record table itself. The use of an index is applicable to a sorted table stored as a linked list, as well as to one stored as an array. Use of a linked list implies a larger space overhead for pointers, although insertions and deletions can be performed much more readily. If the table is so large that even the use of an index does not achieve sufficient efficiency (either because the index is large in order to reduce sequential searching in the table or because the index is small so that adjacent keys in the index are far from each other in the table), a secondary index can be used. The secondary index acts as an index
Page 102
Figure 3.6.b Use of secondary Index Interpolation search The other technique for searching an ordered array is called Interpolation Search. If the keys are uniformally distributed between k(0) and k(n-1) the method may be even more efficient than the binary search. A variation of interpolation search called Robust Interpolation Search(or fast search), attempts to remedy the poor practical behavior of
Page 103
3.2.11
During a search or traversal the fields of a node may be used several times. It may be necessary to distinguish those nodes, which have been searched from the nodes, which have not been searched. The nodes, which have been searched are called visited. Visiting a node may involve printing out its data field, evaluating the operation specified by the node in the case of a binary tree representing an expression, setting a mark bit to one or zero, and so on. The term visited is used rather than the term for the specific function performed on the node. Techniques for Binary Trees Manipulating the binary trees or graphs, can solve many problems. This manipulation requires us to determine a vertex or a subset of vertices in the given data object that satisfies a given property. Let us, for instance try to find all vertices in a binary tree with a data value less than x or find all vertices in a given graph G that can be reached from another given vertex v. The determination of this subset of vertices satisfying a given property can be carried out by systematically examining the vertices of the given data object. This often takes the form of a search in the data object. The search, which necessitates the examination of every vertex in the object being searched, is called a traversal. There are many operations that could be performed on binary trees. One that is done frequently is traversing a tree or visiting each node in the tree exactly once. A traversal produces a linear order for the information in a tree. This linear order may be familiar and useful. When traversing a binary tree, each node and its subtrees must be treated in the same fashion. If L, D, and R stand for moving left, printing the data, and moving right when at a node, then there are six possible combinations of traversal: LDR, LRD, DLR, DRL, RDL, and RLD. If one adopts the convention that one traverse left before right, then only three traversals are possible: LDR, LRD, and DLR. named inorder, postorder, and preorder. Recursive functions are shown in Algorithm for Inorder Traversal. treenode = record { Type data; // Type is the data type of data. treenode *lchild; treenode *rchild; } Algorithm InOrder(t) //t is a binary tree. Each node of t has //three fields: lchild, data and rchild. { if t 0 then { InOrder(tlchild); Page 104
} } Algorithm Recursive formulation of Inorder Traversal Theorem Let T(n) and S(n) respectively represent the time and space needed by any one of the traversal algorithms when the input tree t has n 0 nodes. If the time and space needed to visit a node are (1), then T(n)= (n) and S(n)=O(n).Figure 5.1 below a binary tree.
A B D E G C
F H
Figure 3.7: A binary tree Techniques for Graphs In its simplest form it requires us to determine whether there exists a path in the given graph G=(V, E) such that this path starts at vertex v and ends at vertex u. A more general form is to determine for a given starting vertex uV all vertices u such that there is a path from v to u. Starting at vertex v and systematically searching the graph G for vertices that can be reached from v can solve this. Two search methods are: Depth First Search Breadth First Search
3.2.12
Optimization
Optimization is the process of improving the speed at which a program executes. Depending on context it may refer to the human process of making improvements at the source level or to a compiler's efforts to re-arrange code at the assembly level.
Page 105
Code Optimization Recompile your program with profiling enabled and whatever optimization options are compatible with that. Run your program, again on real world data and generate a profiling report. Figure out which function uses the most CPU time, then look over it very carefully and see if any of these approaches might be useful. Make one change at a time and then run the profiler again. Repeat the process until there is no obvious bottleneck or the program runs sufficiently fast. Choose a Better Algorithm First decide on what the code has to do. Become familiar with the body of literature that describes your specialty and learn and use the most appropriate algorithms. Familiarize the O(n) notation, which is defined very commonly. Some of the obvious replacements: Slow Algorithm sequential search Replace With binary search or hash lookup
Insertion or bubble sort Quick sort, merge sort, radix sort Data structure is to be chosen appropriately. A linked list would be a good option if you'll be doing a lot of insertions and deletions at random and an array would be better if you'll be doing some binary searching. Write Clear, Simple Code Codes that are clear and readable to humans are also clear and readable to compilers. Complicated expressions are harder to optimize and can cause the compiler to "fallback" to a less intense mode of optimization. Part of the clarity is making chunks of code into functions when appropriate. The cost of a function call is extremely small on modern machines. Writing a clean, portable code would help in quickly transferring to the latest, fastest machine and offer that as a solution to customers who are interested in speed. Perspective Take a note of operations that take time. Among the slowest are opening a file, reading or writing significant amounts of data, starting a new process, searching, sorting, Page 106
3.2.13
AND/OR Graphs
Many complex problems can be broken down into a series of subproblems so that these results in the solution for the original problem or they become sufficiently primitive as to be trivially solvable. This breaking down of a complex problem into several subproblems can be represented by a directed graph like structure in which nodes represent problems and descendents of nodes represent the subproblems associated with them. Example The graph of Figure 3.8(a) represents a problem A that can be solved by solving either both the subproblems B and C or the single subproblem D or E.
(a)
Page 107
A A B C D A E
Figure 3.8: Graphs representing problems Groups of subproblems that must be solved in order to imply a solution to the parent node are joined together by an arc going across the respective edges. By introducing dummy nodes in Figure 3.7(b), all nodes can be made to be such that their solution requires either all descendents to be solved or only one descendent to be solved. Nodes of the first type are called AND nodes and those of the latter type are called OR nodes. Nodes A and A of Figure 3.7(b) are OR nodes whereas A is an AND node. The AND nodes are drawn with an arc across all edges leaving the node. Nodes with no descendents are called terminal. Terminal nodes represent primitive problems and are marked either solvable or not solvable. Solvable terminal nodes are represented by rectangles. An AND/OR graph need not always be a tree. Problem reduction is the process of breaking down a problem into several sub problems. Problem reduction has been used on such problems as theorem proving, symbolic integration and analysis of industrial schedules. When problem reduction is used, two different problems may generate a common sub problem. In this case, it may be desirable to have only one node representing the sub problem. Figure 3.9 shows two AND/OR graphs for cases in which this is done. (a)
Page 108
A B D C E F
Figure 3.9: Two AND/OR graphs that are not trees Note that the graph is no longer a tree. Furthermore, such graphs may have directed cycles as in Figure 3.8. The presence of a directed cycle does not in itself imply the insolvability of the problem. In fact, solving the primitive problems G, H and I can help in solving problem A. This leads to the solution of D and E and hence of B and C. A subgraph of solvable nodes that shows that the problem is solved is called solution graph. Possible solution graphs for the graphs of Figure 3.8 are shown by heavy edges. Let there be a cost associated with each edge in the AND/OR graph. The cost of a solution graph H of an AND/OR graph G is the sum of the costs of the edges in H. The AND/OR graph decision problem (AOG) is to determine whether G has a solution graph of cost at most k, for k a given input. Theorem CNF-satisfiability the AND/OR graph decision problem. Proof Let P be a prepositional formula in CNF. Let us see how to transform a formula P in CNF into an AND/OR graph such that the AND/OR graph so obtained has a certain minimum cost solution if and only if P is satisfiable. Let k P = Ci, Ci= j l i=1 where the ljs are literals. The variables of P, V(P) are x1,x2,.,xn. The AND/OR graph will have nodes as follows:
Page 109
2. The node S is an AND node with descendent nodes P, x1, x2,, xn. 3. Each of the nodes xi represents the corresponding variable xi in the formula P.
Each xi is an OR node with two descendents denoted Txi and Fxi respectively. Solving Txi will correspond to assigning a truth-value of true to the variable xi.
5. Each node of type Txi or Fxi has exactly one descendent node that is terminal
(i.e., has no edges leaving it). These terminal nodes are denoted v1,v2,,v2n. To complete the construction of the AND/OR graph, the following edges and costs are added:
1. From each node Ci an edge Ci, Txj is added if xj occurs in clause Ci. An
edge Ci, Fxj is added if xj occurs in clause Ci. This is done for all variables xj appearing in the clause Ci. Clause Ci is designated an OR node.
2. Edges from nodes of type Txi or Fxi to their respective terminal nodes are
assigned a weight, or cost of 1. 3. All other edges have a cost of 0. The node S can be solved by solving each of the nodes P, x1,x2,,xn. Solving nodes x1,x2,.,xn costs n. To solve P, we must solve all the nodes C1,C2,,Ck. The cost of a node Ci is at most 1. However, if one of its descendent nodes was solved while solving the nodes x1,x2,,xn, then the additional cost to solve Ci is 0, as the edges to its descendent nodes have cost 0 and one of its descendents has already been solved. That is, a node Ci can be solved at no cost if one of the literals occurring in the clause C i has been assigned a value of true. If there is some assignment of truth-values to the xis such that at least one literal in each clause is true under that assignment, that iss if a formula P is satisfiable, it follows that the entire graph can be solved at a cost n. If P is not satisfiable, then the cost is more than n.The construction clearly takes only polynomial time. This completes the proof.
3.2.14
Bi-directional Components
The ability to combine separate reusable software components to form a complete program is necessary for effective software reuse. Views provide a clean, flexible, and efficient mechanism for combining reusable software components. A View describes how an application data type implements features of an abstract type; it provides a bi-
Page 110
3.2.15
A depth first search of a graph differs from a breadth first search in that the exploration of a vertex v is suspended as soon as a new vertex is reached. At this time the exploration of the new vertex u begins. When this new vertex has been explored, the exploration of v continues. The search terminates when all reached vertices have been fully explored. This search process is best-described recursively as in the below algorithm Algorithm DFS(v) Mark v as visited. For each vertex w such that v,w belongs to G If w is undiscovered DFS(G,w); Else Check vw without visiting w Mark w as finished DFS(G,w) Explore vw Visit w Explore from there as much as possible and backtrack from w to v. 1
(b) Directed graph head nodes [1] [2] [3] [4] [5] [6] [7] [8] 2 1 1 2 2 3 3 4
3 4 6 8 8 8 8 5
0 5 7 0 0 0 0 6 7 0 0 0
(c) Adjacency list for G Figure 3.10: Example graphs and adjacency lists Example A depth first search of the graph of Figure (5.4(a)) starting at vertex 1 and using the adjacency lists of Figure (5.4(c)) results in the vertices being visited in the order 1,2,4,8,5,6,3,7. One can easily prove that DFS visits all vertices reachable from vertex v. If T(n,e) and S(n,e) represent the maximum time and maximum additional space taken by DFS for an n-vertex e-edge graph, then S(n,e)=(n) and T(n,e)=(n+e) if adjacency lists are used and T(n,e)= (n2) if adjacency matrices are used. A depth first traversal of a graph is carried out by repeatedly calling DFS, with a new unvisited starting vertex each time. The algorithm for this (DFT) differs from BFT only in that the call to BFS(i) is replaced by a call to DFS(i). The exercises contain some problems that are solved best by BFS and others that are solved best by DFS. BFS and DFS are two fundamentally different search methods. In BFS a node is fully explored before the exploration of any other node begins. The next node to explore is the first unexplored node remaining. In DFS the exploration of a node is suspended as soon as a new unexplored node is reached.
Page 112
Searching Data stored in an organized manner requires to be accessed for processing. Locating a particular data item in the memory involves searching the data item. Searching is a technique where the memory is scanned for the required data. Dynamic Programming Dynamic programming is an algorithm design method that can be used when the solution to a problem can be viewed as the result of a sequence of decisions. Optimization Optimization is the process of improving the speed at which a program executes.
3.4
Intext Questions
1. Explain the concept of Dynamic Programming in detail. 2. Find a minimum cost path from s to t in the multistage graph of the below figure A. Do this using the forward approach then using the backward approach.
3. Explain Multistage Graphs in detail. 4. Describe the All-Pairs Shortest Paths with algorithm in detail.
Page 113
3.5
Summary
Dynamic programming is an algorithm design method that can be used when the solution to a problem can be viewed as the result of a sequence of decisions. Dynamic programming often drastically reduces the amount of enumeration by avoiding the enumeration of some decision sequences that cannot possibly be optimal. In dynamic programming an optional sequence of decisions is obtained by making explicit appeal to the principle of optimality. The important difference between the greedy method and dynamic programming is that, in the greedy method only one-decision sequence is ever generated. In dynamic programming, many decision sequences may be generated A multistage graph problem is to find minimum cost path from source to target. All pairs of shortest-path problem is to determine a matrix A such that a(i,j) is the length of a shortest path from i to j . Search, which necessitates the examination of every vertex in the object being searched, is called a traversal. Optimization is the process of improving the speed at which a program executes. Problem reduction is the process of breaking down a problem into several sub problems. A sub graph of solvable nodes that shows that the problem is solved is called solution graph. In BFS a node is fully explored before the exploration of any other node begins. The next node to explore is the first unexplored node remaining. In DFS the exploration of a node is suspended as soon as a new unexplored node is reached.
Page 114
Supplementary Materials
1. Ellis Horowitz, Sartaj Sahni, Fundamentals of Computer Algorithms, Galgotia Publications, 1997. 2. Aho, Hopcroft, Ullman, Data Structures and Algorithms, Addison Wesley, 1987. 3. Jean Paul Trembly & Paul G.Sorenson, An introduction to Data Structures with Applications, McGraw-Hill, 1984.
Assignments
1. Prepare an assignment for applications of searching and its important.
Learning Activities
Students can be performed the following task with small group 1. Searching 2. Indexed Sequential file
Page 115
Page 116