Daa Notes
Daa Notes
(DAA)
UNIT – 1
Introduction:
Algorithm
Pseudo Code for Expressing Algorithms
Performance Analysis, Space Complexity
Time Complexity
Asymptotic Notation-
Big Oh Notation
Omega Notation
Theta Notation
Little Oh Notation
probabilistic analysis
Amortized analysis.
(AUTONOMOUS)
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
Scanned with CamScanner
DESIGN AND ANALYSIS OF ALGORITHMS
(DAA)
UNIT – 2
Disjoint Sets:
Disjoint Sets
Disjoint Set Operations
Union and Find Algorithms
Connected Components and Bi-Connected Components.
(AUTONOMOUS)
Unit - 2
Disjoint set:
A disjoint-set data structure is a data structure that keeps track of a set of elements
partitioned into a number of disjoint (non-overlapping) subsets. A union-find algorithm is
an algorithm that performs two useful operations on such a data structure.
Operations on disjoint sets:
Find: Determine which subset a particular element is in. This can be used for
determining if two elements are in the same subset.
Union: Join two subsets into a single subset.
Find :
Find(x) follows the chain of parent pointers from x upwards through the tree until an
element is reached whose parent is itself. This element is the root of the tree and is the
representative member of the set to which x belongs, and may be x itself.
Path compression, is a way of flattening the structure of the tree whenever Find is used
on it. Since each element visited on the way to a root is part of the same set, all of these
visited elements can be reattached directly to the root. The resulting tree is much flatter,
speeding up future operations not only on these elements, but also on those referencing
them.
Pseudo code:
function Find(x)
if x.parent != x
x.parent := Find(x.parent)
return x.parent
Tarjan and Van Leeuwen also developed one-pass Find algorithms that are more efficient
in practice while retaining the same worst-case complexity.
Union:
Union (x, y) uses Find to determine the roots of the trees x and y belong to. If the roots
are distinct, the trees are combined by attaching the root of one to the root of the other.
If this is done naively, such as by always making x a child of y, the height of the trees
can grow as to prevent this union by rank is used.
Union by rank always attaches the shorter tree to the root of the taller tree. Thus, the
resulting tree is no taller than the originals unless they were of equal height, in which
case the resulting tree is taller by one node.
To implement union by rank, each element is associated with a rank. Initially a set has
one element and a rank of zero. If two sets are unioned and have the same rank, the
resulting set's rank is one larger; otherwise, if two sets are unioned and have different
ranks, the resulting set's rank is the larger of the two. Ranks are used instead of height
or depth because path compression will change the trees' heights over time.
Pseudo code:
function Union (x, y)
xRoot:= Find(x)
yRoot:= Find(y)
// x and y are already in the same set
if xRoot == yRoot
return
// x and y are not in same set, so we merge them
if xRoot.rank < yRoot.rank
xRoot.parent := yRoot
else if xRoot.rank > yRoot.rank
yRoot.parent := xRoot
else
xRoot.parent := yRoot
yRoot.rank := yRoot.rank + 1
Connected Components:
A connected component, of an undirected graph is a subgraph in which any two
vertices are connected to each other by paths, and which is connected to no additional
vertices in the super graph. For example, the graph shown in the illustration has three
components. A vertex with no incident edges is itself a component. A graph that is itself
connected has exactly one component, consisting of the whole graph.
It is straightforward to compute the components of a graph in linear time (in
terms of the numbers of the vertices and edges of the graph) using either breadth-first
search or depth-first search. In either case, a search that begins at some particular
vertex v will find the entire component containing v (and no more) before returning. To
find all the components of a graph, loop through its vertices, starting a new breadth first
or depth first search whenever the loop reaches a vertex that has not already been
included in a previously found component
DFS-recursive(G, s):
mark s as visited
for all neighbours w of s in Graph G:
if w is not visited:
DFS-recursive(G, w)
Time complexity is O(V + E), when implemented using the adjacency list
Biconnected Graphs:
Let us assume that G is an undirected connected graph. An articulation point is a
vertex v of G such that the deletion of v, together with all edges incident on v, produces
a graph, G', that has at least two connected components. For example, the connected
graph of Figure(a) has four articulation points, vertices 1,3,5, and 7. A biconnected
graph is a connected graph that has no articulation points.
In many graph applications, articulation points are undesirable. For instance,
suppose that the graph of Figure (a) represents a communication network. In such
graphs, the vertices represent communication stations and the edges represent
communication links. Now suppose that one of the stations that is an articulation point
fails. The result is a loss of communication not just to and from that single station, but
also between certain other pairs of stations. A biconnected component of a connected
undirected graph is a maximal biconnected subgraph, H, of G. By maximal, we mean
that G contains no other subgraph that is both biconnected and properly contains H. For
example, the graph of Figure (a) contains the six biconnected components shown in
Figure(b). It is easy to verify that two biconnected components of the same graph have
no more than one vertex in common. This means that no edge can be in two or more
biconnected components of a graph. Hence, the biconnected components of G partition
the edges of G. We can find the biconnected components of a connected undirected
graph, G, by using any depth first spanning tree of G.
BINARY SEARCH:
1) Let ai , 1<=i<=n be a list of elements that are sorted in non-descending order.
2) Suppose if we have 'n' number of elements then x is element searched.
3) If n = 1, small(p) be true then there is no requirement of applying divide and conquer
, if n > 1 then it can be divided into new such problems.
4) Pick an index 'q' in the range [i,l] and compare x with a q then the following 3
possibilities will occur,
a) x = aq ,then search element is found,
b) x < aq, then x is searched in left sublist,
c)x > aq, then x is searched in right sublist,
q
0 1 2 3 4 5
1 2 3 4 5 6
EXAMPLE:
Consider the list n = 6 where a = { 1,2,3,4,5,6}
X = 4(key element which has to be find)
First find the mid value of the given array
Mid = low + high / 2
Low = first element in the array (array index)
High = last element in the array (array index)
Mid = 0 + 5 / 2
= 2.5 = 2
Cases:-
1) Compare the key element with a[mid]
X == a[mid] 4 == a[2] 4 == 3(false)
2) Check if the key element is bigger or smaller than mid element
X <= a[mid] X >= a[mid]
4 <= a[2] 4 >= a[2]
4<3 4>3
As said in the procedure key element in bigger than respective array element so the key
element will be in right sub list of the array
Again find the mid for the right sublist
Now low = mid + 1
=2+1=3
High will not be changed because we are searching in the right sublist
Mid = (3 + 5) / 2
=4
4 5 6
3 4 5
a[mid] = a[4]
=5
Cases:-
1) Compare key element with new mid value
X = a[mid] X <= a[mid]
4 = 5(false) 4 <= 5(true)
The key element is less than mid value so now we have the calculate mid again to find
key element
Now low = 3
High value changes now
High = mid - 1
=4–1=3
Mid = low + high / 2
= (3 + 3) / 2 =3
a[mid] = a[3] = 4
cases:-
4 == a[3]
4 == 4(true)
ANALYSIS:-
TIME COMPLEXITY:
T(n) = T(n/2) + 1. It is in the form of T(n) = aT(n/b) + f(n)
Here a=1, b=2, f(n)=1
By using master’s theorem
1) f(n) = g(n) => O(f(n)logn)
2) f(n) < g(n) => O(g(n))
3) f(n) > g(n) => O(n)
g(n) = nlogba
g(n) = nlog21
= n0
=1
f(n) = g(n) => 1 = 1
Time Complexity T(n) = O(f(n)logn)
= O(logn)
ALGORITHM FOR BINARY SEARCH:
Binary Search (A, x, l, r)
Begin:
If l > r then
Return -1 (or) not found
End if
M = (l + r) / 2
If A[m] == x then
Return m;
Else if x < A[m] then
Return BinarySearch (A, x, l, m - 1)
else
return BinarySearch (A, x, m + 1, r)
End if
End
MERGE SORT:
Merge sort is an efficient, general-purpose, comparison-based sorting algorithm. Most
implementations produce a stable sort. Merge sort is a divide and conquer algorithm that
was invented by John von Neumann in 1945. It is one of the most popular sorting
algorithms and a great way to develop confidence in building recursive algorithms.
Divide and Conquer Strategy:
Using the Divide and Conquer technique, we divide a problem into sub problems. When
the solution to each sub problem is ready, we 'combine' the results from the sub
problems to solve the main problem.
Suppose we had to sort an array A. A sub problem would be to sort a sub-section of this
array starting at index p and ending at index r, denoted as A[p..r].
Divide:
If q is the half-way point between p and r, then we can split the subarray A[p..r] into
two arrays A[p..q] and A[q+1, r].
Conquer:
In the conquer step, we try to sort both the sub arrays A[p..q] and A[q+1, r]. If we
haven't yet reached the base case, we again divide both these subarrays and try to sort
them.
Combine:
When the conquer step reaches the base step and we get two sorted sub arrays A[p..q]
and A[q+1, r] for array A[p..r], we combine the results by creating a sorted array A[p..r]
from two sorted sub arrays A[p..q] and A[q+1, r]
EXAMPLE –
The MergeSort Algorithm:
The MergeSort function repeatedly divides the array into two halves until we reach a
stage where we try to perform MergeSort on a sub array of size 1 i.e. p == r.
After that, the merge function comes into play and combines the sorted arrays into
larger arrays until the whole array is merged
MergeSort (A, p, r) { while (i < n1 && j < n2)
If p > r {
return; if (L[i] <= M[j])
q = (p+r)/2; {
mergeSort (A, p, q) arr[k] = L[i];
mergeSort (A, q+1, r) i++;
merge (A, p, q, r) }
} else
void merge (int A[], int p, int q, int r) {
{ arr[k] = M[j];
/* Create L ← A[p..q] and M ← j++;
A[q+1..r] */ }
int n1 = q - p + 1; k++;
int n2 = r - q; }
(DAA)
UNIT – 3
Greedy Method:
General Method
Applications:
Job Sequencing with Deadlines
Knapsack Problem
Minimum Cost Spanning Trees
Single Source Shortest Path Problem.
(AUTONOMOUS)
UNIT – III
GREEDY METHODOLOGY:-
Every problem have some constraints and objective functions
OBJECTIVE FUNCTION:
It is an attempt to express a business goal (or) a function that is desired to be aximized
or minimized.
Any subset that specifies the given constraints is called feasible solution.
FEASIBLE SOLUTION:
If the given problem has ‘n’ inputs then the subset of inputs satisfies the constraints of
particular problem is called feasible solution.
OPTIMAL SOLUTION:
A feasible solution that either maximizes or minimizes the objective function is called
optimal solution.
In greedy method, we works in stages. At each stage, we take one input at a time and
make a decision, either it gives optimal solution or not.
A decision made in one stage cannot be changed in later stages. i.e, there is no
backtracking.
ALGORITHM:
Algorithm greedy(a,n) || a[1:n] contains ‘n’ inputs
{
Solution:= ¶; || initialize the solution
For i:=1 to n do
{
x :=select(a);
if feasible(solution,x)then
solution:=union(solution,x)
}
return solution
}
KNAPSACK PROBLEM:-
We have 'n' objects and a knapsack or bag.Each object has weight 'Wi' and profit 'Pi' and
knapsack has capacity 'm'. Objective is filling of Knapsack that maximises the total profit
earned.So the problem can be stated as maximise
Σ Pi Xi subject to Σ Wi Xi <= m
1<=i<=n 1<=i<=n and 0<=Xi<=1 , 1 <= i <= n To Compute maximum profit, we take
some solution factor i.e Xi.
If object directly placed Xi = 1 (If Enough space is available)
Otherwise Xi=0.
If object does not fit in the knapsack but some amount of space is available, Xi =
Remaining Space/Actual Weight of Object.
ALGORITHM :
Algorithm knapsack(p,w,n)
{
//p[1:n] & w[1:n] are profits and weights of objects such that
Pi/Wi > P(i+1)/W(i+1) > ......
{
for i := 1 to n do
x[i] := 0 (u -> capacity of bag)
m = u;
for i := 1 to n do {
if(w[i] > m) break;
x[i] := 1;
m = u - w[i];
}
if(i <= n) then
x[i] = m / w[i] ;
}
*Time Complexity is O(n).
EXAMPLE:
1. No. of objects n = 3, m = Capacity of knapsack=20
(p1,p2,p3) = (25,24,15)
(w1,w2,w3)=(18,15,10)
Fill the bag with Maximum Profit using Knapsack greedy method
sol: Given, n=3, m=20
Our main aim is to fill the bag with Maximum Profit
Case 1: (Maximum Profit)
First we place Maximum profit object
p1 = 25 w1=18 x1=1
After placing First Object
M = 20 - 18 = 2 p2 = 24 w2 = 15 x2 = 2/15
There is no space available => x3 = 0
Maximize ΣPiXi => p1x1 + p2x2 + p3x3
1<=i<=n =>25(1)+24(2/15)+15(0) =>25+3.2 =>28.2
Case 2: (Minimum Weight)
Among the above three onjects W3 weight is minimum
w3=10 M=20-10=10 x3=1 [Fitted]
Next object is w2=15
x2 = 10/15 Bag is Full
So,x1=0
Here,Σ Pixi = p1x1 + p2x2 + p3x3
1<=i<=n => 25(0)+24(10/15)+15(1) =>0+16+15 =>31
Therefore,
The Maximum profitable feasible solution that maximises profit is 31.5
The optimal solution is (0,1,1/2)
Time Complexity is O(n).
Total cost = 4.0 + 8.0 + 1.0 + 2.0 + 2.0 + 11.0 + 7.0 + 4.0 = 39
2.)KRUSHKAL'S ALGORITHM:
Total cost = 4 + 8 + 7 + 9 + 2 + 4 + 1 + 2 = 37
ALGORITHM:
ALGORITHM PRIMS ( G ):
//INPUT: A weighted connected graph G = <V , E>
//OUTPUT: T minimum spanning tree of G
{
T = { } // empty graph
for i = 1 to n – 2 do {
e := any edge in G with smallest weight that does not form a cycle when added
to T
T := T union { e }
}
return T; }
SINGLE SOURCE SHORTEST PATH PROBLEM:
Graphs can be used to represent distance between states or country with vertices
representing cities and edge representing sections of highway. The edges can be assign
weights which may be either distances between two cities connected by edge or average
time to drive along that section of highway.
(DAA)
UNIT – 4
Dynamic Programming:
General Method
Applications:
Matrix Chain Multiplication
0/1 Knapsack Problem
All Pairs Shortest Path Problem
Travelling Sales Person Problem
Reliability Design
(AUTONOMOUS)
DESIGN AND ANALYSIS OF ALGORITHMS
(DAA)
UNIT – 5
Backtracking:
General Method
Applications:
N-Queen Problem
Sum of Subsets Problem
Graph Coloring
Hamiltonian Cycles.
Branch and Bound:
General Method
Applications:
Travelling Sales Person Problem
0/1 Knapsack Problem
LC Branch and Bound Solution
FIFO Branch and Bound Solution.
(AUTONOMOUS)
DESIGN AND ANALYSIS OF ALGORITHMS
(DAA)
UNIT – 6
(AUTONOMOUS)
UNIT – 6
NP Hard and NP Complete
Polynomial Time Exponential Time
Linear search-n 0/1 knapsack - 2^n
Binary search-log n Travelling sp -2^n
Insertion sort- n^2 sum of subsets – 2^n
Merge sort – n log n Graph colouring – 2^n
Matrix multiplication -n^3 Hamiltonian cycle -2^n
Actually this topic is research topic. The algorithms are categorized into two types:
1. Polynomial time taking algorithms
2. Exponential time taking algorithms
Here we have linear search algorithm which takes O(n) time and faster than that is binary
search algorithm which takes O(log n) time , and for sorting algorithm merge sort which
takes O(n log n) which takes less time and we need faster algorithms which is faster than this
one.
Similarly for these exponential time algorithms we need polynomial time taking
algorithms. All these algorithms which taking exponential time we want faster and easy
method to solve them in just polynomial time because 2^n, n^n , etc.. are much bigger than
polynomial time.
As these exponential algorithms are very time consuming so we need polynomial time for
them, this is our requirement.
This is a research area and the person from computer science and mathematics can solve
these problems so people have been doing research on this one but there is no particular
solution found so far yet for solving these problems in polynomial time.
Then ,when the research work is going fruitless we want something such that whatever the
work we done should be useful so these are the guidelines or framework made for doing
research on these type of problems i.e., exponential type problems and that framework is
NP Hard and NP Complete.
Let us see the basic idea behind that so there are two points on which the entire topic is
based on these two points only.
1. When you are unable to solve these exponential problems, that you are unable to get
solution in polynomial time algorithm for them at least you do the work such that you
try to show the similarities between them so that if one problem is solved the other can
also be solved. We will not be doing research work individually on each and every
problem like one is working on knapsack problem and the other on travelling sp problem.
Let us combine them collectively and put some effort such that if one problem is solved all
the other problems should be solved. So far that we have to show the relationship between
them.
“Try to relate the problem either solve it or at least related”
2. When we are unable to write algorithm for them that is deterministic, why don’t we
write non deterministic algorithm.
Deterministic Algorithm:
Each and every statement how it works we know it clearly. We are sure how they work
i.e., we know the working of the algorithm. This type of algorithms are called deterministic
algorithms.
If key=5. Then choice() will give directly index 2.but how we doesn’t know
Similarly, we write non deterministic algorithms for exponential problems with this we
define two classes
P and NP classes:
P - ‘ P ’ is a set of those deterministic algorithms which are taking polynomial time
Example linear search, binary search, bubble sort etc…
NP -These algorithms are non deterministic but they take polynomial time. We actually
write these algorithms for exponential time taking algorithms.
NP- Non deterministic Polynomial time taking algorithms.
NP
P
• In polynomial time, if you unable to solve at least show the relationship between them
such that if one problem is solved, then it should be easy for solving other problems also
in polynomial time.
• We will not work on each and every problem individually.so for relating them together
we need some problem as base problem.
• The base problem is Satisfiability.
• Let x1, x2, x3…. xn denotes Boolean variables
• .Let xi denotes the relation of xi.
• A literal is either a variable or its negation
• .A formula in the prepositional calculus is an expression that can be constructed using
literals and the operators and Ʌ or v.
• A clause is a formula with at least one positive literal.
• The satisfiability problem is to determine if a formula is true for some assignment of truth
values to the variables.
• It is easy to obtain a polynomial time non determination algorithm that terminates
successfully if and only if a given prepositional formula E (x1, x2……xn) is satiable.
• Such an algorithm could proceed by simply choosing (non deterministically) one of the 2
n possible assignments of truth values to (x1, x2…xn) and verify that E (x1, x2…xn) is true
for that assignment.
• A literal: xi or -xi
• A clause: x1 v x2 v -x3 Ci
• A formula: conjunctive normal form (CNF) C1& C2& … & Cm
If a polynomial time algorithm exists for any of these problems, all problems in NP would
be polynomial time solvable. These problems are called NP-complete. The phenomenon of
NP-completeness is important for both theoretical and practical reasons.
Definition of NP-Completeness:
A language B is NP-complete if it satisfies two conditions
• B is in NP
• Every A in NP is polynomial time reducible to B.
If a language satisfies the second property, but not necessarily the first one, the
language B is known as NP-Hard. Informally, a search problem B is NP-Hard if there
exists some NP-Complete problem A that Turing reduces to B.
Following are some NP-Complete problems, for which no polynomial time algorithm is
known.
NP-Hard Problems
TSP is NP-Complete
The traveling salesman problem consists of a salesman and a set of cities. The
salesman has to visit each one of the cities starting from a certain one and returning to the
same city. The challenge of the problem is that the traveling salesman wants to minimize the
total length of the trip
Proof
To prove TSP is NP-Complete, first we have to prove that TSP belongs to NP. In
TSP, we find a tour and check that the tour contains each vertex once. Then the total cost of
the edges of the tour is calculated. Finally, we check if the cost is minimum. This can be
completed in polynomial time. Thus, TSP belongs to NP.
Secondly, we have to prove that TSP is NP-hard. To prove this, one way is to show
that Hamiltonian cycle ≤p TSP (as we know that the Hamiltonian cycle problem is
NPcomplete).
Assume G = (V, E) to be an instance of Hamiltonian cycle.
Hence, an instance of TSP is constructed. We create the complete graph G' = (V, E'), where
E’={(i ,j): i , j ∈ V and i ≠ j
Thus, the cost function is defined as follows −
t( i, j)= {0 if(I ,j)∈E
1 Otherwise,
Now, suppose that a Hamiltonian cycle h exists in G. It is clear that the cost of each edge
in h is 0 in G' as each edge belongs to E. Therefore, h has a cost of 0 in G'. Thus, if
graph G has a Hamiltonian cycle, then graph G' has a tour of 0 cost.
Conversely, we assume that G' has a tour h' of cost at most 0. The cost of edges
in E' are 0 and 1 by definition. Hence, each edge must have a cost of 0 as the cost of h' is 0.
We therefore conclude that h' contains only edges in E.
We have thus proven that G has a Hamiltonian cycle, if and only if G' has a tour of cost at
most 0. TSP is NP-complete.
II. Show how to obtain an instance I 1 of L2 from any instance I of L1 such that from the
solution of I 1 - We can determine (in polynomial deterministic time) the solution to
instance I of L1.
IV. Conclude from (i), (ii), and the transitivity of α that Satisfiabilityα L1
L1 αL2
Therefore, Satisfiability L2
and L2 is NP-hard
• Satisfiability with at most three literals per clause α chromatic number problem.
Therefore, CNP is NP-hard.
• The DHC is a cycle that goes through every vertex exactly once and then returns to
the starting vertex.
Sum of subsets: -
The problem is to determine if A= {a1, a2 ……., an} (a1, a2 ………,an are positive
integers) has a subset S that sums to a given integer M. 2.
• For each job Ji, S specifies the time intervals and the processors on which this job is to be
processed.
• A job cannot be processed by more than one processor at any given time. The problem is
• Where Ti is the time at which processor Pi finishes processing all jobs (or job segments)
assigned to it.
Cook’s Theorem:
• The Cook-Levin theorem, also known as Cook’s theorem, states that the Boolean
satisfiability problem is NP-complete. That is, any problem in NP can be reduced in
polynomial time by a deterministic Turing machine to the problem whether a Boolean
formula is satisfiable.
• The theorem is named after Stephen Cook and Leonid Levin.
• An important consequence of this theorem is that if there exists a deterministic
polynomial time algorithm for solving Boolean satisfiability, then every NP problem can
be solved by a deterministic polynomial time algorithm.
The work shows that Cook’s theorem is the origin of the loss of non-determinism in terms
of the equivalence of the two definitions of NP, the one defining NP as the class of problems
solvable by a nondeterministic Turing machine in polynomial time, and the other defining N
P as the class of problems verifiable by a deterministic Turing machine in polynomial time.
Therefore, we argue that fundamental difficulties in understanding P versus NP lie firstly at
cognitionlevel, then logic level.
Theorem-1
Theorem-2
The following sets are P-reducible to each other in pairs (and hence each has the same
polynomial degree of difficulty): {tautologies}, {DNF tautologies}, D3, {sub-graph pairs}.
Theorem-3
Theorem-4
If the set S of strings is accepted by a non-deterministic machine within time T(n) = 2n, and
if TQ(k) is an honest (i.e. real-time countable) function of type Q, then there is a constant K,
so S can be recognized by a deterministic machine within time TQ(K8n).
• First, he emphasized the significance of polynomial time reducibility. It means that if
we have a polynomial time reduction from one problem to another, this ensures that
any polynomial time algorithm from the second problem can be converted into a
corresponding polynomial time algorithm for the first problem.
• Second, he focused attention on the class NP of decision problems that can be solved
in polynomial time by a non-deterministic computer. Most of the intractable
problems belong to this class, NP.
• Third, he proved that one particular problem in NP has the property that every other
problem in NP can be polynomially reduced to it. If the satisfiability problem can be
solved with a polynomial time algorithm, then every problem in NP can also be
solved in polynomial time. If any problem in NP is intractable, then satisfiability
problem must be intractable. Thus, satisfiability problem is the hardest problem in
NP.
• Fourth, Cook suggested that other problems in NP might share with the satisfiability
problem this property of being the hardest member of NP.
2. At each computation step, M is in at most one state. For each i = 0, . . . , P(n) and for each
pair j, k of distinct states, we have the clause
¬(Qij ∧ Qik),
giving a total of q(q − 1)(P(n) + 1) = O(P(n)) literals altogether
3. At each step, each tape square contains at least one alphabet symbol. For each i = 0, . . . ,
P(n) and −P(n) ≤ j ≤ P(n) we have the clause
Sij1 ∨ Sij2 ∨ · · · ∨ Sijs,
giving (P(n) + 1)(2P(n) + 1)s = O(P(n) 2 ) literals altogether.
4. At each step, each tape square contains at most one alphabet symbol. For each i = 0, . . . ,
P(n) and −P(n) ≤ j ≤ P(n), and each distinct pair ak, al of symbols we have the clause
¬(Sijk ∧ Sijl),
giving a total of (P(n) + 1)(2P(n) + 1)s(s − 1) = O(P(n) 2 ) literals altogether
5. At each step, the tape is scanning at least one square. For each i = 0, . . . , P(n), we have
the clause
Ti(−P(n)) ∨ Ti(1−P(n)) ∨ · · · ∨ Ti(P(n)−1) ∨ TiP(n) ,
giving (P(n) + 1)(2P(n) + 1) = O(P(n) 2 ) literals altogether.
6. At each step, the tape is scanning at most one square. For each i = 0, . . . , P(n), and
each distinct pair j, k of tape squares from −P(n) to P(n), we have the clause
¬(Tij ∧ Tik),
giving a total of 2P(n)(2P(n) + 1)(P(n) + 1) = O(P(n) 3 ) literals.
7. Initially, the machine is in state 1 scanning square 1. This is expressed by the two clauses
Q01, T01,
giving just two literals.
8. The configuration at each step after the first is determined from the configuration at the
previous step by the functions T, U, and D defining the machine M. For each i = 0, . . . ,
P(n), −P(n) ≤ j ≤ P(n), k = 0, . . . , q − 1, and l = 1, . . . , s, we have the clauses
10. By the P(n)th step, the machine has reached the halt state, and is then scanning square 0,
which contains the symbol a1. This is expressed by the three clauses
QP(n)0, SP(n)01, TP(n)0,
giving another 3 literals.
Altogether the number of literals involved in these clauses is O(P(n) 3 ) (in working
this out, note that q and s are constants, that is, they depend only on the machine and do not
vary with the problem instance; thus they do not contribute to the growth of the the number
of literals with increasing problem size, which is what the O notation captures for us). It is
thus clear that the procedure for setting up these clauses, given the original machine M and
the instance I of problem D, can be accomplished in polynomial time.
We must now show that we have succeeded in converting D into SAT. Suppose first
that I is a positive instance of D. This means that there is a certificate c such that when M is
run with inputs c, I, it will halt scanning symbol a1 on square 0. This means that there is some
sequence of symbols that can be placed initially on squares −P(n), . . . , −1 of the tape so that
all the clauses above are satisfied. Hence those clauses constitute a positive instance of SAT.
Conversely, suppose I is a negative instance of D. In that case there is no certificate
for I, which means that whatever symbols are placed on squares −P(n), . . . , −1 of the tape,
when the computation halts the machine will not be scanning a1 on square 0. This means that
the set of clauses above is not satisfiable, and hence constitutes a negative instance of SAT.
Thus from the instance I of problem D we have constructed, in polynomial time, a set
of clauses which constitute a positive instance of SAT if and only I is a positive instance of
D. In other words, we have converted D into SAT in polynomial time. And since D was an
arbitrary NP problem it follows that any NP problem can be converted to SAT in polynomial
time.