Largest Common Subsequence
Largest Common Subsequence
Submitted to:
Dr. V.K.Pathak
C.S.E. Deptt.
HBTI-Kanpur
Submitted by :
Shweta Singhal
3rd C.S.E.
127/07
LONGEST COMMON SUBSEQUENCE
Definition
Character
3
Substring
CBD is a substring of ABCBDAB
Subsequence
BCDB is a subsequence of ABCBDAB
Common subsequence
BCA is a common subsequence of
4
USE OF LCS
In biological applications, we may want to compare the DNA
of two organisms. A strand of DNA consists of a string of
molecules called bases, where the possible bases are
adenine, guanine, cytosine, and thymine. We represent each
of these bases by their initial letters.
The DNA of one organism might be:
S1=ACCGGTCGAGTGCGCGGAAGCGGCCGAA
and another might be:
S2=GTCGTTCGGAATGCCGTTGCTCTGTAAA
One goal is to determine how “similar” these strands are.
5
LONGEST COMMON SUBSEQUENCE
We say that two DNA strands are similar if one
string is a substring of the other. In our case, we
say two strings are similar if we can find a third
substring in which the bases appear in
each of the first two strings.
The longer the strand S3 we can find that
appears in both S1
and S2, the more similar S1 and S2 are.
6
LONGEST COMMON SUBSEQUENCE
Dynamic programming
If X = <A, B, C, B, D, A, B>
X =< A, B, C, B>
4
X0=<>
7
LONGEST COMMON SUBSEQUENCE
X=ABCBDAB
Y=BDCABA
LCS problem
Given two sequences X=<x1,x2,…,xm> and Y=<y1, y2,..,yn>
to find an LCS of X and Y. 8
How to Solve LCS Quickly
c[i − 1, j − 1] + 1 if x[i ] = y[ j ],
c[i, j ] =
max(c[i, j − 1], c[i − 1, j ]) otherwise
Optimal substructure
Let X = <x1,x2,...,xm> and Y = <y1,y2,...,yn> be the
sequences, and let Z = <z1,z2,...,zk> be any LCS of X
and Y.
1. If xm = yn
then zk = xm = yn and Zk-1 is an LCS of Xm- 1 and Yn-1 .
2. If xm ≠ yn
then zk ≠ xm implies Z is an LCS of Xm-1 and Y.
3. If xm ≠ yn
then zk ≠ yn implies Z is an LCS of X and Yn-1 .
11
LONGEST COMMON SUBSEQUENCE
Infeasible!
The number of subsequences of X is 2m.
12
Longest common subsequence
Can we use a brute-force approach?
Brute-force algorithm:
0 if i = 0 or j = 0,
c[i, j ] = c[i −1, j − 1] + 1 if i, j > 0 and xi = y j ,
max( c[i, j − 1], c[i − 1, j ]) if i, j > 0 and xi ≠ y j .
15
LCS Length Algorithm
LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n c[0,j] = 0 // special case: X0
5. for i = 1 to m // for all Xi
6. for j = 1 to n // for all Yj
7. if ( Xi == Yj )
8. c[i,j] = c[i-1,j-1] + 1
9. else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c
16
LCS Example
We’ll see how LCS algorithm works on the following example:
X = ABCB
Y = BDCAB
LCS(X, Y) = BCB
X=AB C B
Y= BDCAB 17
LCS Example (0) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
A
1
2 B
3 C
4 B
X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4]
18
LCS Example (1) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0
2 B 0
3 C 0
4 B 0
for i = 1 to m c[i,0] = 0
for j = 1 to n c[0,j] = 0
19
LCS Example (2) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
20
LCS Example (3) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
21
LCS Example (4) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
22
LCS Example (5) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
23
LCS Example (6) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
24
LCS Example (7) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
25
LCS Example (8) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
26
LCS Example (10) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
27
LCS Example (11) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
28
LCS Example (12) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
29
LCS Example (13) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
30
LCS Example (14) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
31
LCS Example (15) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
32
LCS Algorithm Running Time
O(m*n)
since each c[i,j] is calculated in
constant time, and there are m*n
elements in the array
33
How to find actual LCS
So far, we have just found the length of LCS, but not LCS itself.
We want to modify this algorithm to make it output Longest
c[i − 1, j − 1] + 1 if x[i ] = y[ j ],
c[i, j ] =
max(c[i, j − 1], c[i − 1, j ]) otherwise
■ So we can start from c[m,n] and go backwards
■ Whenever c[i,j] = c[i-1, j-1]+1, remember x[i] (because x[i]
is a part of LCS)
■ When i=0 or j=0 (i.e. we reached the beginning), output
remembered letters in reverse order
35
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
36
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome) 37
LCS Algorithm Running Time
O(m*n)
since each c[i,j] is calculated in
constant time, and there are m*n
elements in the array
38
How to find actual LCS
So far, we have just found the length of LCS, but not LCS
itself.
We want to modify this algorithm to make it output Longest
Common Subsequence of X and Y
Each c[i,j] depends on c[i-1,j] and c[i,j-1]
or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:
c[i − 1, j − 1] + 1 if x[i ] = y[ j ],
c[i, j ] =
max(c[i, j − 1], c[i − 1, j ]) otherwise
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
41
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
42
(this string turned out to be a palindrome)
Another example-
X=A,B,C,B,D,A,B AND Y= B,D,C,A,B,A
j 0 1 2 3 4 5 6
i yj B D C A B A
0 xi 0 0 0 0 0 0 0
1 A 0 0 0 0 1 1 1
2 B 0 1 1 1 1 2 2
3 C 0 1 1 2 2 2 2
4 B 0 1 1 2 2 3 3
5 D 0 1 2 2 2 3 3
6 A 0 1 2 2 3 3 4
7 B 0 1 2 2 3 4 4
PRINT-LCS(b,X,i,j)
PRINT-LCS(b,X,i,j)
if i = 0 or j = 0
then return
if b[i,j] = “”
then PRINT-LCS(b,X,i-1,j-1)
print xi
else if b[i,j] = “↑”
then PRINT-LCS(b,X,i-1,j)
else PRINT-LCS(b,X,i,j-1)