0% found this document useful (0 votes)
151 views

Largest Common Subsequence

The document discusses the longest common subsequence (LCS) problem in computer science. It defines key terms like subsequence, substring, and common subsequence. It then provides an example to find the LCS of two DNA strings using dynamic programming. Specifically, it fills a 2D array where each entry represents the length of the LCS of the corresponding prefixes of the two strings.

Uploaded by

api-19981779
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views

Largest Common Subsequence

The document discusses the longest common subsequence (LCS) problem in computer science. It defines key terms like subsequence, substring, and common subsequence. It then provides an example to find the LCS of two DNA strings using dynamic programming. Specifically, it fills a 2D array where each entry represents the length of the LCS of the corresponding prefixes of the two strings.

Uploaded by

api-19981779
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

LONGEST COMMON SUBSEQUENCE

Submitted to:
Dr. V.K.Pathak
C.S.E. Deptt.
HBTI-Kanpur

Submitted by :
Shweta Singhal
3rd C.S.E.
127/07
LONGEST COMMON SUBSEQUENCE
 Definition

 Character

 Alphabet: A set of characters


 String (or sequence): A list of characters from an
alphabet
 ex> strings over {0,1}: Binary strings
 ex> strings over {A,C,G,T}: DNA sequences

3
 Substring
 CBD is a substring of ABCBDAB
 Subsequence
 BCDB is a subsequence of ABCBDAB
 Common subsequence
 BCA is a common subsequence of

X=ABCBDAB and Y=BDCABA

4
USE OF LCS
 In biological applications, we may want to compare the DNA
of two organisms. A strand of DNA consists of a string of
molecules called bases, where the possible bases are
adenine, guanine, cytosine, and thymine. We represent each
of these bases by their initial letters.
 The DNA of one organism might be:
S1=ACCGGTCGAGTGCGCGGAAGCGGCCGAA
and another might be:
S2=GTCGTTCGGAATGCCGTTGCTCTGTAAA
 One goal is to determine how “similar” these strands are.

5
LONGEST COMMON SUBSEQUENCE
 We say that two DNA strands are similar if one
string is a substring of the other. In our case, we
say two strings are similar if we can find a third
substring in which the bases appear in
each of the first two strings.
 The longer the strand S3 we can find that
appears in both S1
and S2, the more similar S1 and S2 are.

6
LONGEST COMMON SUBSEQUENCE

 Dynamic programming

 The ith prefix Xi of X is Xi=<x1,x2,…,xi>.

 If X = <A, B, C, B, D, A, B>
 X =< A, B, C, B>
4

 X0=<>

7
LONGEST COMMON SUBSEQUENCE

 Longest common subsequence (LCS)


 BCBA is the longest common subsequence of X and Y

X=ABCBDAB

Y=BDCABA

 LCS problem
 Given two sequences X=<x1,x2,…,xm> and Y=<y1, y2,..,yn>
to find an LCS of X and Y. 8
How to Solve LCS Quickly

 If X and Y are 1 character, LCS is 0 or 1


X a a
Y b a
 If we then add 1 character to X and Y, LCS
increases by at most 1
X ab ab
Y bd ad

 Note that we do not need to rescan the first


character
Longest Common Subsequence (LCS)
 Define Xi, Yj to be prefixes of X and Y of length i and j; m = |X|, n =
|Y|
 We store the length of LCS(Xi, Yj) in c[i,j]
 Trivial cases: LCS(X0 , Yj ) and LCS(Xi, Y0) is empty (so c[0,j] = c[i,0]
=0)
 Recursive formula for c[i,j]:

c[i − 1, j − 1] + 1 if x[i ] = y[ j ],
c[i, j ] = 
 max(c[i, j − 1], c[i − 1, j ]) otherwise

c[m,n] is the final solution


LONGEST COMMON SUBSEQUENCE

 Optimal substructure
 Let X = <x1,x2,...,xm> and Y = <y1,y2,...,yn> be the
sequences, and let Z = <z1,z2,...,zk> be any LCS of X
and Y.
1. If xm = yn
then zk = xm = yn and Zk-1 is an LCS of Xm- 1 and Yn-1 .
2. If xm ≠ yn
then zk ≠ xm implies Z is an LCS of Xm-1 and Y.
3. If xm ≠ yn
then zk ≠ yn implies Z is an LCS of X and Yn-1 . 
11
LONGEST COMMON SUBSEQUENCE

 Brute force approach


 Enumerate all subsequences of X and check each subsequence
if it is also a subsequence of Y and find the longest one.

 Infeasible!
 The number of subsequences of X is 2m.

12
Longest common subsequence
 Can we use a brute-force approach?
 Brute-force algorithm:

1.For every subsequence of x, check to see if it’s a


subsequence of y.
2.Worst-case running time: Θ(n2m) because
3. we have 2m subsequences of x to check and each
check takes Θ(n) time…scanning Y for the first
element, scan from there for the next element,…
Clearly, to optimize this, we would let m ≤ n
Longest Common Subsequence
 Problem: Given 2 sequences, X = 〈x1,...,xm〉
and
Y = 〈y1,...,yn〉 , find a common subsequence
whose length is maximum.

springtime ncaa tournament basketball

printing north carolina snoeyink

Subsequence need not be consecutive, but must be in


order.
LONGEST COMMON SUBSEQUENCE

 c[i, j]: The length of an LCS of the sequences Xi and Yj .


 If either i = 0 or j = 0, so the LCS has length = 0.

0 if i = 0 or j = 0,

c[i, j ] = c[i −1, j − 1] + 1 if i, j > 0 and xi = y j ,

max( c[i, j − 1], c[i − 1, j ]) if i, j > 0 and xi ≠ y j .

15
LCS Length Algorithm
LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n c[0,j] = 0 // special case: X0
5. for i = 1 to m // for all Xi
6. for j = 1 to n // for all Yj
7. if ( Xi == Yj )
8. c[i,j] = c[i-1,j-1] + 1
9. else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c

16
LCS Example
We’ll see how LCS algorithm works on the following example:
 X = ABCB

 Y = BDCAB

What is the Longest Common Subsequence


of X and Y?

LCS(X, Y) = BCB
X=AB C B
Y= BDCAB 17
LCS Example (0) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi

A
1

2 B

3 C

4 B

X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4]
18
LCS Example (1) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0

2 B 0
3 C 0
4 B 0

for i = 1 to m c[i,0] = 0
for j = 1 to n c[0,j] = 0
19
LCS Example (2) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0

2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
20
LCS Example (3) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0

2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
21
LCS Example (4) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1

2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
22
LCS Example (5) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
23
LCS Example (6) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
24
LCS Example (7) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
25
LCS Example (8) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
26
LCS Example (10) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
27
LCS Example (11) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
28
LCS Example (12) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
29
LCS Example (13) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
30
LCS Example (14) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
31
LCS Example (15) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
32
LCS Algorithm Running Time

 LCS algorithm calculates the values of each entry of the array


c[m,n]
 So what is the running time?

O(m*n)
since each c[i,j] is calculated in
constant time, and there are m*n
elements in the array
33
How to find actual LCS
 So far, we have just found the length of LCS, but not LCS itself.
 We want to modify this algorithm to make it output Longest

Common Subsequence of X and Y


Each c[i,j] depends on c[i-1,j] and c[i,j-1]
or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:

2 2 For example, here


2 3 c[i,j] = c[i-1,j-1] +1 = 2+1=3 34
How to find actual LCS - continued
 Remember that

c[i − 1, j − 1] + 1 if x[i ] = y[ j ],
c[i, j ] = 
 max(c[i, j − 1], c[i − 1, j ]) otherwise
■ So we can start from c[m,n] and go backwards
■ Whenever c[i,j] = c[i-1, j-1]+1, remember x[i] (because x[i]
is a part of LCS)
■ When i=0 or j=0 (i.e. we reached the beginning), output
remembered letters in reverse order

35
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3

36
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome) 37
LCS Algorithm Running Time

 LCS algorithm calculates the values of each entry of the array


c[m,n]
 So what is the running time?

O(m*n)
since each c[i,j] is calculated in
constant time, and there are m*n
elements in the array
38
How to find actual LCS
 So far, we have just found the length of LCS, but not LCS
itself.
 We want to modify this algorithm to make it output Longest
Common Subsequence of X and Y
Each c[i,j] depends on c[i-1,j] and c[i,j-1]
or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:

2 2 For example, here


2 3 c[i,j] = c[i-1,j-1] +1 = 2+1=3 39
How to find actual LCS - continued
 Remember that

c[i − 1, j − 1] + 1 if x[i ] = y[ j ],
c[i, j ] = 
 max(c[i, j − 1], c[i − 1, j ]) otherwise

■ So we can start from c[m,n] and go backwards


■ Whenever c[i,j] = c[i-1, j-1]+1, remember
x[i] (because x[i] is a part of LCS)
■ When i=0 or j=0 (i.e. we reached the
beginning), output remembered letters in
reverse order
40
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3

41
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1

2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
42
(this string turned out to be a palindrome)
Another example-
X=A,B,C,B,D,A,B AND Y= B,D,C,A,B,A
j 0 1 2 3 4 5 6
i yj B D C A B A
0 xi 0 0 0 0 0 0 0
1 A 0 0 0 0 1 1 1
2 B 0 1 1 1 1 2 2
3 C 0 1 1 2 2 2 2
4 B 0 1 1 2 2 3 3
5 D 0 1 2 2 2 3 3
6 A 0 1 2 2 3 3 4
7 B 0 1 2 2 3 4 4
PRINT-LCS(b,X,i,j)
PRINT-LCS(b,X,i,j)
if i = 0 or j = 0
then return
if b[i,j] = “”
then PRINT-LCS(b,X,i-1,j-1)
print xi
else if b[i,j] = “↑”
then PRINT-LCS(b,X,i-1,j)
else PRINT-LCS(b,X,i,j-1)

Note: This procedure takes O(m+n) since at least one


of i and j is decremented in each stage of the
recursion.
Longest Common Subsequence (LCS)
 Need separate data structure to retrieve
answer

 Algorithm runs in O(m*n),


 Brute-force algorithm: O(n 2m)
46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy