0% found this document useful (0 votes)
2 views24 pages

Dynamic Programming - 2

The document outlines the agenda for a CS5114 class on the Theory of Algorithms, focusing on Dynamic Programming topics such as RNA Secondary Structure Determination and Edit Distance. It explains the principles of Dynamic Programming, constraints for RNA secondary structures, and the concept of Edit Distance as a measure of similarity between strings. The document also discusses the optimal substructure property and efficiency of algorithms related to these topics.

Uploaded by

ravichandu.b1409
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views24 pages

Dynamic Programming - 2

The document outlines the agenda for a CS5114 class on the Theory of Algorithms, focusing on Dynamic Programming topics such as RNA Secondary Structure Determination and Edit Distance. It explains the principles of Dynamic Programming, constraints for RNA secondary structures, and the concept of Edit Distance as a measure of similarity between strings. The document also discusses the optimal substructure property and efficiency of algorithms related to these topics.

Uploaded by

ravichandu.b1409
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

CS5114: Theory of Algorithms

Sharath Raghvendra
Associate Professor, Dept. of Computer Science,
Virginia Tech
Fall 2022
Agenda

▪ Today’s topic
▪ Dynamic Programming (Continued)
▪ RNA Secondary Structure Determination
▪ Edit Distance
▪ Shortest path with negative edge costs
▪ After that…
▪ NP-Completeness and reductions between problems (Chapter 8)
Principles of Dynamic Programming

1. Define polynomial number of sub-problems. One of them should be the


final solution
2. There is a natural ordering of problems from smallest to largest such that
we can obtain solution to the larger problem by combining solutions of
smaller ones.
3. Optimal value of a problem can be computed from optimal value of its
sub-problems
RNA Secondary Structure

▪ RNA : is a string 𝑏1 𝑏2 , … . , 𝑏𝑛 over alphabet {A, C, G, U}


▪ Secondary Structure: RNA is a single-stranded, therefore it loops back and
forms base pairs with itself. Determining this 2-dimensional structure is
essential for predicting its function in the body.
Constraints on Secondary Structure

▪ Given a string, we need to compute a set of base pairings S = 𝑏𝑖 , 𝑏𝑗


that satisfy:
▪ [Watson-Crick Pairings] S is a matching, i.e., no alphabet of the string is involved in
multiple pairings. Furthermore, this pairing is A—U , U—A , C—G or G—C
▪ [No Sharp Turns] The ends of each pair are separated by at least four bases. If
𝑏𝑖 , 𝑏𝑗 ∈ 𝑆, then 𝑖 < 𝑗 − 4
▪ [Non-crossing] If 𝑏𝑖 , 𝑏𝑗 and 𝑏𝑘 , 𝑏𝑙 are in S, then we cannot have 𝑖 < 𝑘 < 𝑗 < 𝑙
▪ Maximizing the size of S ensures that the structure is stable.
▪ Objective: Find a base pairing S which is of the largest size.
Examples
Optimal Substructure

▪ What is the optimal substructure property here?

▪ Results in two sub-problems, one is not in the desired form.


▪ Sub-problem 1 𝑏1 , 𝑏2 , 𝑏3 , … , 𝑏𝑡−1 = 𝑂𝑃𝑇(𝑡 − 1)
▪ Sub-problem 2 𝑏𝑡+1 , 𝑏𝑡+2 , … , 𝑏𝑛−1
Dynamic Programming – False Start

▪ Lets try defining sub-problems similar to weighted interval scheduling


▪ OPT(j) = Max number of base pairings in a secondary structure of the sub-
string 𝑏1 , 𝑏2 , … , 𝑏𝑗

▪ We need to express solution to OPT(j) as solution to smaller sub-problems


▪ Suppose 𝑏𝑛 matches to 𝑏𝑡 . What sub-problems do we get?
▪ Results in two sub-problems, one is not in the desired form.
▪ Sub-problem 1 𝑏1 , 𝑏2 , 𝑏3 , … , 𝑏𝑡−1 = 𝑂𝑃𝑇(𝑡 − 1)
▪ Sub-problem 2 𝑏𝑡+1 , 𝑏𝑡+2 , … , 𝑏𝑛−1
Dynamic Programming – Sub-Problems

▪ Define sub-problems over intervals


▪ OPT(i, j) = max number of base pairs in a secondary structure of the sub-
string 𝑏𝑖 , 𝑏𝑖+1 , … , 𝑏𝑗
▪ Case 1: If 𝑖 ≥ 𝑗 − 4
▪ 𝑂𝑃𝑇 𝑖, 𝑗 = 0
▪ Case 2: Base 𝑏𝑗 is not involved in a pair
▪ 𝑂𝑃𝑇 𝑖, 𝑗 = 𝑂𝑃𝑇(𝑖, 𝑗 − 1)
▪ Case 3: Base 𝑏𝑗 maps with 𝑏𝑡 for some 𝑖 ≤ 𝑡 < 𝑗 − 4
▪ Problem splits into 𝑂𝑃𝑇(𝑖, 𝑡 − 1) and 𝑂𝑃𝑇(𝑡 + 1, 𝑗 − 1)
Ordering Sub-Problems

▪ We need to still define an order for solving sub-problems


▪ We will solve 𝑂𝑃𝑇 𝑖, 𝑗 in increasing order of their interval length
▪ Note, we have expressed optimal for interval 𝑖, 𝑗 as a combination of
solution to sub-problems of smaller intervals.
Efficiency

▪ There are 𝑂 𝑛2 intervals, each corresponds to a sub-problem


▪ To compute the solution to one sub-problem, we need to compute max
over 𝑂 𝑛 smaller sub-problems
▪ Therefore, total time taken is 𝑂(𝑛3 )
▪ Retrieving the base pairings can be done via backtracking using the table
▪ You can figure out a procedure for retrieving solution on your own.
Edit Distance
Edit Distance

▪ The Edit Distance (or Levenshtein distance) is a distance function


between two strings. It is used to measure their similarity.
▪ Edit Distance = minimum number of edits needed to transform one string into the
other.

▪ Applications:
▪ spell checkers,
▪ natural language translation,
▪ bioinformatics.
Application
How does google know that “dynamic” and
Dymanic are similar?
Edit Distance

▪ Given initial string x and final string y, what is the minimum number of edits
required to convert string x to string y.
▪ Valid edits include:
▪ Insert a character to x
▪ Delete a character from x
▪ Change a character in x to another character
Example

Alternately, we can ask how well the words line-up? Such line-ups are called alignments.
Example

Alternately, we can ask how well the words line-up? Such line-ups are called alignments.
Representing edits as alignments

▪ An alignment is a matching of positions in strings x and y given by

▪ Sets 1 … 𝑚 , {1 … 𝑛} represent positions in x and y


▪ A matching of positions (or indices) is an alignment if there is no “crossing
pairs” in M: if 𝑖, 𝑗 , 𝑖 ′ , 𝑗 ′ ∈ 𝑀, 𝑖 < 𝑖′ then 𝑗 < 𝑗′
▪ Not every index needs to be matched in an alignment.
▪ Unmatched indices create a gap. Every gap has a gap penalty 𝛿
▪ 𝑖, 𝑗 ∈ 𝑀 is a mismatch if 𝑥𝑖 ≠ 𝑦j . Mismatch penalty is 𝛼𝑥𝑖 𝑦𝑗
▪ Goal: Given two strings obtain an alignment with minimum penalty.
Dynamic Programming – Observations

▪ So, when matching one word to another one, consider the last characters
of strings s and t.
▪ If we are lucky enough that they ALREADY match,
▪ then we can simply "cancel" and recursively find the edit distance between the two
strings left when we delete this last character from both strings.
▪ Otherwise, we MUST make one of three changes:
1) delete the last character of string x (this character matches to a “blank” on y)
2) delete the last character of string y (this character matches to a “blank” on x)
3) change the last character of string s to the last character of string t. (this will
correspond to a mismatch)
▪ Edit distance between an empty string and another string is the length of
the second string times insertion/deletion cost.
Defining Sub-Problems

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy