FASTA Algorithm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Overview

z Introduction
z Heuristic approach
z Steps in the algorithm
Introduction

z The FASTA algorithm is a heuristic method


for string comparison. It was developed by
Lipman and Pearson in 1985 and further
improved in 1988.
z It performs Local alignment of sequences.
Heuristic Approach

z Definition: A heuristic method is a method


that uses hash coding in which the sequencce
is broken into small “words or K-tuples”of
specific sizes.
z Dynamic programming methods consume
time to analyse and search the entire
database.Heuristic approach overcomes this
drawback.
FASTA - Idea -
z Problem of Dynamic Programming
D.P. compute the score in a lot of useless area for optimal
sequence
G A A T T C A G T T A
G 1 1 1 1 1 1 1 1 1 1 1
G 1 1 1 1 1 1 1 2 2 2 2
A 1 2 2 2 2 2 2 2 2 2 2
T 1 2 2 3 3 3 3 3 3 3 3
C 1 2 2 3 3 4 4 4 4 4 4
G 1 2 2 3 3 4 4 5 5 5 5
A 1 2 3 3 3 4 5 5 5 5 6

z FASTA focuses on diagonal area


Steps in FASTA Algorithm

z Four steps:
1) Identify regions of similarity:
z Using the ktup parameter which specifies #
consecutive identities required in a match
z 10 best diagonal regions found based on #matches
and distance between matches
2) Rescore regions and identify best initial regions
z PAM250 or other scoring matrix used for rescoring the
10 diagonal regions identified in step 1 to allow for
conservative replacements and runs of identities
shorter than ktup
z For each the best diagonal regions, identify “initial
region” that is best scoring subregion
FASTA - Algorithm -
z Step 1
Find all hot-spots
// Hot spots is pairs of words of length k that exactly match
Sequence 1
Hot Spots
Sequence 2
Steps in FASTA Algorithm

3) Optimally join initial regions with scores > T


ƒ Given: location of initial regions, scores, gap penalty
ƒ Calculate an optimal alignment of initial regions as a
combination of compatible regions with maximal score
ƒ Use resulting score to rank the library sequences
ƒ Selectivity degradation limited by using initial regions that
score greater than some threshold T
4) Align the highest scoring library sequences using
modification of global and local alignment algorithms
ƒ Considers all possible alignments of the query and library
sequence that falls within a band centered around the
highest scoring initial region
OVERVIEW OF THE FASTA ALGORITHM

•FastA locates regions of the query sequence and the search set
sequence that have high densities of exact word matches.
•For DNA sequences the word length usually used is 6.
•The 10 highest-scoring sequence regions are saved and re-scored
using a scoring matrix. These scores are the init1 scores
•FastA determines if any of the initial regions from different diagonals
may be joined together to form an approximate alignment with gaps.
Only non-overlapping regions may be joined.
•The score for the joined regions is the sum of the scores of the initial
regions minus a joining penalty for each gap. The score of the highest
scoring region, at the end of this step, is saved as the initn score.
•FastA uses dynamic programming (Smith-Waterman algorithm ) over a
narrow band of high scoring diagonals between the query sequence
and the search set sequence, to produce an alignment with a new
score.
FASTA - Algorithm -

z Step 1 in detail
Use look-up Table
Query : GAATTCAGTTA
Sequence: G G A T C G A Dot—Matrix

Look-up Table G A A T T C A G T T A

Q Location G * *
G * *
A 2,3,7,11
A * * * *
C 6 T * * * *
G 1,8 C *
T 4,5,9,10 G * *
A * * * *
FASTA - Algorithm -

zStep 2
Score the Hot-spot and locate the ten best diagonal run.
// There is some scoring system; ex. PAM250
FASTA - Algorithm -

z Step 3
Combine sub-alignments into one alignment with GAP

GAP

One of local
alignment
FASTA - Algorithm -

z Step 4
Use the dynamic programming in restricted area around the best-
score alignment to find out the higher-score alignment than the
best-score alignment

Width of this band


is a parameter
Conclusion

Algorithm Sensitivity Running Time


D.P 1 3
FASTA 3 2
BLAST 2 1
Reference

z N Gautham, Introduction to bioinformatics


databases and algorithms.
z T.K.Atwood,Parry Smith and Samiron
Pukhan ,introduction to bioinformatics.
Thank you

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy