FASTA Algorithm
FASTA Algorithm
FASTA Algorithm
z Introduction
z Heuristic approach
z Steps in the algorithm
Introduction
z Four steps:
1) Identify regions of similarity:
z Using the ktup parameter which specifies #
consecutive identities required in a match
z 10 best diagonal regions found based on #matches
and distance between matches
2) Rescore regions and identify best initial regions
z PAM250 or other scoring matrix used for rescoring the
10 diagonal regions identified in step 1 to allow for
conservative replacements and runs of identities
shorter than ktup
z For each the best diagonal regions, identify “initial
region” that is best scoring subregion
FASTA - Algorithm -
z Step 1
Find all hot-spots
// Hot spots is pairs of words of length k that exactly match
Sequence 1
Hot Spots
Sequence 2
Steps in FASTA Algorithm
•FastA locates regions of the query sequence and the search set
sequence that have high densities of exact word matches.
•For DNA sequences the word length usually used is 6.
•The 10 highest-scoring sequence regions are saved and re-scored
using a scoring matrix. These scores are the init1 scores
•FastA determines if any of the initial regions from different diagonals
may be joined together to form an approximate alignment with gaps.
Only non-overlapping regions may be joined.
•The score for the joined regions is the sum of the scores of the initial
regions minus a joining penalty for each gap. The score of the highest
scoring region, at the end of this step, is saved as the initn score.
•FastA uses dynamic programming (Smith-Waterman algorithm ) over a
narrow band of high scoring diagonals between the query sequence
and the search set sequence, to produce an alignment with a new
score.
FASTA - Algorithm -
z Step 1 in detail
Use look-up Table
Query : GAATTCAGTTA
Sequence: G G A T C G A Dot—Matrix
Look-up Table G A A T T C A G T T A
Q Location G * *
G * *
A 2,3,7,11
A * * * *
C 6 T * * * *
G 1,8 C *
T 4,5,9,10 G * *
A * * * *
FASTA - Algorithm -
zStep 2
Score the Hot-spot and locate the ten best diagonal run.
// There is some scoring system; ex. PAM250
FASTA - Algorithm -
z Step 3
Combine sub-alignments into one alignment with GAP
GAP
One of local
alignment
FASTA - Algorithm -
z Step 4
Use the dynamic programming in restricted area around the best-
score alignment to find out the higher-score alignment than the
best-score alignment