BLAST N FASTA

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 55

BLAST and FASTA

1
Pairwise Alignment

Global Local
• Best score from among • Best score from among
alignments of full-length alignments of partial
sequences sequences
• Needelman-Wunch • Smith-Waterman
algorithm algorithm

8
Local vs. Global Alignment
• Global Alignment
--T—-CC-C-AGT—-TATGT-CAGGGGACACG—A-GCATGCAGA-GAC
| || | || | | | ||| || | | | | |||| |
AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG—T-CAGAT--C

• Local Alignment—better
alignment to find conserved
segment tccCAGTTATGTCAGgggacacgagcatgcagagac
||||||||||||
aattgccgccgtcgttttcagCAGTTATGTCAGatc

9
Why do we need local alignments?

• To compare a short sequence to a large one.

• To compare a single sequence to an entire


database

• To compare a partial sequence to the whole.

10
11
Heuristic Methods: FASTA and BLAST

FASTA
• First fast sequence searching algorithm for
comparing a query sequence against a database.

BLAST
• Basic Local Alignment Search Technique
improvement of FASTA: Search speed, ease of
use, statistical rigor.
12
FASTA and BLAST
• Basic idea: a good alignment contains
subsequences of absolute identity (short lengths
of exact matches):

– First, identify very short exact matches.


– Next, the best short hits from the first step are
extended to longer regions of similarity.
– Finally, the best hits are optimized.

13
FASTA
Derived from logic of the dot plot
– compute best diagonals from all frames of
alignment
The method looks for exact matches between
words in query and test sequence
– DNA words are usually 6 nucleotides long
– protein words are 2 amino acids long

14
FASTA Algorithm

17
Makes Longest Diagonal
After all diagonals are found, tries to join
diagonals by adding gaps

Computes alignments in regions of best


diagonals

18
FASTA Alignments

19
FASTA Results - Histogram
!!SEQUENCE_LIST 1.0
(Nucleotide) FASTA of: b2.seq from: 1 to: 693 December 9, 2002 14:02
TO: /u/browns02/Victor/Search-set/*.seq Sequences: 2,050 Symbols:
913,285 Word Size: 6
Searching with both strands of the query.
Scoring matrix: GenRunData:fastadna.cmp
Constant pamfactor used
Gap creation penalty: 16 Gap extension penalty: 4

Histogram Key:
Each histogram symbol represents 4 search set sequences
Each inset symbol represents 1 search set sequences
z-scores computed from opt scores
z-score obs exp
(=) (*)
< 20 0 0:
22 0 0:
24 3 0:=
26 2 0:=
28 5 0:==
30 11 3:*==
32 19 11:==*==
34 38 30:=======*==
36 58 61:===============*
38 79 100:==================== *
40 134 140:==================================*
42 167 171:==========================================*
44 205 189:===============================================*==== 20
46 209 192:===============================================*=====
48 177 184:=============================================*
FASTA Results - List
The best scores are: init1 initn opt z-sc E(1018780)..

SW:PPI1_HUMAN Begin: 1 End: 269


! Q00169 homo sapiens (human). phosph... 1854 1854 1854 2249.3 1.8e-117
SW:PPI1_RABIT Begin: 1 End: 269
! P48738 oryctolagus cuniculus (rabbi... 1840 1840 1840 2232.4 1.6e-116
SW:PPI1_RAT Begin: 1 End: 270
! P16446 rattus norvegicus (rat). pho... 1543 1543 1837 2228.7 2.5e-116
SW:PPI1_MOUSE Begin: 1 End: 270
! P53810 mus musculus (mouse). phosph... 1542 1542 1836 2227.5 2.9e-116
SW:PPI2_HUMAN Begin: 1 End: 270
! P48739 homo sapiens (human). phosph... 1533 1533 1533 1861.0 7.7e-96
SPTREMBL_NEW:BAC25830 Begin: 1 End: 270
! Bac25830 mus musculus (mouse). 10, ... 1488 1488 1522 1847.6 4.2e-95
SP_TREMBL:Q8N5W1 Begin: 1 End: 268
! Q8n5w1 homo sapiens (human). simila... 1477 1477 1522 1847.6 4.3e-95
SW:PPI2_RAT Begin: 1 End: 269
! P53812 rattus norvegicus (rat). pho... 1482 1482 1516 1840.4 1.1e-94

21
FASTA Results - Alignment
SCORES Init1: 1515 Initn: 1565 Opt: 1687 z-score: 1158.1 E(): 2.3e-58
>>GB_IN3:DMU09374 (2038 nt)
initn: 1565 init1: 1515 opt: 1687 Z-score: 1158.1 expect(): 2.3e-58
66.2% identity in 875 nt overlap
(83-957:151-1022)

60 70 80 90 100 110
u39412.gb_pr CCCTTTGTGGCCGCCATGGACAATTCCGGGAAGGAAGCGGAGGCGATGGCGCTGTTGGCC
|| ||| | ||||| | ||| |||||
DMU09374 AGGCGGACATAAATCCTCGACATGGGTGACAACGAACAGAAGGCGCTCCAACTGATGGCC
130 140 150 160 170 180

120 130 140 150 160 170


u39412.gb_pr GAGGCGGAGCGCAAAGTGAAGAACTCGCAGTCCTTCTTCTCTGGCCTCTTTGGAGGCTCA
||||||||| || ||| | | || ||| | || || ||||| ||
DMU09374 GAGGCGGAGAAGAAGTTGACCCAGCAGAAGGGCTTTCTGGGATCGCTGTTCGGAGGGTCC
190 200 210 220 230 240

180 190 200 210 220 230


u39412.gb_pr TCCAAAATAGAGGAAGCATGCGAAATCTACGCCAGAGCAGCAAACATGTTCAAAATGGCC
||| | ||||| || ||| |||| | || | |||||||| || ||| ||
DMU09374 AACAAGGTGGAGGACGCCATCGAGTGCTACCAGCGGGCGGGCAACATGTTTAAGATGTCC
250 260 270 280 290 300

240 250 260 270 280 290


u39412.gb_pr AAAAACTGGAGTGCTGCTGGAAACGCGTTCTGCCAGGCTGCACAGCTGCACCTGCAGCTC
|||||||||| ||||| | |||||| |||| ||| || ||| || |
DMU09374 AAAAACTGGACAAAGGCTGGGGAGTGCTTCTGCGAGGCGGCAACTCTACACGCGCGGGCT 22
310 320 330 340 350 360
FASTA Format
• simple format used by almost all programs
• [>] header line with a [hard return] at end
• Sequence (no specific requirements for line
length, characters, etc)

>URO1 uro1.seq Length: 2018 November 9, 2000 11:50 Type: N Check: 3854 ..
CGCAGAAAGAGGAGGCGCTTGCCTTCAGCTTGTGGGAAATCCCGAAGATGGCCAAAGACA
ACTCAACTGTTCGTTGCTTCCAGGGCCTGCTGATTTTTGGAAATGTGATTATTGGTTGTT
GCGGCATTGCCCTGACTGCGGAGTGCATCTTCTTTGTATCTGACCAACACAGCCTCTACC
CACTGCTTGAAGCCACCGACAACGATGACATCTATGGGGCTGCCTGGATCGGCATATTTG
TGGGCATCTGCCTCTTCTGCCTGTCTGTTCTAGGCATTGTAGGCATCATGAAGTCCAGCA
GGAAAATTCTTCTGGCGTATTTCATTCTGATGTTTATAGTATATGCCTTTGAAGTGGCAT
CTTGTATCACAGCAGCAACACAACAAGACTTTTTCACACCCAACCTCTTCCTGAAGCAGA
TGCTAGAGAGGTACCAAAACAACAGCCCTCCAAACAATGATGACCAGTGGAAAAACAATG
GAGTCACCAAAACCTGGGACAGGCTCATGCTCCAGGACAATTGCTGTGGCGTAAATGGTC
CATCAGACTGGCAAAAATACACATCTGCCTTCCGGACTGAGAATAATGATGCTGACTATC 23
CCTGGCCTCGTCAATGCTGTGTTATGAACAATCTTAAAGAACCTCTCAACCTGGAGGCTT
BLAST
• Basic Local Alignment Search Tool
– Altschul et al. 1990,1994,1997
• Heuristic method for local alignment
• Designed specifically for database searches
• Based on the same assumption as FASTA
that good alignments contain short lengths
of exact matches
24
BLAST
• Both BLAST and FASTA search for local
sequence similarity - indeed they have exactly
the same goals, though they use somewhat
different algorithms and statistical approaches.

• BLAST benefits
– Speed
– User friendly
– Statistical rigor
– More sensitive
25
BLAST Searches GenBank
[BLAST= Basic Local Alignment Search Tool]
The NCBI BLAST web server lets you compare your
query sequence to various sections of GenBank:
– nr = non-redundant (main sections)
– month = new sequences from the past few weeks
– refseq_rna
– RNA entries from NCBI's Reference Sequence project
– refseq_genomic
– Genomic entries from NCBI's Reference Sequence project
– ESTs
– Taxon = e.g., human, Drososphila, yeast, E. coli
– proteins (by automatic translation)
– pdb = Sequences derived from the 3-dimensional structure
from Brookhaven Protein Data Bank
26
BLAST
• Uses word matching like FASTA
• Similarity matching of words (3 amino acids, 11
bases)
– does not require identical words.
• If no words are similar, then no alignment
– Will not find matches for very short sequences

• Does not handle gaps well


• “gapped BLAST” is somewhat better
27
BLAST Algorithm

28
BLAST Word Matching

MEAAVKEEISVEDEAVDKNI
MEA
EAA Break query
AAV
AVK into words:
VKE
KEE
EEI
EIS
ISV Break database
...
sequences
into words:

29
Find locations of matching words
in database sequences

ELEPRRPRYRVPDVLVADPPIARLSVSGRDENSVELTMEAT
MEA
EAA TDVRWMSETGIIDVFLLLGPSISDVFRQYASLTGTQALPPLFSLGYHQSRWNY
AAV IWLDIEEIHADGKRYFTWDPSRFPQPRTMLERLASKRRVKLVAIVDPH
AVK
KLV
KEE
EEI
EIS
ISV

30
Extend hits one base at a time

31
BLAST variants

32
33
34
35
Understanding BLAST output

36
37
38
39
40
41
44
45

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy