Creating Phylogenetic Trees With Mega: Prat Thiru
Creating Phylogenetic Trees With Mega: Prat Thiru
MEGA
Prat Thiru
Outline
MEGAFeatures
BackgroundonPhylogenetic Trees
BriefOverviewofTreeBuildingMethods
MEGADemo
MEGA
Easytousesoftwarewithmultiplefeatures
Features:
Aligningsequences
Estimatingevolutionarydistances
Buildingtreesusingseveralmethods
Testingtreereliability
MarkingGenes/Domains
Testingforselection
Computingsequencestatistics
Phylogenetics
Studyofevolutionaryrelationship
Phylogenetic treeisagraphicalrepresentationofthe
evolutionaryrelationships
Phylogenyof
Species
Strains
Genes
MetabolicPathways
Treescanbeinferredbymorphologyormolecular
information
WhyCreatePhylogenetic Trees?
Reconstructevolutionaryhistory
Drawconclusionsofbiologicalfunctionswhich
mightnotbeapparent
Precomputedtrees(eg.Ensembl,Pfam)
mightnotincludegenesorspeciesofinterest
PartsofaTree
Nodes:taxonomicunits(eg.genes,species,etc.)
Internal:ancestralstate
Bifurcating
Multifurcating
External:OperationalTaxonomicUnits(OTUs)
Branches:relationshipsamongthetaxonomic
units(ie.ancestordescendentrelationship).
Clade
BranchLength:numberofchangesthathave
occurred
Topology:branchingpattern
Rootedvs Unrooted tree
Operational
Taxonomic
Unit(OTU)
(Internal)Node
StepstoCreatePhylogenetic Trees
Identifyandacquirethesequencesthatareto
beincludedonthetree
Alignthesequences(MSAusingClustalW,
TCoffee,MUSCLE,etc.)
Estimatethetreebyoneofseveralmethods
Drawthetreeandpresentit
FromHall,B.G.(p.3 seeFurtherReadingSlide)
TreeBuildingMethods
CharacterStateMatrix
SpeciesAACTTC
SpeciesBAGTTC
SpeciesCCGTAC
SpeciesDCCTAC
DistanceMatrix
ABCD
A10.80.40.6
B0.810.60.4
C0.40.610.8
D0.60.40.81
Distancebased Characterbased
Parsimony
Probabilistic
Example:
Distancebased
Unweighted PairGroupMethodwithArithmeticmean(UPGMA)
NeighborJoining(NJ)
Startwithalltaxa inasinglenodeanddecompose witheachiteration
Pairofnodespulledout(grouped)ateachiterationarechosensothat
thetotallengthofthebranchesonthetreeisminimized
Mutationratesarenotconstant
a
b
d
c
e
a
b
c
d
e
a
b
c
d
e
Characterbased:Parsimony
Preferredphylogenetic treeistheonewiththefewestevolutionary
steps
Identifyinformativesites
Foreachpossibletree,calculatethenumberofchangesateach
informativesite
Sumtotalnumberofchangesforeachpossibletree,thetreewith
thesmallestnumberofchangesisselectedasthemostlikelytree
*informativesite
Characterbased:Probabilistic
MaximumLikelihood
Ateachsite,thelikelihoodisdeterminedbyevaluatingthe
probabilitythatacertainevolutionarymodel(eg.BLOSSUMor
PAMmatrices)hasgeneratedtheobserveddata.
Thelikelihoodsforeachsitearethenmultipliedtoprovide
likelihoodforeachtree
Choosethetreewithmaximumlikelihood
BayesianInference
RecentvariantofML
Findsasetoftreeswiththegreatestlikelihoodgiventhedata
ComparisonofMethods
Distancebased
Resultsinasingletree
UPGMA:reliableonlyforcloselyrelatedspecies;replaced
byNJ
NJ:fast,suitableforlargedataset
Characterbased
Multipletreeswillbefound
MP:longbranchattractionproblem
ML:statisticallywellfoundedbutslowforlargedataset
BI:fastertoassesssupportfortree,priordistribution
parametersmustbespecified
Hall,B.G.(p.60 seeFurtherReadingslide)
WhichMethodtoUse?
Accuracy
lookingathowclose theestimatedtreeistothetrue treeresulted(inorderof
decreasingaccuracy):
BayesianInference MaximumLikelihood MaximumParsimonyNeighborJoining
EaseofInterpretation
singlevs multipletrees
TimeandConvenience*
Hall,B.G.(p.160 seeFurtherReadingslide)
DataSet NeighborJoining Maximum
Parsimony
Maximum
Likelihood
Bayesian
Inference
Small Data 1sec 3sec 6sec
SmallData** 9sec 10min 1hr34min 29min40sec
LargeData 1sec 22sec 3min29sec
LargeData** 86sec 10hr2min 58hr 6hr33 min
*MacPro dualprocessor(2.6GHz);NJandMPwasusingMEGA4.0,MLbyPHYML,andBIusingMrBayes
**withreliabilityestimate
TreeReliability:Bootstrapping
Astatisticalresamplingprocedurecommonly
usedforprovidingconfidencetobranchesin
phylogenetic trees
Ameasureofrepeatability,theprobabilitythat
thebranchwouldberecoveredifthetaxa were
sampledagain
Bootstrappingvaluesaretypicallypresentedfrom
1000repeatedcalculations
Bootstrapvaluesof>70%isrecommended
SomeTips
Usemorethanonemethod
Usemorethanonesoftwarepackage
Examinemorethanonetreeifmultipletrees
aregenerated
Bootstrapyourdata
Homologyvs Homoplasy
Consideranotherintermediatetaxa toresolve
relationship,ifneeded
Phylogenetic TreeSoftware
PHYLIP(thePHYLogeny InferencePackage):
http://evolution.genetics.washington.edu/phylip.html
PAUP*:
http://paup.csit.fsu.edu
MrBayes:
http://mrbayes.csit.fsu.edu
MEGA(MolecularEvolutionaryGeneticAnalysis)
http://www.megasoftware.net
AMoreComprehensiveListingofPhylogenyPrograms:
http://evolution.genetics.washington.edu/phylip/software.html
*Commercialsoftware
Summary
NatureReviewsGenetics4,275284(2003)
FurtherReading
Kumar,S.,Dudley,J.,Nei,M.,andTamura,K.,MEGA:Abiologistcentricsoftwareforevolutionary
analysisofDNAandproteinsequences. BriefingsinBioinformatics9:299306(2008)
Hulsenbeck,J.P.,andRonquist,F.,MyBayes:Bayesianinferencesofphylogeny. Bioinformatics
17:754755(2001)
Holder,M.,andLewis,P.O.,Phylogenyestimation:traditionalandBayesianapproaches.Nature
ReviewsGenetics 4:275284(2003)
Books:
Phylogenetic TreesMadeEasy:AHowtoManual3
rd
Ed.Hall,B.G.(2008)
InferringPhylogenies.Felsenstein,J.(2003)