Using IMa3 PDF
Using IMa3 PDF
Jody Hey
Table of Contents
Introduction .................................................................................................................................................. 3
3.2. Assessing mixing and run duration when sampling G and τ (but not Φ) ...................................... 15
RECENT CHANGES
3/22/2019 changed the recommended heating model from SIGMOID to GEOMETRIC.
INTRODUCTION
This document explains how to use the IMa3 computer program (Hey, et al. 2018). Unlike previous IM
programs, IMa3 allows an investigator to estimate the topology of the phylogenetic tree for the sampled
populations or species. In this document the topology will often be referred to as Φ (Phi). Once an
estimate of Φ has been obtained, IMa3 can be run using that fixed value of Φ in order to estimate the
parameters of an Isolation-with-Migration model, just as one can do with IMa2 which requires a fixed
value of Φ.
IMa3 is based on the older IMa2 and IMa2p programs and uses the same input file format. Nearly all of
the options and analyses that were available with IMa2 are present in IMa3, although many of the
command line options have changed. If compiled with MPI (MPI_ENABLED is defined during
compilation) then IMa3 can be run with multiple heated chains distributed across multiple cores.
http://people.binf.ku.dk/rasmus/webpage/mdiv.html
• IM implements a full six-parameter IM model for multiple loci (Hey and Nielsen 2004).
https://bio.cst.temple.edu/~hey/software#im-div
• IMa implements a full six-parameter IM model with the prior on genealogies calculated by
integrating over the prior for population size and migration rate parameters. This greatly
reduces the dimensionality of the state space of the Markov chain simulation and allows for
calculation of the joint posterior probability density for all rate parameters, which in turn allows
for likelihood ratio tests of nested demographic models (Hey and Nielsen 2007).
https://bio.cst.temple.edu/~hey/software#im-div
• IMa2 implements a full IM model for more than two populations using the same approach as
IMa does for two populations. IMa2 requires a user-specified population phylogeny (Hey 2010).
https://bio.cst.temple.edu/~hey/software#ima2-div
https://bio.cst.temple.edu/~hey/software#ima2p-div
• IMGui a graphical user interface for running IMa2 or preparing IMa2 command lines (Knoblauch,
et al. 2017).
• IMa3 provides for estimation of the posterior probability of the population phylogenetic
topology by using a new kind of data augmentation called a ‘hidden genealogy’ (Hey, et al.
2018). IMa3 also implements hyperpriors.
The basic ‘Isolation with Migration’ model and the use of Bayesian inference and Markov chain Monte
Carlo is described in the “Introduction_to_IM_and_IMa” document available from the HeyLab website.
To understand the general principals behind the overall approach, all of which apply to the latest
program, it is useful to read this introductory document.
• Φ (Phi) is the ordered rooted topology of a population tree. It does not include the branch
lengths. By “ordered” we mean that the sequence in time of internal nodes in time is
considered to be part of the topology. Edwards (Edwards 1970) identified such trees as
‘labelled histories’.
• τ (tau) is a vector of the times of common ancestry of the populations, also called ‘splitting
times’. If we have an estimate of τ for a given value of Φ, then the τ values give the lengths
of the branches of Φ,
• Θ (Theta) is a vector of all the population size parameters and migration rate parameters. Θ
is not in the MCMC simulations, but rather is estimated using a sample of genealogies.
• G is a genealogy for a locus, or if a data set has multiple loci, it refers to a set of genealogies,
one per locus.
• μ is a vector of mutation rate scalars, usually one per locus.
On a different note, users will notice that most countable things in the program output are counted
beginning with the number 0. For example, in a model with four populations, populations are indexed
using 0,1,2 and 3. The same goes for loci and other things. Thus, for example, if an error is reported
involving locus 1 it means the second locus in the data file.
The new command line term of greatest note is –j0, which tells the program to run an MCMC
simulation over phylogenetic topologies. When this is used the output to the screen and results file
contains information on sampling of topologies, whereas when –j0 is not used the output contains
information on sampling of genealogies, splitting time values and mutation rate scalars.
By virtue of a parallel implementation, it is possible to analyze larger data sets than with earlier non-
parallel IM programs, however this does not circumvent the challenges that arise with larger numbers
of loci. Like its predecessors, the MCMC simulation of IMa3 implements some kinds of updates that
apply to all genealogies in the simulation (one per locus per chain). So adding more loci has a large
effect on the mixing process. In the current release of the program we have fixed the maximum
number of loci to 500 (this can be changed - see MAXLOCI in ima.hpp). The maximum number of
populations that can be run is 8 (9 with a ghost), and this cannot be increased. With this many
populations there are a total of 1,587,600 different topologies (i.e. values of Φ). The maximum number
of gene copies per locus is 1000. However the effect on the speed of the analysis of having large
For data sets with multiple loci it is usually desirable to run multiple heated chains simultaneously, and
if the number of chains is 10 or more, considerable time can be saved by running on multiple
processors. Even so, run times can be quite long for large data sets. For example, for phylogeny
estimation using a data set of 100 loci it is not uncommon to run on 20 to 40 processors for several days.
When the phylogeny is known then IMa3 takes a similar amount of time as IMa2p, which is roughly the
amount of time required of IMa2 divided by the number of processors used. These runs typically
proceed more quickly than when estimating the phylogeny.
The relationship between data size (e.g. # of loci) and model size (e.g. # of populations that can be
examined with the data) certainly depends on how much the populations have actually diverged, and
how much information is in the data, but otherwise is hard to predict. For data sets that show clear
patterns of divergence for each of multiple loci, phylogeny estimation using IMa3 is probably
unnecessary as many other methods will return the correct tree. We’ve had success estimating
phylogenies with 7 populations that had a lot of gene flow using simulated data sets with 50 loci, and 5
gene copies per locus.
Figure 1 also introduces the string formatting used for trees by IMa3. The string contains information on
the topology of the tree for the sampled populations and information on the ordering of the internal
nodes in time. For starters, the counting of populations begins with 0, and sampled populations are
numbered 0 to npops-1, where npops is the number of sampled populations. The internal nodes of
the tree correspond to ancestral populations, and these are numbered beginning with the number of
populations, i.e. npops for the most recent ancestral population, and proceeds up to 2×(npops-1)
for the ancestor of all the sampled populations. Ancestral populations are represented by their
The tree string format differs from that used in IMa2, in which the ancestral population number was
preceded by a colon (‘:’). IMa3 does not use this colon, and so the format now corresponds to the
conventional Newick format.
0 1 3 2 00 1 3 2
0 1 3 2
t0 t0 t0
4
4 4
t1 t1 t1
5 5 5
t2 t2 t2
6 6 6
At any point in time during the MCMC simulation run by IMa3, there will be a current tree with a
topology (e.g. as represented by the tree string) with splitting time values. The program is designed so
that the trees that are generated by the simulation will be drawn from the posterior probability
distribution. In other words, if D is the data and Φ is the topology, then the current value of Φ is a
random draw from p(Φ|D). Then by recording lots of values of Φ the user can build a frequency
distribution that will approximate p(Φ|D).
In order to monitor how well the MCMC simulation is mixing, IMa3 provides output to the screen at
intervals, and it can be run in modes that provide output files at intervals.
• The current step number, the number of sampled values of Φ, the current values for the
probability of the data, given the current genealogies, ‘p (D|G)’, and the prior probability of the
genealogy ‘p (G)’. Both of these probabilities are also of course conditioned on the value of Φ
that exists in the simulation at the time they were recorded. They can jump around a lot
because Φ is changing constantly. Therefore these probabilities are really only useful as a
check, for example to see that they are actual numbers (e.g. non-numerical values indicate a
floating point problem and the run should be halted).
• The value of the text string for the current topology (see Figure 1).
• The current value of τ (splitting times) that were sampled along with the current topology.
o The update rates for τ using an update method called RannalaYang, which is based on
the method in Rannala and Yang (2003)).
o Update rates for the branches of the phylogeny using a branch sliding update. Rates are
shown for changes that affect only branch lengths (i.e. effectively updates to τ), for
changes that affect the topology per se (i.e. Φ), and changes that affect the time of the
root of the phylogenetic tree (the TMRCA, which is the last element of τ).
o Update rates for genealogies, which are also branch slide updates. For these updates,
some fraction of these branch slides also affect the topology and/or the genealogy
TMRCA and the update rates for these are given as well.
o If the data have multiple loci, then update rates are also available for mutation scalars
(μ), and these can be requested for runtime output to the screen using a command line
flag (‘-r4‘).
• A table of Effective Sample Size (ESS) estimates and autocorrelations for Φ, for the sum of
P(D|G) and P(G) (called L[P]), splitting times, and tmrca values of genealogies. This table begins
to appear after 100,000 steps or so.
• If Metropolis-coupling has been invoked then an additional table appears with columns:
o The number of accepted swap proposals between successive chains (swaps between
chains with non-successive heating terms are not shown).
o The rate of accepted swapping. This value will depend on the number of chains and the
heating values. In general the closer are the heating values for two chains the higher
will be the acceptance rate of proposed swaps between them. For large data sets with
large numbers of chains, it seems best to have very high swap acceptance rates (e.g. >
0.95) between those chains nearest the cold chain.
It is possible for the simulation to appear to be mixing well on the basis of large numbers of phylogeny
updates that include a wide variety of different topologies. However mixing of phylogenies is entirely
dependent on mixing of the latent genealogies and hidden genealogies.
When estimating the phylogeny with IMa3, and particularly when there are more than 10 loci and more
than three populations, the safest route to ensuring a good burnin is too simply observe the
distribution of sampled values at intervals, and to wait until that distribution appears to not be changing
(see below).
Similarly during the sampling phase, and particularly for larger problems, the safest route to ensuring a
good sample is to first ensure a good burnin, and then observe the sampled distribution at intervals to
see that it is no longer changing. Even then, it is quite important to repeat the entire process with a
different random number seed, at least once, to see that you have obtained the same distribution.
file named ‘IMburn’ in the same folder with the data, and that that file begins with ‘yes’ (Deletion of the
IMburn file, or deletion of the ‘yes’ in the file, causes the burnin period to end at the close of the
current time interval specified by –b).
The second way to extend a burn is to restart a previously completed run using a checkpoint file that
contains the state space at the end of that previous run. The creation of checkpoint files can be invoked
To load a previously saved mcf file, the user can use –r3 (together with –f to specify the file name) or
–r7 which causes the current output file name to be used as a base for the mcf file and if a file of that
name is present at the beginning of the run, it will be loaded. The –r6 option (together with –r3 or –
r7) causes the previously recorded values in the mcf file to be ignored, and begins the MCMC
simulation fresh from the point at which it left off in the previous run. So by using –r6 and -r7
together it is possible to keep repeating a command line, and assessing mixing by looking at the output
files, until such time as it seems that the actual sampling can begin (at which point –r6 is no longer
needed ).
Another tool for assessing mixing for Φ is to compare the distribution of trees for the first half of the
sample with that for the second half of the sample. These values (‘freq_set1” and “freq_set2”) in the
table with sorted topologies and estimated posterior probabilities should be similar to each other and to
“freq_ALL” if a large sample of effectively independent values have been obtained.
Users are encouraged to be quite conservative with respect to assessing mixing. Even if a the sample
appears to be high quality, based on these criteria, it can be useful to extend the run (e.g. using –r7) and
to repeat the run with a different random number seed.
Like its immediate predecessors IMa3 implements two quite different suites of calculations when
estimating an IM model, given a phylogenetic topology. One part runs the MCMC simulation that
generates samples of genealogies (i.e. samples from the posterior probability density of genealogies,
given the data and a value of Φ). The second part does the analyses on the function that is built from
these genealogies.
The MCMC part of this must include a burnin period of sufficient length, followed by a period of
sampling of genealogies. At the end of the run, or at intervals specified by the user, the default analyses
and additional analyses that were specified on the command line are done. Typically the analyses are
carried out by the program as soon as the sampling phase is complete. However if desired, and if the
genealogies from a prior run were saved, the program can be run in a ‘Load-Genealogies’ mode (using
–r0) to do additional analyses using the function generated from saved genealogies.
For most data sets, and for all data sets with more than a few loci or with many gene copies per locus,
predicting run duration is a complex problem that depends on how well the Markov chain explores the
state space of genealogies (see discussion in Introduction_to_IM_and_IMA.pdf). These are
fundamentally the same issues that arise when using IMa3 to estimate Φ, however it can be easier to
assess mixing when not sampling topologies, that is when running on a given value of Φ and sampling
genealogies and splitting times.
During the run the following information will appear in the command window (or wherever screen
output is redirected to, depending on the environment):
• The current step number, the number of sampled genealogies, the current values for the
probability of the data, given the genealogies, ‘p(D|G)’, and the prior probability of the
genealogy ‘p(G)’. These probabilities are not particularly useful, but it is good to check that
they are actual numbers (non-numerical values indicate a floating point problem and the run
should be halted).
• The update rates for τ and G. These are the percentage of proposed updates that were
accepted based on the corresponding Metropolis-Hastings criteria. By default two types of
updates are done for splitting times. One, called NielsenWakeley,is based on the method in
Nielsen and Wakeley (2001). The other is called RannalaYang and is based on the method in
Rannala and Yang (2003)). For genealogies, branch slide updates are done, with updates for
single branches (identified as ‘branch’ in the output). For these updates, some fraction of these
branch slides also affect the topology and/or the TMRCA and the update rates for these are
given as well.
o In the results file of a run sampling values of G, update rates are also available for
mutation scalars (μ), and these can be requested for runtime output to the screen using
a command line flag (‘-r4‘).
• A table of Effective Sample Size (ESS) estimates and autocorrelations for splitting times and for
the sum of P(D|G) and P(G) (called L[P], this is a useful summary of where the MCMC is at).
This table begins to appear after 100,000 steps or so. The ESS estimates are of limited
usefulness. For most runs they are pretty chaotic in the way they change over the course of the
run.
• If Metropolis-coupling has been invoked then an additional table appears with columns:
o The rate of accepted swapping. This value will depend on the number of chains and the
heating values. In general the closer are the heating values for two chains the higher
will be the acceptance rate of proposed swaps between them. For large data sets with
large numbers of chains, it seems best to have very high swap acceptance rates (e.g. >
0.95) between those chains nearest the cold chain.
3.2. Assessing mixing and run duration when sampling G and τ (but not Φ)
For most data sets, and for all data sets with more than a few loci or with many gene copies per locus,
predicting run duration is a complex problem that depends on how well the Markov chain explores the
state space of genealogies. These issues are discussed in the document:
Introduction_to_IM_and_IMA.pdf. The program provides several tools to help deal with this issue:
• The program will generate plots showing the value of L[P] and t over the course of the run.
Runs that are mixing well will not show long term trends in these plots.
• The program estimates the autocorrelations of these same quantities. Long term trends
appear as non-zero autocorrelations. Effective Sample Size (ESS) estimates are calculated
from these autocorrelations.
• At the point when the posterior probability function is generated and an output file is
created, the sampled genealogies are divided evenly into two sets: SET1 for the first half,
and SET2 for the 2nd half. If the sampled genealogies are not auto-correlated over the full
length of the run, such that SET1 and SET2 both contain a fair number of effectively
independent samples, then the estimated posterior density functions based on SET1 and
SET2 should be very similar. Similarly parameter estimates based on these functions should
be similar. The program calculates parameter estimates for both sets.
• For each run multiple Metropolis-Coupled chains can be run to improve mixing over the
course of a run. In general, multiple chains are recommended or required when there are
five or more loci.
When starting out it is a good idea to keep an eye on things at the beginning of a run to be sure that
quantities are being updated. However, even if update rates are nontrivial it is quite possible that the
chain is mixing poorly. Note that even if update and swap rates look ok at the start of a run, they may
change as the system approaches stationarity, so it is usually advisable to watch a run for a few tens of
thousands of steps before deciding whether or not to restart with different terms.
If you are running many chains, remember that the total update rate for any term in chain 0 (cold chain
from which samples are taken) includes the rate of swapping with other chains. If splitting times are
updating at higher rates in higher numbered chains (as they typically will be), then it may be the case
that splitting times for chain 0 have an overall update rate that is acceptable. The trend plots will tell
the tale.
The IM programs can be used to generate estimates of model parameters (θ1, θ2, θA, m1 , m2 and
t). Typically the peaks of the estimated distributions are taken as the estimates, just as if one
was taking a maximum likelihood estimate. Indeed, because the method uses uniform prior
Once the model parameter estimates are in hand, many investigators will wish to also generate
estimates of the demographic quantities (i.e. N1, N2, NA, t, 2N1m1, and 2N2m2). Most of these
conversions can be done automatically by the program, provided that mutation rate estimates
are included in the input file (see the section on Input File Format and command line flags for the
program you are using). Whether this is done by the program or by the investigator, it is
important to understand what is being done here, as the subject can be a bit confusing. The next
few paragraphs explain how these calculations are made.
First, note that that all of the parameters in the model include the mutation rate u (which is a
value for the gene, not per base pair). If you have multiple loci, then u is the geometric mean of
the mutation rates of all the loci. The method for converting parameter scales is explained for
diploid autosomal loci for the parameters t, θ1, and m1. The same approach is readily applied to
θ2, θA, and m2 in the same way.
First, gather your model parameter estimates, and an estimate of actual mutation rates (using
an outgroup or some other relevant data).
- Let A be your estimate of θ1 (i.e. 4N1u where N1 is the effective size of population 1)
- Let B be your estimate of t, the time parameter (i.e. t u, where t is the time since
splitting)
- Let C be your estimate of m1 (i.e. m1/u). It is important to understand that m1 is the rate
per gene per generation from population 1 to population 2, in the coalescent. Since the
coalescent goes backwards in time, m1 is more easily thought of as the rate at which
genes come into population 1, from population 2, as time moves forward.
- Let U be an estimate of the mutation rate per year for the gene being studied. (not per
base pair, but for the entire gene). This must usually be obtained using some other data,
such as distance from an outgroup of known time separation. If you are doing multiple
loci, then U is the geometric mean of the mutation rates (per year) for the loci.
- To estimate the effective population size, N1, calculate A/(4 V). This is because N1 is
defined in the coalescent models as being proportional to the inverse of the coalescent
rate per generation. Therefore we need to use V since it is an estimate of the mutation
rate on a scale of generations.
- To estimate the time since splitting, t, in units of years, take B/U.
- To estimate the time since splitting in generations, take B/V.
- To estimate the migration rate per generation, m1, take C × V.
- To estimate a migration rate per year, take C × U.
- To estimate the population migration rate (the effective rate at which genes come into a
population, per generation) for population 1 (i.e. 2 N1 m1) you don't even need the
estimate of mutation rate, Since 4N1u × m1/u / 2 = 2N1 m1, all you need to do is take
A×C/2.
When multiple loci are studied, the program also allows estimation of mutation rate scalars for
each locus. If xi is the estimate of the scalar for locus i, then an estimate of 4N1ui can be
obtained by taking A × xi.
If data from multiple loci are used, but per-year mutation rates are only available for a subset of
them, then it is still possible to generate estimates of demographic quantities. The trick is to
make use of the estimates of the mutation rate scalars for those same loci. Let X be the
geometric mean of the estimates of the mutation rate scalars for just those loci for which per-
year mutation rates are also available. Let U be the geometric mean of the per-year mutation
rates for just those same loci, and let V = U G. Then A X /(4 V) is an estimate of N1. Similarly an
estimate of the number of years since divergence began is obtained with B X/U. An estimate of
the migration rate per generation per gene copy is obtained with C V/X.
If you are working with multiple loci that have mixed inheritance models, then you will want to
be setting the inheritance scalars in the input file (see Input File Format). In this case the method
described above can be applied in exactly the same way without changes.
However if you are working with a single locus that does not have diploid autosomal inheritance,
but the inheritance scalar is set to 1 in the input file. Then the estimates of θ1, θ2, and θA are not
of 4Neu but rather of the product of 4Neu and whatever the true inheritance scalar actually is for
that locus.
A sufficient burnin period is required in order to have sampled genealogies be independent of the
starting state of the Markov chain. The minimal length of a burn-in chain depends on the data set
and cannot be known prior to looking at the results of some runs. If the user is loading a
previously saved Markov chain (using –r3 and –f . see below), and if the goal is to essentially
just continue the previous run that was used to save the state of the Markov chain, then the
burnin can be made very short.
• If ‘-b’ is followed by a floating point number (i.e. with a decimal point), then this is
the duration of the burnin time interval in hours. If there is not a valid file named
‘IMburn’ in the current directory/folder, or until the first letter in that file is not a
‘y’, then the burnin stops, otherwise it continues. This usage of the ‘IMburn’ file is
similar to the use of the ‘IMrun’ file (see below) for controlling the length of the run.
If the burntrend file option is invoked (-r5) then following each burnin time interval a
burntrend file is written. In this way the user can inspect the state of the burnin and
decide when it has been sufficient. Simply renaming or deleting the IMburn file will
cause the burn to stop at the end of the current period.
-c Calculation options:
Multiple options can be specified at once (e.g. –c01 invokes both options 0 and 1).
If mutation rate range priors are included on in the input file, for multiple mutation rates,
then the ratios of these limits are used as limits on the ratios of the mutation rate scalars.
If two or more loci in the analysis have a prior range, then this allows the prior
information on mutation rates to be included in the analysis and to shape the findings.
Care must be taken in selecting a prior range, as it really should include all of the sources
of uncertainty in the mutation rate. The IMa2 documentation has some additional
information about selecting these ranges.
Invoke this option to calculate the joint posterior density for demographic parameters
(population sizes and migration rates). For accuracy this requires a good sample of at least
100,000 genealogies. For models with more than two sampled populations it is not
possible (in a short enough time, with workable numbers of sampled genealogies) to
jointly estimate all parameters. In these cases joint estimates are obtained for all
population size parameters, and then for all migration rate parameters. This option is
usually used in Load-Genealogy mode. If likelihood ratio tests of nested models are to be
conducted then those models should be specified in a nested model file with the name
given using the –w option.
3 Get prior distribution terms from file (requires filename given with
-g )
This option allows the user to specify the upper bounds of uniform prior distributions
individually for each of the splitting time, population size, and migration rate parameters.
This option requires the use of a priorfile (see Parameter Prior File Format). In the
This is ignored when sampling phylogenies. When phylogeny is fixed, this is the length of the
interval (in steps of the MCMC simulation) between the saving of genealogies. The default is 100.
If a specific number of sampled genealogies is requested (i.e. -L is used with an integer) then
the total length of the run will be the burnin length (-b) plus the product of the integer specified
by -L and the integer specified by -d (or the default for –d). It is hard to know how many
genealogies should be saved. Values less than 10,000 are probably only useful for two population
models. Values greater than 100,000 often take a long time to analyze. For joint parameter
estimates, or tests of nested models, at least 100,000 genealogies are needed.
-f Name of file with saved Markov chain state generated in previous run (use
with –r3)
This is used when the user has saved a file containing the state of the Markov chain, from a
previous run, and wishes to load file to begin a new run. When using this option -r3 must also
be invoked. In general the input file, the priors and the heating model must be the same for the
two runs.
This is used together with the -c3 option to use prior distributions for population sizes, migration
rates and or splitting times that vary among parameters. This option offers an alternative to
setting the priors to be the same (e.g. using the -m –q and –t options). For example it is
possible to turn off some, but not all, migration rate parameters and to exclude them from the
model by setting their upper bound to zero.
-h Heating terms (MCMC mode only. Default: EVEN. -ha only: GEOMETRIC. -ha
and -hb SYGMOID):
-hb lower heating value (required for GEOMETRIC, optional for EVEN)
Most sampling runs (whether of topologies or genealogies) require the running of multiple metropolis-
coupled (Geyer 1991)chains in which all chains, with the exception of chain 0, have updates accepted at
a higher rate than specified by the Metropolis-Hastings criterion. This is called heating, and by swapping
the state space of heating and unheated chains, it is possible to get much improved overall mixing of
the state space for the unheated chain.
If multiple Markov-coupled chains are run, then it is important to have a heating scheme that leads to
sufficient rates of swapping of the chains. For chain i (where i goes from 0 – the cold chain – to n-1,
where n is the number of chains) , the Metropolis criteria for parameter updating are raised to a power
βi , where βi < 1. For all heating schemes chain 0 is not heated and chains 1 thru n-1 have successively
greater values of β. Swapping is attempted by picking chains at random that are within 8 chain steps of
one another, but most accepted swaps are between chains with the smallest difference in heating
values. Chains with low values of β will be strongly heated, but will have lower swapping rates with
chains that are much less heated.
For runs that sample τ and G (but not Φ) lots of trial and error suggests that a heating scheme should
be selected that provides swap rates of between 0.4 and 0.8 between chain 0 and chain 1, and
generally between chains i and i+1.
For runs that sample Φ it seems necessary to have a very gentle gradient of heating terms so that
swapping rates among low numbered chains (i.e. the cold chain and others with only a little heating)
have very high swap rates (preferably > 0.9).
It is not uncommon to run a very large number of chains (e.g. 100 or more). The speed of the program
with n chains is roughly 1/n as fast as with 1 chain (though it is a bit slower than this with very large
numbers of chains), so it may seem that having 100 chains will cause the program to be incredibly slow.
The tradeoff is that with a large number of chains the sampled genealogies are much more independent
of each other and they can be sampled more often. When there are many chains and the Markov chain
is mixing reasonably well the rate of sampling genealogies can be increased by reducing the interval
between samples (i.e. reduced the –d value).
The primary heating option is the number of chains: –hn (e.g. –hn 10 for n=10 chains).
The number of chains must be at least twice the number of cpus, and must be a multiple of the number
of cpus.
The heating options in IMa3 are different from previous programs. There are three heating models:
EVEN (default), SIGMOID, and GEOMETRIC.
Under the EVEN heating model, the user need only specify the number of chains using –hn and the
heating values will be set automatically between 0 and 1 among that many chains, or they can set a
lower bound using –hb and the values will be evenly spaced between that lower bound and 1.
Under the SIGMOID heating model, the heating value as a function of chain number is a sigmoid shaped
function (i.e. flatter when β is near 0 or 1), which improves mixing. To specify this model the user
invokes a shape term given by –ha. Values near 1 are close to the EVEN model
(i.e. non-sigmoid), whereas values close to 0.95 are strongly sigmoid shaped. Under the SIGMOID
model the lowest β value is 0, and in order to have good swapping rates between adjacent chains
(including chains with very low β values) it is necessary to have a large number of chains.
For the GEOMETRIC model the user specifies both –ha and –hb values. This model requires fewer
chains than SIGMOID and is recommended for most data sets as it is often not necessary to have most
heated chains have a β near 0.
-j Model options
There are various model options. Multiple options can be specified at once (e.g. -j134
invokes options 1,3 and 4). Note that the numbering is quite different from the same or
similar options in IMa2.
This invokes updating and sampling of Φ (the topology of the population phylogeny). Once the
burnin is complete values of Φ are sampled every 10 steps. Topology sampling requires a basic
underlying Isolation-with-Migration model in which all sampled and ancestral populations have
a population size parameter, and all pairs of populations that co-exist have two migration
parameters (one each direction). This means that some IM models with reduced parameter
sets (e.g. –j4 and others) will not run with –j0.
A ghost population is an unknown and unsampled population that might have affected
your data, e.g. by exchanging genes with your sampled populations (Beerli 2004).
Invoking this option will add an additional population to the model. The phylogenetic
position of the ghost population is assumed to be as an outgroup to all sampled
populations. Invoking this option will add two population size parameters and will greatly
increase the number of migration parameters in the model.
Following the original Bayesian model specified by Nielsen and Wakeley (2001), the basic
method uses uniform prior distributions by default. These are so-called ‘uninformative’ priors,
and they are simple to work with, and they can provide a posterior density that is proportional
to the likelihood. However two common issues arise regarding the use of a flat prior for
migration rate. One issue is that flat priors are only truly non-informative if (a) the user has an
actual prior belief about the upper bound (which is not common), or (b) the posterior density
reaches zero within the bounds of the prior. In the latter case, increasing the upper bound is
not expected to alter the shape of the posterior density. But for many analyses the estimated
posteriors for some parameters, particularly migration rates, are fairly flat and do not reach
zero within the bounds of the priors that are used.
The second issue is that for many problems we actually do have a basic prior expectation that
migration rates are low or zero. This is simply because the analyses are usually done on
These issues suggest that it might be useful to consider a prior on migration parameters that has
its highest value at zero and that does not have an explicit upper bound. The exponential
distribution has these properties, and invoking this option will cause an exponential prior to be
used for migration parameters. In this case the value following the –m is the mean of the prior
distribution or if a prior file is used the values given for migration rates are means of exponential
distribution priors. Typically users will want to use, or at least start with, small mean values.
IMa3 has an important feature in that users can work with hierarchical priors for the population
size and the migration rate parameters. To do so the user specifies the hyperprior probability
density from which the terms of the prior distributions will be sampled. Then the actual priors
are included in the state space of the MCMC simulation and are updated using Metropolis-
Hastings updates just as are other elements of the state space.
There are two benefits two using hyperpriors. One is that it allows the prior distributions to
adjust to the data and to the model, and this can improve mixing and, in our limited experience,
higher posterior probabilities for the phylogeny that fits the data best. The second benefit is
that by specifying hyperprior densities the investigator is partly freed from having to carefully
match the priors to the data (a messy and not easily justified practice). When –j3 is invoked
the -q and –m terms apply, not to the prior distributions, but to the hyperprior distributions.
For population size terms, with –j3 the number specified with -q becomes the upper bound
on uniform distribution (with lower bound 0). Then each individual population size parameter
has a uniform prior distribution (lower bound 0), the upper bound of which is drawn from this
uniform hyperprior distribution.
For migration parameters, the default action of –j3 is to invoke a uniform hyperprior
distribution just as for population size terms. If exponential priors are specified for use with
migration parameters (-j2) and hyperpriors are used, then both the hyperprior density and
the individual prior densities follow exponential distributions. Specifically, with -j2 and –j3,
the value following the –m term is the mean of an exponential hyperprior distribution. Each
Invoking –j3 causes tables with update rates of population size and migration
hyperparameters to appear in the screen output and in the main output file.
When –j3 is invoked without using –j0 (i.e. when sampling genealogies and splitting times,
with a fixed topology) then trend plots and asci curves approximating the posterior density of
each of the priors for the given topology will appear in the output file. Because there are not
set prior distributions the program will not estimate the joint or marginal densities of migration
and population size parameters under these circumstances. However IMa3 will generate a
priorfile format that contains suggested values for –q and –m (the spitting time priors are set
to the same as specified with –t), based on the estimated posterior distribution of priors, to
use in a run that does not use either hyperpriors (i.e. without either –j0 or –j3).
When generating a priorfile, which happens when –j3 is used without –j0, the (highly
arbitrary) rules for generating the suggested priors are as follows:
With a uniform hyperprior (either population size or migration rate), the method (that is
applied to each individual parameter prior) is as follows:
• Let the suggested prior that is to be determined (i.e. the upper bound of the prior
distribution for the parameter) be called ‘sugpr’
• Identify the upper bound of the hyperprior distribution (specified using –q on the
command line), call it ‘ub’.
• Identify the prior value with the highest estimated posterior probability, call it
‘prmax’
• Identify the value of the prior to for which the estimated cumulative probability is
0.8, call it ‘prcump8’ (i.e. prcump8 is the value x for which it is estimated that
prob(prior <= x|Data) = 0.8 ).
• Then follow this short conditional:
o If prmax > 0.8×ub : sugpr = ub
o else:
o if prmax < prcump8: sugpr = 0.8×ub.
o else: sugpr = prmax
• Let the suggested prior that is to be determined (i.e. the upper bound of the prior
distribution for the parameter) be called ‘sugpr’
• Identify the prior value with the highest estimated posterior probability, call it
‘prmax’
• Identify the value of the prior to for which the estimated cumulative probability is
0.1, call it ‘prcump1’ (i.e. prcump1 is the value x for which it is estimated that
prob(prior <= x|Data) = 0.1 ).
• Then follow this short conditional:
o If prmax < prcump1: sugpr = prcump1
o else: sugpr = prmax
Of course users can make their own priorfiles. The histograms for the priors are in the output
file and are valid estimates of the posterior probability densities for the priors.
4 Migration only between sister populations (do not use with -j0)
Removes migration parameters from the model for pairs of non-sister population (i.e. those not
directly related by a splitting event). This reduces the number of migration parameters to two
per splitting time interval, and greatly reduces the size of the overall model if there are many
populations.
5 One migration parameter for each pair of populations (do not use
with -j0)
For each pair of populations that have two migration parameters, these rates are set
equal to each other and only a single parameter is used. Note this also requires turning
off estimates of 2NM (-p4).
6 Migration only between sampled populations (do not use with -j0)
Removes migration parameters from the model for migration involving ancestral
populations. This model is not terribly realistic, but it is difficult to have a data set large
enough to inform on migration among ancestral populations. So it is possible that some
No migration parameters are included and the model becomes a pure isolation model.
9 One single migration parameter for all pairs of populations (do not
use with -j0)
Sometimes it is useful to ask about an overall migration rate. One way to get this is to test
a nested model with just one migration parameter. However a more complete way is to
invoke this option. When this is used all migration, between all pairs of populations in all
time periods occurs under the same parameter. This could be useful for multi-population
data sets that are too small for estimating multiple migration parameters.
x One single population size parameter for all populations (do not use
with -j0)
All populations have the same effective population size parameter value
If the value has a decimal point it is interpreted as a time, in hours, it specifies the duration of the
sampling period (after any burnin period) . In this case, the program will run continuously, as long
as there is a file named ‘IMrun’ in the current directory/folder and as long as that is a simple text
file that begins with the word ‘yes’ (or the first character in that file is a ‘y’). The only purpose of
this file is to be present when the user wants a run to continue, and to be absent when a run
should come to an end. When using the ‘-l’ flag with a floating point value the user controls the
If –L is followed by a floating point value and topologies are being sampled (-j0), then one
topology is sampled every 10 steps of the run. If genealogies are being sampled, then the
number of steps between sampling of genealogies is given by the –d option.
If –L is followed by an integer (e.g. –L 10000) then this value is the number of times the chain is
sampled (topologies or genealogies, depending on the presence of –j0) after the burnin is
completed.
If the program is run in Load-Genealogies mode (-r0) : An integer given with the –L flag will
specify the number of genealogies to load from the .ti files. If this integer is greater than the total
number of genealogies contained in all files to be loaded, then all genealogies in those files will be
used in the analyses. If this integer is less than the number of genealogies in those files, then the
specified number of genealogies will be sampled from the total (with even spacing among all
genealogies available in the files). This option allows a user to sample the results from a long
genealogy sampling mode run, that generated a large number of saved genealogies. This can be
useful if a very large number of genealogies have been saved and the Load-Genealogies mode
analysis is slow if they are all used .
This sets the upper limit of the prior distribution of the migration parameters, or if an exponential
distribution is used (-j2) it sets the mean of that prior distribution. If the prior is set to zero, it is
the same as invoking a model with no migration (i.e. the same as –j8). If the user wishes to set
priors individually for each parameter then a priorfile should be used (see –c3).
-p Output options:
There are various output options that determine which tables of results are included in the output
file. They can be given separately or all can be given at once (e.g. –p452 invokes options 2,4 and
5). The numbering is different than in IMa2.
This turns off the printing of plots of parameter value trends. These plots are an essential
tool for assessing MCMC mixing so ordinarily they would not be turned off. However they
do take up space and sometimes it is simpler to not create them.
This turns off the printing of plots of the estimated marginal posterior density for all
parameters in the output file. Such plots are not suitable for publication as they are based
on ASCII characters and have low resolution, so you can turn them off if you prefer.
However it is important to look at these curves, either as ASCII plots or by loading
histograms into a spreadsheet or some other program, at some point in the analysis.
This can only be used when sampling genealogies. This prints a table in the output file of
the distributions of times of the most recent common ancestor (TMRCA) for each locus.
The units on these times are the same as for the splitting time parameters (i.e. mutation
rate times time).
When there are two or more splitting times in the model the marginal prior distributions
are not uniform (Hey 2010). In this case it can be useful to distinguish how much of the
posterior density is due to the data and how much is due to the prior. This option causes
By default IMa3 will provide estimates of the posterior density for the population
migration rate associated with each migration parameter. These calculations can take a
while (sometimes hours in a big model with lots of genealogies), and so this can be
turned off if desire. In general for migration from population i to j, backwards in times,
we can appreciate the population migration rate as a function of a population size and a
migration rate parameter, 2NiMi→j = 4NiMi→j /2= 4Niu×(mi→j /u)/2 (Hey 2010). Considering
this forward in time it is the rate at which population i receives migrants from population
j.
From the posterior density for pairs of parameters it is possible to calculate the
probability that one term is larger than another. This option causes two tables to be
printed, one for population sizes and one for migration rates. Calculations are only
conducted for parameters with identical prior distributions.
This generates a file (with extension ‘.mpt’) of the histograms that estimate the posterior
distribution for numbers migration events. This is a large file if there are many loci and
migration parameters. Histograms are provided in each direction for each locus and for
the sum of all loci. These tables and numbers do not have a direct connection to the
migration parameters. These measurements are done for each pair of populations
regardless of how migration has been parameterized.
7 Print joint estimate for splitting times (not with –j0, for
models with 3 or 4 populations)
In case a user wants to compare the joint estimates of splitting times to the estimates
from the marginal densities, this option can be invoked. It works by setting up a series of
This is the value of the upper bound on the prior distribution for all of the population size
parameters. These parameters are values of 4Nu, where N is the effective population size and u is
the mutation rate per generation. For multiple loci u is the geometric mean of the mutation rates
of all the loci, per generation. Importantly the mutation rates are not per base pair and so it
follows that the population size parameters are not on a per base pair scale. Users might want to
estimate population size parameters ahead of time for their data to help think about where to set
the priors. If you do so, be sure not to calculate on a per base pair basis. If the user wishes to set
priors individually for each parameter then a priorfile should be used (see –c3). If hyperpriors are
invoked (-j3), then –q sets the upper bound of the hyperprior distribution.
-r Run options
This causes the program to run in Load-Genealogy mode. There is no MCMC simulation,
no burnin, and no genealogy sampling. Rather the program loads information on
previously sampled genealogies that are contained in .ti files, as specified using –v.
Prevents creation of .ti file(s) with genealogy information. This is ignored with –j0.
Generates a file of the state space of the Markov chain simulation. This file is saved at the
end of the burnin and each time an output file is generated. One file is saved for each cpu
used. If only one cpu, the file ends in ‘.mcf.0’.
4 Write all mutation related updates rates to stdout during the run
(default is to suppress this)
This option causes values and update information for mutation rate scalars to be printed
to the screen during the run. If there are many loci this is a large table. Usually mutation
rate scalars update at high rates with no problem. This information is always printed to
the output file regardless of whether they are requested for screen output.
This option results in a file of update information and trend plots that is printed at the end
of a burn period that was started using –b followed by an integer (indicating the number
of steps in the burnin). If the burn was set using a time period, then this option is
automatically invoked.
6 When loading mcf files (-r3,-r7) do not load sampled values (i.e.
use previous run as burnin)
This option causes the program to load a saved mcf file from a previous run, however only the
state space at the time the previous run ended will be restored. All of the data structures for
recording sampled values will be initialized as if the sampling phase is just beginning. This
option allows the user to treat a previous sampling run as a burnin run.
This option can be used to continue runs using the same command line. It is especially useful
for running IMa3 on systems that have a maximum time that a job can be run. In such cases a
run can be continued an indefinite number of times using the exact same command line
repeatedly, with the set of sampled values growing with each repeat of the command.
Typically –r7 is to be used after a burn has been completed (-b is ignored when –r7 is used). If
This is an integer (e.g. ‘-s 1731’). It is useful to specify a value for two reasons: first if you want
to repeat an identical run; and second if you are starting multiple separate replicate runs on the
same data set, in which case you will want to make sure they start with different seeds. It can
happen on some computers that if the runs start near each other in time they will end up with the
same starting seed from the system clock.
Importantly, If multiple cpus are being used, then there is no guarantee that reusing a random
number seed will generate an identical run.
The upper bound on the prior distribution for splitting times. This applies to all splitting times in
the model. If the user wishes to set priors individually for each splitting time then a priorfile
should be used (see –c3).
Specify the generation time. This is used for demographically scaled estimates of splitting times of
population sizes.
-v Base name (no extension) of *.ti files with genealogy data (requires
use of -r0)
When running in Load-Genealogy mode the user must specify the base name of the files with an
extension of ‘.ti’. All of the *.ti files to load must be in the same directory. For example if you have
two files with genealogies that you want to load and they are named myrun1.out.ti and
myrun2.out.ti you would specify as the base name ‘myrun’ (i.e. –v myrun ). If the .ti files
are in a directory that is not the same as the directory with the data file, that directory should be
included as part of the base name of the .ti files.
To conduct log-likelihood ratio tests of nested models the user must generate a file that specifies
precisely which models to test. This option is used to read in the name of file containing those
By default IMa3 assumes a uniform prior over all ordered rooted topologies. By using this option
users can adjust the topology prior for specific clades. For example, if a clade of populations 0
and 1 is considered to be half as probable as other clades, then include –x 0 1 0.5 . If a clade
of populations 2, 3 and 5 is considered to be twice as probable as others then include –x 2 3 5
2.0. Users can include as many –x terms as they wish.
-y Mutation rate scalar for relevant loci - for use with -p3
This is the geometric mean of the estimated mutation rate scalars. This is usually used for a Load-
Genealogy mode run with a mean value obtained from a previous run in which genealogies were
sampled. If all loci in the data file have a mutation rate provided in that file, then use –y1.
The default is 10,000, but if the program is quite slow for your data, and you want to look at
things immediately, it is useful to set it to a small number (e.g. 100 or 1000).
o After line 1, but before line 2, comments can be included to provide explanatory
information. Each line of comment must begin with a ‘#’
• Line 3 – the population names in order, separated by one or more spaces. There must be
npops names. This order also corresponds to the order in which the populations are
numbered in the population tree and the order in which the data occur for each locus. The
program numbers populations from 0 up to npops-1.
• Line 4 – a string of text representing the phylogenetic tree for the populations. If not
estimating the phylogeny, then it is very important to get this right. If phylogenetic topology
is being estimated then this line is ignored.
The string contains information on the topology of the tree for the sampled populations
and information on the ordering of the internal nodes in time. These internal nodes
correspond to ancestral populations. The ancestral populations are numbered beginning
with npops for the most recent ancestral population and proceeds up to 2×(npops-1)
for the ancestor of all the sampled populations. Sampled populations in the string are
represented by their respective number (see Figure 1). Ancestral populations are
represented by their ancestral population number. Examples of trees and strings for
three different topologies for samples from four populations are shown. If there is only
If estimating the phylogenetic topology, the tree string specified in the input file will be
ignored. And if desired the user can simply not include the string in the input file for a
–j0 run (i.e. delete the complete line where the tree string would be, do not leave an
empty line).
• line 6 - basic information for locus 1. This line contains, in order and each separated by
spaces: the locus names; the sample sizes for each population; the size of the locus; the
mutation model; the inheritance scalar; possibly a mutation rate; and possibly a range of
mutation rates.
o n0 thru npops-1, the sample sizes for the each population for that locus. These
numbers do not need to be the same for different loci. If a population is not
represented at this locus, a zero is used for that population.
o the length of the sequence (if SSM model – a number is needed here, but it is
ignored, if HapSTR, the length pertains to the sequence portion of the data )
o Letter indicating the mutation model (I- IS, H - HKY, S - SSM, J - joint SSM and IS
= HapSTR). If SSM (S) or HapSTR (J), the letter is followed immediately (no
spaces) by the number of linked STR markers within the locus.
o Inheritance scalar - For example: 1 for autosome, 0.75 for X-linked, 0.25 for Y-
linked or mtDNA.
o The mutation rate per year for the locus (not per base pair). This can be left
blank, but is needed for at least one locus in the data set if parameters on
demographic scales are to be estimated. If there are multiple STRs in the locus
then there can be multiple mutation rates on this line separated by spaces. If
the locus is a HapSTR, then the first mutation rate given applies to the
o line 7 - data for gene copy # 1 from population 0. For this line and all other data
lines, the first 10 spaces are devoted to the sample name. The sequence or
allele length (for SSM model) begins in column 11 of the file.
- Note that each line is for a single copy of the locus. If you have two copies
of a gene from an individual (e.g. STR genotypes), then use two lines, one
for each gene copy.
- For SSM or HapSTR data, the allele length assumes a step size of 1. This
means that data from STRs that are multiples of lengths greater than 1
must be converted to counts of the number of base repeats (e.g. for a
dinucleotide ‘CACACACACACACACACA’ the length would be 9). Any
number less than 5 causes the program to stop with an error (it is
assumed that such low counts could not evolve under the stepwise
mutation model). If the data is for an SSM model locus and there are
multiple STRs, then there will be one integer on each line for each STR,
separated by a space. If the locus is a HapSTR (joint IS and SSM) then the
STR data is given on the line, beginning at column 11, followed by the
sequence data. For SSM data, as for other types of data, only one gene
copy is represented on each line of the data file. This is true even if the
original data consists of diploid genotypes. In other words, diploid
genotype data must be broken up and listed, with one data line for each
gene copy.
• lines 8 thru line (6+n0+n1+... +npops-1) - the remainder of the data for locus 1.
Each line contains the data for one sample. The data for locus 1 for population 1
immediately follow those for population 0, and so on.
• Additional lines for additional loci. Each locus begins with a line containing the
information for that locus, in the same format as for the first locus. The sample names
and sample sizes for additional loci and the inheritance scalars and mutation model for
additional loci do not have to be the same as for locus 1 (generally they are not).
• Finally, after all the data the file should end on a blank line.
Here is an example for a tiny three locus data set (a larger example is provided with the source code). In
this made-up example the mutation rate per year is known and specified for locus 1, and for each STR in
locus 3, but not for locus 2. The inheritance scalars vary from 0.25 for locus 1, 0.75 for locus 2 and 1.0
for locus 3. In one case the sample size for a population is 0 (pop3 in locus 2).
• The user has reason to think that the most recent splitting time is much more recent than the other
splitting events in the history of the sampled species and wishes to constrain the first splitting event
to fall within this time.
• The user has reason to think that gene exchange did not happen between particular pairs of
populations, either between sampled populations or between their ancestors, and so wishes to
exclude some migration parameters from the model. This can be done by setting the upper bound
on the prior distribution for those terms to zero.
If it is used, the priorfile must specify all priors for all splitting-time, population size, and migration rate
parameters. The file is a simple text file with format as follows:
• Any line at any point in the file that begins with a pound sign, ‘#’, is ignored. These lines can be
used for explanatory text.
• There are four keywords: tree, theta, migration and time. Each must appear exactly once
in a priorfile as the only word on a line. The first non-comment line of the file must have tree
on it. After the line with tree, the order of lines with keywords does not matter. Following a
keyword line, the next line(s) has prior information (see below).
• The first non-comment line following the line with tree gives the population tree string for the
sampled data. This must be the exact same tree string as is given in the data file. For example,
the left-most population tree in Figure 1 has the following string: ((0,1)5,(2,3)4)6. This means
that sampled populations 2 and 3 joined most recently (at ancestral population 4); that sampled
populations 0 and 1 joined longer ago (at ancestral population 5); and that ancestral populations
4 and 5 joined longest ago at ancestral population 6.
• The first non-comment line following the line with time repeats the population tree string but
also contains information on the upper bounds for the prior distributions for splitting times.
Each population number in the string is followed by a colon, which is followed in turn by a
• The first non-comment line following the line with theta repeats the population tree string but
this time it contains information on the upper bound of the population size parameters. Each
population number (for sampled and ancestral populations) is followed by a colon and then by a
floating point number. For example, if populations 0, 1 and 4 have an upper bound of 5.0 and
the remainder have an upper bound of 10.0 the string would be:
((0:5.0,1:5.0)5:10.0,(2:10.0,3:10.0)4:5.0)6:10.0
• Following the line with migration, there is a matrix that provides the prior terms for migration
parameters (either upper bounds for uniform priors, or means for exponential priors). With n
sampled populations there are a total of k = 2n-1 populations (including sampled and
ancestral). The migration terms are given in a k×k matrix, with the migration term for the
parameter corresponding to migration from population i to population j (backwards in time, in
the coalescent) given in row i and column j of the matrix. The term for the migration parameter
for the reverse direction is of course given in row j and column i of the matrix. Many cells of the
matrix will be zero simply because it is not possible to have migration, given the population tree,
between the corresponding row and column populations. In the tree used here for examples -
((0,1)5,(2,3)4)6 - the migration parameters are m0→1, m1→0, m0→2, m2→0, m0→3, m3→0, m1→2, m2→1,
m1→3, m3→1, m2→1, m3→2, m0→4, m4→0, m1→4, m4→1, m4→5, m5→4. However there can be no
parameter m3→4 (or many others) simply because the two populations never coexist during any
time period (see Figure 1).
For any non-zero element in row i and col j of the migration term matrix there must also be a
non-zero element in row j and col i and vice versa. Similarly if either is zero, then the other must
also be zero. This is a basic constraint of the method - it is not possible in the MCMC simulation
Migration terms can be set to zero, with the effect of removing those migration terms from the
model. In the following example migration upper bounds are set to 2.0 between sister
populations (i.e. 0&1, 2&3 and 4&5), and to zero between all others (this actually mimics the –
j1 option if used with –m2.0). Rows are counted down from the top and columns starting
IMa3 will generate a priorfile when topologies are not being sampled and when hyperpriors are used
(see –j3).
The nested model file is a simple text format file with format as follows:
• Any line at any point in the file that begins with a pound sign, ‘#’, or that is empty is ignored. The
lines that begin with ‘#’ can be used for explanatory text.
• Each model begins on a new line with the word ‘model’ followed by some text explaining the model.
• After the line that begins with model there are one or more lines, each of which begins with either
with the word ‘constant’ or the word ‘equal’
• After the word, either ‘constant’ or ‘equal’, there is a ‘p’ or an ‘m’ indicated that the constraint
applies to either population or migration parameters.
• For lines that begin with the word ‘constant’, after the ‘p’ or ‘m’ there is a floating point number
that is the constant value that some parameters must take. The most typical use of this is to set
migration rates to zero in the model, but any constants can be used.
• For lines that begin with ‘equal’ after the ‘p’ or ‘m’, or for lines that begin with the word
‘constant’ and after the floating point constant value, the parameter numbers are given to which
the constraint (i.e. being equal to each other, or constant at the given value) apply. For example
• The ‘constant’ and ‘equal’ lines refer to parameters by their parameter number, which in the
case of migration rates is not related to the populations they apply to.
o The numbering of population size parameters is the same as is the index number of the
populations themselves (i.e. parameter # 4 applies to population #4).
o The numbering of migration rate parameters is more complex but it can be found simply by
looking at the sequence of migration parameters in tables in the main output file. Tests of
nested models are done in L-mode runs, so a the results of a previous M-mode run can be used
to get the numbering of migration parameters.
For models with just two sampled populations it is possible to develop models that impose constraints
on both population size and migration rate parameters. Samples of 100,000 genealogies seem to give
adequate estimates of the joint posterior density and of the corresponding joint posterior parameter
The file All_nested_model_tests_two_populations.txt gives all 24 nested models for the case
of two sampled populations.
6. OUTPUT FILES
The program generates up to five main types of output files, including: the main results file, genealogy
files (ending in .ti), Markov chain state files (ending in .mcf extension); migration histogram files (ending
in .mpt extension); and burntend files for update rates and sampling trends during the burnin period. In
addition, if using hyperpriors without sampling topologies, the program will generate a priorfile (see –
j3). The great majority of the time, the user will only need to look at the main results file.
This section near the top of the file gives the names of the tables (all upper case) that contain
the parameter estimates. This information is just to assist navigation in, what can be, a very
large file.
This section lists the starting information from the input file and the command line settings,
including parameter counts and priors and information on each of the loci in the data file.
SAMPLING SUMMARIES
This is a short section that gives information on the duration of the run, in terms of numbers of
steps, length of burnin, and numbers of samples taken. If sampling topologies (-j0) this also
MCMC INFORMATION
This section includes tables of MCMC update rates for the cold chain. In each of these tables is
listed the quantity being updated, the type of update, the number of attempted updates
(‘#Tries’), the number of accepted updates (‘#Accp’), and the acceptance rate (‘%’). If
multiple chains have been used there will be a table of swapping rates between successive
chains (i.e. adjacent heating values). Following the update rates, the output file has a table of
parameter autocorrelations and ESS (effective sample size) estimates. These are the same as
appear on the computer screen during the course of the run. ESS values are estimates of the
number of independent points that have been sampled for each parameter. They can be useful,
but are often highly unstable and should be used only in conjunction with other assessments of
mixing. This section is absent in Load-Genealogy mode.
This section appears only if topologies are being sampled (-j0) . It lists the sampled topologies
sorted by sampled frequency for all trees that were sampled at least once. Also shown are
sampled frequencies in the first half of the sample and the second half (set1 and set2).
Comparison of these two values can be very useful for assessing mixing. Also listed is the prior
probability for each tree (which will be 1 unless –x is used on the command line) and the
product of clade posterior probabilities for each tree (ppcp). If the number of possible trees is
not too large, then following this table will be a similar one of unsorted trees, including those
that were not sampled.
MEANS, VARIANCES AND CORRELATIONS OF PARAMETERS ('$' R > 0.4 '*' R >
0.75)
This section appears only with sampling genealogies with fixed priors. It does not appear when
using hyperpriors (-j3) or when sampling topologies. It provides a table of the mean and
standard deviation of the marginal posterior probability distribution for each of the population
size parameters and migration rate parameters (denoted by an m followed by the source
population, an arrow >, and the target population). Population size parameters are denoted by
a q followed by the population number. Migration rate parameters are denoted by an m
This section appears only when sampling topologies (-j0). It lists the populations in the model,
the topology prior, a summary of sampling information, including the most frequently sampled
topology and the topology with the highest product of posterior clade probabilities (ppcp). Also
provided are a list of topologies, ppcp values, and counts sorted by counts, and an unsorted list
of all topologies with counts.
This section provides estimates of population size and migration rate parameters. It appears
only with sampling genealogies with fixed priors. It does not appear when using hyperpriors (-
j3) or when sampling topologies.
A set of tables are provided that give the parameter estimates obtained from the location of the
peaks of the estimated marginal densities for each term. For each time period there are tables
for population size parameters, migration rate parameters, and population migration rates (i.e.
2NM terms). For each parameter set in each period, the parameters are listed as a table, with
each parameter heading a column. Only those parameters that appear first in that period are
listed for that period. For example, if a population size parameter extends through periods 1
and 2, then results will be listed under period 1 and not period 2. The table has the following
rows:
• ‘Parameter’ this row has the parameter labels alternated with the letter ‘P’ (for
probability). For migration rates and population migration rates the parameters refer to
migration backwards in time (i.e. “in the coalescent”). This is important to remember,
because when you write up your results you don’t want to get things backwards. For
migration and population migration rates the first number in the label is the population
in which the genes started and the second number is where they went to. For example
direction (i.e. from population 2 to 1 forwards in time). For 2N0M0>3 the genes went
from population 0 to 3 backwards in time and from 3 to 0 forwards in time.
• ‘Set0’ this row has the parameter estimates calculated using ½ of the genealogies (the
first half of all those that are in memory).
• ‘Set1’ this row has the parameter estimates calculated using the second ½ of the
genealogies that are in memory. If there are enough genealogies and they are
sufficiently independent of one another, then the Set0 and Set1 values should be quite
similar. This is yet another way to check to see if the MCMC simulation sufficiently
explored the state space.
• ‘All’ parameter estimates and probabilities using all the genealogies. These numbers
will not be very useful if the Set0 and Set1 values are not similar.
• ‘Pmax’ is the highest posterior marginal posterior density observed for that parameter.
• ‘LR95%Hi’ same as ‘LR95%Lo’ but for a value higher than the estimate.
• ‘LLRtest’ For migration rates and population migration rates the likelihood ratio test
of Nielsen and Wakeley (Nielsen and Wakeley 2001) is done. See also Hey (2010) .
Statistical significance is indicated by asterisks, or lack thereof by ‘ns’. These tests are
prone to false positives when actual divergence is quite low and the amount of data is
not large (Hey, et al. 2015). For 2NM values, the reported significance level for the
LLRtest results is actually copied from the LLRtest for the corresponding migration
If the option to get joint estimates and do LLR tests of nested models (-c3 together with –r0) is
invoked then IMa3 will generate a joint parameter estimate. This can only be used for two or
three population models. This table will also report the results of likelihood ratio tests of nested
models if a nested model file is loaded using the –w option.
The table reports the maximum joint posterior density, and the estimated parameter values at
that maximum, for various models. At the least a full model is analyzed, which for two
population models means a model with all three population size parameters and both migration
rate parameters. For more than two populations there are two ‘full’ models, one for all
population size parameters and one for all migration rate parameters, since a model with all of
both requires too many sampled genealogies and too much time to find the joint posterior
density estimate.
For each model the log of the posterior probability is shown with the parameter estimates.
Parameter values that are shown as bracketed (e.g. [0.01] ) represent the values for parameters
that are not in the model, either because they were set to a fixed value or because they were
set to be identical to another parameter.
For likelihood ratio tests the table lists the degrees of freedom (which may not apply in cases of
parameter values fixed at the boundary of a prior distribution, e.g. migration set to zero, see
(Hey and Nielsen 2007)) and the LLR statistic which can be compared with a χ2 distribution. Also
shown is the effective sample size (ESS) for the number of genealogies used to estimate the
parameters. For portions of the posterior density that are high and flat (e.g. as expected near
the peak) there are likely to be several genealogies that contribute substantially to the
calculation of the posterior density. However for parts of the surface that have lower
probability and are less flat, there is likely to be a much wider variance among genealogies in
their contribution to the estimate of the posterior. Typically many nested models have their
peak posterior density at a low point on the surface and for which there may be very few
genealogies that contribute much. These models will usually have an ESS value of 1.0 which
HISTOGRAMS
Here begins a set of large tables that are suitable for importing into a spreadsheet generating
figures if desired. For each set there are actually two tables, the first of which summarizes the
key things about the larger, second table. The larger table includes 1000 rows and gives the
estimated posterior probability for each of 1000 values of a model parameter. These tables can
be used for estimating parameter values and are suitable for importing into a spreadsheet
program and generating a plot. For demographic parameters (population size and mutation
rate parameters) the estimates obtained from these tables should be very similar to those
obtained from the table labeled MARGINAL PEAK LOCATIONS AND PROBABILITIES.
The histogram tables are generated for a variety of summaries of the data, but all include the
following information:
o ‘Value’ this line contains labels for this terms in the histograms
o ‘Minbin’ the midpoint value of the lowest bin of the histogram with a non-zero
value
o ‘Maxbin’ the midpoint value of the highest bin of the histogram with a non-
zero value
o ‘Mean’ the mean value calculated from the histogram. For terms with posterior
density functions this should be close to the mean calculated for the tables of
means and variances (see above).
o ‘95%Lo’ the estimated point to which 2.5% of the total area lies to the left. This
is probably not very useful. Note that this is different from the 95% lower limit
calculated on the basis of likelihood ratio assumptions and reported in the table
on MARGINAL PEAK LOCATIONS AND PROBABILITIES (See above).
o ‘95%Hi’ the estimated point to which 2.5% of the total area lies to the right-
see info for ‘95%Lo’.
o ‘HPD95Lo’ the lower bound of the estimated 95% highest posterior density
(HPD) interval. The 95% HPD interval is the shortest span (on the X axis) that
contains 95% of the posterior probability. A question mark, ‘?’, is added if the
HPD interval did not appear to be contiguous (in which case the HPD estimates
are not reliable) which can happen with multiple peaks or if the surface is rough.
A hatch symbol, '#', is added if the posterior density does not reach low levels
near either the upper or the lower limit of the prior. In such cases the HPD
intervals will obviously change with if the prior distribution is changed.
o ‘HPD95Hi’ - the upper bound of the estimated 95% HPD interval (see notes for
‘HPD95Lo’.
• Below the summary table the main histogram table begins. At the top are parameter
labels, alternating with the letter ‘P’. Below this is a row that gives the bin value and
probability for the highest probability in the table. If a pair of columns is imported into
a spreadsheet the first column will be the X axis values and the second column the Y axis
values. Below the table is a row title ‘SumP’. This is the sum of the probability values in
These are plots of the values for parameters in the MCMC simulation over the course of
the run. They are ASCII-based and crude, but are still quite useful for assessing how well
the parameters are mixing. Any sign of a trend over a substantial portion of the plot, or
from one end of the plot to the other, is indicative of too short a run. Similarly a plot in
which a substantial region of the parameter space is visiting only once or a small number
of times, is indicative of too short a run. A plot is also provided for Log(P) =
Log(P(Data|Genealogy)) + Log(P(Genealogy)), which is useful for an overall sense of how
well the chain is mixing. In the case of Load-Genealogy mode, trends are provided only
for the Log(P) values and the splitting time values that are saved in the ‘.ti’ files.
If topologies are sampled (-j0) a trend plot will appear showing topology number on the
y axis. Because this is a discrete quantity, and two topologies with adjacent numbers can
be very different, this plot is of limited usefulness with more than 4 populations (5
populations have 180 topologies). When there are 6 or more populations, topology
numbers are replaced by the Robinson-Foulds distance from topology 0.
Here begins a set of curves that are estimates of the marginal posterior densities for
model parameters. These plots are not suitable for publication as they are based on ASCII
characters and have low resolution, however they are useful for getting a rough picture of
the marginal densities. For population size and migration parameters, the plots are based
on the estimated marginal density functions. For terms that are in the MCMC simulation
(like splitting times and mutation rate scalars), the plots are based on recorded values. In
After this final step the user may find that they need to revisit some issues about their data or
their priors and to begin again.
1. The first step is to assemble the data to be used and to construct a data file. Be sure
to review the section of this manual on the Data file format.
a. For sequence data make sure that the data do not contain obvious signals of
past recombination events. Hey and Nielsen (2004) discuss this issue at some
length.
b. Remember that for DNA sequences any base positions with missing data (i.e.
ambiguous bases) or indels will cause all sequences for that locus to be ignored
at those base positions.
c. For microsatellite (STR) loci convert the allele designations to integers that
differ by the number of repeats. It is ok if you know only the relative number.
For example if you have STR allele lengths of 154, 156 and 158 and you know
the base repeat is of length 2 but you don’t know the length of the flanking
sequences (so you don’t know the absolute number of repeats), you can
convert this simply to 77,78 and 79.
e. Obtain estimates for as many loci as you can of the mutation rate per year (for
the complete locus, not per base pair). These are usually obtained by
comparing one of your sampled populations with another population for which
you have a divergence time estimate in years. Include this information in the
data file (see Data file format). These numbers are essential for getting
splitting time estimates in years and effective population size estimates.
f. If you don’t intend to estimate the phylogeny, and you have more than two
populations, figure out your phylogenetic tree for your sampled populations.
This is known of course for models with one or two populations.
2. Do you know your phylogeny? Or do you want to estimate it? If you are going to
estimate phylogeny then you will do a topology sampling runs (-j0) . It is also usually
recommended that hyperpriors be used (-j3) .
3. The next step is to figure out what values to use for the upper bounds on your prior
distributions or hyperprior distributions. In this checklist it is assumed that you are setting
priors by using the –q, -m and –t flags and that you are using uniform (not exponential)
priors for migration rates.
The prior distributions can have a very large impact on the analysis, both in a
conventional Bayesian sense of shaping the posterior distribution, but also (and
usually more importantly) in affecting the speed of the analysis. If the upper
bounds of the priors are too low, then the parameter estimates might be strongly
affected by the prior in a way the user did not intend. On the other hand the
higher the upper bounds of the priors the larger is the space of genealogies that
have high prior probabilities, which (unless the data strongly dominates over the
prior) effectively expands the state space of the MCMC simulation. This means
slower mixing and a longer run. This issue can be particularly acute with
Hyperpriors offer the great advantage that the user does not need to be as careful
optimizing or carefully selecting prior distributions, and so it may also be useful to
use hyperpriors for a fixed topology. The downside is that, when using
hyperpriors on a fixed topology, that the genealogies are actually not sampled
(splitting times and prior values are sampled) and the population sizes and
migration rates cannot be estimated without an additional run using priors
selected on the basis of a previous hyperprior run.
In general users should follow the instructions (below) for picking priors, and
then, because these values are intended to be somewhat liberal, if they wish to
do a hyperprior run, they can use the same values to specify the hyperprior
distributions.
ii. Let the largest value of these geometric means, across your sampled
populations, be x.
iii. Set your upper bound for your population size parameters to be 5x
(e.g.. if x=2.0 then use –q10.0)
iv. Set the upper bound of your splitting times to be 2x (i.e. if x=2.0 then
use –t4.0)
v. Set the upper bound of your migration rate parameters to be 2/x (i.e. if
if x=2.0 then use –m1.0)
ii. Geometric means are used because all parameters are scaled by the
geometric mean of the mutation rate across loci.
iii. For the prior on splitting times you also want to work from your
population size parameters. Splitting time and population size
parameters are on the same scale, and the splitting times of your species
(in generations) will generally be of the same order as 4N. If it was much
older than that, then you won’t be able to study it much anyway because
it will have been so long ago that drift will have removed the information
about ancestral populations and your genealogies will coalesce with few
lineages present in the ancestral populations. So a reasonable strategy for
your first run, is to pick a splitting time prior that is based on your
ballpark estimates of population size parameters. Another thing to keep in
mind is that if you do not have a lot of information in your data for
iv. For the prior on migration rates it is usually best to start with a value
not much higher than would correspond to a value of
2NM=(4Nu×M/u)/2 = 1. This is moderately high migration as it is, and
because many data sets can’t resolve migration posteriors very well it is
easy to get in a situation where you have a high prior on migration and
where mixing goes very badly. For this reason I suggest starting with an
upper bound on the migration prior that is a few times the inverse of the
ballpark estimate of the population size. I suggest 2 times the inverse, but
5 might also be fine. Unless you have a lot of data, starting with much
higher values is likely to lead to a poorly mixing MCMC simulation.
4. Once the data, model and priors are set it is useful to do a very short trial run that
stops pretty quickly after a burnin and a run. The purpose of this is to see that your data is
loading, and that you are getting updating. For example if your priors are ‘-q2 –m1 –t3’
and your input file is called ‘mydata.txt’, you could try the following command: IM –
imydata.txt –otrialrun.out –q2 –m1 –t3 –b100 –L100 –s123. If –j0 is
included topologies will be sampled, in which case –L100 specifies that 100 topologies
will be sampled. If the topology is fixed and genealogies are sampled then –L100
specifies that 100 genealogies will be sampled, which means that following the burning
the run will proceed for 10000 steps (i.e. the default number of steps between saving
genealogies is 100). After the run is completed you will have a file called trialrun.out
that shows the results. After such a short run the results file won’t be of any use for
parameter estimates, but it will contain summary information on your data and on the
MCMC update rates.
a. It can be useful to set the burnin to an indefinite period so that you can look at
the trend plots and decide when to stop the burnin (see notes for –b and the
use of the IMburn file). For example, to generate a burntrend file every 6
hours you would use –b6.0 and have an IMburn file in the directory with your
data. Then just look at trend plots in the burntrend file that is rewritten every 6
hours of the run. Typically these plots will show something like a curve that
plateaus. When all the trendplots, including those for Log[P] and all splitting
times have reached a sort of plateau after which there are no clear trends, you
can delete or rename the IMburn file. Then when the current burn period
reaches its end, one last burntrend file will be written and the actual run will
begin.
b. For the actual duration of the run it is often useful to use –L with a floating point
number (# of hours between generation of output files) and an IMrun file,
much as was done for the burnin period. If –L is followed by an integer then
this is how many genealogies will be saved and it will dictate the duration of
the run (i.e. IMrun is not used).
c. Almost all data sets with multiple loci require multiple Metropolis-coupled
chains, often a great many. The guidelines here are to be sure to pick heating
terms that lead to high swap rates between all adjacent pairs of chains and to
have enough heating that the overall MCMC simulation mixes well. Don’t
scrimp on the number of chains (see notes under –h ).
Below are some examples of geometric heating schemes for a couple types of
problems. These are simply based on experience running infinite sites data. If
you are sampling topologies, then you should increase these numbers, and if
you have microsatellite data, then you will probably need a great deal more
heating than suggested here:
• Very small data sets (< 4 loci < 20 individuals per locus) : data sets of this size
might do ok without multiple chains.
• Small data set (e.g. < 15 loci, < 20 individuals per locus), low heating:
5. Finally, when everything is set, including priors and heating terms, the user can set up
two or more runs (using different starting seed values). In the end you will want to have at
least several thousand topologies (or at least 10,000 genealogies) from at least two well
mixed runs. Whether or not these actually constitute a good sample from the target
density depends entirely on how well the simulation was mixing.
If you are sampling genealogies in order to estimate joint posteriors and do likelihood-
ratio tests you will need at least 100,000 genealogies. One way to do this is to do multiple
runs that save a lot of genealogies but that do only basic analyses (e.g. don’t use –c2
which can take a long time, and use –p4), and then after all these runs are done to do a
Load-Genealogy mode run that uses all the .ti files and that invokes all the analyses that
you want. When you do an Load-Genealogy mode run you can specify how many
genealogies to load.
If it is desired to join the results of multiple genealogy sampling runs using a single Load-
Genealogy mode run, it is necessary to keep in mind that in order to generate histograms
for effective population sizes and splitting times in years (i.e. parameters on demographic
scales), the user will need to provide the geometric mean of the mutation rate scalar
estimates for those loci in the data file that have a mutation rate (using -y on the
command line, this is if all loci have mutation rates provided) . These can be obtained
from the output files of the genealogy sampling runs.
IMa3 can accommodate anywhere from one to 8 sampled populations (9 with a ghost). Whether
or not phylogeny is being estimated, analyses with more than two populations present a number
-r2 or –r7 will save an mcf with a name based on the output filename
–r7 loads an mcf with a name based on the output file name (-o) including state space and recorded
values; saves a new mcf using the output (-o) name; adds to a preexisting .ti file; adds to preexisting
sampled values, no additional burn, regardless of –b values on the command line
–r67 loads an mcf with a name based on the output file name (-o ) including state space but not the
recorded values; saves a new mcf using the output (-o) name; overwrites a preexisting .ti file; no
–r3 will load the state space, but not recorded values, from the mcf file with name given by –f . A run
with –r3 can include an additional burn using –b.
–r37 load the mcf, ignore saved values, makes a new .ti file with new name, can do an additional
burn with -b , makes new mcf file with the output file name (-o).
1. Do preliminary runs to find heating terms that lead to both high heating values in the
most heated chains and high swap rates between adjacent chains. This might require
one hundred, or multiple hundreds of heated chains.
2. Run on multiple processors, with saving of checkpoint files and writing of a burntrend
file at intervals (e.g. with –b12.0 in the presence of an IMburn file so that a
burntrend file is generated every 12 hours). Continue the run, while checking the
burntrend file to assess mixing. Once convinced that the burn is sufficient, the
IMburn can be deleted or its name changed, and after the next interval the sampling
period will begin. Since checkpoint files are being saved, the actual –L value does not
really matter for these runs, as new sampling runs can be restarted from the mcf files.
3. Much as when assessing the burnin period, sampling runs can be done by writing
results at intervals (using –L followed by a floating point value in the presence of an
IMrun file). Once these chains have run sufficiently long that various indicators
suggest a good sample has been taken (e.g. ESS values are high, there are no
perceivable trends in the trendplots, set0 and set 1 samples are similar, etc) they can
4. If desired a new set of runs for genealogy sampling can be started by reloading the
.mcf files. These runs can do without a burnin period (i.e. –b0). Also it is possible to
start more runs for sampling genealogies than were used for the burnin. Each set of
.mcf file generated can be used to start multiple sampling runs, provided each is
given an additional burnin period so that they become independent of each other.
This time may be much less than was required for the initial burnin.
If mcf files are loaded with –r6 then the previously sampled values will not be
loaded and the run begins anew at the current value of the state space that is
given in the mcf file. This can be used to treat a previous run as a burnin run.
If mcf files are read and written with –r7 then the same command line can be
used repeatedly.
7.7. Cautions
One of the greatest challenges is knowing whether the chain is mixing sufficiently. Update rates, trend
plots, and comparisons within and between runs must all be considered for deciding if your runs are
long enough or have sufficient heating. One of the tools for assessing mixing is the ESS value. However
it is recommended that users not rely strongly on ESS estimates. These values are highly unstable over
Regarding the mutation model, and the type of data to use, the program runs fastest with the infinite
sites (IS) model (Kimura 1969), which may be a reasonable model for many nuclear gene loci sampled
from closely related species. However the IS model is certainly not completely accurate and it may be a
poor model for your data. For mitochondrial data the IS model is usually not reasonable and the HKY
(Hasegawa, et al. 1985) model is more likely to provide a reasonable fit (though even the HKY is often
not adequate for control region or D-loop data). The program will handle microsatellite (STR) data, but
be forewarned that runs will require many, many heated chains, that burnin times will be quite long,
and that a complete analysis may take much longer than would a data set of similar size under the IS
model. The runs in Hoelzel et al. (Hoelzel, et al. 2007), each took about a month on a single cpu. These
were for two populations, 15 loci, and a small number of alleles at each STR locus.
If the data set is small, and sometimes even if not, it can happen that a large portion of the likelihood
surface is very flat over the range of some parameters. In particular it is not uncommon to have high,
flat likelihoods for high values of t, for ancestral population sizes, or for migration rates. One may find
for example in the curve for a splitting time (typically that for the oldest split in the phylogeny) a peak at
a low value, and then a plateau that extends indefinitely to the right at an even higher value. This is
awkward because the highest likelihood appears to be associated with an infinitely wide range of
parameter values. In these situations, the data does not contain enough information to clearly identify
the model under the prior you have specified.
To run this executable under multiple processors on windows, the user will need to have installed the
Microsoft version of MPI (called Microsoft MPI, or sometimes MSMPI). After installation the user will
need to find the location of mpiexec.exe and put this location in their PATH variable (or include the full
path when running mpiexec). Users should be aware of the number of logical cores on their machine
Running at the command line in Linux is typically done using mpirun, similar to the way mpiexec is
used the windows example above.
10. COPYRIGHT
The source code files and this documentation are copyrighted. The source code may be modified as
needed to recompile for different computers, or with different runtime constants, as needed to analyze
your data. The source code may not be incorporated into other programs without permission from the
authors.
11. REFERENCES
Beerli P. 2004. Effect of unsampled populations on the estimation of population sizes and
migration rates between sampled populations. Mol Ecol 13:827-836.
Edwards AW. 1970. Estimation of the branch points of a branching diffusion process. Journal of
the Royal Statistical Society. Series B (Methodological):155-174.
Geyer CJ. 1991. Markov chain Monte Carlo maximum likelihood. Computing Science and
Statistics, Proceedings of the 23rd Symposium on the Interface:156-163.
Hasegawa M, Kishino H, Yano T. (1684 co-authors). 1985. Dating of the human-ape splitting by
a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22:160-174.