ITS As An Environmental DNA Barcode For Fungi: An: in Silico Approach Reveals Potential PCR Biases
ITS As An Environmental DNA Barcode For Fungi: An: in Silico Approach Reveals Potential PCR Biases
ITS As An Environmental DNA Barcode For Fungi: An: in Silico Approach Reveals Potential PCR Biases
Abstract
Background: During the last 15 years the internal transcribed spacer (ITS) of nuclear DNA has been used as a target for
analyzing fungal diversity in environmental samples, and has recently been selected as the standard marker for fungal
DNA barcoding. In this study we explored the potential amplification biases that various commonly utilized ITS primers
might introduce during amplification of different parts of the ITS region in samples containing mixed templates
('environmental barcoding'). We performed in silico PCR analyses with commonly used primer combinations using
various ITS datasets obtained from public databases as templates.
Results: Some of the ITS primers, such as ITS1-F, were hampered with a high proportion of mismatches relative to the
target sequences, and most of them appeared to introduce taxonomic biases during PCR. Some primers, e.g. ITS1-F,
ITS1 and ITS5, were biased towards amplification of basidiomycetes, whereas others, e.g. ITS2, ITS3 and ITS4, were
biased towards ascomycetes. The assumed basidiomycete-specific primer ITS4-B only amplified a minor proportion of
basidiomycete ITS sequences, even under relaxed PCR conditions. Due to systematic length differences in the ITS2
region as well as the entire ITS, we found that ascomycetes will more easily amplify than basidiomycetes using these
regions as targets. This bias can be avoided by using primers amplifying ITS1 only, but this would imply preferential
amplification of 'non-dikarya' fungi.
Conclusions: We conclude that ITS primers have to be selected carefully, especially when used for high-throughput
sequencing of environmental samples. We suggest that different primer combinations or different parts of the ITS
region should be analyzed in parallel, or that alternative ITS primers should be searched for.
18S (SSU) and 28S (LSU) genes in the nrDNA repeat unit fungi in the Dikarya, Ascomycota and Basidiomycota.
(Figure 1). The large number of ITS copies per cell (up to Ascomycota represents the largest phylum of Fungi, with
250; [13]) makes the region an appealing target for over 64,000 species, while Basidiomycota contains about
sequencing environmental substrates where the quantity 30,000 described species [21]. In total those two groups
of DNA present is low. The entire ITS region has com- represent 79% of the described species of true Fungi.
monly been targeted with traditional Sanger sequencing The aim of this study was to analyse the biases com-
approaches and typically ranges between 450 and 700 bp. monly used ITS primers might introduce during PCR
Either the ITS1 or the ITS2 region have been targeted in amplification. First, we addressed to what degree the var-
recent high-throughput sequencing studies [14-17], ious primers mismatch with the target sequence and
because the entire ITS region is still too long for 454 whether the mismatches are more widespread in some
sequencing or other high-throughput sequencing meth- taxonomic groups. Second, we considered the length
ods. Using high-throughput sequencing, thousands of variation in the amplified products, in relation to taxo-
sequences can be analysed from a single environmental nomic group, to assess amplification biases during real
sample, enabling in-depth analysis of the fungal diversity. (in vitro) PCR amplification, as shorter DNA fragments
Various primers are used for amplifying the entire or are preferentially amplified from environmental samples
parts of the ITS region (Figure 1). The most commonly containing DNA from a mixture of different species [22].
used primers were published early in the 1990's (e.g. Finally, we analyzed to what degree the various primers
[18,19] when only a small fraction of the molecular varia- co-amplify plants, which often co-occur in environmen-
tion in the nrDNA repeat across the fungal kingdom was tal samples. For these purposes we performed in silico
known. Several other ITS primers have been published PCR using various primer combinations on target
more recently [20] but have not been used extensively sequences retrieved from EMBL databases as well as sub-
compared to the earlier published primers. However, lit- set databases using the bioinformatic tool EcoPCR [23].
tle is actually known about the potential biases that com- In order to better simulate real PCR conditions, we
monly used ITS primers introduce during PCR allowed a maximum of 0 to 3 mismatches except for the 2
amplification. Especially during high-throughput last bases of each primer and we assessed the melting
sequencing, where quantification (or semi-quantifica- temperature (Tm) for each primer in relation to primer
tion) of species abundances is also possible to a certain mismatches.
degree (although hampered by factors like copy-number
variation), primer mismatches might potentially intro- Methods
duce large biases in the results because some taxonomic Compilation of datasets
groups are favoured during PCR. Our main focus in this The EcoPCR package contains a set of bioinformatics
study is on the two dominating taxonomic groups of tools developed at the Laboratoire d'Ecologie Alpine,
Grenoble, France ([23], freely available at http://
www.grenoble.prabi.fr/trac/ecoPCR). The package is
composed of four pieces of software, namely 'ecoPCRFor-
mat', 'ecoFind', 'ecoPCR' and 'ecoGrep'. Briefly, EcoPCR is
based on the pattern matching algorithm agrep [24] and
selects sequences from a database that match (exhibit
similarity to) two PCR primers. The user can specify (1)
which database the given primers should be tested
against, and (2) the primer sequences. Different options
allow specification of the minimum and maximum ampli-
fication length, the maximum count of mismatched posi-
tions between each primer and the target sequence
(excluding the two bases on the 3'end of each primer),
and restriction of the search to given taxonomic groups.
The ecoPCR output contains, for each target sequence,
amplification length, melting temperature (Tm), taxo-
nomic information as well as the number of mismatched
Figure 1 Commonly used primers for amplifying parts or the en-
positions for each strand.
tirety of the ITS region. a) Relative position of the primers, design of
the subsets and number of sequences in each subset. b) Primer se- First, we retrieved from EMBL sequences from fungi in
quences, references and position of the primer sequence according to the following categories: 'standard', 'Genome sequence
a reference sequence of Serpula himantioides (AM946630) stretching scan', 'High Throughput Genome sequencing', 'Whole
the entire nrDNA repeat. Genome Sequence' from ftp://ftp.ebi.ac.uk/pub/data-
Bellemain et al. BMC Microbiology 2010, 10:189 Page 3 of 9
http://www.biomedcentral.com/1471-2180/10/189
bases/embl/release/ (release embl_102, January 2010) to introns), ITS3-ITS4 and ITS5-ITS2. From dataset 3 we
create our initial database. It corresponds to 1,212,954 used the combinations ITS3-ITS4 and ITS3-ITS4B. Dur-
sequences including approximately 79,500 ITS sequences ing these virtual PCRs we also allowed from 0 to 3 mis-
(estimated from EMBL SRS website requesting for fungi matches between each primer and the template, except in
sequences annotated with 'ITS' or 'Internal Transcribed the 2 bases of the 3' primer end.
Spacer'). These ITS entries refer to more than 10,800
taxa. This database hereafter referred to as the "fungi Assessing the degree of primer mismatches and Tm
database" was compiled using EcoPCRFormat. For all in silico internal amplifications from each subset,
To assess the specificity of the primers to fungi, we used we assessed the proportion of sequences retrieved when
the plant database from EMBL (release embl_102, Janu- allowing for 0 to 3 mismatches between each primer and
ary 2010 from ftp://ftp.ebi.ac.uk/pub/databases/embl/ the template. For the amplifications from each subset, we
release/) to run amplifications using the same primers as used an external primer (one of the primers used to cre-
for fungi. This database, hereafter referred to as the "plant ate the subset) and an internal primer. Therefore, for each
database", contained 1,253,565 sequences, including analysis, we assessed the proportion of sequences includ-
approximately 65,000 ITS sequences (estimated from ing mismatches for the internal primer only. The primer
EMBL SRS website requesting for viridiplantae sequences pair ITS5-ITS2 was evaluated both for subset 1 and sub-
annotated with 'ITS' or 'Internal Transcribed Spacer'). set 2, with the focus on ITS5 for subset 1 and on ITS2 for
These ITS entries refer to more than 6,100 taxa. This subset 2 (as those primers correspond to internal primers
database was also compiled using EcoPCRFormat. within their respective subsets). Similarly, the primer pair
As there are relatively few sequences submitted to pub- ITS3-ITS4 was evaluated both for subsets 2 and 3, with
lic databases covering the entire ITS region as well as the the focus on ITS3 in subset 2 and ITS4 in subset 3. The
commonly used universal primer sites in the flanking primer ITS1 was evaluated both for subset 1 (with the
SSU and LSU regions, we created three subset datasets combination ITS1-ITS2) and for subset 2 (with the com-
covering either ITS1, ITS2 or the entire ITS region. From bination ITS1-ITS4) as ITS2 and ITS4 were used as exter-
the initial fungi database, we compiled three subset data- nal primers in subsets 1 and 2, respectively.
bases (hereafter referred to as subset 1, 2, and 3) by in sil- To assess whether certain taxonomic groups were more
ico amplification (see below) of target sequences using prone to mismatches, we assessed the proportion of
the following primer pairs: NS7-ITS2 (dataset 1, focused sequences including one mismatch for each of the three
on ITS1 region), ITS5-ITS4 (dataset 2, including both taxonomic groups 'ascomycetes', 'basidiomycetes' and
ITS1 and ITS2 regions) and ITS3-LR3 (dataset 3, focused 'non-dikarya' (the latter is a highly polyphyletic group
on ITS2 region). To simulate relatively stringent PCR including e.g. Blastocladiomycota, Chytridiomycota,
conditions, a single mismatch between each primer and Glomeromycota and Zygomycota [25]). We also assessed
the template was allowed except in the 2 bases of the 3' the Tm for each primer based on the analyses from inter-
primer end. These three subsets were then compiled nal amplifications, allowing a single mismatch. The Tm is
using EcoPCRFormat and included 1291, 5924 and 2459 defined as the temperature at which half of the DNA
partial nrDNA sequences, respectively. strands are in the double-helical state and half are in the
"random-coil" states. The strength of hybridization
In silico amplification and primer specificity to fungi between the primers and the template affects Tm. It is
Using EcoPCR, we ran in silico amplifications from both therefore informative to assess how Tm decreases as the
the fungi and the plant databases using various com- number of mismatches increases, i.e. with less stringent
monly used primer combinations, to assess the number PCR conditions. Tm was calculated in ecoPCR based on a
of amplifications and the specificity of the primers to thermodynamic nearest neighbor model [26]. Exact com-
fungi. For each amplification, we allowed from 0 to 3 mis- putation was performed following [27].
matches between each primer and the template (exclud-
ing mismatches in the 2 bases of the 3' primer end) in Assessing bias in amplification length relative to taxonomic
order to simulate different stringency conditions of PCRs. group
Secondly, from the three subsets, we amplified sequences To further assess the taxonomic bias introduced by the
using different internal primer combinations in order to use of the different primer pairs, we separated the ampli-
evaluate the various primers (Figure 1). From dataset 1 fied sequences from selected analyses into the groups
we used the primer combinations ITS1-F-ITS2, ITS5- 'ascomycetes', 'basidomycetes' and 'non-dikarya' based on
ITS2 and ITS1-ITS2. From dataset 2 we used the combi- their taxonomic identification number, using the ecoGrep
nations ITS1-ITS4 (amplifying both ITS1 and ITS2 tool. These selected analyses were (1) the three subsets,
Bellemain et al. BMC Microbiology 2010, 10:189 Page 4 of 9
http://www.biomedcentral.com/1471-2180/10/189
and (2) all internal amplifications within each subset with Primer mismatches in sequence subsets
one mismatch allowed. The amplification length was The selected ITS primers showed large variation in their
reported for each analysis. ability to amplify fungal sequences from the three subsets
when allowing different number of mismatches (Figure
Results 2). All primer pairs amplified at least 90% of the
Relative amplification of different primer combinations sequences when allowing two or three mismatches, with
from the fungi and plant databases the exception of ITS4-B (see below). It is noteworthy that
The number of fungal versus plant sequences amplified in the percentages of sequences were quite similar for two
silico with various ITS primer combinations directly from and three mismatches, indicating that rather few
the raw data downloaded from EMBL (Table 1) mainly sequences included three mismatches. Under strict con-
reflected the number of sequences deposited. However, ditions (i.e. allowing no mismatches), the proportion of
the number of amplified sequences varied considerably amplified sequences varied considerably between primer
with varying stringency conditions (in this context allow- pairs, ranging from 36% for ITS1-F to 81% for ITS5 (Fig-
ing zero to three mismatches) across different primer ure 2).
combinations (see Table 1 for details). Only a few plant Allowing one mismatch increased the proportion of
ITS sequences were amplified using the fungus-specific amplified sequences from 36% to 91.6% for the com-
primer ITS1-F (ranging from 20 to 24 sequences under monly used primer ITS1-F, implying that more than half
different stringency conditions). Assessing these of the amplified sequences included one mismatch. ITS5
sequences using Blast, 20 out of 24 were revealed to be amplified the highest proportion of the sequences when
fungal sequences erroneously deposited as algae from an allowing for a single mismatch (97.5%), and less than 10%
unpublished study (six Liagora species, two Caulerpa of the sequences in each taxonomic group included one
species, Helminthocladia australis, and Ganonema mismatch. The primer ITS1, on the other hand, only
farinosum). There was a sequence deposited as Chorella amplified 56.8% and 65.9% of the sequences from subsets
matching a fungal sequence. The three others were Chlo- one and two, respectively, when allowing no mismatches.
rarachniophyte species that did not match any known Allowing three mismatches, ITS1 was still only able to
fungal sequence. Some of the other primer combinations, amplify 92% of the sequences in subsets one and two.
including ITS1-ITS2, amplified a high number of plant Allowing no mismatches, the complementary primers
sequences from different orders. We also confirmed that ITS2 and ITS3 amplified 79.4% and 77.3% of all
the assumed basidiomycete-specific primer ITS4-B did sequences respectively, in subset 2. Allowing one mis-
not amplify any plant sequences even when allowing 3 match, these numbers increased to 87.5 and 90%, respec-
mismatches. tively. Primer ITS4 amplified 74.9% of all sequences in
Table 1: Number of plant and fungi ITS sequences amplified in silico from EMBL fungal and plant databases, using the
various primer combinations and allowing none to three mismatches.
Number of mismatches * 0 1 2 3 0 1 2 3
Table 2: Percentage of sequences amplified in silico, allowing one mismatch, from ascomycetes, basidiomycetes and 'non-
Dikarya' with different primer combinations and using the three sequence subsets 1-3 (see Material and Methods) as
templates.
Table 3: Melting temperature (Tm) of each primer according to the number of mismatches allowed between the primer
and the target sequence.
Conclusion
The in silico method used here allowed for the assess-
ment of different parameters for commonly used ITS
primers, including the length amplicons generated, taxo-
nomic biases, and the consequences of primer mis-
matches. The results provide novel insights into the
relative performance of commonly used ITS primer pairs.
Our analyses suggest that studies using these ITS primers
to retrieve the entire fungal diversity from environmental
samples including mixed templates should use lower
annealing temperatures than the recommended Tm to
allow for primer mismatches. A high Tm has been used in
most studies, which likely biases the inferred taxonomic
composition and diversity. However, one has to find a bal-
ance between allowing some mismatches and avoiding
Figure 3 Box plots illustrating length differences between the non-specific binding in other genomic regions, which can
amplicons obtained using different primer combinations for
also be a problem.
each of the three subsets. The plot in each subset represents the
primer pair used to create the subset (*). Considering the different types of biases (specificity to
fungi; mismatches; length; taxonomy), we suggest that
different primer combinations targeting different parts of
PCR conditions. Overall, the results indicate that it is
the ITS region should be analyzed in parallel. When deal-
important to assess the specificity of the amplification in
ing with single culture isolates compared to environmen-
relation to PCR stringency before interpreting the results
tal samples, the choice of a primer pair to amplify ITS is
from environmental samples in terms of abundance and
less problematic because there is no 'competition'
diversity.
between DNA fragments of different taxonomic groups/
Our in silico analyses further indicate that most of the
lengths, and the DNA quality is generally higher.
primers will introduce a taxonomic bias due to higher
This study also illustrates potential benefits of using a
levels of mismatches in certain taxonomic groups. When
bioinformatics approach before selecting primer pairs for
Bellemain et al. BMC Microbiology 2010, 10:189 Page 8 of 9
http://www.biomedcentral.com/1471-2180/10/189
Table 4: Number (percentages) of sequences and amplified in each of the most common Basidiomycete groups, from the
original subset3 and from the amplification of ITS3-ITS4-B from subset3, allowing no or 3 mismatches.
11. Nilsson R, Ryberg M, Abarenkov K, Sjökvist E, Kristiansson E: The ITS region 34. Sipos R, Szekely A, Palatinszky M, Revesz M, K M, Nikolausz M: Effect of
as a target for characterization of fungal communities using emerging primer mismatch annealing temperature and PCR cycle number on
sequencing technologies. FEMS Microbiology Letters 2009, 296:97-101. 16S rRNA gene -targetting bacterial community analysis. FEMS
12. Nilsson R, Ryberg M, Kristiansson E, Abarenkov K, Larsson K, Koljalg U: Microbiology Ecology 2007, 60:341-350.
Taxonomic reliability of DNA sequences in public sequence databases: 35. Engelbrektson A, Kunin V, Wrighton K, Zvenigorodsky N, Chen F, Ochman
a fungal perspective. PLoS One 2006, 1(1):e59. H, Hugenholtz P: Experimental factors affecting PCR-based estimates of
13. Vilgalys R, Gonzalez D: Organisation of ribosomal DNA in the microbial species richness and evenness. The International Society for
basidiomycete Thanatephorus praticola. Current Genetics 1990, Microbial Ecology Journal 2010. doi:10.1038/ismej.2009.153
18:277-280. 36. Huber J, Morrison H, SM H, Neal P, Sogin M, Welch D: Effect of PCR
14. Buée M, Reich M, Murat C, Morin E, Nilsson R, Uroz S, Martin F: 454 amplicon size on assessments of clone library microbial diversity and
Pyrosequencing analyses of forest soils reveal an unexpectedly high community structure. Environmental Microbiology 2009,
fungal diversity. New Phytologist 2009, 2:449-456. 11(5):1292-1302.
15. Ghannoum M, Jurevic R, Mukherjee P, Cui F, Sikaroodi M, Naqvi A, Gillevet
P: Characterization of the Oral Fungal Microbiome (Mycobiome) in doi: 10.1186/1471-2180-10-189
Healthy Individuals. PLoS Pathogens 2010, 6(1):e1000713. Cite this article as: Bellemain et al., ITS as an environmental DNA barcode for
16. Jumpponen A, Jones K: Massively parallel 454-sequencing of Quercus fungi: an in silico approach reveals potential PCR biases BMC Microbiology
macrocarpa phyllosphere fungal communities indicates reduced 2010, 10:189
richness and diversity in urban environments. New Phytologist 2009,
184:438-448.
17. Jumpponen A, Jones K, Mattox J, Yeage C: Massively parallel 454-
sequencing of Quercus spp. ectomycorrhizosphere indicates
differences in fungal community composition richness, and diversity
among urban and rural environments. Molecular Ecology 2010 in press.
18. Gardes M, Bruns T: ITS primers with enhanced specificity for
basidiomycetes - application to the identification of mycorrhizae and
rusts. Molecular Ecology 1993, 2(2):113-118.
19. White T, Bruns T, Lee S, Taylor J: Amplification and direct sequencing of
fungal ribosomal RNA genes for phylogenetics. In PCR-protocols a guide
to methods and applications Edited by: Innis MA, Gelfand DH, Sninski JJ,
White TJ. San Diego: Academic press; 1990:315-322.
20. Martin K, Rygiewicz P: Fugal-specific primers developed for analysis of
the ITS region of environmental DNA extracts. BMC Microbiology 2005,
5:28.
21. Kirk PM, Cannon PF, David JC, Stalpers J: Ainsworth and Bisby's
Dictionary of the Fungi. 9th edition. Wallingford UK: CAB International;
2001.
22. Deagle B, Eveson J, Jarman S: Quantification of damage in DNA
recovered from highly degraded samples - a case study on DNA in
faeces. Frontiers in Zoology 2006, 3:11.
23. Ficetola GF, Coissac E, Zundel S, Riaz T, Shehzad W, Bessière J, Taberlet P,
Pompanon F: An In silico approach for the evaluation of DNA barcodes.
BMC Genomics in press.
24. Wu S, Mamber U: Agrep- a fast approximate pattern matching tool.
Proceedings of the Winter 1992 USENIX Conference San Francisco USA.
Berkeley 1992:153-162.
25. James T, et al.: Reconstructing the early evolution of Fungi using a six-
gene phylogeny. Nature 2006, 443:818-822.
26. SantaLucia JJ, Hicks D: The thermodynamics of DNA structural motifs.
Annual Review of Biophysics and Biomolecular Structure 2004, 33:415-440.
27. Duitama J, Kumar D, Hemphill E, Khan M, Mandoiu I, Nelson C:
Primerhunter: a primer design tool for pcr-based virus subtype
identification. Nucleic Acids research 2009, 37(8):2483-2492.
28. Peay K, Kennedy P, Davies S, Tan S, Bruns T: Potential link between plant
and fungal distributions in a dipterocarp rainforest: community and
phylogenetic structure of tropical ectomycorrhizal fungi across a plant
and soil ecotone. New Phytologist 2010, 185:529-542.
29. Harris D: Can you bank on GenBank? Trends in Ecology and Evolution
2003, 18(7):317-319.
30. Landeweert R, Leeflang P, Kuyper T, Hoffland E, Rosling A, Wernars K, Smit
E: Molecular identification of ectomycorrhizal mycelium in soil
horizons. Applied and Environmental Microbiology 2003, 69(1):. DOI:
10.1128/AEM.1169.1121.1327-1333.2003
31. Robinson C, Szaro T, Izzo A, Anderson I, Parkin P, Bruns T: Spatial
distribution of fungal communities in a coastal graasland soil. Soil
Biology and Biochemistry 2009, 41:414-416.
32. Hong S, Bunge J, Leslin C, S J, Epstein S: Polymerase chain reaction
primers miss half of rRNA microbial diversity. The ISME shopping 2009,
3:1365-1373.
33. Jeon S, Bunge J, Leslin C, Stoeck T, Hong S, Epstein S: Environmental rRNA
inventories miss over half of protistan diversity. BMC Microbiology 2008,
8:222.