Complete Sequence Analysis of The Genome of The Bacterium Mycoplasma Pneumoniae
Complete Sequence Analysis of The Genome of The Bacterium Mycoplasma Pneumoniae
Complete Sequence Analysis of The Genome of The Bacterium Mycoplasma Pneumoniae
Zentrum für Molekulare Biologie Heidelberg, Mikrobiologie, Universität Heidelberg, 69120 Heidelberg, Germany
Received August 22, 1996; Revised and Accepted October 10, 1996 DDBJ/EMBL/GenBank accession no. U00089
*To whom correspondence should be addressed. Tel: +49 6221 54 68 27; Fax: +49 6221 54 58 93; Email: r.herrmann@mail.zmbh.uni-heidelberg.de
Present addresses: +QIAGEN GmbH, 40724 Hilden, Germany and Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
4421
Nucleic Acids
Nucleic Acids Research,
Research,1994,
1996,Vol.
Vol.22,
24,No.
No.122 4421
Computer assisted analysis ORFs (49.2%) were functionally assigned, based on significant
sequence similarities to genes or proteins from other organisms
Sequence assembly, map drawing and multiple alignments were with known functions (e.g. ribosomal proteins) or at least known
done with the Lasergene program package (DNA STAR). categories of function (e.g. proteins involved in cytadherence).
Other analyses were performed with the HUSAR (Heidelberg Significant similarities to proteins without known function from
Unix Sequence Analysis Resources) program package release 4.0 other bacteria, mostly M.genitalium, were shown for 181
at the German Cancer Research Center, Heidelberg, Germany. proposed ORFs (26.7%). We also included in this group those
This package is based on the GCG program package version M.pneumoniae proteins which were identified in protein extracts
Unix-8.1 of the Genetics Computer Group, Wisconsin. For of M.pneumoniae by monospecific antibodies or by the N-terminal
searching the DNA and protein databases [SWISS-PROT (19) and amino acid sequences of enriched proteins (26,27). The group of
PIR (20)] the FASTA (21) and BLAST (22) programs (BLASTX, ORFs without significant similarity or without indication for their
BLASTN and BLASTP) were used. Conserved motifs in proteins in vivo expression comprised 109 members (16.1%); 42 of them
and peptides were identified by using the program PROSITE carry characteristic motifs, which are not sufficient for defining
(23). Open reading frames (ORFs) were calculated by the a function. Examples of such motifs are the leucine zipper (29
program FRAMES allowing AUG (or GUG, UUG) as start
Figure 1. (Following two pages) The gene map of the complete M.pneumoniae genome. The arrows indicate the position and the size of the predicted ORFs. The colour
refers to the functional category in which the ORFs are sorted. The complete name of an ORF can be deduced by the cosmid name above the horizontal scale-line
and the number below the arrows (e.g. the ORF name of the first complete arrow in this figure is E07_orf1113). Rectangles above the scale-line indicate the size and
the position of different repetitive DNA sequences (see also Table 4).
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4422 Nucleic Acids Research, 1996, Vol. 24, No. 22
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4423 No.122
24,No.
Vol.22,
1996,Vol.
4423
Nucleic Acids
Nucleic Research,1994,
Acids Research,
4424 Nucleic Acids Research, 1996, Vol. 24, No. 22
Nucleic Acids
Nucleic Research,1994,
Acids Research,
Table 1. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4426 Nucleic Acids Research, 1996, Vol. 24, No. 22
Table 1. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4427 No.122
24,No.
Vol.22,
1996,Vol.
4427
Nucleic Acids
Nucleic Research,1994,
Acids Research,
Table 1. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4428 Nucleic Acids Research, 1996, Vol. 24, No. 22
Table 1. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4429 No.122
24,No.
Vol.22,
1996,Vol.
4429
Nucleic Acids
Nucleic Research,1994,
Acids Research,
Table 1. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4430 Nucleic Acids Research, 1996, Vol. 24, No. 22
Table 1. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4431 No.122
24,No.
Vol.22,
1996,Vol.
4431
Nucleic Acids
Nucleic Research,1994,
Acids Research,
Table 1. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4432 Nucleic Acids Research, 1996, Vol. 24, No. 22
Table 1. Continued
4433
Nucleic Acids
Nucleic Acids Research,
Research,1994,
1996,Vol.
Vol.22,
24,No.
No.122 4433
Table 1. Continued
coding densities have been also estimated for the smaller M.genita- 50 and 56 mol%) and for all 50 ribosomal protein genes (42.8 mol%)
lium genome (9) and for the genome of Haemophilus influenzae as an example for frequently translated genes. Codon usage of the
which is more than twice as large (30). The length of the proposed low and high G+C content subfractions is clearly influenced by the
proteins in M.pneumoniae ranges from 37 (4.3 kDa) to 1882 (209.4 DNA composition, favouring either codons with G/C or A/T at the
kDa) amino acids (Fig. 3). One of the largest proteins is the third position. The codon usage pattern differs also for the complete
cytadherence accessory protein HMW2 (F10_orf1818) and the genome and for genes which are frequently expressed like the ones
smallest identified protein is the 37 amino acid ribosomal protein coding for ribosomal proteins.
L36 (GT9_orf37). For practical reasons we introduced at the The most frequently used codons are AUU (Ile, 4.6%); AAA
beginning of the sequence analysis a cut-off point of 100 amino (Lys, 4.6%); UUU (Phe, 4.3%); GAA (Glu, 4.2%) and UUA
acids for proposed proteins unless we found smaller proteins such (Leu, 3.9%) and the most common amino acids are Leu (10.3%),
as some of the ribosomal proteins during the initial BLASTX Lys (8.5%), Ile (6.6%), Ala (6.6%) and Val (6.5%). The high
homology search. All intergenic or non coding regions were value for Lys is in agreement with the relative high percentage of
reanalyzed with a cut-off point of 50 amino acids and searches were proposed proteins with calculated isoelectric points between pH
done for specific small proteins. However, we cannot exclude the 9 and 12 (Fig. 4). The least frequently used codons are UGC (Cys,
possibility that some of the smaller proteins, not showing similarities 0.2%); CGA (Arg, 0.25%); AGG (Arg, 0.29%); AGA (Arg,
to known proteins from other organisms, have been missed in our 0.4%) and UGU (Cys, 0.55%).
analysis. All M.pneumoniae gene products were classified (Table 1 and 2),
The codon usage of M.pneumoniae is summarized in Table 3. We with some minor modifications, in accordance with criteria
compared it for all proposed genes, for the subsets of genes with a introduced for Escherichia coli (31) and adapted for the
low G+C (content below 35 mol%) and high G+C content (between classification of putative genes from H.influenzae. We added
4434 Nucleic Acids Research, 1996, Vol. 24, No. 22
Nucleic Acids
Nucleic Acids Research,
Research,1994,
1996,Vol.
Vol.22,
24,No.
No.122 4435
Mycoplasma pulmonis and experimental results on enzyme The number of genes involved in DNA repair is considerably
purification and characterization of enzyme activities were pub- smaller in M.pneumoniae than in the ‘standard’ eubacteria E.coli
lished (34). The results indicated that the polC gene from and B.subtilis or even H.influenzae with the smaller genome.
M.pulmonis also codes for a 3′–5′ exonuclease, and that the size of Mycoplasma pneumoniae codes only for 13 of the genes known
the predicted PolC protein, 1435 amino acids, is very similar to the to be involved in excision repair of DNA, recombination and SOS
PolC homolog B01_orf1443 in M.pneumoniae and that the repair. Thus the genes recB, recC, recD, recG and ruvC involved in
polymerase could be inhibited by compounds specific for PolC recombination are missing as well as the genes recN, recO, recQ and
proteins of Gram-positive bacteria. Furthermore, the authors recR involved in SOS repair in E.coli. Nevertheless, a rudimentary
provided some experimental evidence for a second, smaller stock of enzymes has been conserved in M.pneumoniae to permit
enzyme with DNA polymerase activity. Considering the charac- homologous recombination [RecA, Ssb, PolA (see above), GyrA,
terization data of DNA polymerase activities in M.pulmonis and GyrB, RuvA and RuvB] (36), excision repair (37) and a kind of
the nucleotide sequence data on DNA polymerase genes of truncated SOS repair (38). In particular missing is the lexA gene
M.pneumoniae and M.genitalium (9,35), one can conclude that at which plays a central role in regulating the SOS response including
least these three Mycoplasma species have two DNA polymerase the expression of the recA gene in other bacteria.
(polC) genes coding for a larger protein (≈1400 amino acids) with We were also unable to find components of the so called
a 3′–5′ exonuclease activity and with the highest sequence mismatch-repair system encoded by the mutS, mutL and mutH
similarities to the Gram-positive B.subtilis polymerase III. There- genes. Since bacteria which normally carry the mut genes show
fore it is unlikely that an increased mutation frequency is caused by a reduced genetic stability, if these genes are mutated, it seems
the DNA replication process. The nucleotide sequence of the likely that the absence of these genes in mycoplasmas causes an
smaller Pol III homolog (≈100 kDa) of M.pneumoniae and increased mutation rate (65).
M.genitalium (9,35) resembles more the polC gene from the
Gram-negative E.coli. This is also emphasized by the absence of Transcription
the 3′–5′ exonuclease domain in the proposed genes. The gene for
the smaller, Gram-negative typical PolC has not yet been found in The DNA dependent RNA polymerase of M.pneumoniae is
M.pulmonis, but during the purification of the larger PolC, a second coded by the conserved genes rpoA (α subunit), rpoB (β subunit),
polymerase activity lacking exonuclease activity has been identi- rpoC (β′ subunit) and rpoE (δ′ subunit). The only sigma factor
fied. The function of the exonuclease negative DNA polymerase found (H91_orf499) shares the highest similarity with the sigma
can only be elucidated experimentally and it remains to be seen if factor SigA from B.subtilis (39). Presently, not enough experimental
it can substitute for the function of the polymerase I (PolA) in data are available for defining promoter sequences in M.pneumo-
combination with the proposed 5′–3′ exonuclease of the truncated niae. The promoter of only three genes/operons have been
polA gene (A19_orf291). This topic has been also discussed for determined experimentally by primer extension. These genes are
M.genitalium (35). the P1 operon (14), the ribosomal RNA operon (40) and
In addition to the DNA polymerase many more gene products F10_orf405 (27). The –10 region and to a lesser extent the –35
are necessary for DNA replication, e.g. initiation, elongation and region of these three examples are comparable with consensus
termination (32). The most obvious functions missing in promoters sequences in B.subtilis (41). Termination of transcription
M.pneumoniae according to the sequence analysis are an RNaseH seems to be independent of the termination factor Rho, since the
for primer removal and a protein for the termination of corresponding gene could not be found. Transcription stops on
replication. typical terminator sequences which are short interrupted palin-
4436 Nucleic Acids Research, 1996, Vol. 24, No. 22
Translation
The translation machinery of M.pneumoniae is rather extensive.
dromic regions followed by a run of U residues. The Nus About 15% of all proposed ORFs, are involved in translation
transcription termination factors, of which NusA (E07_orf540) including 19 tRNA synthetases, 50 ribosomal proteins, various
and NusG (D09_orf320) are present, may play a role in the factors and enzymes, 33 tRNAs, one ribosomal RNA operon with
termination of transcription. NusB and NusC are absent. NusA is one copy of each 5S, 16S and 23S rRNA (45), and a gene coding
involved in termination and NusG in antitermination in other for the 10Sa RNA. The conservation of the 10Sa RNA which
bacteria. Finally, GreA promotes elongation by the RNA functions as tRNA and mRNA and is implicated in trans-translation
polymerase by utilizing a novel transcript-cleavage reaction (42). (66), is interesting in evolutionary terms. Three exceptions are
4437
Nucleic Acids
Nucleic Acids Research,
Research,1994,
1996,Vol.
Vol.22,
24,No.
No.122 4437
Table 4. List of the proposed ORFs, RNAs and REPs in numerical order starting with E07_orf540o on the gene map (Fig. 1)
Table 4. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4439 No.122
24,No.
Vol.22,
1996,Vol.
4439
Nucleic Acids
Nucleic Research,1994,
Acids Research,
Table 4. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4440 Nucleic Acids Research, 1996, Vol. 24, No. 22
Table 4. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4441 No.122
24,No.
Vol.22,
1996,Vol.
4441
Nucleic Acids
Nucleic Research,1994,
Acids Research,
Table 4. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4442 Nucleic Acids Research, 1996, Vol. 24, No. 22
Table 4. Continued
Downloaded from https://academic.oup.com/nar/article-abstract/24/22/4420/2385685 by guest on 31 May 2019
4443 No.122
24,No.
Vol.22,
1996,Vol.
4443
Nucleic Acids
Nucleic Research,1994,
Acids Research,
Table 4. Continued
4444 Nucleic Acids Research, 1996, Vol. 24, No. 22
Table 4. Continued the ORF6 gene of the P1 operon (40 kDa protein = C, 90 kDa
protein = B). The gene for A is still unknown. Another criterion
for a putative protein of the cytoskeleton-like structure is its
partitioning into the Triton X-100 insoluble fraction after treating
M.pneumoniae with this detergent. This fraction is ill defined and
comprises ∼50 proteins, of which only a subfraction is associated
with the cytoskeleton and/or cytadherence. The following
proteins have been identified as most likely components of a
cytoskeleton (2): HMW1 (H08_orf1018), HMW2 (F10_orf1818;
Krause, submitted), HMW3 (H08_orf672), P200
(D02_orf1036o) (49), P65 (F10_orf405) (27). These proteins,
noteworthy: the lack of the ribosomal protein S1, of the peptide
with the exception of HMW2, share some common peculiar
chain release factor 2 (RF2) and of the glutaminyl-tRNA
features, like an extended acidic proline rich domain and an
synthetase. So far, quite a number of Gram-positive bacteria
abnormal migration in SDS-PAGE (49). The adhesin P1 is mainly
including Bacillus or Lactobacillus species also lack the S1
Nucleic Acids
Nucleic Acids Research,
Research,1994,
1996,Vol.
Vol.22,
24,No.
No.122 4445
acid of the processed protein. The cleavage site including the for its inability to synthesize essential compounds like amino
cysteine and the three (positions –3, –2 and –1) upstream located acids. Three different transport systems, mainly involved in
amino acids, is to some extent conserved (–3: 37×L, 6×F, 1×A, import, were found in M.pneumoniae: (i) the ABC transporter
1×V; –2: 19×S, 10×A, 8×T, 6×V, 2×I; –1: 37×A, 7×S, 1×G). system (57) consisting of two ATP-binding, two membrane-span-
The number of lipoproteins in M.pneumoniae is relatively high ning and one substrate-binding domain which are frequently
compared with the Gram-negative bacteria E.coli and H.influenzae. present on separate polypeptides, but sometimes also consist of
Even in the closely related M.genitalium only 21 putative two or three different domains located on the same peptide
lipoproteins could be found by analyses of the published data (9). (D12_orf634 or D12_orf623), (ii) the phosphoenolpyruvate:
The lipoproteins of M.pneumoniae can be divided into six carbohydrate phosphotransferase system (PTS), (58) and (iii)
subgroups based on sequence similarities; also included in these facilitated diffusion systems with transmembrane proteins func-
groups are proteins with similarities to lipoproteins but without tioning as specific carriers. Mycoplasma pneumoniae codes for
the lipoprotein signature at the N-terminal end. Quite a number 43 genes involved in the above mentioned transport systems
of these proposed genes with high similarities are organized in according to the present status of annotation. In addition, there are
tandem. For instance seven lipoproteins and one protein without several proposed proteins with 6 or 12 transmembrane segments
Figure 5. Schematic diagram of the metabolic pathways of M.pneumoniae deduced from Table 1. Shaded arrows with question marks indicate missing enzymatic
activities.
4447
Nucleic Acids
Nucleic Acids Research,
Research,1994,
1996,Vol.
Vol.22,
24,No.
No.122 4447
It is also possible that one or more of the lipoproteins function as Mollicutes. These components can be used for the synthesis of
substrate-binding proteins. ribonucleotides by the salvage pathway as predicted from the
There is also evidence for bacterial ABC export systems in enzymes listed (Table 1, Fig. 5). The ribonucleotides are converted
M.pneumoniae (59). For example D12_orf634 (msbA), to deoxyribonucleotides by ribonucleoside–diphosphate reductase,
D12_orf623 (pmd1) and D02_orf660 (lcnDR3) have the con- an enzyme complex formed by the gene products of nrdE
served ATP binding motif and the membrane-spanning domains (F10_orf721) and nrdF (F10_orf339). Adenine, guanine and uracil
on the same polypeptide. In addition D12_orf623 and can be metabolized directly to the corresponding nucleoside
D12_orf634 show also significant similarities to multidrug monophosphates by the enzymes adenine phosophoribosyl-
resistance proteins of different organisms. transferase (apt, F11_orf133), hypoxanthine-guanine phosphoribo-
Among the proposed PTS transport systems, we identified one syltransferase (hpt, K05_orf175) and uracil phosphoribosyl-
for glucose and one for mannitol. They are similar to the transferase (upp, B01_orf178). Uridylate, adenylate and guanylate
homologous systems from several Gram-positive bacteria, with kinases catalyze the generation of ADP, GDP and UDP.
a EIIA and EIIBC domains on two separate polypeptides for the Surprisingly, we could not find the nucleoside diphosphate kinase
mannitol transport system and with three domains (EIIABC) of (ndk), the key enzyme for the conversion from NDP to NTP. This
Pyruvate can be further metabolized by two alternative reactions, find any indication for a number of genes/proteins, which should
either to lactate by lactate dehydrogenase (K05_orf312) or to be there based on experimental evidence. Mycoplasma pneumoniae
acetyl-CoA by the pyruvate dehydrogenase complex and further has been shown to be motile and to exhibit chemotactic behaviour
to acetate by the phosphotransacetylase (A05_orf320, pta) and the (64). Motility genes are difficult to identify since the motility in
acetate kinase (G12_orf390, ackA). The pyruvate dehydrogenase M.pneumoniae is independent of pili or flagella and it is not yet
complex consists of E1α (F11_orf358a) E1β (F11_orf327), the known which are potential candidates. Therefore, any progress in
two subunits of the pyruvate dehydrogenase, the dihydrolipoamide this field depends on the isolation of mutants. Furthermore, none
acetyltransferase E2 (F11_orf402) and the dihydrolipoamide of the components of the chemotactic signal pathway, the Che
dehydrogenase E3 (F11_orf457). The corresponding genes are proteins, which are well conserved among bacteria, or any other
clustered (nt 549 943–557 431; pcosMPF11); part of this cluster ‘two-component signal transduction system’ could be detected.
also contains the genes coding for NADH oxidase (nox, Chemotactic behaviour in M.pneumoniae is difficult to study.
F11_orf479) and lipoate protein ligase (lplA, F11orf339). The While it might be possible that these bacteria are chemotaxis
later enzyme joins lipoic acid in an amide linkage to the ε amino negative, only additional experiments will clarify this point.
group of a lysine residue of the dihydrolipoamide acetyltransferase. It has been reported that M.pneumoniae produces hydrogen
Nucleic Acids
Nucleic Acids Research,
Research,1994,
1996,Vol.
Vol.22,
24,No.
No.122 4449
One obvious topic is the comparative analysis between the 26 Proft, T. and Herrmann, R. (1994) Mol. Microbiol., 13, 337–348.
completely sequenced genomes of the closely related species 27 Proft, T., Hilbert, H., Layh Schmitt, G. and Herrmann, R. (1995)
J. Bacteriol., 177, 3370–3378.
M.pneumoniae and M.genitalium (9). Since the present paper is 28 Razin, S. and Jacobs, E. (1992b) J. Gen. Microbiol., 138, 407–422.
already very voluminous we decided to publish this analysis in an 29 Ruland, K., Wenzel, R. and Herrmann, R. (1990) Nucleic Acids Res., 18,
additional paper (Himmelreich et al., in preparation). 6311–6317.
30 Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E.
F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J.
ACKNOWLEDGEMENTS M. et al. (1995) Science, 269, 496–512.
31 Riley, M. (1993) Microbiol. Rev., 57, 862–952.
We thank R. Frank and A. Bosserhoff for the synthesis of 32 Baker, T. A. and Wickner, S. H. (1992) Annu. Rev. Genet., 26, 447–477.
oligonucleotides, B. Reiner for her expertise in computer data 33 Mills, L. B., Stanbridge, E. J., Sedwick, W. D. and Korn, D. (1977)
analysis, Raphael Mosbach for his technical assistance concern- J. Bacteriol., 132, 641–649.
ing hardware problems, U. Leibfried for technical assistance, I. 34 Barnes, M. H., Tarantino, P. M., Jr., Spacciapoli, P., Brown, N. C., Yu, H.
Schmidt for preparing the manuscript, D. Hofmann and H. and Dybvig, K. (1994) Mol. Microbiol., 13, 843–54
35 Koonin, E. V. and Bork, P. (1996) Trends Biochem. Sci., 21, 128–129.
Göhlmann for reading of the manuscript and H. Schaller for 36 Camerini-Otero, R. D. and Hsieh, P. (1995) Annu. Rev. Genet., 29,