Smallpox Dna Sequence
Smallpox Dna Sequence
Smallpox Dna Sequence
Nucleotide sequence of 21.8 kbp of variola major virus strain Harvey and
comparison with vaccinia virus
Begofia Aguado,1" lan P. Selmes and Geoffrey L. Smith*
Sir William Dunn School o f Pathology, University o f Oxford, South Parks Road, Oxford OX1 3RE, U.K.
A 21-8 kbp region of the genome ofvariola major virus lutinin gene there is a deletion of 1910 bp so that the
(strain Harvey), a virus that caused haemorrhagic-type equivalent of vaccinia virus gene SalF17R is truncated,
smallpox, has been sequenced and shown to possess and SalF 16R, which shows amino acid similarity to the
96 % nucleotide identity to the corresponding region of tumour necrosis factor receptor, is absent. The region
vaccinia virus, the smallpox vaccine. Overall the gene sequenced includes the genes for thymidylate kinase
arrangement in the two viruses is highly similar and and DNA ligase both of which are active in vaccinia
individual open reading frames (ORFs) display a high virus and are highly conserved in variola virus. Other
degree of amino acid identity, for instance 26 of the 32 conserved ORFs with interesting homologies are those
variola virus ORFs have i> 90% identity with their encoding profilin, superoxide dismutase and part of
vaccinia virus counterparts. A remarkable difference is guanylate kinase. Two vaccinia virus genes encoding
the disruption of seven vaccinia virus ORFs into small glycoproteins of the outer envelope of extracellular
fragments in variola virus. These include the variola enveloped virus are also conserved in variola virus and
virus homologue of vaccinia virus SalF2R, which this homology is likely to have contributed to the
encodes a protein related to C-type animal lectins, and immunological protection which vaccinia virus evoked
SalF7L, which encodes an active 3p-hydroxysteroid against smallpox. Lastly, there are multiple instances
dehydrogenase enzyme that contributes to vaecinia in which short oligonucleotide direct repeats flank a
virus virulence. Upstream of the variola virus haemagg- region absent from either variola or vaccinia virus.
Introduction the sequences of the two viruses, one the pathogen and
the other the vaccine, is of great interest.
In 1798 Jenner introduced vaccination for the prevention A 42 kbp region of the vaccinia virus strain WR
of smallpox (Jenner, 1798) and predicted a few years genome from near the right terminus has been sequenced
later that this practice would result in the elimination of in this laboratory (G. L. Smith et al., 1991) and shown to
smallpox, 'the greatest scourge of the human species' contain several genes that contribute to virus virulence in
(Jenner, 1801). This prophecy was fulfilled in 1977 and vivo but are not essential for virus replication in vitro.
subsequently certified by the WHO (World Health These include genes encoding the enzymes DNA ligase
Organization, 1980). Variola virus is now contained only (Colinas et al., 1990; Kerr et al., 1991; Kerr & Smith,
in two high security laboratories in Moscow and Atlanta, 1991), thymidylate kinase (TmpK) (Hughes et al., 1991)
and the WHO has proposed that once complete genomic and 3fl-hydroxysteroid dehydrogenase (3fl-HSD), an
sequences of representative strains of variola virus have enzyme that synthesizes steroid hormones (Moore &
been determined all remaining virus and DNA clones Smith, 1992). Other genes may interfere with the
should be destroyed. The virus used for smallpox immune response to virus infection, including those
immunoprophylaxis in the modern era, vaccinia virus, encoding serine protease inhibitors (Kotwal & Moss,
has already been completely sequenced for strain 1989; Smith et al., 1989b), soluble receptors for the
Copenhagen (Goebel et al., 1990) and mostly sequenced cytokines tumour necrosis factor (TNF) (Howard et al.,
for the commonly used laboratory strain Western 1991; Upton et al., 1991) and interleukin-1 (Smith &
Reserve (WR) (see G. L. Smith et al., 1991 and Chan, 1991), and a membrane glycoprotein related to
references therein), and the future prospect of comparing complement control factors (Takahashi-Nishimaki et
al., 1991; Engelstad et al., 1992). Three vaccinia virus
t Present address: MRC ImmunochemistryUnit, Departmentof genes which have interesting homologies [superoxide
Biochemistry,Universityof Oxford, South Parks Road, OxfordOX1
3QU, U.K. dismutase (SOD), TNF receptor (TNFR) and guanylate
kinase (GmpK)] are all unlikely to encode proteins with was translated into open reading frames (ORFs) representing >/65
these activities (G. L. Smith et al., 1991), yet if the amino acids using the programs ORFFILE and DELIB (M. E. G.
Boursnell, Institute of Animal Health, Houghton, U.K.), and these
proteins were active they might contribute to virus
were screened against the SWISSPROT protein database using the
virulence. In the case of SOD, the protein might confer program FASTA (Pearson & Lipman, 1988).
on the virus resistance to destruction by an oxidative
burst in phagocytes. A soluble T N F R would sequester
soluble T N F and obviate the antiviral activities of the Results
cytokine, as has been shown for leporipoxviruses (C. A.
Smith et al., 1991; Upton et al., 1991). GmpK might The region of the variola major virus (strain Harvey)
contribute to virus virulence by providing increased genome we have sequenced is shown in Fig. 1, and the
nucleotide pool sizes to aid virus replication. Another nucleotide sequences of the 14-6 kbp S s t I - H i n d l I I and
group of important genes from this region are those 7.2 kbp HindlII I fragments are shown in Fig. 2. The
encoding glycoproteins that form part of the envelope of restriction map of the variola virus Harvey genome
the extracellular enveloped virus (EEV) (Shida, 1986; (Mackett & Archard, 1979; Esposito & Knight, 1985)
Duncan & Smith, 1992a; Engelstad et al., 1992). For predicts that these fragments should directly abut.
immunity against orthopoxviruses these antigens are However, after the ends of the sequences were compared
especially important because they induce protective against the sequences from vaccinia virus strains WR
immunity, in contrast to inactivated intracellular naked (G. L. Smith et al., 1991) and Copenhagen (Goebel et al.,
virus (INV) (Appleyard et al., 1971; Boulter & Apple- 1990), it was apparent that a second HindlII site lies
yard, 1973; Payne, 1980). EEV is also the form of the within this region of the variola virus genome and that a
virus that mediates long-range virus dissemination in 67 bp HindlII fragment between these sites had not been
vitro (Appleyard et al., 1971 ; Boulter & Appleyard, 1973; cloned. In an attempt to obtain the sequence of the
Payne, 1980) and in vivo (Payne & Kristensson, 1985). missing fragment, the large overlapping SstI A fragment
In view of the abundance of interesting genes from this was used as a template for the polymerase chain reaction
region of vaccinia virus, we wished to determine whether (PCR) with oligonucleotides (5' TGCCGCCTTATT-
similar genes existed in variola virus that might GTCGC 3' and 5' CGCTACTCCAGAGAACGC 3')
contribute to virus pathogenesis or explain the immuno- containing nucleotides 85 to 69 upstream and nucleotides
logical cross-protection afforded by prior infection with 635 to 652 downstream of the unsequenced region. These
either virus. Variola major virus was selected owing to attempts yielded PCR fragments which, when cloned
the higher mortality (30~) caused by this virus than by and sequenced, did not match the predicted sequence in
variola minor virus (I to 2~0), and two cloned restriction any way. Examination of the cloned SstI A fragment by
fragments (Hamilton et al., 1985) containing 21-8 kbp restriction digestion and agarose gel electrophoresis
were obtained and sequenced. This substantially in- revealed that it had a size and structure incompatible
creases the available variola virus sequence data, which with the published data (data not shown). We conclude
had previously been restricted to the thymidine kinase that this instability was due to the very large insert
(TK) gene (Esposito & Knight, 1984) and a 500 bp region (50 kbp) and that the region sought had unfortunately
near the left inverted terminal repeat (Cowley & been deleted. Since variola virus is no longer available
Greenaway, 1990). within the U.K., the extra sequence could not be
obtained and so the sequence is presented in two pieces
which are predicted to be joined by an extra 67
Methods
DNA sequencing and computer analysis. Plasmids containing the
variola major virus (strain Harvey) 7.2 kbp HindllI I fragment and the (a) M 0
14.6 kbp SstI-HindllI fragment from the right end of the HindlII A CN[KF EP[I GLJ H D A B
fragment (Hamilton et al., 1985) were obtained from Drs R. Cowley
and P. Greenaway, Porton Down, U.K. The poxvirus insert was
excised from these plasmids, self-ligated, randomly fragmented by (b) D N O C D PJ GLKH E M A I B F
I II I pl i Jl i I II ~ I I
sonication and cloned into M13. Single-stranded DNAs from recom-
binant phages were sequenced by the dideoxynucleoside triphosphate Fig. 1. HindlII restriction maps of the variola major virus (Harvey) (b)
chain termination method (Sanger et al., 1977) using [35S]dATP and and vaccinia virus (WR) (a) genomes indicating the regions sequenced
buffer gradient gels (Biggin et al., 1983), and either the Klenow in this study and previously (G. L. Smith et al., 1991) (hatched areas).
fragment of D N A polymerase I or Sequenase. Autoradiograms of The 10 kbp inverted terminal repeats of vaccinia virus are shown as
sequencing gels were read by using a sonic digitizer and the sequence stippled boxes. Note the presence of two HindlII sites between the
data assembled into contiguous sequence using the program SAP (R. variola virus HindIII A and I restriction fragments. The missing 67 bp
Staden, MRC, Cambridge, U.K.). A consensus nucleotide sequence fragment might be called fragment Q when it is formally identified.
(.)
TCCTAATGGAGATTTTCTATT~TCGTCCATTTTAGGATATG~TTTCATAAAGTCCCTAATAACTTCGTGAATAATGTTTCTATGTTTTCTACTGATC~TGTATTTC4~TTC~ATTTTTTT 120
G L P S K R N E D M K P Y A K M F D R I V E H I I N R H K R S I C T N A E I K K
T•CAACCTAAATAGAACGTCATCATTGCGTTTAcAAcACTTTTCTATTTGTTCAAACTTTGTTGTTACATTAGTAATTTTTTTTTCCAAATTAGTTAGCCGTTGTTTGAGAGTTTCCTcA 480
E L R F L V D D N R K C C K E I Q E F K T T V N T I K K E L N T L R Q K L T E E
T T T A C A T C A G A ~ A T T G T G T C A T TGACATC TTGAACT CTC CTATCTAATGC TGG TGTACCACCTATAGATTTTG AATAT TCGAATGCT GCATGAGTACCATT~u%ATTCCTTAATATTGCCA 960
K V D S I T D N V D Q V R R D L A P T G G I S K S Y E F A A H T A N F E K I N G
fiAACTCTTT AAACGAT A T T C T T G A A A T AC ATGTAAC AAAGT TTCCT TT AACTCGGTCGGTTT AT CTACC AT AGTTACAGAATT TGTATCCTT AT CTATAATATAAT AATCAAAAT CGTAT 1680
S S K L R Y E Q F V H L L T E K L E T P K D V M T V S N T D K D I I Y Y D F D Y
GGTTCGAGT TCAACGACGAT TGAAT TCTCTTCCCGCGGATGCC GCATGATGAACGACTGGATGT TGT TCGATTGATTTGGAATTCTTTT TCGAC T TTTTGT TTATATTAAATAT T T TAAA 2040
• R R N F E R G A S A A H H V V P H Q E I S K S N K K S K K N I N F I K F
P E L E V V I S N E E R P H R M
• -- A29L 3 5 . 3 K
ATTTATAGCTGATAGCAATTCATGTACTACGGATAATGTAGACGCGTATTGTGCATCGATATCTTTATTATTAGATAAATTTATCAATAAATGTGAGAAGTTTGCCTCGTTAAGGTCTTC 2160
N I A S L L E S V V S L T S A Y Q A D I D K N N S L N [ L L H S ~ N A E N L D E
CATTTAAATATTATATAAACATTTGTGTTTGTATCTTATTCGTCTTTTATGGAATAGTTTTTAAcTAGTAAACCTGTAATTACATAcTTTGTCCGTAAAACATAAATATAAACACCCC-CT 2280
M
4-- & 3 0 L 8.7K A31R "-~ / SalLIR 16.3K
M V S I L N T L R F L E K T S F Y N C N D S I T K E
TTTATCAAACGTTCCAAAAAGTCGTTAGTAGACATTTTTAACATGGTATCTATTTTAAATACACTTAGGTTTTTAGAAAAAACATCATTTTATAATTGTAACGATTCAATAACTAAAGAA 2400
K I K I K H K G M S F V F Y K P K H S T V V K Y L S G G C I Y H D D L V V L G K
AAGATTAAGATTAAACATAAGGGAATGTCATTTGTATTTTATAAC.CCAAAGCATTCTA•CGTTGTTAAATACTTATCTGGAGGATGTATATATCATGATGATTTGGTTGTATTGGGGAAG 2520
V T I N D L K M M L F Y M D L S Y H G V T S S G V I Y K L G S S I D R L S L N R
GTAACAATTAATGATCTAAAGATGATGCTATTTTACATGGATTTATCATATCATGGAGTGACAAGTAGTGGAGTAATTTACA~ATTGGGATCATCCATAGATAGACTTTCTCTA~ATAGG 2640
T I V T K V N N N Y N N Y N N Y N N Y Y N C Y N Y D D T F F D D D D *
ACTATTGTTACAAAAGTTAATAACAATTATAACAATTATAAcAATTATAACAATTATTATAATTGTTATAATTATGATGATACATTTTTTGACGATGATGATTGATCGCTATTACACAAT 2760
• S S V N K S S S S Q D S N C L
TTTG'I~Tr T GTACTTT CTi~TATAGTGTTTAGGT TCTTT TT CATATGAG-I~TA'I~GATI"TACT / ~ T A T C TATGTTT~ C ' I ~ TT GTTCTATi~CGTCCTTAT CC~CG~I'ATC~TACAT' 2880
K T K T S E L I T N L N K K M H S Y Q N V L I D I N L K Q E I V D K D A T D T C
AT AC GT AATTC ACCTT CACAAAAT ACGGAGTCTT CG AT AAT AAT AGCC AATCG AT T ATT GGATCTAACCGT CT GTATCAT A T T C A A C A T G T T T A A T AT ATCCTTTCGT TT ACCCTTT ACA 3000
I R L E G E C F V S D E I I I A L R N N S R V T Q I M N L M N L I D K R K G K V
GGCATCGATCGTAGCATATTTTCCGCGTCTGAGATGGAAATGTTAAAACTACAAAAATGCGTAATGTTAGCCCGTCCTAATATTGGTACATGCCTATAAGTTTGGCATAGTAGAATAATA 3120
P M S R L M N E A D S I S I N F S C F H T I N A R G L I P V H R Y T Q C L L I I
GACGTGTTCAAATGCCTTCCAAAGTTTAAGAATTCTATTAGAGTATTGCATTTTGATAGTTTATCACCTACATCATCAAAAATAAGTAAAAAGTGTGCTGATTTTTTATCATTTTGTGCG 3240
S T N L H [% G F N L F E I L T N C K S L K D G V D D F I L L F H ]% $ K K H N Q A
ACAGTAATACA T T T T T C T A T G T T A C T T T T A G T T C G T A T T A G A T T A T A T T C T A G A G A T T C C T G A C T A C T A A C C ~ T T A A T A T G A T T T G G C C A A A T G T A C C C A T C A T A A T C T C ~ T T A T A A 3360
V T I C K E I N S K T R I L N Y E L S E Q S S V F N I H N P W I Y G D Y D P N Y
ACGGGTGTAAACAAGAATATATGTTTATATTTTTTAACTAGTGTAGAAAACAGAGATAGTAAATAGATAGTTTTTCCAGATCCAGATCCTCCCGTTAAAACCATTCTAAACGGCATTTTT 3480
V P T F L F I H K Y K K V L T S F L S L L Y I T K G S G S G G T L V M R F P M K
AATAAATTTTCTcTTAAAAATTGTTTTTCTTGGAAACAATTcATAATTATATTTACAGTTAcTAAATTAATTTGATAATAAATCAAAATATGGAAAACTAAGGTCGTTAGTAGGGAGGAG 3600
L L N E R L F Q K E Q F C N M
(-- A 3 2 L / SalL2L 31.0K
A33R / S a l L 3 R 20.5K --)
M M T P E N D E E O T S V F S A T V Y G D K I Q G K
AACAAAGAAGGCACATCGTGATATAAATAATATTTGTTATcATGATGACACCAGAAAACGACGAAGAGCAAAcATCTGTGTTCTCCGCCACTGTTTAcGGAGACAAAATTCAGGGAAAAA 3720
N K R K R V I G I C I R I S M V I S L L S M I T M S A F L I V R L N Q C M S A N
E A A I T D A T A V A A A L S T H R K V A S S T T Q Y K H Q E S C N G L Y Y Q G
AGGCTGCTATTP~C T G A C G C C A C T G C A G T T G C T G C T G C A T T A T C T A C T C AT A G A A A G G T T G C G T C T A G C A C T A C A C A A T A T A A A C A C C A A G A A A G C T G T A A T G G T T T A T A T T A C C A G G G T T 3960
S C Y I F H S D Y Q L F S D A K A N C A T E S S T L P N K S D V L T T W L I D Y
CTTGTTATATATTCCATTCAGACTACCAGTTATTCTCGGATGCTAAAGCAAATTGCGCCACAGAATCATCAACACTACCCAATAAATCTGATGTCTT~~A~ATG 4080
V E D T W G S D G N P I T K T T T D Y Q D S D V S Q E V R K Y F C V K T M N *
TT G A G G A T A C A T G G G G A T C T G A T G G T A A T C C A A T T A C A A A A A C T A C A A C C G A T T A T C A A G A T T C C G A T G T A T C A C A A G A A G T T A G A A A G T A T T T T T G T G ~ ~ T ~ T A ~ 4200
A34R / 8alL4R ig.sK -9
M K S L N R Q T V S R F K K ;. S V p A A I M M T L S T ~ I S c. T G T
TTTT~TACATTAATAAATG AAAT CGCTTAATAGAC AAACT GTAAGTAGGTT TAAGAAGTTGTC GGTGCCGGC CGCTATAATGAT GATACTC TCAACCATTATTAGCGGCATAGGAAC 4320
F L H Y K E E L M P S A C A N G W I Q Y D K H C Y L D T N I K M S T D N A V Y Q
A T TT C T G C A T T A C A A A G A A G A A C T G A T G C C T A G T G C T T G C G C C A A T G G A T G G A T A C A A T A C G A T A A A C A T T G T T A T T T A G A T A C T ~ C A T T ~ T ~ T A ~ T ~ T A ~ 4440
M E A N C L L V S A * H K R W H I S V I R Y V N N N L Y * C * F T * L Y H P K D
C R K L R A R L P R P D T R H L R V L F S I F Y K D Y W V S L K K T N N K W L D
GTGTCGTAAATTACGAGCCAGATTGCCTAGACCGGATACTAGACATCTAAGAGTATTGTT TAGTAT TTTTTATAAAGATTATTGGGTAAGTTTAAA~G~C~T~T~T~A~ 4560
I N N D K D I D I S K L T N F K Q L N S T T D A E A C Y I Y K S G K L V K T V C
T A T T A A T A A T G A T A A A G A T A T A G A T A T T A G T A A A T T AAC A A A T T T T A A A C A A C T A A A C A G T A C A A C G G A T G C T GAAGC GT GTT AT AT A T A C A A G T C T G G A A A A C T G G T T ~ A~ 4680
(A35R / SalLSR} -9
K S T O S V L C V K R F Y K * m d t t f v i t p m g
TAAAAGTACTCAATCTGTACTATGTGTTAAAAGATTCTACAAGTAAcAACAAAAAATAAAATAATAATAAGT~CTTAACGAAcGT~GATGGACAcCAcGTTTGTTATTACTCCAATGGGT 4800
m 1 t i t d t 1 y d d 1 d i s i m d f i g p y i i g n i k t v q i d v r d i k y
A T G C T G A C T AT A A C A G A T AC AT T A T A T G A T G A T C T C GAT AT CT C A A T A A T G G A C T T T A T A G G A C C A T A C A T T A T A G G T A A C A T A A A A A C T GT CC A A A T A G A T G TAC G G G A T A T A A A A T A T 4920
s d mY q k c f s n t i i
•k g k i v p * d s n d 1 v r f n i y s i c t a y r s k
TCCGACATGCAAAAATGCTACTTTAGCTAAGGGTAAAATAGTTCCTTAGGATTCTAATGATTTGGTTAGATTCAACATTTATAGCATTTGTACCGCATACAGATCAAAAATACCATCATC 5040
i a c d y d t m 1 d i e g k h q p f y 1 f t s i d v f n a t i i e a y n 1 y t a
ATAGCATGCGACTATGATACCATGTTAGATATAGAAGGTAAACATCAG•CATTTTATCTATTTACATCTATTGATGTTTTTAACGCTACAATCATAGAAGCGTATAACCTGTATACAGCT 5160
g d y h 1 i i n p s d n 1 k m k 1 * f n s s f C i s n g n g w i ~ i d g k c n s
GGAGATTATCATCTGATCATCAATCCTTCAGATAATCTGAAAATGAAATTGTAGTTTAATT~TTCATTCTGCATATCAAACGGCAATGGATGGATCATAATTGATGC~SAAATGCAATAGT 5280
A36R / SalL6R 24.41( -9
n f 1 s M I L V P L I T V T V V A
AATTTTTTATCATAAAAGTTGTAAAGTAAATAATAAAACAATAAATATTGAACTAGTAGTACGTATATTGAGCAATCAGAAATGATACTGGTACCTCTTATCAcAGTGACCGTAGTTGCG 5400
G T I L V C Y I L Y I C R K K I R T V Y N D N K I I M T K L K K [ K S S N S $
C~SAACAATATT AG T A T G T T A T A T A T T A T A T A T T T GT A G G A A A A A G A T A C G T A C T G T C T A T A A T G A C A A T A A A A T T A T T AT G A C A A A A T T A A A A A A G A T A A A G A G T T C T A A T T C C A G C A A A 5520
S S K S T D N E S D W E D H C S A M E Q N N D V D N I S R N E I L D D D S F A G
TCTAGTAAATCAACTGATAACGAATCAGACTGGGAGGATCACTGTAGTGCTATGGAACAAAATAATGAcGTGGATAATATTTCTAGGAATGAGATATTGGATGATGATAGCTTCGCTGGT 5640
S L I W D N E ~ N V I A P S T E H I Y D S V A G S T R L I N N D C N E Q T I Y Q
AGTTTAATATGGGATAACGAATCCAAT GTCATAGCGCCTAGCACAGAACACATTTACGATAGTGTTGCT GGAAGCACGCGGCTAATAAATAATGATTGTAATGAACAAACTATTTATCAG 5760
N T T V I N E T E T I E V L N E D T K Q N P S Y S S N P F V N Y N K T S I C S K
AACACTACAGTAATTAATGAGACAGAGACTATTGAAGTACTTAATGAAGATACCAAACAGAATCCTAGCTATTCGTCCAATCCTTTCGTAAATTATAATAAAACCAGTATTTGTAGCAAG 5880
S N P F I T E L N N .K F S E N N P F R R A H S D D Y L N K Q E H D D I E S S V V
TCAAATCCGTTCATTACAGAACTCAACAATAAATTTAGTGAGAATAATCCGTTTAGGAGAGCACATAGCGATGATTACCTAAATAAGCAAGAACATGATGATATAGAATCATCTGTTGTA 6900
(A37R / SalL7R)-9
S L V * m e i f p v f g i s k i s n f
TCATTAGTCTGATTAGTTTCC TTTTTATAAAATTGAAGTAATATTTAGTATTAATTGCTACCGTTACATTGTACAAATGGAGATATTCCCTGTATTTGGCATTTCTAAAATTAGCAATTT 6120
A37R / SalLTR 7.7K -9
i a n n d c r y y i d t e h h * k i i p n e I n r q M D E M v L L T N I L S V E
TATTGCTAATAATGACTGTAGATATTATATAGATACAGAACATCATTAAAAAATTATACCTAATGAGATCAATAGACAGATGGATGAAATGGTACTTCTTACCAACATCTTAAGCGTAGA 6240
V V N N N E M Y H L I P H R L S M I I L C I S S I G R C V I S I D N D V N N K N
AGTTGTAAATAACAATGAGATGTATCATCTTATTCC CCATAGACTATCGATGATTATACTCTGTATTAGTTCTATTGGAAGATGTGTTATCTCTATAGATAATGACGTTAATAACAAAAA 6360
p 1 s k c v v v s k g p t t i 1 v v k a d i p s k r h
I L T F P I D H A V I I S H *
TATTCTAACCTTTCCCATTGATCAT GCTGTAATCATATCCCACTAAGTAAATGTGTCGTAGT TAGTAAGGGTC CTACAACCATATTGGTTGTTAAAGCGGATATACCCAGCAAACGACAT 6480
i y v n n 1 s 1 i n y 1 p 1 s v f I i r r v t n y 1 d r h I c d q i f a n n k w
AcTATATGTAAATAATCTGTCACTGATTAATTATTTGC•GTTGTcTGTATTCATTATTAGACGAGTTA•AAACTATTTGGATAGAcAcATATGCGATCAGATATTTGCTAATAATAAGTG 6600
y s i I t i d d k q f p i p 1 n c i g m s s t k y i n s s i e q d t 1 i h v c n
GTATTCCATTATAACCATCGACGATAAGCAGTTTCCTATTCCATTAAACT GTATAGGTATGTCC TCTACCAAGTACATAAATTCTAGCATCGAGCAAGATACTTTAATCCATGTTTGTAA 6720
n s v p i k e q i 1 y g r i d n i n m s I s i s v
1 e h p f d s v y k
k m q s y
CCTCGAO~ATCCATTCGACTCAGTAT AcAAAAATGCAGTCGTACA~uATTCTGTACCTAT~AAGG.r~C~TATTGTACGGTAGAATTGAT~TAT~ATATGAc9~ATrAGTATTTCTGTG 6840
d *
G A T T A A T A G A T TTC TAGTATGC43ATC AT T A A T C ATC T C T ~ T C TC TP~.AT A C C TC A T A A ~ A C ~ . C ~ T A T T A TC~.~.TACT G T A C G GAATGGAT']'C AT TCT CT TCT CT T T T T A T ~ 6960
TAACTCTATCATGGCAATAATACAACCAAACACTTGTAAAATTCCTAAATTAGTAGAAAATACAACTGATATCGATGTATAAGCGATTTCGAGGAATAATAAGAACAAAGTAATGCCTGT 7560
-L E T M A I I C G F V O L I G L N T ~ F V V S ~ S T Y A I E L F L L F L T I G T
TTTATACTTTATAATTAATGAAACATCGTTATCAGATAACGAATGGAGTTTGGCAcTAGTATGCCATTTACTTAATATGATCGT•TTGGAAGTTTTATTATAAGTTAAAATATTATGGTT 7800
K Y K I I L S V D N D S L S H L K A S T H W K S L I I T K S T K N Y T L I N H N
S I K V
d t 1 V C g t n n @ n p k c w k i d g s k d p k h r g r g y a p y q
e d
TCTATAAAAGTGAGGATGTGATACATTAGTATGCGGAACCAATAACGGAAATcC~AAATGTTGGAAAATAGACGGTTCCAAAGATCCAAAAcATAGAGGTAGAGGATACGCTC~TATcA 8280
~ g R / Sa2.Lglit 14.31r. - ~
n s k v t i i a h n k c i 1 s n i n i s k e g i k r w r r f d g p c g M I Y L
AAAT AGTAAAGTAACGAT AATCAh-'TC A T A A C A A A T G T AT ACT ATCTAACATAAACAT ATCAAAAGAAGGAATT AAACG ATGGAGAAGATTTGAC ~ C AT~TATG A~TA~TAT 8400
¥ T A D N V I P K D G L Q G A F V D K D G T Y D K V Y I L F T V T I G S K R I V
A C A C G G C G G A T A A C G T A A T T C C A A A A G A T G G T TTACAAGGAGC ATT CG TCGATAAAGAC GGT AC TTATGAC AAAGT TT ACATTCTTTTCACT ~ T A ~ A T C ~ C ~ T ~ A 8520
K I P Y I A Q M C L N D E C" G P S S L S S H R W S T L L K V E L E C D I D G R S
AAATTC C G T A T A T ~ J C A C A A A T G T G C T T A ~ A C G A C G A A T GT GG TCC AT C~TCATTGTCTAGT CATAGATGG TCGAC GT T G ~ C ~ G
TCG ~ A G ~ T ~ GACAT C G ~ ~ 8640
/ ~.FIR 1 6 . 3 K --~ A39R
M N T I K Q
Y S Q I N H S K T I K Q I M I R Y Y M Y S L I V L F Q V R I M Y L F Y E Y H *
ATAGTCAAATTAATCATTCTAAAACTATAAAACAGATAATGATACGATACTATATGTATTCTTTGATAGTCCTTTTcCAAGTCCGC~TTATGTACCTATTCTATGAATAC~TT~C~ 8760
S F S T S N W E D I Q S N Y C L Q L L V Y V Y Q L E K V V P H N T F D V I E Q Y
TCTTTTTCTACGTCAAATTGGGAGGATATACAAAGCAATTACTGTCTCcAGCTTTTGGTATATGTCTAC~AGCTGGAAAAAGTTGTTCCACATAACACGTTTGACGTTATAGAACAATAT 8880
N V L D N I I K P L S N Q P I F K G P S D V K W F D I K E K E N E H R K Y R I Y
A A T G T A C T A G A T A A T A T T A T AAAGCCTTT ATCTAAC CAACC TATCT TCAAAGGACCGTC TGATGTTAAATGGTTCGAT AT AAAGGAG A A G G A A A A T G A A C A T C GGAAATATAGAATATAC 9000
F I K E N T I Y S F N T K K Q T R S S Q V D A Q L F S V M V T S K P L F S I A D
TTCATA~C4%A~u%TACT AT AT A T T C G T T C A A T A C ~ T CTAAAC AAAC TCGTAGC TC GCAAGTC GAT GCGCAACTATTTTC A G T ~ A T G G T A A ~ C G ~ C ~ T A ~ A T ~ T 9120
(A40R / S&IF2R) --)
I G I E V G M P R I K N T * m t m n k p k t n v a a v a e c v
ATAGG~ATAGAAGTAGC4%AT G C C A C G A A T A A ~ T ACT T ~ A A T G TAAT C T T A A T C G A G T A C G C C A T A T G A C A A T G ~ C AAACC T A A G A C A ~ T T A T G C T GG T T A T G C T T ~ T G C G T ~ 9240
y n k c i h 1 s t n q k t w e e g r n a c k a 1 n p n s d 1 i n i
c p t d * i s y
GTCCTACTGACTGAATAAGCTATATAATAAATGTATACATTTATCGACTAATCAAAAAACCTGGGA GAAGGACGTAATGCATGCAAAGCTCTAAATCCAAATTCGGATCTAATTAATAT 9480
e t 1 n e 1 s f 1 r s 1 r ~ ~ y w v g e f k i 1 n ~ t t t y n f i a k . v t k ~I
AGAGACTCTAAAC G A G T T A A G T T T T T T A A G A A G C C T T A G A A G A C G C T A T T GGG TAGGAGAATTC AAAATATTAAAC C A G A C A A C C ACGTATAATTT T A T A G C T A A A A A T G T C A C G A A G A A 9600
k y i c s t t n t p k 1 h s c y t i *
s t k k r
TG~AACTAAAAAACGAAATATATTTGTAGTACAACGAATACTCCCAAACTGcATTCATGTTAcACTATATAAcAATTACACTACA TTTTTATCATAACAcTACTTTGGTTAGATGTTTTA 9720
TGGAATGGAATGCTGTTTAATGTTTCCACAcTCATCGTATATTTTGACGTATGCAGTTACAT•GTTTACGCAATAGT•AGACTGTAGTTCTATTATC.CTTCCTAcATTAGGAGGAACAGT 9960
P I S H Q K I N G C E D Y I K V Y A T V D N V C Y D S Q L E I I S G V N P P V T
TTTAAAGTCTCTTGGTTTTAATCTATTACCGTTAGTTTTTATGAAATCCTTTGTTTTATCCACTTCAcATTTTAAATAAATGTC•ACTATACATTCTTCTGTTAATTTTACTAGATCGTC 10080
K F D R P K L R N G N T K I F D K T K D V E C K L Y I D V I C E E T L K V L D D
ATCAACAGAAATATTTAATCCTCCGTTTGATACAGATGCAcCATATTTATGGATTTCGGATTCACACGTTGTTTGTCTGAGGGGTTCGTCTAGCGTTGCTTCTACATAAACTTCTATTCC 10320
D V S I N L G G N S V S A G Y K H I E S E C T T Q R L P E D L T A E V Y V E I G
CATATATTC TTTATTATC A G A A T C G C A T A C C G A T TT ATCAT CATAC ACTGTTTGAAAAC TAAAT GGTATAC A C A T C A A A A T A A C A A A T A C TAAC G A G T A C A T T C T G C A A T ATT GTTATCG 10440
M Y E K N D S D C V S K D D Y V T Q F S F P I C M L I V F V L S Y M
4-- A 4 1 L / S a l F 3 L 24.9K
TAATTGGAAAATTAGTGTT~GGGTGAGTTGGATTATGTGAGTACTGGATTG~ATATTATATTTTATATTTTATATTTTATATTTTGTAATAAGAATA~T~T~C~TA~ 10560
A42R / SalF6R iS OK. ( p z o f i l i n )
M-" A E W H K I I E D I S K N N N F E D A A I V D Y K T
CAATAAATGACTTATTAAAAAACATATATAATAAATAACAATGGCTGAATGC.CATAAAATTATcGAGGATATTTCAAAAAATAATAATTTcGAGGATGCCGcCATCGTTGATTACAAGAC 10680
T K N V L A A I P N R T F A K I N P G E V I P L I T N H N I L K P L I G Q K F C
TACAAAGAATGTT CTAGCGGCT ATT CCTAACAGAAC ATTTGCAAAGATTAATC C ~ GG CGAAGTT ATTCCGC TC ATCACTAATC AT AATATTCT AAAACCTCTT ATT G G T C A G A A G T T ~ G 10800
I V Y T N S L M D E N T Y A M E L L T G Y A P V S P I V I A R T H T A L I F L M
TATTGTATATACTAACTCTCTAATGGATGAGAACACGTATGCTATGGAGTTGCTTACTGGGTACGCCCCTGTATCTCCGATCGTTATAGCGAGAACTCATACCGCACTTATATTTTTGAT 10920
--~A43R / S ~ I F S R 22.7K
M M ~ K W I [ S [ L T M S T M P V L V Y S s S I F R F R S E D V E L C Y G N L Y
"~GA T G A T A A A A T G G A T A A T A T C C A T A T T G A C GATGT C A A T A A T G C C G G T A T TGGTATACAGCTC ATCGATTT TTAGATTTCGTTCAGAGGATGTC43 A A T T A T G T T A T G G G A A T T T ~ A ~ 11160
F D R I Y N N V V N I K Y I P E H I P Y K Y N F I N R • F S V D E L D N N V F F
TTGAT AQGATCT AT AAT AAT GT AGT AAAT AT AAAAT AT ATT CCTGAGC AT ATT CC AT AT AAAT AT AATT TT ATT AATCGT ACGTT CT CCGT AGATG AACT AGACAAT AAT GTCTT T T ~ A 11280
T H G Y F L K H K Y G S L N P S L I V S L S G N L K Y N D I Q C S V N V S C L I
C A C A T G G T T AT TT T T T ~ % A C A C A A A T AT G G T T C A C T T A A T C C T A G T T T G A T T G T C T C A T T A T C A G G A A A C T T A A A A T A T A A T G A T A T A C A A T G C T C A G T A A A T G T A T C G T G C CT C A T T A 11400
K N L A T S I S T I L T S K H K T Y S L H R $ K C I T I I G Y D S I I W Y K D I
A A A A T T T G G C A A C G A G T A T A T C T A C T A T A T T A A C A T C T A A A C A T A A G A C G T A T T C T C T A C A T C G G T C C A A G T G T A T T A C T A T A A T A G G A T A T G A T T C T A T T A T A T G G T A T A A A G A T A T A A 11520
(SalF6R) -~
m 1 1 e
N D K Y N D I Y D F T A I C M L I A S T L I V T I Y V F K K I K M N S
AT G A C A A G T A T A A T G A T A T C T A T G A T T T T A C C G C A A T A T G T A T G C T A A T A G C G T C T A C A T T G A T A G T G A C C A T A T A C G T G T T T A A A A A A A T A A A A A T ~ T T A ~ T A ~ 11640
e k d v 1 1 a
m d k i k i t v d s k i g n v v t i s y n 1 e k i t i n d t p k k k
A A T G G A T A A A A T C A A A A T T A C G G T T G A T T C A A A A A T T G G T A A T G T T G T T A C C A T A T C G T A T A A C TT G G A A A A A A T A A C T A T T A A C G A C A C A C C A A A A A A G A A A A G G A T G T A ~ A T T A ~ 11760
q s v a v e e a k d v k v e e k n i d i e d d n d d d m d v e s v
C A A T C A G T T G C T G T T G A A G A G G C A A A A G A T G T C A A G G T C~]AAG A A A A A A A T A T C G A T A T T G A A G A T G A C A A T G A C G A T G A T A T G G A T G T A G A A A G C G T G T A A T A C ~ A T ~ T A 11880
AGTATATAAATACTTTTTATTTAcGGTACTCTTGTAGTC~TAATACC~CTAcT~GATTATTTT~FFFrA~AAAAAATACTTATTCTGATTATTCTAGCCATTTCCGTGTTCGTTcAAATG 12000
e s * e 1 w k r t r e f a
CcAcATCAAAGATATGGGAGTAGTTGAAATCTAGTTCTGCATTGTTGG•G•GCCTCAAATGTAGTGTTGGATATCTT•AAcGTATAGTTGTTGAGTAGTGATGGTTTTCTAAATAGAATT 12120
v d f i h s y n f d 1 e a n n a r r
E F T T N S I K L T Y N N L L S P K R F L I
C T C T T C A T A T C A T T C T T G C A C G T G T A C A T T T T T A G C A T C C A T C T T G G A A T C C T A G A T C C T T G T T C T A T T C C C A A T G G T T T C A T C A A T A G A A G A T T A A A A T A T C G T A C G A A C A C G A T G G A 12240
R K M D N K C T ¥ M K L M W R P I R S G Q E I G L P K M L L L N F M D Y S C S P
G A G T A A C C G T A G C A A A A G T A A G C A T T T C C T T T A A T C T C A G A T C CCG G A T A C T G G A T A T A T T T T A C A G C C A A C A C A T G C A T C C A T G C A A C A T T TC CT A C A T A T A C C C G G C T A T G T A C C G C G 12360
S Y G Y C F Y A N G K I E S G P Y Q I Y K V ~. L V H M W .1%- V /~ G V Y V R S H V A
AATGGATTACCGTGTTTATTTGGTCCTATTC~CTTcCATGCTACCTAATATAAATCAAATACTTGATTCCTAGGTCTACAGAAGCTGCCAATATAGTCTGTGTTACATAATAGTTTACTTT 12720
~&44L / S & I F 7 L 24.1K (3~-HSD) ~- y i 1 y k i g 1 d V s a a 1 i t q t v y y n v k
F P N G H K N P G I A E M s g 1
CATGATTTCATTATCGGTGTATTTTCCAAATACATCCACTAGAGCAACCGTATGAATAATCAGATTTACCCCATCTAGTGCTTCTTTCACCTTATTAAGTCGTTTATATcACATTGTATA 12840
m i e n d t y k g f v d v 1 a v t h i i 1 n v g d 1 a e k v k n 1
d n i d c q i
TAGTTTATAACTTTAACCTTCGATACGAGAGGTTGTGGATCTTcTACGACATTGATAACTCTGATTTCTTGAACATCATCTGCGCTAATTAAAAGTTTTACTATATACCTC~CCTAGAAAT 12960
y n i v k v k s v 1 p q p d e v v n i v r i e q v d d a s i 1 1 k v i y r g 1 f
A45R / S a l F S R 1 3 . 7 K (SOD) , - . b
A V C I I D H D N I R G
TCGGCAcCAcCAGTAACCGCGTAcACGGTCATTGCTGCTGT•ACT•ATAAATAT•GGACTACTTATTCTATTTTACAAATAATGGCTGTTTGTATAATAGA•CACGATAATATCAGAGGA 13080
e a g g m t v a y v t
•- ( A 4 8 L / SalF7L 3~-~SD)
V I Y F E P V H G K D K V L G S V I G L K S G T Y N L I I H R Y G D I S R G C N
GTTATTTACTTTGAACCAGTCCATGGAAAAGATAAAGTTTTAGGATCAGTTATTGGATTAAAATCCGGAACGTATAATTTGATAATTCATCGTTACGGGGATATTA~CGA~AT~T 13200
S I G S P E I F I G N I F V N R Y G V A Y V Y L D T D V N I S T I I G K A L S I
T C C A T A G G C A G TC C A G A A A T A T T T A T C G G T A A C A T C T T T G T A A A C A G A T A T G G T G T A G C A T A T G T T TAT T T A G A T A C A G A T GT A A A T A T A T C T A C A A T T AT T G G A A A G G C G T T A T C T A T T 13320
A 4 6 R / S a l F g R 2 7 . 6 K -~
M A F D I S V N A S K
S K N D Q R L A C G, V I G I S Y I N E K I I H F L T I N E N G V *
T C A A A A A A T G A T C A G A G A T T A G C G T G T G G A G T T A T T G G T A T T T C T T A C A T A A A T G A A A A G A T A A T A C A T T T T C T T A C A A T T A A C G A G A A T G G C G T T T G A T A T A T C A G T T A A T G C A T C T A A 13440
T I N A L V Y F S T Q Q N K L V I R N E V N D T H Y T V E F D R D K V V D T F I
A A C A A T A A A T G C A T T A G T T T A C T T T T C T A C T C A G C A A A A T A A A T T A G T C A T A C G T A A T G A A G T T A A T G A T A C A C A C T A T A C T G T C G A A T T T G A T A G G G A C A A A G T A G T T G A C A C G T T T A T 13560
S Y N R H N D S I E I R G V L P E E T N I G C T V N T P V S M T Y L Y N K Y S F
T T C A T A T A A T A G A C A T A A T G A C T C C A T A G A G A T A A G A G G G G T G C T T C C A G A G G A A A C T A A T A T C G G T T G C A C G G T T A A T A C G C C G G T T A G T A T G A C T T A C T T G T A T A A T A A G T A T A G T T T 13680
K L I L A E Y I R H R N T V S G N I Y S A L M T L D D L V I K Q Y G D I D L L F
T A A A C T G A T T T T A C 4 • A G A A T A T A T A A G A C A C A G A A A T A C T G T A T C C G G C A A T A T T T A T T C G G C A T T G A T G A C A C T A G A T G A T T T G G T T A T T A A A C A G T A T G G C G A C A T T G A T C T A T T A T T 13800
N E K L K V D S D S G L F D F V N F V K D I I C C D S R I V V A L S S L V S K H
T A A T G A G A A A C T T A A A G T G G A C T C C G A T T C G G G A T T A T T T G A C T T T G T C A A C T T T G T A A A G G A T A T T A T A T G T T G T G A T T C T A G A A T A G T A G T A G C T C T A T C T A G T C T A G T A T C T A A A C A 13920
W E L T N K K Y R C M A L A E H I A D S I P I S E L S R L R Y N L C K Y L R G H
T T G G G A A T T G A C A A A T A A A A A G T A T A G G T G T A T G G C A T T A G C C G A A C A T A T A G C T G A T A G T A T T C C A A T A T C T G A G C T A T C T A G A C T A C G A T A C A A T C T A T G T A A G T A T T T A C G A G G A C A 14040
T E S I E D E F D Y F E D D D S S T C S V V T D R E T D V *
CACTGAGAGCATAGAGGATGAATTTGATTATT TTGAAGACGATGATT CGTCGACATG TTCTGTCGTAACCGACAGGGAAACGGATGTAT AAT TTT TTATAGTGTGCAC~ATATG ~ 14160
A T A T A A T T G T T G T A T C C A T T C C C A T T C T A A T C A C A T T A T A T G A T T C T G T A A A A A A T T A T A C T G T A A c A C A A T G A A G T A G T T G C A T A G A T G T A T A T A G G T C A G A T A C T G G T T T G A T A A A C T 14280
• V T V C H L L Q M S T Y L- D S V P K I F K
T T T T A T T C C A C A T G A G T A T G T T T G A C T T T A T G G T T A G A C C C G C A T A C T T T A A C A A A T C A C T G A A A A T T G G A G T T A G G T A T T G A C A T C T C A G A A T C A G T T G C C G T T C T G G A A C A T T A A A T G 14400
K N W M L I N S K I T L G A Y K L L D S F I P T L Y Q C R L I L Q R E P V N F T
T A T T T T T T A T G A T A T A T T C C A A C G C A T T T A T G T G G G T A T A C A A C A A G T C A T T A C T A A T A G A G T A T T C C A A G A G T T T T A A T T G G C T A G T A T T T A A C A A G A G A A G A G A T T T C A A C A A A C T G T 14560
N K I I Y E L A N I H T Y L L D N S I S Y E L L K L Q S T N L L L L S K L L S N
TTATGAACTCGAATGCCGCCTTATTGTCGCTTATATTGATGATGTCGAATTCTCCCAATATCATCACTGATGAGTAGCTCATCTTGTTATCAGGATCCAAGCT 14623
I F E F A A K N D S I N I I D F E G L I M V S S Y S M K N D P D L S
•- A47L / SalFIOL ctd
S T V T G K M I D D Y L T R K K T Y N D H I V N L L F C A N R W E F A S F I Q E
ATCCACTGTC~CTGG~AGATGATAGATGACTATC T ~ C TCG T / L K A A ~ C C T A T A A T G A T C A T A T A G T T A A T C T A T T A T ~ T ~ A A A T A ~ T ~ G ~CATL-FI-I-rAT~ 600
Q L E Q G I T L I V D R Y A F S G V A Y A T A K G A S M T L S K S Y E S G L P K
AcAATTAGAACAGGGAATTACTTTAATAGTTGATAGATACGCGTTCTCTGGAGTAGCATATG•CACCGCTAAAGGCGCGTCAATGAcTCTCAGTAAGAGTTATGAATCTGGATTGCCTAA 720
P D L V I F L E S G S K E I N R N V G E E I Y E D V A F Q Q K V L Q E Y K K M I
~C CC GAC T T A G T T A T A T ~ %~fGGAATC TG G T A G C A A A G A ~ T T A A T A G A A A C G T C GG C G A G G A A A T T T A T G A A G A T G TAGCA~'f C ~ C ~ G G T A ~ A ~ T A T ~ T 840
E E G E D I H W Q I I S S E F E E D V K K E L I K N I V I E A I H T V T G P V G
TGAAGAAGGAGA~JGATATTCATTGGCAAATTATTTCTTC T G A A T T C G A G G A A G A T G T A A A G A A G G A G T T G A T T A A G A A T A T A G T T A T A G A G G CTATACATAC GGTTACTGGACCAGTGGG 960
A49R / S a l F I 2 R 1 8 . 7 K --~
Q L W M * M D E G ¥ Y S G N L E S V L G ¥ V S
GCAAC TG T G G A T G T A A T A A A G T G A A A T T A C A T T T T T A T A A A T A G A T G T T A G T G C A G T G T T A A A A A A T G G A T G A A G G A T A T T A C TC TGGCAAC ~TG GAATCFE TTC TC GGATACG TATCTG 1080
D M H T K L A S I T Q L V I A K I E T I" D N D I L N N D I V N F I M C R S N L N
ATATGCATACTAAACTCGCATCAATAACT•AATTAGTTATTG•CAAGATAGAAACTATAGATAATGATATATTAAAcAACGACATTGTAAATTTCATTATGTGTAGATCAAACTTAAATA 1200
N P F I S F L D T V Y T I I D Q E I Y Q N E L I N S L D D N K I I D C I V N K F
A T C C A T T T A T C T C T T T C C T A G A T A C TG TATATAC T A T T A T A G A T C A A G A G A T C T A T C A G A A C G A G T T G A T T A A T T C A T T A G A C G A C A A T A A A A T T A T C G A T T G T A T A G T T A A T A A G T T T A 1320
M S F Y K D N L E N I V D
I I T L K Y A
I M N N P D F K T T Y A E V L G S R I A
TGAGC %~fTTATAAG GATAAC C T A G A ~ T A T A G T A G A ~ C
TATC ATTAC T C T A A A A T A T A T A A T G A A T A A T C C A G A T T T T A A A A C T A C G T A T G C A G A A G T A C TC G GTTC C AGAATAGCGG 1440
A S O R / S a l F I 3 R 6 3 . 3 K (DI~A l i g a J e } --~
D I D I K Q V I R E N I L Q L S N D I R E R Y L * M T S L R
ATATAGATATTAAACAAGTAATACGTGAGAATATACTACAATTGTCTAATGATATCCGCGAACGATATTTGTGAAAATA~[~AAAAAAAAATAC ACGTCTCTTCGT 1560
E F R K L C C A I Y H A S G Y K E K S K L I R D F I T D R D D K Y L I I K L L L
GAATTTAC~TTATGCTGTGCTATATATCACGCATCAGGATAT/~AGA~TCTAAA~AAT TAGAGAC I ~ T A T A A C A G A T A G G G A T G A T A A A T A T q ' T A A T C A T T ~ T A ~ 1680
P G L D D R I Y N M N D K Q I I K I Y S I I F K Q S Q K D M L Q D L G Y G Y I G
CC CG GATTAGAC GATAGAA~q~TATAA~AT G A A C G A T ~ A A C A A A T T A T A ~ T A T A T A G T A T A A T A T ~ T / ~ C A A T C l ~ A C 4 % A A G A T A T GCTAC A A G A T T T A G G A T A C G G A T A ~ T A ~ A 1800
D T I S T F F K E N T E I R P R N K S I L T L E D V D S F L T T L S S I T I( E S
GACAC T A T T A G T A C A T T C T T C A A A G A G A A C A C A G A A A T C C GTCC AAGAAATAAAAC~CATTTTAAC T T T A G A A G A C G T G G A T A G T T T C TTAAC T A C A T T A T CATC C A T A A C TAAAGAATC G 1920
H Q I K L L T D I A S V C T C N D L }( C V V M L I D K D L K I K A G P R Y V L N
C A T C A A A T A A A A T T A T T G A C T G A C A T C G C A T C C G T ~ f G T A C A T G T A A T G ATTTAAAATG TGTAG TCATGC T T A T T G A T A A A G A T C TAAAAAT TAAAGCGG GTCC TC G GTACGTAC TTAAC 2040
A I S P H A Y D V F R K S N N L K E I I E N E S K Q N L D S I S V S V M T P I N
GC TATCAGTC CTCAT GC C T A T G A T G T T ~ T A G A A A A T C T A A T A A C TT G A A A G A G A T A A T A G A A A A T G A A T C T A A A C A A A A T C TAGAC TC TATATC TG TTT C T G T T A T G A C TC CAATTAAT 2160
P M L A E S C D S V N K A F K K F P S G M F A E V K Y D G E R V Q V H K N N N E
CC CAT G T T A G C G C ~ T C G T G T G A T T C T G T C A A T ~ G G C A T T T A ~ T T T C CAT C A G G A A T G T T T G CT G ~ G T C A A A T A C G A T G G C GAGAGAG TACAAGTTCATAAAAATAAT/u~C GAG 2280
F A F F S R N M K P V L S Y K V D Y L K E Y I P K A F K K A T S I V L D S E I V
TTTGC CTTr TTTAG T A G A A A C A T G A A A C C A G T A C TCT C T T A T A A A G T G G A T T A T C T C A A A G A A T A C A T A C C GAAAGC A T T T A A A A A A G C TAC GT C TATC G T A T T G G A T T C T G A A A T T G T T 2400
L V D E H N V Q L P F G S L G I H K K K E Y K N S N M C L F V F D C L Y F D G F
CTTG TAGAC G A A C A T A A T G T A C A G C TC C C G Tr TG G A A G T T T A G G TATAC A C A A A A A G A A A G A A T A T A A A A A C TC T A A C A T G T G T T T G T r T G T G T T T G A C TGTITG T A C T T T G A T G GATTC 2520
D M T D I P L Y K R R S F L K D V M V E I P N R I V F S E L T N I S N E S Q L T
GATATGACG G A C A T T C C A T T G T A C A A A C G A A G A T C TTTT C TCAAAGATG TTATG G TC GAGATAC C C A A T A G A A T A G T A T T CTCAGAG ~VfGAC G A A T A T T A G T A A C GAGT C TCAGTTAAC T 2640
D V L D D A L T R K L E G L V L K D I N G V Y E P G K R R W L K I K R D Y L N E
GACGTATTG G A T G A T G C A C T A A C A A G A A A A T T A G A A G GATTGGTCI~f A A A A G A T A T T A A T G G A G TATAC GAG CC G G G A A A G A G A A G A T G G T T A A A A A T A A A G C G A G A C T A T T T G A A C GAG 2760
G S M A D S A D L V V L G A Y Y G K G A K G G I M A V F L M G C Y D D E S G K W
GGTT C C A T G G C A G A T T C T G C C G A T T T A G T A G T A C TAG GTGCTTAC TATGG TAAAG GAGC A A A G G G T G G T A T T A T G G C A G TCTTT C TAAT G GG TT G TTAC G A T G A T G A A T C CGGTAAATGG 2880
K T V T K C S G H D D N T L R V L Q D Q L T M V K I N K D P K K I P E W L V V N
AAGAC GG T A A C T A A A T G T T C A G G A C A T G A T G A T A A T A C G T T ~ A G G GTTTTGC AAG AC CAATTAAC GATG G T T / ~ T T ~ A C A A G G A T C C CA~I~f CCA~G ~T~TAG ~ T 3000
K I Y I P D F V V E D P K Q S Q I W E I S G A E F T S S K S H T A N G I S I R F
A A A A T C T A T A T T C C C fiATT T TGTAGTAGAGGATCCGAAACAATCTCAAATATGGGAAATTTCAGGAGCAGAG~CPTAC ATC T T C C A A G T C A C A T A C A G C G A A T G G A A T A T C G A ~ T A G A T T T 3120
P R F T R I R E D K T W K E S T H L N D L V N L T K S *
C C T A G A T T T A C T A G G A T T A G A G A A G A T A A A A C GT G GAAAGAATC TAC TCATC TAAAC GATTTAG T A A A T T T G A C TAAATC TTAATAG T T A C A T A C A A A C TGAAAA~'2AAAATAACAC TAT 3240
A51R / SalFI4R --~ 3 7 . 6 K
M D G V I V Y C L N A L V K H G E E I N H l K N D F M 1 K P C C E R V
TTAGTTGGTGATCG C CATG G A T G G T G T T A T T G T A T A C TG TCTAAACG CG T T A G T A A A A C A T G GTGAG G A A A T A A A T C A T A T A A A A A A T G A T T T C A T G A T T A A A C C A T G T T ~TGAAAGAGT 3360
C E K V K N V H I D G Q S K N N T V I A D L P Y L D N A V L D V C K S V Y K I( N
T T G T G A A A A A G T C A A G A A C G T A C A C A T C G A C G G A C A A T C T A A A A A C A A T A C A G T G A T T G C A G A T T T G C C A T A T C T G G A T A A T G C T G T A T T G G A T G T A T G C A A A T C A G T A T A T A A A A A G A A 3480
V S R I S R F A N L I K I D D D D K T P T G V Y N Y F K P K D A I S V I I S I G
T G T A T C A A G A A T A T C C A G A T ~ G C T A A T T T G A T A A A G A T A G A T G A T G A T G A C A A G A C TC C TACC G G T G T A T A T A A T T A T T T T A A A C C T A A A G A T G CTATTTC TG T T A T T A T A T C CATAGG 3600
K D K D V C E L L I A S D K A C A C I E L N S Y K V A I L P M N V S F F T K G N
AAAG G A T A A A G A T G T C T G T G A A C T A T T A A T C G CATCC GATAAAGC GTGTG CGTG T A T A G A G T T A A A T T C A T A T A A A G T A G CCATTCTTC C C A T G A A T G T T T C C T T C T T T A C C A A A G G A A A 3720
A S L I I L L F D F S I N A A P L L R S V T D N N V V I S R H K R L H G E I P S
TGCGTCA~'fGATTATTCTCCTGTTTGACTTCTCTATCAATGC GGCAC CTC TC T T A A G A A G T G T A A C C G A T A A T A A T G ~ G ~ A T A ~ T A G ~ C G ~ ~ G 3840
S N W F K F ~ Y I S I K S N Y C S I L Y M V V D G S V M Y A I A D N K T H T I I S
TTcCAATTGGTTCAAGT~fTATATA~TATAA~JGTCCAACTATTGTTCTATATTATATATGGTAGTTGATGGATCTGTGATGTATGCAATAGCTGATAATAAAACTCACACAATTATTAG 3960
K N I L D N T T I N D E C R C C Y F E P Q I K I L D R D E M L N G S S C D M N R
CAAAAATATATTAGACAATACT~AATTAA~GATGAGTGCAGATGCTGTTATTTTGA~CACAGATTAAGATTCTCGATAGAGATGAGAT~~ ~ ~ G 4080
H C I M M N L P D I G E F G S S I L G K Y E P D M I K I A L S V A G N L I R N Q
ACATTGTATTATGATGAATTTACCAGATATAGGAGAATTCGGTTCCAGTATATTGGGGAAATATGAACCTGACATGATTAAGATTGC TCTTTCAGTGGCTGGT~T~A 4200
D Y I P G R R G Y S Y ¥ V Y G I A S R *
AGACTACA_~TCC C G G G A G A C G C G G C T A T A G CTACTAC G T T T A C G G T A T A G C C T C T A G A T A A T T T T ~ T A A G CACAAAATAAAAAACATAA'±TI'±AA~TAGTC TATTTCATACTA'I'I'I-I~T 4320
-e (J-~2R / I & I F I S R )
m d i k i d i s i f g d k f t v t t r r e n e e r
k k y 1 p 1 q k e k f t
GTGATCAcCATGGACATAAAGATAGATATTAGTATTTTTGGTGATAAATTTAcGGTC~.CTACTAGGAGGGAAAA~GAAAAAATATCTAcCTCTCCAAAAA~CTA 4440
A52R / S a l F I S R @ . 4 K -~
t d v i k p n y 1 e h d n 1 1 d r d e M S T I L E E Y F M Y R G L L G L R I K Y
CTGATGTTATCAAAC CTAATTATCTTGAGCAC GATAACTTATTAGATAGAGATGAGATGTCTAC TATTCTAGAGGAATATTTTAT~ACA~ ~ C ~ ~ T A ~ 4560
G R L F N E I R K F D N D A E E Q F G T I E E L K Q K L R L N A E E G A D N F I
GACGAcA-rlTA~ACGAAATTAGAAAATTC G A C A A T G A T G C G G A A G A A C A A T T c G G T A C T A T A G A A G A A C T C A A A C ~ % A A C T T A G A T T A A A T G C T G A A G A G G G A G C C G A T A A C T T T A T A G 4680
k q
i s m i g 1 c a c v v d v w r k e k 1 f s r w k y c 1 r a i k 1 f
D Y I K V Q
A T T A T A T A A A G G T A C A A A C A G G A T A T C T A T G A T A G G A T T G TGTGC GTGTG TG GTAGATG TTTGGAGAAAG GA A A A C T G T T T T C TAGATG G A A A T A T T G T T r A C G A G C T A T T A A A C T G T T 4800
i d d p i 1 d k i k s i 1 q n r 1 v y v e m 1 *
T A T T G A T G A T C C C A T A C T T G A T A A G A T A A A A T C T A T A C T G C A G A A T A G A C TAGTG TATG TGGAAATG TTATA A A A T T A A A A G T T A A T G A G A G C A A A A T A T A A T G T T G T A T T C TAATC C 4920
A55R / SalFITR @ . 2 K -@
T~- r r y r c s f a v t v n i. i y M M G G Y D ~ Y P Y R S S K V I V Y N
C A T A T T T A T T A T T T T C A C G G A ~ A T A T A G G TGTAG TTTTG C A G T G A C C G T C A T A A T A T T A T C T A T A T G A T G G G TG G A T A T G A T C A G T A C C C G T A T A G A A G T T C ~ ~ATA~TA~ 5040
T C T N S W I Y D I P E L K Y P R S N C G G V A D D E Y I Y C I G D Q D S S L I
T A C A T G T A C ~ % A T T C ~'fGGATATATGATATAC C A G A G C T A A A A T A T C C T C G T T C T A A T T G T G G A G G A G T T G C T G A T G A T G A A T A C A T T T A T T G C A T A G G C ~ A ~ G ~ T 5160
1.55R / S & I F I 7 R 1 9 . 9 K --)
S S I D R W K P S K P Y * M R E T K C D I G V A M L N G L I Y V I G G
ATCTAGTATTGATAGATGG~GCCA~KAACCATATTGATA~C GTATG CTAAAATGC G A G A G A C ~ T G T G A T A ~ f G G T G T A G C GATGq~fIJ%AC G G A ~ T A T A T G ~ T A ~ C G G 5280
V V K G D T C T D T L E S L S E D G W M M H Q R L P I N V Q Y V D D C S Y R Q N
AG T T G T W A A A G G T G A C A C A T G T A C C GACACACI~fGAGAGTTTATCAG A A G A T G G A T G G A T G A T G C A T C A A C G T C T T C C A A T A A A T G T C C A A T A T G T C G A C G A T T G ~ T ~ ~ G ~ 5400
L Y I R R L H N S S V V N G I S N L V L S Y N P I Y D E W T K L S S L N I P R I
T T T A T A T A T C A G G A G G C T A C A C A A T A G T A G TG T A G T T A A T G G A A T A T C A A A T C T A G T CC TTAGC TATAATCC G A T A T A T G A T G A A T G G A C C A A A T T A T C A T C A T T A A A T A T r C C TAGAAT 5520
N P A L W S V H N K V Y V G G I S D D I Q T N T S E T Y N K E K D R W T L D N S
TAATCCTGC TCTATGGTCAGTGCATAATAA~JSTATATGTAGGAG GAATATCTGATGATAT TCAAACTAATACATC TGAAACATACAACAAAGAAAAAGATCGTTGGACATTGGATAATAG 5640
H V L P R N Y I M Y K C E P I K H K Y P L E K H S T R M I F *
TCAC G T G T r A C C A C G C A A T T A T A T A A T G T A T A A A T GC GAACC G A T T A A A C A T A A A T A T C C A T T G G A A A A A C A C A G T A C A C G A A T G A T T T T C T A A A G T A C TTG GAAAG TTTTATATGT~J3T 5720
A56R / SalGIR 34.4K (HA) -~
M T R T, S I I, r. ~, L ~ ~ L V Y S T P Y P O T
TGATAGAAC AAAATACATAITI-[TTG T A / ~ C A C q'FFTTATAC TAATATGACAC GATTG TC A A T A C T T ~ G TTAC T A A T A T C A T T A G T A T A C T C T A C A C C T T A T C CTCAGAC~JC 5840
O I S K K I G D D A T L S C S R N N I N D Y V V M S A W Y K E P N S I I L L A A
A G A T A T C T A A A A A A A T A G G T G A T G A T G C A A C T C T A T C A T G T A G T A G A A A T A A T A T A A A T G A T T A T G T T G T T A T G A G T G C TTG G T A T A A G G A G C C CAATrC C A T r A T T C T T T T A G C TGCCA 6000
K S D V L Y F D N Y T K D K I S Y D S P Y D D L V T T I T I K S L T A K D A G T
AAAG TGACG T C T T G T A T T T T G A T A A T T A T A C C A A G G A T A A A A T A T C A T A C GATrC TC CATAC GATGATC TAG TTACAAC T A T T A C A A T T A A A T C A T T G A C T G C T A A A G A T G C C G G T A C T T 6120
Y V C A F F M T S T T N D T N K V D Y E E Y S T E L I V N T D S E S T I D I I L
A T G T A T G T G C A T T C T T T A T G A C A T C A A C T A C A A A T G A T A C TAATAAAGTAGAI~fATGAAGAATAC TC TACAGAG T T G A T T G T A A A C A C A G A T A G TGAATC G A C T A T A G A C A T A A T ~ T A T 6240
S G S S H S P E T S S E K P D Y I N N F N C S L V F E I A T P G P I T D N V E N
CTGGATC T T C A C A T T C A C C G G A A A C T A G T T C T G A G A A A C C T G A T T A T A T A A A T A A T T T T A A T T G C TC GTTGG T A T I T G A A A T C G C GACTC C G G G A C C A A T T A C T G A T A A T G T A G A A A A T C 6360
H T D T V T Y T S D I I N T V S T S S G E S T T D K T S G P I T N K E D H T V T
A T A C A G A C A C TGTC A C A T A C A C TAG T G A T A T C A T T A A T A C A G T A A G T A C A T C A T C TG GAGAATC C A C A A C A G A C A A G A C G TC GGGAC CAATTAC T A A T A A A G A A G A T C A T A C A G T C A C A G 6480
D T V S Y T T V S T S S E I V T T K S T A N D A H N D N E P S T V S P T T V K N
ACAC T G T C T C A T A C A C T A C A G T A A G T A C A T C A T C TGAAATTG TCACTAC TAAATCAACC GCCAATGATG C G C A C A A T G A T A A T G A A C CAT CTAC TGTGTCAC C A A C A A C T G T A A A A A A C A 6600
IT K S I G K Y S T K D Y V K V F G T A A L I I L ~ A V A T F C I T Y Y I C N K
TCAC G A A A T C TATAG G T A A G T A T A G T A C T A A A G A C TATGT CAAAG TATTTGG TATTG CAG C A T T A A T T A T A T TGTC G GC C GTGG CAATTTTC TG TATTAC G T A T T A T A T A T G T A A T A A A C 6720
( A 5 7 R / S a l G 2 R GmpK) -~
T G N F L E H T E F L G N I Y G T S K T V V N T A A I N N R I C V M D L N I D G
CAC C GGAAAC TTTC T A G A A C A T A C T G A G T T l~fTAG G A A A T A T T T A C G GAACTTC TAAAACAG TT G TGAATACAG C GG C T A T T A A T A A T C G T A T T T G T G T G A T G G A T T T A A A C A T C GACG G 7080
V R S L K N T Y L M P Y S V Y I E P T S L K M V E T K L
TG T T A G A A G T C T T A A A A A T A C T T A C C T A A T G C CTTAC TCG GT G T A T A T A A A A C C TAC CTC TC T T A A A A T G G T T G A G A C C A A G C T T 7165
Fig. 2. (a) The nucleotide sequence of the 14-6 kbp SstI-HindIII fragment from the right end of the HindIII A fragment Nucleotide
sequence numbering starts from the left end of the fragment with the orientation shown in Fig. 1. Predicted amino acid sequences
(shown in upper case letters) of ORFs >/65 residues in length and starting with a methionine are included above the nucleotide sequence
for ORFs transcribed from left to right, and below the nucleotide sequence for those ORFs running right to left. ORF names are as for
the vaccinia virus homologues from strains Copenhagen (Goebel et aL, 1990) and WR (G. L. Smith et al., 1991), and the predicted sizes
nucleotides. In Fig. 2, variola virus O R F s that start with W R O R F s SalL9R and S a l F I R and to variola virus
a methionine codon and are predicted to encode proteins Harvey O R F s 13R, 14R and 15R (Table 1). N o strain
of ~>65 amino acids in length are indicated in capital Copenhagen O R F equivalent to strain W R O R F SalF6R
letters. For comparative reasons the O R F s are n a m e d was described (Goebel et al., 1990) and variola virus
according to their vaccinia virus (strains W R and Harvey also lacks a comparable O R F encoding /> 65
Copenhagen) counterparts. amino acids due to a frameshift. Some clustering of the
At the nucleotide level the sequences in Fig. 2 share fragmented genes is apparent because to the right of
96 ~ identity with the corresponding regions of vaccinia SalL6R the next three rightward transcribed genes
virus strain WR. Translation of the sequence shows that (Copenhagen strain O R F s A37R, A39R and A40R or W R
the majority of variola virus O R F s are extremely similar strain O R F s SalF7R, SalF9R, SalF1R and SalF2R) are
to the vaccinia virus homologue in overall arrangement, all disrupted in variola virus. The arrangement of O R F s
length and amino acid content (Fig. 3 and Table 1). within the SalL9R/SalF1R (Copenhagen strain O R F
However, there are three types of significant differences A39R) region differs in all the orthopoxviruses for which
and these are described before cataloguing the similari- sequences are available. Possible consequences of the
ties. The first type of diversity is the fragmentation of presence of multiple broken O R F s in variola virus and
seven genes which in vaccinia virus are present as a not in vaccinia virus are considered in the discussion.
single O R F but in variola virus are broken into two or Another major difference between the variola virus
more separate pieces by termination codons or frame- and vaccinia virus O R F s is the absence from variola
shifts. Only some of these derivative fragments contain a virus of 1910 bp corresponding to vaccinia virus W R
suitable initiation codon and encode proteins >/65 amino nucleotides 17 360 to 19 270 (numbering according to G.
acids in length. The other fragments that are clearly L. Smith et al., 1991). Consequently variola virus H a r v e y
related to vaccinia virus ORFs, but which alone would lacks homologues of vaccinia virus O R F SalF16R
not be classified as an O R F due to either their small size (related to T N F R ) and part of O R F SalF17R, at least
or lack of an initiating methionine codon, are represent- from a comparable genomic position.
ed in Fig. 2 in lower case letters. These broken genes The third type of difference between variola virus and
correspond to vaccinia virus strain W R O R F s SalL5R, vaccinia virus is short deletions (mostly less than 20
SalL7R, SalL9R, SalF1R, SalF2R, SalF6R, SalF7L and nucleotides) in one virus or the other that occur between
SalF 15R, and to vaccinia virus strain Copenhagen O R F s short direct repeats (Table 2 and Fig. 4). There is no
A35R, A37R, A39R, A40R, A44L and A52R. Note that obvious sequence specificity of the direct repeats and the
the strain Copenhagen A39R O R F is equivalent to strain sequences vary from three to nine nucleotides in length.
of the polypeptides are indicated. Predicted amino acid sequences which show clear similarity to vaccinia virus ORFs but which are
either < 65 residues in length or lack an initiating methionine are indicated in lower case letters. Underlined nucleotide sequences
represent potential early transcription termination signals (Yuen & Moss, 1987)or late initiation signals (Roselet al., 1986). Underlined
amino acid sequences represent potential hydropfiobictransmembrane spanning regions or potential sites for attachment of N-linked
carbohydrate in those polypeptides having hydrophobic membrane sequences. (b) Nucleotide sequence of the 7.1 kbp Hind]II l
fragment. The dagger symbolsabove nucleotides 4935 and 4936 indicate the positions between which 1910 nucleotides are missing in
comparison with vaccinia virus strains WR and Copenhagen. Other annotations are as in (a).
Table 1. Summary of ORFs in variola virus strain Harvey and vaccinia virus strains WR and Copenhagen
Variola virus Predicted size Vaccinia virus ORF* Number of amino acids
Harvey of protein Amino acid Amino acid Function or
ORF (Mr x 10-3) WR Cop Harvey WR Cop identity (~)t overlap~ homology
* Nomenclature for ORFs from vaccinia virus strains Copenhagen (Cop) and WR as described by Goebel et al. (1990) and G. L. Smith et al. (1991),
respectively.
t Amino acid identity between variola virus strain Harvey and vaccinia virus strain WR, except where only the strain Copenhagen ORFs are
shown.
Where ORFs show incomplete overlap.
§ Vaccinia virus strain WR ORFs for which the variola virus homologue is fragmented.
Eleven of the deletions occur in variola virus and nine in second intergenic deletion occurs between genes (SalF3L
strain WR, and these changes are predicted to cause a and SalF4R) which are transcribed away from each
variety of changes in the ORFs or their regulation. other, so that the deletion is likely to affect RNA
Three of these deletions occur in intergenic regions initiation rather than termination. Transcriptional map-
(Table 2 and Fig. 4). Between SalL7R and SalF8L there ping is available for SalF4R (homology to profilin)
is a loss of 24 nucleotides which also results in the loss of (Duncan & Smith, 1992b) and the RNA initiates late
an early transcription termination motif downstream of during infection from the TAAAT motif approximately
SalL8L. However, as noted before (G. L. Smith et al., 10 nucleotides upstream of the ATG codon. Since late
1991), where two ORFs are predicted to be transcribed promoters are functionally very short (Davison & Moss,
towards each other early during infection there are 1989), and the deletion is present more than 70
multiple T5NT motifs on both strands which may help to nucleotides upstream of the TAAAT motif, it is unlikely
reduce transcriptional interference or double-stranded to affect SalF4R transcription. Interestingly, there is also
RNA formation. In the present example there are two a difference between vaccinia virus strains WR and
other T5NT signals on the same strand in variola virus Copenhagen at this position (G. L. Smith et al., 1991),
(Fig. 2) and consequently the loss of one extra signal but variola virus and strain Copenhagen share identical
downstream of SalF8L is probably less important. The sequences. The third intergenic deletion occurs in the
Table 2. Deletions between direct repeats in either variola virus (Harvey) or vaccinia virus W R
* Nucleotide sequence numbering is from the left end of the 14.6 kb SstI HindlII fragment, or t the left end of HindlII I fragment.
from yeast, vaccinia virus and humans (Smith et al., expressed by translational frameshifting the mechanism
1989a; Lasko et al., 1990) and which is immunologically is likely to be conserved in all the poxviruses for which
cross-reactive between vaccinia virus and mammalian this gene has been sequenced.
DNA ligase I (Kerr et al., 1991). The predicted active site Of the other two enzymes (3fl-HSD and SOD), SOD is
lysine (Tomkinson et al., 1991) and the surrounding highly conserved, being the same length as in vaccinia
residues are also unaltered in variola virus. virus and showing 97"6~o amino acid identity. Like that
Variola virus TmpK is also very similar to the vaccinia of vaccinia virus, the predicted variola virus protein
virus enzyme, with only three conservative changes lacks the two loops that protrude from the fl-barrel
(98.0~ identity) and an extra residue at position 165. The structure and which are necessary for ion binding.
third homology to an enzyme involved in nucleic acid Consequently, the protein is unlikely to have SOD
metabolism is that of GmpK which, like TmpK and activity. The 3fl-HSD gene is fragmented into four pieces
DNA ligase, is very similar between the two viruses. In in variola virus and is therefore the least conserved of the
three strains of vaccinia virus this enzyme is predicted to variola virus enzyme genes. Although the individual
be active only if the body of the ORF is joined to a 5'- fragments share a high degree of amino acid identity
terminal region that contains the ATP-binding domain, with the corresponding region of vaccinia virus 3fl-HSD,
possibly by a translational frameshift (G. L. Smith et al., the region is most unlikely to encode an active enzyme,
1991). The same situation is observed in variola virus owing to this fragmentation. The fragmentation of the
GmpK in which the Y-terminal region upstream of the variola virus 3fl-HSD gene was surprising given that the
main ORF has a nucleotide sequence identical to that of vaccinia virus gene encodes an active enzyme which
vaccinia virus WR. Although the ORF is incompletely functions as a virus virulence factor (Moore & Smith,
sequenced in variola virus, because it extends beyond the 1992). It would seem more logical for the virulent
right end of the HindlII I fragment, the available data pathogen (variola virus) to have an intact virulence
show only three conservative changes in 89 amino acid factor and for the vaccine (vaccinia virus) to have lost
residues (96.6~o identity). If the complete protein is this.
variola virus and vaccinia virus is the disruption of seven sequence data are needed from other variola virus strains
variola virus ORFs into small fragments. For the which have a shorter and better characterized passage
vaccinia virus counterparts transcriptional mapping is history since clinical isolation. Nonetheless it would, in
available for three of these ORFs, SalF6R (Duncan & our view, be remarkable if all the broken genes identified
Smith, 1992b), SalF2R (S. A. Duncan & G. L. Smith, in this region are intact in other variola virus strains.
unpublished data) and SalF7L (Moore & Smith, 1992). Assuming that the clones sequenced here are representa-
For two of these ORFs (SalF2R and SalF7L) the gene tive of virulent variola virus strains, why are so many
product has been detected, and in one case (SalF7L) the genes broken? Perhaps some of the ORFs that are
protein has been shown to be an active 3fl-HSD enzyme broken in variola virus but complete in vaccinia virus
that synthesizes steroid hormones and contributes to play a role in moderating virus virulence, so that the virus
virus virulence (Moore & Smith, 1992). The presence of a possessing complete ORFs is attenuated compared to the
virulence factor in vaccinia virus but not in variola virus virus with defective genes. Thus it may have been the
is counter-intuitive and remarkable. It is possible that functional disruption of these genes that allowed variola
these genes are not broken in variola virus DNA isolated virus to become such a virulent pathogen. There is a
directly from virions and that the breaks have been precedent for the loss of an orthopoxvirus gene resulting
introduced during cloning of the virus restriction in increased virulence: vaccinia virus WR ORF B15R
fragments and propagation in Escherichia coli. Although encodes a secretory, high affinity receptor for interleu-
more comparative sequence data from other variola kin-lfl that is not essential for virus replication in vitro,
virus strains are required to address this possibility, it but loss of the gene causes a more rapid onset of
seems unlikely that this is the mechanism for four symptoms and death in intranasally infected mice
reasons. (i) These changes are clustered within a few (Alcami & Smith, 1992). Whether this is more generally
genes (several genes have multiple breaks) and are not true for other genes cannot be predicted, but the
randomly spaced throughout the region sequenced. (ii) observed disruption of variola virus genes has important
Deletions between short direct repeats, which contribute implications for researchers deleting vaccinia virus
to gene disruptions, seem to occur equally in vaccinia ORFs either individually or in combination and studying
virus and variola virus, and not only in one virus. (iii) the consequences for virus replication in vitro and in vivo.
Comparisons of vaccinia virus strains WR and Copenha- If the genes deleted are ones that are already defective in
gen show that these genomes are very similar and variola virus, then the recombinant vaccinia virus may
therefore reasonably stable after cloning in E. coli, thus have a genetic structure more akin to that of variola
the same would be expected for the variola virus genome, virus. Nevertheless, the construction of a 'variola virus'
with 96 ~ nucleotide identity. (iv) A similar situation has from vaccinia virus is impossible owing to the different
been reported for the variola virus homologue of the terminal restriction profiles showing sequences unique to
vaccinia virus host range gene K 1L, which is fragmented variola virus. Instead the information gained from this
in multiple variola virus strains including Harvey type of comparison is helpful in indicating that some
(Cowley & Greenaway, 1990). combinations of genes should not be deleted from
Can one then conclude that these genes are not vaccinia virus.
essential for variola virus growth? For in vitro propaga- A prediction at the outset of this sequencing was that
tion of the virus this is very likely to be the case, because the variola virus genes for TNFR, SOD and GmpK
these gene disruptions have probably occurred within the might be functional for these activities and contribute to
virus and not after cloning of its genome. However, it is the greater virulence of variola virus, in contrast to the
also possible that these genes are required for efficient situation in vaccinia virus in which the genes are not
replication and pathogenesis in vivo and were mutated likely to encode proteins with these activities (see
subsequent to the isolation of the virus from a soldier Introduction). Surprisingly, in each case the variola virus
named Harvey who returned by convoy from Gibraltar ORF is extremely similar to the vaccinia virus counter-
in 1944 and started an outbreak of haemorrhagic-type part (SOD and GmpK) or is absent from the comparable
smallpox in Middlesex, U.K. (Bradley et al., 1946; region of the variola virus genome (TNFR). Hence
Downie & Dumbell, 1947). Following its isolation the comparison of these provides no explanation for the
virus was passaged 36 times on the chorioallantoic increased virulence of variola virus.
membrane of chick embryos prior to cloning of the virus What are the implications of these comparisons for
genome (K.R. Dumbell, personal communication). The orthopoxvirus evolution? The high nucleotide identity
degree of nucleotide and amino acid variation within the (96~) shows that variola virus and vaccinia virus are
broken ORFs is comparable to the degree of variation very closely related despite having dramatically different
seen for the complete ORFs, implying that the fragmen- pathogenesis. Nevertheless, the number of differences in
tation of these genes is not ancient. More comparative the ORFs, the 1.9 kbp deletion in variola virus and the
documented differences in the restriction patterns of the ANGULO, A., VII~UELA, E. & ALCAMi, A. (1992). Comparisons of the
sequence of the gene encoding African swine fever virus attachment
genomic termini suggest that one of these viruses has not protein pl2 from field virus isolates and viruses passaged in tissue
recently been derived from the other. It seems more culture. Journal of Virology 66, 3869-3872.
likely that the two viruses have evolved from a common APPLEYARD, G., HAPEL, A. J. & BOULTER, E. A. (197l). An antigenic
difference between intracellular and extracellular rabbitpox virus.
ancestral orthopoxvirus and that for variola virus the Journal of General Virology 13, 9-17.
evolution from this ancestor was accompanied by the BAXBY,D. (1981). Jenner's Smallpox Vaccine. The Riddle of the Origin of
disruption of some genes, and possible acquisition of Vaccinia Virus. London: Heinemann.
BIGGIN, M. D., GIBSON, T. J. & HUNG, G. F. (1983). Buffer gradient
others, that may have contributed to the enhanced gels and 35S label as an aid to rapid DNA sequence determination.
virulence in man. The origin of vaccinia virus remains a Proceedings of the National Academy of Sciences, U.S.A. 80, 3693-
mystery but some of the possible sources of the virus, 3695.
BOULTER, E. A. & APPLEYARD, G. (1973). Differences between
such as its derivation by a simple recombination between extracellular and intracellular forms of poxviruses and their
cowpox virus and variola virus in recent times, or its implications. Progress in Medical Virology 16, 86-108.
derivation from variola virus by passage in cows (Baxby, BRADLEY, W. H., DAVIES, J. O. F. & DURANTE, J. A. (1946). The
outbreak of smallpox in Middlesex, 1944. British Medical Journal ii,
1981; Fenner, 1992), are made increasingly unlikely by 194-196.
the data presented here. More extensive sequence data COLINAS, R. J., GOEBEL, S. J., DAVIS, S. W., JOHNSON, G., NORTON,
from several orthopoxviruses are required to compile E. K. & PAOLETTI, E. (1990). A DNA ligase gene in the Copenhagen
strain of vaccinia virus is nonessential for virus replication and
convincing evolutionary trees for these viruses. recombination. Virology 179, 267-275.
The WHO has proposed that all variola virus DNA COWLEY, R. & GREENAWAY, P. J. (1990). Nucleotide sequence
should be destroyed after the sequencing of several comparison of homologous genomic regions from variola,
monkeypox, and vaccinia viruses. Journal of Medical Virology 31,
variola virus strains is complete. This proposal seems 267-271.
illogical given that there is complete identity between DAVISON, A. J. & MOSS, B. (1989). Structure of vaccinia virus late
vaccinia and variola virus over hundreds of nucleotides promoters. Journal of Molecular Biology 210, 771-784.
DIXON, L. K., BRISTOW,C., WILKINSON, P. J. & SUMPTION,K. J. (1990).
and 9 6 ~ identity overall in the region studied. Are Identification of a variable region of the African swine fever virus
vaccinia virus oligonucleotides and cloned DNA frag- genome that has undergone separate DNA rearrangements leading
ments, and DNA from other orthopoxviruses, which to expansion of minisatellite-like sequences. Journal of Molecular
Biology 216, 677-688.
may in the future be shown to be equally or more related DOWNIE, A. W. & DUMBELL, K. R. (1947). The isolation and
to variola virus, also to be destroyed? It seems more cultivation of variola virus on the chorio-allantois of chick embryos.
logical that the destruction of variola virus material Journal of Pathology and Bacteriology 59, 189-198.
DUNCAN, S. A. & SMITH, G. L. (1992a). Identification and
should be restricted to the infectious virus. characterization of an extracellular envelope glycoprotein affecting
In conclusion, we have presented the sequence of vaccinia virus egress. Journal of Virology 66, 1610-1621.
approximately 129/o of the variola virus genome, com- DUNCAN, S. A. & SMITH, G. L. (1992b). Vaccinia virus gene SalF5R is
non-essential for virus replication in vitro and in vivo. Journal of
pared this to that of vaccinia virus and demonstrated a General Virology 73, 1235-1242.
remarkable degree of similarity between the two viruses. ENGELSTAD,M., HOWARD, S. T. & SMITH,G. L. (1992). A constitutively
For the virion envelope glycoproteins this similarity will expressed vaccinia virus gene encodes a 42 kDa glycoprotein related
to complement control factors that forms part of the extracellular
have contributed to the effectiveness of vaccinia virus as envelope. Virology 188, 801-810.
a live vaccine that eradicated smallpox. The most EsPoslTO, J. J. & KNIGHT, J. C. (1984). Nucleotide sequence of the
profound differences between the viruses are the thymidine kinase gene region of monkeypox and variola viruses.
Virology 135, 561 567.
fragmentation of seven variola virus ORFs and tile ESPOSITO, J. J, & KNIGHT, J. C. (1985). Orthopoxvirus DNA: a
partial or complete deletion of two others from a total of comparison of restriction profiles and maps. Virology 143, 230-251.
32. A possible explanation for this surprising observation FENNER, F. (1992). Vaccinia virus as a vaccine, and poxvirus
pathogenesis. In Recombinant Poxviruses, pp. 1-43. Edited by M. M.
is that the ancestors of these fragmented genes encoded Binns & G. L. Smith. Boca Raton: CRC Press.
factors that moderated virus virulence and that by their GOEBEL, S. J., JOHNSON, G. P., PERKUS, M. E., DAVIS,S. W., WINSLOW,
loss variola virus became more pathogenic. J. P. & PAOLETTI, E. (1990). The complete DNA sequence of
vaccinia virus. Virology 179, 247-266.
HAMILTON, A., KINCHINGTON, n., GREENAWAY, P. • DUMBELL, K.
We thank Antonio Alcami for advice and critical reading of the (1985). Recombinant bacterial plasmids containing inserts of variola
manuscript. This work was supported by the Medical Research DNA. Lancet ii, 1356-1357.
Council and the Medical Research Fund of the University of Oxford. HOWARD, S. T., CHAN, Y. S. & SMITH, G. L. (1991). Vaccinia virus
G.LS. is a Lister Institute-Jenner Research Fellow. homologues of the Shope fibroma virus inverted terminal repeat
proteins and a discontinuous ORF related to the tumour necrosis
factor receptor family. Virology 180, 633-647.
HUGHES, S. J., JOHNSTON, L. H., DE CARLOS, A. & SMITH, G. L. (1991).
Vaccinia virus encodes an active thymidylate kinase that
complements a cdc8 mutant of Saccharomyces cerevisiae. Journal of
References Biological Chemistry 266, 20103-20109.
JENNER, E. (1798). An Enquiry into the Causes and Effects of Variolae
ALC.~,~ti,A. & SMITH, G. L. (1992). A soluble receptor for interleukin-lfl Vaccinae, a Disease Discovered in some Western Counties of England,
encoded by vaccinia virus: a novel mechanism of virus modulation Particularly Gloueestershire, and Known By the Name of Cow Pox.
of the host response to infection. Cell 71, 153-167. London: Reprinted by CasseU, 1896.
JENNER, E. (1801). The Originof the VaccineInoculation. London: D. N. initiation sites of vaccinia virus late genes deduced by structural and
Shury. functional analysis of the HindlIl H genome fragment. Journal of
JIN, D., LI, Z., JIN, Q., YUWEN, H. & HOE, Y. (1989). Vaccinia virus Virology 60, 436-449.
hemagglutinin. A novel member of the immunoglobulin SANGER, F., NICKLEN, S. & COULSON, A. R. (1977). DNA sequencing
superfamily. Journal of Experimental Medicine 170, 571-576. with chain-terminating inhibitors. Proceedings of the National
KERR, S. M. & SMITH, G. L. (1991). Vaccinia virus DNA ligase is Academy of Sciences, U.S.A. 74, 5463-5467.
nonessential for virus replication: recovery of plasmids from virus- SHIDA, H. (1986). Nucleotide sequence of the vaccinia virus
infected cells. Virology 180, 625-632. hemagglutinin gene. Virology 150, 451-462.
KERR, S. M., JOHNSTON,L. H., ODELL, M., DUNCAN,S. A., LAW, K. M. SHIDA, H., TOCHIKURA,T., SATO, T., KONNO, T., HIRAYOSHI, K., SEKI,
& SMITH, G. L. (199l). Vaccinia DNA ligase complements S. M., ITO, Y., HATANAKA, M., HINUMA, Y., SUGIMOTO, M.,
cerevisiae cdcD, localizes in cytoplasmic factories, and affects TAKAHASHI-NISHIMAKI, F., MARUYAMA,T., MIKI, K., SUZUKI, K.,
virulence and virus sensitivity to DNA damaging agents. EMBO MORITA, M., SASHIYAMA, H. & HAYAMI, M. (1987). Effect of the
Journal 10, 4343-4350. recombinant vaccinia viruses that express HTLV-I envelope gene on
KOTWAL, G. J. &. MOSS, B. (1989). Vaccinia virus encodes two proteins HTLV-I infection. EMBO Journal 6, 3379-3384.
that are structurally related to members of the plasma serine protease SMITH, C. A., DAVIS, T., WIGNALL, J. M., DIN, W. S., FARRAH, T.,
inhibitor superfamily. Journal of Virology 63, 600-606. Erratum, UPTON, C., MCFADDEN, G. & GOODWIN, R. G. (1991). T2 open
Journal of Virology 64, 966. reading frame from Shope fibroma virus encodes a soluble form of
LA1, C., GONG, S. &. ESTEBAN, M. (1991). The purified 14-kilodalton the TNF receptor. Biochemical and Biophysical Research
envelope protein of vaccinia virus produced in Escherichia coli Communications 176, 335-342.
induces virus immunity in animals. Journal of Virology 65, 5631- SMITH, G. L. & CHAN, Y. S. (1991). Two vaccinia virus proteins
5635. structurally related to interleukin-I receptor and the
LASKO, D. D., TOMKINSON, A. E. & LINDAHL, T. (1990). Mammalian immunoglobulin superfamily. Journal of Genera/ Virology 72, 511-
DNA ligases: biosynthesis and intracellular location of DNA ligase 518.
I. Journal of Biological Chemistry 265, 12618 12622. SMITH, G. L., CHAN, Y. S. & KERR, S. M. (1989a). Transcriptional
MACKETT, M. & ARCHARD, L. C. (1979). Conservation and variation in mapping and nucleotide sequence of a vaccinia virus gene with
Orthopoxvirus genome structure. Journal of General Virology45, 683- extensive homology to DNA ligases. Nucleic Acids Research 17,
701. 9051-9061.
MEYER, H., SUTTER, G. & MAYR, A. (1991). Mapping of deletions in SMITH, G. L., HOWARD,S. T. & CrtAN, Y. S. (1989b). Vaccinia virus
the genome of the highly attenuated vaccinia virus MVA and their encodes a family of genes with homology to serine proteinase
influence on virulence. Journal of General Virology 72, 1031 1038. inhibitors. Journal of General Virology 70, 2333 2343.
MOORE, J. B. & SMITH, G. L. (1992). Steroid hormone synthesis by a SMITH, G. L., CHAN, Y. S. & HOWARD, S. T. (1991). Nucleotide
vaccinia enzyme: a new type of virus virulence factor. EMBO sequence of 42 kbp of vaccinia virus strain WR from near the right
Journal 11, 1973-1980. inverted terminal repeat. Journalof General Virology 72, 1349-1376.
PAYNE, L. G. (1980). Significance of extracellular enveloped virus in TAKAHASHI-NISHIMAKI,F., FUNAHASHI,S.-I., MIKI, K., HASHIZUME,S.
the in vitro and in vivo dissemination of vaccinia virus. Journal of & SUGIMOTO,n . (1991). Regulation of plaque size and host range by
General Virology 50, 89-100. a vaccinia virus gene related to complement system proteins.
PAYNE, L. G. & KRISTENSSON, K. (1985). Extracellular release of Virology 181, 158-164.
enveloped vaccinia virus from mouse nasal epithelial cells in vivo. TOMKINSON, A., TOTTY, N. F., GINSBURG, M. & LINDAHL, T. (1991).
Journal of General Virology 66, 643-646. Location of the active site for enzyme-adenylate formation in DNA
PEARSON, W. R. & LIPMAN, D. J. (1988). Improved tools for biological ligases. Proceedings of the National Academy of Sciences, U.S.A. 88,
sequence comparison. Proceedings of the National Academy of 400~404.
Sciences, U.S.A. 85, 2444-2448. UPTON, C., MACEN, J. L., SCrtREIBER, M. & MCFADDEN, G. (1991).
RODRIGUEZ, J. F. & ESTEBAN, M. (1987). Mapping and nucleotide Myxoma virus expresses a secreted protein with homology to the
sequence of the vaccinia virus gene that encodes a 14-kilodalton tumor necrosis factor receptor gene family that contributes to viral
fusion protein. Journal of Virology 61, 3550-3554. virulence. Virology 184, 370-382.
RODRmUEZ, J. F. & SMITH, G. U (1990). IPTG-dependent vaccinia VII~'UELA, E. (1985). African swine fever virus. Current Topics in
virus: identification of a virus protein enabling virion envelopment Microbiology and Immunology" 116, 151-170.
by Golgi membrane and egress. Nucleic Acids Research 18, 5347 WORLD HEALTH ORGANIZATION (1980). The global eradication of
5351. smallpox. Final Report of the Global Commission for the Certifica-
RODRIGUEZ, J. F., JANEZCKO, R. & ESTEBAN, M. (1985). Isolation and tion of Smallpox Eradication. In History of International Public
characterization of neutralizing monoclonal antibodies to vaccinia Health. Geneva: World Health Organization.
virus. Journal of Virology 56, 482-488. YUEN, L. & MOSS, B. (1987). Oligonucleotide sequence signaling
RODRIGUEZ, J. F., PAEZ, E. & ESTEBAN, M. (1987). A 14,000-Mr transcriptional termination of vaccinia virus early genes. Proceedings
envelope protein of vaccinia virus is involved in cell fusion and forms of the National Academy of Sciences, U.S.A. 84, 6417-6421.
covalently linked trimers. Journal of Virology 61, 395-404.
ROSEL, J. L., EARL, P. L., WEIR, J. P. & Moss, B. (1986). Conserved
TAAATG sequence at the transcriptional and translational (Received 9 June 1992; Accepted 16 July 1992)