mammary tumor virus characteristic of BR6 mice. The complete provirus is 9,901 base pairs long, flanked by. 6 base-pair duplications of cellular DNA at the site ...
Vol. 61, No. 2
JOURNAL OF VIROLOGY, Feb. 1987, p. 480-490
0022-538X/87/020480-11$02.00/0 Copyright © 1987, American Society for Microbiology
Complete Nucleotide Sequence of a Milk-Transmitted Mouse Mammary Tumor Virus: Two Frameshift Suppression Events Are Required for Translation of gag and pol PETERS,2 AND C. DICKSON'* Imperial Cancer Research Fund Laboratories, London WC2A 3PX,1 and Imperial Cancer Research Fund Laboratories, R. MOORE,' M. DIXON,' R. SMITH,'
G.
St. Bartholomew's Hospital, London ECIA 7BE,2 United Kingdom Received 1 August 1986/Accepted 30 October 1986
We sequenced two recombinant DNA clones constituting a single provirus of the milk-transmitted mouse tumor virus characteristic of BR6 mice. The complete provirus is 9,901 base pairs long, flanked by 6 base-pair duplications of cellular DNA at the site of integration. Five extensive blocks of open reading frame corresponding to the gag gene, the presumed protease, the pol and env genes, and the open reading frame orf within the long terminal repeat of the provirus were readily discernible. Translation of gag, protease, and pol involved three different translational reading frames to produce the three overlapping polyprotein precursors Pr77, Pr110, and Prl60 found in virus-infected cells. Synthesis of the reverse transcriptase and endonuclease therefore required two separate frameshifts to suppress the termination codons at the ends of the Pr77 and PrllO domains. Direct evidence is presented for translational readthrough of both stop codons in an in vitro protein synthesis system. mammary
Retroviruses are widely dispersed among vertebrate species but are unified by their genome organization and mode of replication. An essential step in their replication is the reverse transcription of the viral RNA into a doublestranded DNA intermediate that becomes integrated as a provirus within the chromosomal DNA of the host cell. As a result, analyses of retroviral genomes have recently focused on the isolation and characterization of molecular clones of proviral DNA to the extent that complete nucleotide sequences are now available for several avian and mammalian retroviruses (18, 40, 42, 44, 46-49). Such techniques and the opportunity to perform site-directed mutagenesis have also led to the identification of previously unrecognized or poorly understood viral functions over and above the wellcharacterized components encoded by the gag, pol, and env genes. These include a protease, required for processing of the viral polypeptide precursors; an endonuclease, required for proviral integration; and specific sequences needed for the initiation of viral DNA synthesis, the integration of viral DNA, and the packaging of genomic RNA (5, 12, 16, 24, 27, 31, 32, 43, 50, 55). Whereas direct evidence for these functions is restricted to a few specific examples, the general uniformity of retrovirus architecture and conservation of particular localized sequence motifs suggest that most retroviruses conform, with only minor embellishments, to a basic prototype. Mouse mammary tumor virus (MMTV) is unique among the Retroviridae because it represents a distinct morphological subclass (B type) and is one of the causative factors in a specific epithelial neoplasm (9, 29). However, despite being the first mammalian retrovirus isolated, MMTV has remained among the most refractory to molecular analysis. Two reasons can be cited for this: its strong tropism for the mammary alveoli, which may underlie the difficulties encountered in tissue culture manipulation of the virus, and its apparent resistance to molecular cloning as a complete DNA provirus (2, 11, 13, 25, 26, 53). Although large segments of *
the genome have been previously characterized (6, 11, 13, 14, 20, 25, 26, 37), here we report the successful isolation and DNA sequencing of recombinant phage clones constituting what we believe to be an entire milk-transmitted MMTV provirus. This provides the first complete comparison between the genome organization of MMTV and that of other retroviruses and clarifies the disposition of the open reading frames within the gag and pol domains. Previous experiments from our group and others (for a review, see reference 9) indicating three overlapping polyprotein precursors were confirmed by the DNA sequence and extended by the direct demonstration of two independent translational frameshifts in vitro.
MATERIALS AND METHODS of MMTV Cloning proviral DNA. High-molecular-weight DNA from BR6 mouse mammary tumors was digested to completion with EcoRI, for which there is a single cleavage
site in the MMTV provirus, and ligated into the purified armes of the X gtWES X B vector (33). Packaged phage was plated on Escherichia coli LE392 and screened by hybridization to a probe for the MMTV long terminal repeat (LTR) by standard procedures. Positive phage was then plaque purified and hybridized to 5'- and 3'- specific MMTV probes to distinguish the respective virus-cell DNA junctions (33). From one particular library prepared from the tumor designated W26 (30), we recovered recombinants corresponding to both halves of a single integrated provirus. These were matched initially by preparing restriction fragment probes specific for the cellular DNA adjacent to the viral sequences in each clone. The procedures used for the labeling of probes and blot hybridization have been adequately detailed elsewhere (33). Although the 3' junction fragment was readily transferred into a plasmid vector to facilitate further analysis, difficulties were encountered in subcloning the corresponding 5' junction (la). DNA sequence analysis. The 6.7-kilobase (kb) 5' junction fragment, containing around 1 kb of flanking cellular DNA, and the corresponding 6.4-kb 3' fragment were excised from
Corresponding author. 480
VOL. 61, 1987
their respective vectors with EcoRI and recovered by preparative agarose gel electrophoresis. The fragments were then self-ligated, randomly sheared by sonication, and bluntend ligated into the SmaI site of the M13 vectors mp8 and mp9 (30). Recombinant M13 phage was grown in small-scale liquid cultures to prepare single-stranded DNA templates for dideoxynucleotide chain termination sequencing procedures (30). DNA sequences were compiled by the DBUTIL computer program (51) and, apart from a small section of the 3' LTR, were obtained for both DNA strands. In vitro synthesis of MMTV-specific RNAs and proteins. A segment of MMTV proviral DNA extending from the PstI site at nucleotide 2580 to the EcoRI site at nucleotide 5803 was inserted into the polylinker of the pSP65 plasmid vector (Promega Biotec, Madison, Wis.). The orientation was such that MMTV-specific RNA of positive sense could be transcribed in vitro from the SP6 phage promoter with the SP6-specific RNA polymerase (28). Conditions for RNA synthesis were essentially as described by Melton et al. (28). RNAs of different length were obtained by cleavage of the plasmid at the specific KpnI, BglII, and EcoRI sites in the MMTV proviral DNA (see Fig. 3). The size and integrity of each RNA preparation was verified by Northern blot analysis with an MMTV specific probe (not shown). After synthesis, the products were treated with RNase-free DNase, phenol extracted, and ethanol precipitated. The protein coding capacity of each RNA was then assessed by translation in vitro in a nuclease-treated reticulocyte lysate (8, 10). The analysis of the resultant [35S]methionine-labeled products by acrylamide gel electrophoresis, immune precipitation, and tryptic peptide mapping followed protocols described in detail previously (7, 8, 10). RESULTS Cloning and DNA sequencing of the MMTV provirus. A recombinant phage library of EcoRI-digested DNA from a BR6 mouse mammary tumor was screened with a probe specific for the MMTV LTR (33). Since EcoRI cuts the MMTV provirus at a single site, each positive plaque identified was expected to represent a junction between viral and cellular DNA sequences. During plaque purification, these recombinants were hybridized with probes specific for either the 5' or 3' portions of the provirus (defined relative to the EcoRI site). In contrast to our previous experience and that of other laboratories in cloning MMTV DNA from GR and C3H mouse mammary tumors, we recovered 5' and 3' junction fragments with approximately equal frequency from BR6 tumors (11, 25, 26, 53). Moreover, many of these recombinants represented proviral DNA of the milktransmitted MMTV characteristic of BR6 mice rather than endogenous sequences (35). This was initially shown by preparing unique sequence probes specific for the cellular DNA adjacent to each provirus as described elsewhere (33, 35). Such probes also permitted the matching of corresponding 5' and 3' junctions, since the probes identified signature restriction fragments derived from the unoccupied site in normal DNA. As a result of these analyses, we were in a position for the first time to determine the complete sequence and genome organization of a single, potentially infectious provirus of milk-borne MMTV. The derived sequence for two matched EcoRI halves and the cellular DNA immediately flanking the LTRs are depicted in Fig. 1. Six base pairs of cellular DNA were duplicated at the site of integration, confirming the linkage between these two junction fragments. It should be
MMTV DNA SEQUENCE
481
stressed, however, that although the sequence was obtained for both DNA strands, it did not cross the EcoRI site at nucleotide 5803, and we therefore cannot formally exclude additional sequence at this position (e.g., two closely apposed EcoRI sites). Other features of the sequence are described below and in Discussion. Open reading frames in the MMTV provirus. From the MMTV DNA sequence and known features of the LTRs (11, 13, 14, 20, 21, 26), we deduced that the viral genome RNA must extend for 8,585 nucleotides, beginning at nucleotide 1196 (Fig. 1). Since retroviral RNA is in the positive sense the sequence as presented was directly translatable into the viral proteins. A computed translation of the MMTV provirus in the three reading frames is depicted in Fig. 2, indicating the positions of all potential termination codons. It was immediately apparent that the viral genome RNA could encompass five substantial protein-encoding domains. The three largest were presumed to encode the viral gag, pol, and env functions, as expected of a prototype retrovirus. An additional segment of open reading frame characteristic of MMTV and designated orf began immediately proximal to the boundary of the 3' LTR as previously reported (10, 11, 14, 20, 26, 37). However, the presumed gag and pol domains, nominally ascribed to reading frames 1 and 2, respectively, were not contiguous, and it was clear that continuity in the generation of a gag-pol precursor would require the inclusion of amino acids encoded in reading frame 3. The situation therefore paralleled that described recently for Mason-Pfizer monkey virus, bovine leukemia virus, and human T-cell leukemia virus types I (HTLV-I) and II (HTLV-II) in which the viral protease bridges the gag and pol domains in a different frame (38, 40, 44, 46, 49, 60; see below). The data also concur with previous immunobiochemical and in vitro translation experiments in which three overlapping precursors were identified with antisera to the viral structural proteins (for a review, see reference 9). Designated Pr77, PrilO, and Prl60, these precursors probably share the same amino terminus, initiating at the methionine codon at nucleotide 1508 and terminating at nucleotides 3281, 4087, and 6771, respectively (Fig. 2). Synthesis of the Prl60 pol precursor would therefore necessitate two separate -1 translational frameshifts, switching from frame 1 to 3 at the Pr77-PrllO boundary and from frame 3 to 2 at the end of PrilO. Demonstration of translational frameshifting in vitro. We and others have previously shown that rabbit reticulocyte lysates programmed with purified MMTV genome RNA can support synthesis of Pr77, PrilO, and Prl60 in roughly the same relative proportions as detected in lysates from infected cells (8, 45). However, these studies could never rigorously exclude the presence of minor RNA species in virion preparations which might direct synthesis of the longer readthrough products. To circumvent these problems, we followed the example of Jacks and Varmus (19) and constructed recombinant plasmids in which appropriate segments of MMTV RNA could be transcribed in vitro with the specific RNA polymerase of phage SP6 (28).theThus, asiteseg-at PstI ment of MMTV proviral DNA extending from nucleotide 2580 to the EcoRI site at 5803 was inserted into the polylinker of the vector pSP65 (Fig. 3). Linearization of the resultant plasmid at either the KpnI, BglII, or EcoRI sites within the MMTV sequences would be expected to could yield RNA transcripts of 0.9, 1.8, and 3.2 kb, which discriminate between the proposed translational frameshifts at the ends of Pr77, PrllO, and Prl60. Although this fragment lacks the normal initiation codon at the start of the gag gene,
482
MOORE ET AL.
J. VIROL.
tltgtgtgtgtgtgtgtotacaccttggaggggggagcggctgegttctcctgec>catcagggggtggggtgcgggtggggtollg9t9tcccc99tctoggalgggccockottccgtcag 12S AACTCCC6A6A6T6TCCTACAC5TA666A6AACA6CCAA6666T76TT5CCCACCAA66AC5ACCC6tC16C6CACAAAC66AT6ABCCCATCA6AC ___1CT6CA2CASAAA7665Tt AAA6ACATACICATTCTCT6CT6CAAACTT66CATA6CTCt6Cl7T76CCT6666C7AtTT6666AASTT6C66TTC6T6CTC6CASSSCTCTCACCCTT6ACTCTTTtAtA6ACTCTTCT6T6CA A6ATTACAATCTAAAC6ATtC66A6AACTC6ACCTTCCTCCT6A66CAAS66ACCACASCCAACTTCCICtTACAA6CC6CATCAACCTTSTCCTTCA6AAATASAAATAAZAAT6CTT6CTAAAA AT7AtATfTTTACCAA?AA6ACCAATCCAATA66tA6ATTATIA6TTACTAT6T1AA64AAA16AATCAT7ACCTlTTASTACIAlTTTTACTCAAATTCAAA66TTA6AAAT666AATA6AAAAT
250 375
500
A6iAAA6A6ACACTCAACCTCA6TT6AA6AACA66T6CA666ACTAA666CC7CA66CCTA,6AA6TAAAAA6666AAASA66A6t6C6CrT6TCAAAATA66A6ACA66T66T66CAACCA666AC 625
T7ATA6666ACCTtACACTCACA6ACCAACA6AT6CCCCCTTACC6TAtACA66AA6ATAT6ACCTAAAlTTT6ATA66TS666TCACA67CAAT6BCTATAAA6T6TTATACA166TCCCTCTCCT 750 T7C6T6AAA66CTCSCCA666C7A6ACCTCCTT66T6TAT6TT6ACTCA66AA6A6AAAAAC6ACAT8AAACAACA66TACAT6iATTAtATrTATTT666AACA66AAT6A6CA6CArTT6666A 875 AA8AtT7TTCATAICCAA66A6A66ACA6T66CT6CACIAATABACACTATTCT6CAAA6ACTTATSAT66At6TTATlAT6ATTA6CCTTTATTA6CCCAA7CrT6T667T=CAA66rTTAA6 l000 tAS6TTCAT66TCACA6ACt6T7CrTTAAACAASSAt6t6AACAA6xT6TTC7l6ACTT66ET7667ATCAAA6nT7T6ATCTAA6CTCTAAAT6CTCTAAbCCTCCTA76TTCtTTT86ATT 1 12 *cap CtATMcAA6TTTTAt6tAAAt6CTTAt6tAAACCAT6AIAtAAA6A'666CtAAATTTTT6A67AAACTT6CAACA67CCTAACATTCACtCTCC6T676TT76T67CT6TTC6CCATCCC6TCT 1250 pbs
MC7C67CAC6TTATtCCTTCACTTTMC*6A666TCCCCCC6CA6ACCCC66T6ACCCTCA66tC66CC6ACT6 CT66C6CCC6AACA666ACCCTC66ATAA6T6ACCCTT6tCTCTAT 1375 nTCTACTATTT76t6TTC6TCTt6TTTT67CCtACtATCtTAC66C TATTATCACAA6A6C66AAC66ACTCACCACA66SAACT6CA6tCTC6CCTACA6A6AA6A66TA66TTAC66T6A6CC 1500
AT7664AAAT6666TrCTC666ATCAAAA666CA6AAACICTT76TFTCT6TtCTACAAA6ACTCCTCTCA6A6A66667CTlCAT6T6AAA6A6A6CA6T6CAATA6A6TTTTATCA6TtCCTAA 1625 I10-H 6 Y S 6 S k 6 0 K L F V S V L I R L L S E R 6 L N V K E S S A I E F Y I F L I TAAA66tC7CtCClT6rTt7CCC6AA6AA66A66ATtAAATTTACAA6ATT66AAAA666T666AA6A6A6AT6AA6A66tAC6CA6CA6AACAT666AC66AtA6tATACCAAA6CA66CTTAC 1750 K V S P N F P E E 6 6 L N L I D N K R V 6 R E N K R Y A A E H 6 t D S I P K Q A Y =CATTt66CTTCA6TT66A6A6AATACT6AtA6A6CAATCA6ACTT66TTTTr6TATCT6CA6AA6CCAA6TCC6T6ACT6AA6A66AArTA6A66AA66TTTAACC66ACTAC7ATC6ACAA6 1875 P I N L I L R E I L t E I S D L V L L S A E A K S V 7 E £ E L E E 6 L 7 6 L L S t S 2000 TTCACA66AAAAAACTTA1666ACTA66664ACA6CATAT6CA6AAA7A6ATACA6A66TA6ACAA6Ct6tCT6AACAtATTTAT6A76AACCAATA6AA6AAAA66A6AA66CA6ATAAAAAT6 S Q E K T Y 6 t A 6 t A Y A E I D t E V D K L S E H I Y D E P V E E K E K A D K 0 E A66AAAA66ACCAT6TTA6AAAAATAAA6AA66TA6TACAAA6AAAA6AAAATA6T6A666TAA6A6AAAA6A6AA66AlTTCAA66CCTTTTTA6CCACA6Att66AAC6AT6AT6ACCT6TCC 2125 E K D H V R K I K K V V Q R K E N S E 6 K R K E K D S K A F L A 7 8 V N D D D L S CCT6A66ATT666AC6ATFTA6A66AACAA6%P66CACArTATCAT6A76AT6AT6A6CtAIATCCTTCCA6TAAAAA66AA66T66TTAA6AA6AAACCTCA66CACTCA6AA66AAACCCCT6CC 25 P E D 0 D D L E E O A A H Y N D D D E L I L P V K R K V V K K K P I A L R R K P LP IM66T666TTT76CA66A6C6A766CA6A66CCA666AAAAA66A6ATTT6ACTTt7AC6TTTC:T6TA6TTTTTAT666A6A6A676AT6AASAT6ACAC6CC76tTT666AACC6Ct6CCAT 2375 P V 6 F A 6 A N A E A R E K 6 D L T F 7 F
fP'
V F N 6 E 6 D E D D 7 P V N E P L P L
t6AAAACCTTAAA66AATT6C^AAC66CA6TTA66ACCAT666ACCAtCT6CTCCCTACACCCT6CA66T66TA6ACAT66T66C7A6TCAAT66CTCACCCC6A6t6ACt66CACCAAACA6CC 2500 K t L K E L I S A V R t N 6 P S A P Y t L Q V V D 0 V A S Q N L 7 P S D N H Q t A A6A6CTACCT76TCtCCT6A6ACTAT6tTTTAT66A6AACT6AATAT6AA6A6AAAA6tAAA6AAAT66TACAAAA466CT6CA66CAA6C6AAA666CAA66TCTCTCrT6AtAT6TTACT666 262S R A 7 L S P 6 D Y V L 0 R 7 E Y E E K 6 K E N V- Q K A A 6 K R K 6 K V S L D N L L 6 CACT66CCAAltCC6TNCCCCTTCTTCTCA6ATAAAATT6TCTAA66AT6TCrTTAAA6AT6TCACCACAAAT6C76T6TTA8CAT66A666CCATTCC6CCTCCT66A6TTAAAAAA6ACT6tAT 2750 7 6 I F L S P S S I I K L S K D V L K D V t t N A. Y L A N R A I P P P 6 V K K t V L TA6CA66ArTAAAACA666AAAT6.AA6A67CTTAt6A6,ACTTTCATTTCAA66CTC6A66AA6CT6TTTACC6AAT6AT6CCCA6A6666AA6667C66ATAtATT6ATCAAACAArT616C6T66 2675 A 6 L K Q 6 N E E S Y E t F I S R L E E A V Y R N N P R 6 E 6 S I I L I K Q L A N B6AAT6CAAAITtATt6T6TC*6AtCtCATCC6CCCAAtACST^AAAACA66AACTATAtCA66ATTA7ArTC676CTT6TCT86AC6CTTCTCCC6CA6T767TCA1666tAT66CATAT6CA6C 3000 E A N S L C O D L I R P I A K t 6 t I D. D Y I R A C L D A S P A V V Q 6 N A YFAA A6CCAT6A6A66ACAAAA67T7tCtACCTTT6TAAA6CAAACAIAT66T66666AAAA66A66TCAA66A6CA6AA666CCA6TT767TTTTCCT6T667AA6ACA66ACACATCA6AAAA6ACT 312S A N R 6 I K Y S t F V K I t Y 6 6 6 K 6 6 Q 6 A E 6 P V C F S C 6 K t 6 H I R K D C
6tAA66AT6AAAA666CTCAAAAA6666CCCICC7666CtCT6CCtCC6AT67AASAAA66CtA7CACT66AA6A6T6A6t67AAATCTAAATTT6ACAAA6AT666AATCCACTTCCTCCCTT6 325 K D E K 6 S K A A p f I L C P R C K K 6 Y H 0 K I E C K S K F D K D 6 N P L P P L MCT*AtgATBCT7MTT6i6ACT1A6CA67tCCCCt6Ct6CCAAAA66666AT66A6TtA A466WCTP.A6AXTTZATClC1';AAGCACttCT7*TCALATA6TTT76 33
E T N A E I S K 0 L 0 3 1- F K K L V K I I 8 P 8 P A I K 6 D 6 V K 6 S 6 L k P I A P P F 7 la N D L FIG. 1. Complete nucleotide sequence of a milk-transmitted MMTV provirus. Two EcoRI fragments representing the S' and 3' ends of a single MMTV provirus were recovered as recombinant clones and sequenced by shotgun cloning into M13 vectors and dideoxynucleotide chain termination procedures (30, 33). The sequence across the EcoRI site was not determined, but with the exception of a small section of the 3' LTR, all sequences were obtained for both strands. Shown are 9,901 nucleotides of proviral sequence, numbered from base 1 of the S' LTR and flanked by segments of the cellular DNA adjacent to the provirus (show'n in lower case). The boundaries of the LTRs are indicated by rectangular brackets, and the six bases of cellular DNA duplicated during proviral integration and underlined (-). The beginnings of long open reading frames are also shown (D-, accompanied by a number referring to the translation frame used). Other sequence features indicated are the S' end of the viral genome RNA (cap); the binding site for tRNA31Y' (pbs); the splice donor (sd) and acceptor (sa) sites for env and orf mRNAs; the known amino-terminal sequences of p27, p14-p30, gp52, and gp36 ( r' ), and the potential inverted repeat sequences distal to translational frameshift sites (overlined) (11, 13, 14, 17, 20, 21, 25, 26, 34, 37, 54, 57).
VOL. 61, 1987
MMTV DNA SEQUENCE
MtC6A66CACCCCTS6A6T6C:E6rTT7ACCT6TCATCACASAAZ6A6ATTTCTCC7tCTCtAA6AA66AT6ATATCArT6tSACCCAC6t6AA6ACCCTCCt 666.G P A PRST 6 t P 6BSAS S A 6 L IDLSS L S 6 I KDLI K D L I L SLED S L E D 6 YSLV V S L V P T L V K 6 t L P E 6 T 7
TTASTLAPCCECC6MATAE'7
T66ACTAATAATABSTA6AA6TTCTAATTATAAAAA666ACT76A66TTTTACCA66A6TCATT6ACTCC6ATTTCCAA6b6ASAMtCA66T7AT66TTAA66M6CiAAAAAT6C66TCATCA 6 L I 1 6 R S SU Y K k 6 L E V L P 6 V I D S D F 0 6 E I V N K
3625
V K A A KN A V I I
TTCACAAA86A6AAA6^AtA6CACAAC cT676T76jC76CCATATTTAAMTTACCCMATCCT6TATCSsAA66AASACA66CTCA6AS6CTTC66ATCAACAA67CAT6T6cATT666T6CA6 3750 N K 6 E R I A I L L L L P Y L K L P I P V 1 K I EfR 6 S E 6 F 6 S T s H V H U v Q
6AAATAA6T6ACTCCA6ACCCA76CTTCACATTTACTTMAT66AA6 6ATTM7C66TCtCT766A7ACC6666CA6ATAAACTT6CATA6CA66CA6A6ACT6.6cCSCTAATT66CCCAT 3875 E I S D S A P N L H I Y L N 6 R R F L 6 L L D T 6 A D K 7 C I A 6 R I N P A N N P I 4000 N O 7 E S S L Q 6 L 6 N A C 6 V A R S S I P L R V Q N E I t S 6 1 I N P f V I P t L 76CCTTTCACCTTAT6666AA6A6A7ATTAT 6AAA6ATATA AA66TCAatTTAA6AMACTACATATCACCB%nAlTTA16AT46CM16A6CAAI TC CTTT6CA6ACCAAA 4125 P F T L 1 6 R 9 I N K D I I t A L # 7 I S P I D S I D L # # 2 10-F T S F N I 6 A I E S N L F A D Q TATCTT66AA6TCAAC6AMSATJsJ*T#ntBXAT ASTt= ArAA6AAAtTTA ACA66CTTTACAAC6TATACASAtACTTACAAT6666C ACTTAAA6A6A6CAATA6CI 4250 S N K S D I t V N L N I V P L K Q E K L I A L I I L V t E Q L I L 6 H L E E S k S
TCACCAAAC76A6A6TTCTCtTtCAA66T76CA766CCTS6T666ST66C6C6TA6TASTCA6CCACTCC6TT66CAACAT6A66AT^AAATCA66AATTATACATCCUTT6tT6ATCCCTACAC
CCtT6AATAC6CC76tTTTT6TCATTAAAAAA6AA1CA66AAAAT66A6ACT6TTACAA6ACC7AC6T6CA6TTAAT6CTACAAt6CAC6ATAt666A6CATTACAACCC66CTT6CC6TCCCC
P H N 7 P V F V I K K K S 6 K N R L L I I L R A V N A 7 H H I N 6 A L I P 6 L P S P
T67A6CA6T6CCCTAAA66AT666AAATAATCATAATA6ATCTACAA6ATT6CTfTTTTAAATAAAAACT6CATCCTSAA6AltTTAATM V A V P K 6 N E I I I I
4375
6AtTT6CtTTTA676T6CCCTMCCCTAATTTTAA6A 4500
D L I D C F F N I K L N P E D C K R F A F S V P S P N f K R
6ACCrTATCAAA6ATTCCAAt66AAA6tTTTTCCCCA666TAT6AAAAATA6CCCTACTTTATSTCAAAAArTT6T66ACAAA6CTATATT6ACT6TAA666ATAAATACCAA6ACTCATATATT P Y I R F I 9 K V L P O 6 N K N S P T L C I K F V D K A I L 7 Y R D K Y I D S Y I 6T6CATtACA166AT6ACATtCtTtT66CACACCCATCAAi6AtCCATTSTC6AT6AAATACrTAC7?CCAT6ATACAS6CCCTTAACAAACAT66CCT?6tA6TAICCACA6A6AA6ATTCAAAA V N Y N D D I L L A N P S R 6 I V D E I L 7 S N I I A L N K H 6 L V V S T E K I O K tAtA6AtAATCTCAAATAtTTT666AACTCAtATACA666T6ATTC46T6tCTTATCAAAAATTACA6ArTA66ACA6ATAAArTAA6AACCTTAAAT6ATTTCCAAAA46CTArTA66AAATATTA Y D N L K Y L 6 t H I I 6 D S V S V I K L I I 0 t I K L A T L k D F Q K L L 6 N I N AT766ATAC61CCTTTCTTAAAATTAACTAC666A6A6TTAMACCTCTCrTT6AAATTCTTAAT66A6A77CTAATCC6ATCTCAACAA6AAAACTtAC7CCT6A66CAT6CAAA6rTCttCAA N I R P F L K L t T 6 E L K P L F E I L N I D S N P I S t R K L t P Q
4625 4750 4675
5"00
E A C K A L
TTAAT6AAT6A6A6ACTATCTACC6CTC66ETAAA6A66CtA6ATTTATCACA6CCTTB6tCTCTAT6tAtATTAAA6ACT6AATATACCCCCACA6CAT6CCTCT66CA66AT66A6TTSTA6A AT66AtACATTTSCCTCATATTTCACCAAA66T6ATTACTCCTTAT6ATATCrTrTTTTACA%AACTTATTATTAA666CC64CACC6tCTCAAA6AATTAtTTA6TAAA6ACCCT6ATTATATTS N I H L P N I S P K V I t P Y D I F C t I L I I K 6 R N R S K E L F S K D P D Y I V T76T6CCCTACACCAAA6rTCAATT76ATCTCCtATtACAA6AAAA66AA6ATT66CCTATt7CTTlATTA666TTCTT666A6A66TTCArTTCCATCTTCCAAAA6AcCCCT7T6CTTACATTT V P Y 7 K V I f D L L L O E K E D I P I S L L 6 F L 6 E V H F N L P K D P L L t f ACCCTACAAACT6CCtAlTlTATTCCTCACAt6ACCTCTACCACACCACTASA6AAA66AAT76T6ATTTTTACA6AC666TCA6CAAAt66CC6TTC66TAACATATATACAA66AA666A6CC t L I t A I I f P N N t 7 t L N N E R L S T A R V K R L D L S I P V S L C I L K 7 E Y t P 7 A C L V Q D 6 V V E
5125 5250
5ms 5500
P L E K 6 I V I F t D 6 S A 1 6 R S V t Y I Q 6 R E P
S
TATAATTAAA6AAAATACACAAAACACA6CCCAACA66CT6AAATT6T66CASTCATTACA6CCYTT6A66AAT6A66TCAACCCTTTAATTT6ATAACT6ATTCTAAATAT6T6ACA666TT67
5625
TTCCC6AAATC6AAACT6CAACTTT6TCACCCA6AACAAAAATTTACACA6AACT6AAACATTTACAAA66TTAATCCACAA6A6ACAA6AAAAATtT7ACATT66TCATATCA6A66ACACACT P E I E t A t L S P R t K I Y t E L K N L O R L I N K R I E K F Y 1 6 N I R 6 N t 68ACTTCCC667CCTTT66CACA666AAAT6CCTAT6CA6ATTCTTTAACAA6AATTCT6ACC6CTTTA6A6TCA6CTCAA6AAABCCAC6CACTACATCATCAAAAT6CC6:66CCTTA66TT 6 L P 6 P L A I 6 N A V A D S L T R I L t A L E S A O E S H A L H H I N A A A L R f TCA6TTTCACATCACTC6T6AACAA6C6C6A6AAATA6TAAAATTAT6TCCCAATTSCCCC6ACT66666CAC6C6CC6CAATTA66667AAACCCCA6666CCTTAA6CCCC6A6TTCTAT66C O F H I t R E I A R E I V K L C P N C P D 9 6 R A P I L 6 V N P R 6 L K P R V L k O AAAT6,6AYT7tACTCAT6TTTCA6AATTTSSAAAATTAAAAtAT61ACAT6T6ACA6T66ATACTlATTCtCATTTTACTT7C6CIACC6CCC6AAC666C6AA6CAACCAA66ATT61TTACAA N D V 7 H V S E f 6 K L K Y V N V T V D 7 Y S H F t F A 7 A N t 6 E A 7 K D V L O UCATT66CTCAAA6CTTT6CATACAT666CATTCCTCAAAAAATAAAAACA6ATAAT6CCCCT6CATAT761TCTC6TTCAATACAA6AATTTCT66CCA6AT664AAAAIATCTCAC6TCAC666 N L A I S F A Y N 6 1 P I K I K t D N A P A V V S R S I I E F L A R H K I S H V t 6 CATCCCCTACAATCCCCAA66ACA66CCATT6TT6AAC6AAC6CACWCAAAATATAAA66CACA6CTlAAIAAACTTCAAAA66CT66AMATAAC7ATACACCCCATCATC76TT66CACAC6CTC I P Y N P Q 6 Q A I V E R t N O N I K A O L N K L Q K A 6 K Y Y T P N N L L A N A L tTTTTT6TCT6AATCAT6TAAATAT66ACAATCAA66CCATACA6C66CC6AAA6ACAtTT6666TCCAiATCTCA6CC6ATCCAAAACCTA766TCAtT666AAA6ACCTTCTCACA666TCCT6)6 F V L N N V N N D N I 6 H t A A E A H H 6 P I S A D P K P N V N V K I L L T 6 S V AAA66ACCC6AT6TCCTAATAACA6CC66AC6A66C7AT6CTT6T6TTTTTCCACA6iAt6CC6AAACACCAAICT666TCCCC6ACC6ATTCATCC6ACCTTTTACT6A6C66AAA6AA6CAAC K 6 P D V
5750
I I K E 0 t I N T A O Q A E I V A V I t A f E E V S I P f N L Y T D S K Y V t 6 L f
L I t A 6 A 6 V A C V F
I E t P I I V P D A f I R P F t E R K E A t R~~ sa- 3 I, P K H O S 6 S P 7 D S S I L L L S 6 K K I R
5075 6000
t125
in0 6375
6500 "2
IICCCACACCT66CACT6C66A6AAAAC6CCSCC6C6A6AT6A6AAA6ATCAACA66AAA6TCCCAA6AATSAATCTA6TCCCCATCAAAMA6A6AC66CtTT6CMCATCT6CA66C6TT6ATC P 7 P 6 7 A
6750
tcC6AA6C66A66A66TCCTTAAAACCT>CACAACCCCCAAACCTCTTTACCTTATTlCT76CtT76TT6TCT6TCCTC66cCCCCCC6CCT6t6ACA6666A6A6TTATT666CCTACCTACC # A
6875
E K t P P R D E K D I O E S P K N E S S P N I R E D 6 L A t S A 6 Y D L P H L A L R R K A R R E N R K I N R K V P A N N L V P I K E K 7 A V I N L O A L I
S 6 6 6 P 6 E A E E V L K t S I t P I t S L 7 L F L A L L S V L 6 P P P V
E S
A
LP
TAAACCACCTATTCTCCATCCC6T666AT6666AA67ACA6ACCCCAT7A6A6TTCT6ACAAATCAAACCAT6ATTTi666766TTC6CCT6ACTTTCAT666TTCA6AAATAT67TCt66AAT6 7"00 K P P I L H P V 6 V 6 S T D P I R Y L t 0 I t N V L 6 6 S P D F H 6 F R N N S 6 N V 7125 7ACAlTTT6A6666AA6TCT6ATAC6CTCCCCATTT6CCTTTMTCTCTCClTTTCtACCCCCAC666CT6ClTTCAA67A6ACAA6CAA6TATTTCTTTCT6ATACACCCAC66TTlATAATAAT H F E 6 K S I t L P I C L F S
S F S t P t 6 C F I V D K Q V F L S D t P t V D N N
AAACC766666AAA666T6ATAAAA66C6TAT6T666AAClTt6tT76ACTACCTT6666AACTCA6666CCMTACAAMAC766TCCCTATMAAAASAA6TT6CCCCCCAAATATCCTCACT6 K P 6 6 K 6 D K R R N V E L V L 7 t L 6 N S 6 A N t K L V P I K K K L P P K Y P H C
7250
MAAbTC6CCtTTAA6AA66AC6CCT7CT666A666A6AC6A6TCt6CTCCTCCAC66T66TT6CCTT6CBCCttCCCT6ACCA6666T6t6ATTTTCTCCAAAA6666CCCTTS666TTACTTT 7ms O I A f K K D A F V E 6 D E S A P P R V L P C A F P V D C 6 V S F S P K 6 A L 6 L L
666ATTTCTCCCTTCCCTC6CCTA6T67A6ATCA6tCA6ATCA6ATTAAAA6CAAAAA6AATCTATTT66AAATTATACTCCCCC76TCAATAAAGA66TTCATC6AT66TAT6AA6CA66AT66 7500 D F S L P S P S V D Q S D Q I K S K K N L f 6 N Y t P P V N K E V H R v Y E A 6 k 6tA6AACCTACTT66TTCT666AAAATTCtCCTAA66ATCCCAAT6ATA6A6ATTTCACT6CACTA6TCCCCCATAAAA776TT7CSCTTASTCSCA6CCTCAA6ACATCTTATTCTCAAAAS 7625 V E P t V f N E N S P K D P N D R D F t A L V P H t E L F R L V A A S R N L I L K R
FIG. 1-Continued.
483
484
MOORE ET AL.
J. VIROL.
BCCA66ATTTCAASAACA16A6AT6AlXCCTACATCT6CCT6T61TACTTACCCTTAT6CCATATIATTA66AlT£CTCiA6TTiAA76TATA6A6AAAA6A66ATCTACTTTTCATATTTCCT 7750 P 6 F O E H E H I P T S A C V T Y P Y A I L L 6 L P I L I D I E K A 6 S T F N I S C 6TTCTTCrT67A6ATT6ACTAATl6TTTA6ACTCTTCT6CCTAC6ACTAT6CA6C6AICATA6TCAA6A66CC6CCATAT676CT6CTACCT6TSA6AATT66T6ATBAACCAT66TTT6AT6AT 7,75 S S C N L 7 N C L I S S A Y D Y A A I I V K A P P Y V L L P V D 1 6 D E P N F D D
S A I I T FR Y A t
LL I R A K RJTV A A I I L 6 1 9 A L I A I
T7 S F A V AT T A
3000
CT7A6T1AA66A6A76CAAACT6CIACST6TTAArtTAATCTTCAtA66AA76TTACAT7A6CCTTATCT6AACAAC66ATAATA6AMAAATTA6AA6CIAMCTTAAT6CTTTA6A6AA6AA 8125 L V K E N I 7 A T F V 0 k L N R N V 7 L A L 6 E Q R I I D L K L E A R L N A L E E V
TASTfT7tA6ATT666ACAA6At6t66CCAATTTAAA6ACCA6AATSTCCACTA667T6CAT6CAAATTA76ACTTTA7Ct6C6T7ACACCTT7ACCCTAtAAT6CtACT6A6AACT666AAA6A 8250 V L E L 6 I I V A N L K 7 A N 6 7 P C H A 0 V D F I C V t P L P V H A 7 E N N E R
ACCA666CTCATTTATT666CAlTT66AAT6AtAAt6A6AlTTCA7AIAACATACA66A6T7AACCAACCT6AlTA616A7AT6A6CAAACAACATAT76AT6CAST66ACCTTA6T66CTT66C .375
t R A H L L 6 1 1 N I H E I S Y 0 I 1 E L t N L I C I N 6 K O H I D A V I L 6 6 L A
ttCATCTTTT6CCAAT66A6T6AA6CTTTA)AATCCATTA6ATT66ACACAATATTTCATTTTTAtA6676TT66A6CCCT6CTTTTA67CAT76tACTTAt6ATTTtMCCATT6TTTTCCA6T 0500 0 S F A N S V K A L N P L I U T C Y F I F 1 6 V 6 A L L L V I V L N I F P I V F 9I
a
ACTCCCBA6A6T6TCCTACACTTAiU625 KCCTT6C6AA6A6CCTT6ACCAA6T6CA6TCA6ATCT7AACT6CTTCTTTTAAAAAA6AAAAAA66666AAlE6CC'6C6CCT6CA6CA6AAATBST766 L A K S L I I V I S I L N V L L L K K K K S6 6 A A P A A E H V E L P R V SY YT * 11H P R L 09 K I L N S R E C P t L R mtv S6A6AA6CA6CCCUS666TT6TTTCCACCAA66AC6ACCE6TCTZCGEACZAACG6AT6A6ECCATCACACAAA6ACATACTCATTC7CTCt6CCAAACTt66CATA6CTC76CTTTGCT66 1750 6 E A A K 6 L F P T K D D P 6 A N K A N 6 P 6 D K D I L I L C C K L 6 1 A L L C L 6
66CtATT66666M6TT6C66TTC676CTCBCA666CTCTCACCCTT6ACTCTTTttAAA6CTCrTCT6T6CAA6ATTACAATCTAASC6ATTC66A6AACTC6AeCTTCCTCCt6A66CM66A 3575 L L 6 E V A V R A N R A L T L D S F N S 6 S V I D Y N L N D S E N S T F L L N Q 6
I MI I RD T N1 I POPT6SYHWU WW% 16161 I SI EIY I NoT"N"" I"LA K KNYIFT TN SPOD PRL L LV 0P6E CIRgHL """NMI InIIN III IIT K166I
9000
P Q P 7 6 S V K P H I P C P S E I E I N N L A K N Y I F 7 N K 7 N P I 6 R L L V T I ctTTAAAAYuTrATTArrzrT=TACriAsTTTTirTr^aATir^ALcctT^crAA^t=cAAAAAcAaTcanasrarrArrTrAhrTPACTTcAA&ArAnyacr=rrTAAcalrrT IwI a*I NR"fDMlo In"l III I§W In I INII II RL ILA" II1IL"I"ARRDI iR"W""""lbombIM6L bo iboIRAI TIAD " NMLI VBo"I aIJr, IWRIk"bA II nb"RLM V
L A I E I L t F 1 7 I F 7 O I I R L E N 6 I E N R K A N 6 7 6 V E E O V I 6 L R A 6
MC68CT*'6AA61AAAAASCSAAABA66A6T6C6CTT6TCAAAATA66A6ACA66T66766CAACCA666ACTTATA6666ACCT7ACATC7ACA6ACCAACA6,AT6MCCCTTACC6tATACA 9250 6 L E V K R 6 K A 6 A L V K J 6 1 N I N Q P 6 t V N 6 P Y I Y R P 7 0 A P L P Y t 66AA6AIAT6ACC7AAA7TTT6ATA66T666TCACA67CAA766C7ATAAA6T6TTATACA66TCCCTC7CCTTTC6T6AAA66CTC6CCA666CTABCC7CCTT66T6TAT6TT6ACTCA66 9m 6 A Y b L N F I N H V T V N 6 Y K V L Y R S L 6 F A E N L A N A R P P N C N L t I E A6A6AAAAAC6ACAT6AAACAACA66TACAT6ATTA7AlTTATTT666AACA66AAT6A6CA6CAtTTT6666AAA6ATrTTTCATACCAA66A6A66ACA6766CT6CAIC7AAIABA6CACTATT "00 E K H I 0 K I I V H D Y I Y L 6 T 6 P S S I 9 6 K I F H t K E A T V A A L I E N Y 6
CT6CAAA6ACTTA166CA76A6TTATlA16AT7A6CCTTTAlT7ACCCAATCT76T66TTCCCA66T7AA67AA6TTCA766TCACA6ACT61TCTTAAAACAA66AT6T6A6ACAA6T66T 9US AK T
6 N S6 Y YDI
Mt6ACT766CT766TATCMAAt67tT6ttbCCTAACCTAAAT6CTCIAACTCCtAT6T7CTtTt76ATTCTAtCC6tTttAT67AAAT6CttAT6tAAACCAt6ATATAAAAST6rKT 9750 AAATTTTT6ASTAAACTTBCAACASTCCTAACATTCACCTCTCSTSTSTTTST6TCTSTTC6CCA7CCC6TCTCC6CTC6TCACTIATCCTTCACTTTCCA6A666TCMCCCSCA64CCM66T 975
w_,....
AC*CCTCAC*6TC86CC8ACTC stcctcgag _
&ccaccgFtgagttag -gCogagtincgtuedg.ccggtgcttcgtcgcgggttcgctcgctagctatctggatcactccc FIG. 1-Continued.
we noticed a fortuitous ATG codon 36 nucleotides distal to the PstI site, which was likely to act as an initiator under the more promiscuous translation conditions of the reticulocyte lysate. The 0.9-kb RNA species should therefore be translated into a 25-kilodalton (kDa) protein which terminates at the TAA codon at the end of Pr77 (nucleotide 3281), whereas readthrough of this codon should yield a 28-kDa species terminated at the KpnI site marking the end of the transcript (Fig. 3a). Similar reasoning for the 1.8-kb RNA would predict naturally terminated 25- and 51-kDa species and a 64-kDa readthrough product ending at the BglII site, whereas the 3.2-kb RNA should yield 25-, 51-, and 92-kDa products. The results obtained by translating these RNAs in a rabbit reticulocyte lysate and by analyzing the [35S]methioninelabeled products by polyacrylamide gel electrophoresis (7, 8) are shown in Fig. 3b. Densitometric scanning of the autoradiographs, with corrections for the relative contents of methionine, indicated that the predicted products were present in proportions consistent with approximately 15 to 25% readthrough of the successive termination codons. With RNA terminating at the EcoRI site, the low level of the 92-kDa gag-pol product presumably reflected the vulnerability of this longer RNA to degradation in the lysate or premature termination of the translation machinery. A similar explanation could be proposed for the background in these gels, since breakage of RNA chains permits initiation at essentially any methionine codon encountered close to the
5' ends of the resultant RNA fragments (8, 10). Nevertheless, the indicated products were specifically precipitated by antisera against the major gag determinant, p27 (not shown). Partial amino acid sequencing of the virion proteins p27 and p14 has established their amino termini at nucleotides 2315 and 2976, respectively (N. Totty, M. Waterfield, R. Moore, M. Dixon, R. Smith, G. Peters, and C. Dickson) so that the in vitro translation products predicted in Fig. 3a should include approximately 60% of the primary sequence of p27. Tryptic peptide analysis of in vitro readthrough products. To verify the identity of the in vitro translation products detected in Fig. 3, we prepared fingerprints of the methionine-labeled tryptic peptides from the 25-, 51-, and 92-kDa proteins (Fig. 4). Previous reports from this laboratory have described two-dimensional tryptic-peptide maps of individual MMTV proteins and of Pr77, Prl1O, and Pr160 produced either in vivo or in vitro (7, 8). A schematic representation of the pattern of spots that we obtained from Prl60 expressed in a reticulocyte lysate (8) is shown in Fig. 4d. In this earlier
work, it was established that peptides 1 through 8 were present in Pr77 and peptides 9 and 10 were additionally present in PrllO, whereas only peptide 11 distinguished Prl60 from Prl1O. Although not all of these peptides were represented in the present work (because of the artificial initiation site generated in the SP6 transcripts), it was clear that peptides 9 and 10 were again acquired in the readthrough from the 25- to 51-kDa species and that peptide
VOL. 61, 1987
MMTV DNA SEQUENCE
485
a
gag
LTR
1
env
11
°
2
21
pol
1111
11 11
LTR
kb11111111
2 3
b
0.
Pr77
Prl6O
*..:..j
*..
......
Pr731
FIG. 2. Disposition of open reading frames in the MMTV proviral sequence. (a) Computed translation of the MMTV DNA sequence of Fig. 1 in the three possible reading frames. The positions of all potential termination codons are indicated relative to the linear map from nucleotides 1 to 9901. (b) Open reading frames shown in panel a directly correlated with the precursor polyproteins identified in MMTV-infected cells and characterized by immune precipitation and tryptic peptide mapping (9). The different reading frames are depicted by the symbols: a, frame 1; [lii, frame 3; mm, frame 2. Thus Pr77 is translated in frame 1, PrllO spans frames 1 and 3, and Prl60 uses frames 1, 3, and 2. The open reading frame orf in the LTR is not included in this figure, since the predicted protein has not been detected in MMTV-infected cells (8, 10, 45).
11 was acquired in the readthrough from the 51- to 92-kDa products. Moreover, the other major peptides were entirely consistent with the initiation of translation within the coding region of virion protein p27.
DISCUSSION The majority of previous studies on the MMTV provirus and sequence have concentrated on the milk-borne viruses characteristic of the GR and C3H strains of mice or on endogenous elements. Although there have been numerous reports on the sequence of the LTRs, the env gene, and parts of the pol gene, none of these has provided a comprehensive picture of the MMTV genome (6, 11, 13, 14, 20, 25, 26, 37). Our success in obtaining recombinant clones encompassing the 5' half of the genome may in some measure reflect our choice of the BR6 mouse strain. The exact origin of the milk-transmitted MMTV in these mice remains unclear, since they were derived initially from a cross between an RIII male and a nonviremic C57BL female (15), but we assume that this virus represents the one that is characteristic of RIII mice. However, the so-called poison sequences which have hampered many studies on MMTV are not completely abrogated in this strain, since we encountered difficulties whenever attempts were made to transfer cloned 5' junction fragments from A vectors into plasmids (la). It was therefore impracticable to reconstruct a complete provirus by joining the two EcoRI junction fragments, which thus precluded any tests on the biological activity of this provirus. Nevertheless, all the features of the derived DNA sequence would fulfil the expectations for an infectious virus and would be entirely consistent with the known biochemical data.
The 5' LTR and leader sequences. The LTR of the provirus shown in Fig. 1 extends for 1,328 base pairs and is bounded by the expected 6-base-pair inverted repeats (11, 14, 20, 25). The various elements required for initiation of transcription and glucocorticoid regulation have been described in detail by others (for a review, see reference 39) and are maintained in our sequence with only scattered base changes. Flanking the LTR to the 5' side is the hexanucleotide CGTCAG duplicated upon integration of the provirus into cellular DNA (11, 20, 25), and at the 3' boundary is the 18-nucleotide binding site for the DNA synthesis primer tRNA- ys (25, 34). The sequences distal to the primer binding site are loosely described as leader sequences, since they precede the start of the gag gene, but they encompass the splice donor site for env mRNA at nucleotide 1484 (Fig. 1) and presumably all or part of the packaging signal for MMTV genome RNA (13, 26, 27, 54, 55). gag gene. Identification of the ATG codon at nucleotide 1508 as the initiator for the gag gene products is based on a number of considerations: it is the first ATG encountered after the LTR (Fig. 1), it conforms to the GNNATG consensus (23), it is followed by 590 codons in frame (Fig. 1 and 2), and it predicts the amino terminal Met-Gly sequence consistent with the observed myristylation of the MMTV gag precursor (13, 41). However, we as yet have no information that would allow us to align the amino-terminal virion protein, plO, with the DNA sequence. Partial amino acid sequencing has allowed us to match the N terminus of p27, the major core protein, with the proline codon at nucleotide 2315, and upstream of this point are two domains of around 20 amino acids, one of which is composed of 50% acidic residues, the other being 50% basic. This latter region is reminiscent of the highly basic virion protein, p8, described
J. VIROL.
MOORE ET AL.
486
a Bg
K
P
i
[111111:
a
b
C
d
E
~~i
_
_
......
...
SP6
K
.
0.9kb
2s :.-~I
7
1..kb
Bg 25
6
00
..-..
11 01 c)lo
Q8
E
90 203 O5 04 O'2 01
3.2kb 25
FIG. 4. Tryptic peptide maps of in vitro translation products. Two-dimensional fingerprints were prepared of tryptic peptides from the 35S-labeled in vitro products depicted in Fig. 3b (7, 8). Shown are the 25-kDa (a), the 51-kDa (b), and the 92-kDa (c) products and a schematic diagram of the previously reported fingerprint of in vitro synthesized Pr160 (d) (8). Numbered peptides correspond to those discussed in earlier publications and are represented according to the frame symbols used in Fig. 2. The symbols A and A indicate additional peptides found in the 51-kDa and 92-kDa products, respectively.
b NO RNA
K
Bg
E
464
_M
451
_451 25
"428
a 425
*425
_, 2 5
FIG. 3. Translational frameshifting in vitro. (a) Partial restriction of MMTV proviral DNA. The positions of significant (but not all) PstI (P), KpnI (K), BgIII (Bg), and EcoRI (E) sites are indicated. The segment from PstI to EcoRI depicted by the bold line was subcloned into the polylinker of the pSP65 vector such that transcription of the resultant plasmid DNA by the SP6 phage RNA polymerase initiated at the specific promoter (*) and copied the MMTV sequences into positive-sense RNA (19, 28). Linearization of the plasmid at the KpnI, BglII, or EcoRI sites yielded specific RNA transcripts (.--) of 0.9, 1.8, and 3.2 kb, respectively. When these were used to direct protein synthesis in a rabbit reticulocyte lysate (8, 10), the expected products had the sizes (in kilodaltons) and composition shown in the diagram. The symbols for each translational frame correspond to those used in Fig. 2. Termination occurred either at the normal stop codons or at the 3' end of the specific RNA transcripts. (b) In vitro translation products generated from RNAs terminating at the KpnI (K), BglII (Bg), or EcoRI (E) sites in the MMTV sequence. These products were analyzed by electrophoresis in 10% polyacrylamide gels (7, 8), labeled by the addition of [35S]methionine to the reticulocyte lysates, and visualized by autoradiography. The positions and molecular sizes (in kilodaltons) of the relevant products are indicated. The No RNA lane shows the background of protein synthesis observed in the map
previously (7, 9). The amino-terminal sequences of virion protein p14 and the minor protein p30 both align perfectly with the DNA sequence beginning at nucleotide 2996 (N. Totty, M. Waterfield, R. Moore, M. Dixon, R. Smith, G. Peters, and C. Dickson, unpublished results). P14 is the major nucleic-acid-binding protein of MMTV, a conclusion that is supported by the presence of two of the motifs (N, N + 3, N + 13) of cysteine residues identified in other systems (1, 4). However, any function of p30 remains a mystery, since to our knowledge, no equivalent protein has been described for other retroviruses. As p30 represents a transframe protein, sharing an amino terminus with p14 in frame 1 and a carboxy terminus with PrllO in frame 3, it will clearly be interesting to determine the complete amino acid sequence of p30 to define exactly where in the nucleotide the frameshift occurs. Protease. One consequence of expressing viral structural proteins in the form of precursors is the requirement for a specific protease to process these polypeptides into the mature forms. It is therefore hardly surprising that retroviruses encode their own enzymes to perform this role, but it has become apparent that they have adopted various strategies for implementing this function. Avian leukosis viruses, for example, include the protease within the gag gene; in murine and feline leukemia viruses, the enzyme is formally sequence
nuclease-treated lysate. We noted that several of the major in vitro products migrated as triple rather than single entities in acrylamide gels, but we have no explanation for this phenomenon, nor have we been able to distinguish differences between the various species.
C;QT~ ~bT
VOL. 61, 1987 MMT V
aMPMV
RSV MoMLV HTLV-1 V T S NA
DPLTNDKLAAAJLV
QZQMEASITKSs$
DUWLPEGILV<WV
EKEL4I £PSLISW IVWVZRKA WSY ALLHRAVU4 Q L.LEQtr LV PCQSMU LLPV*PGN NYRPVGS .JEVN
KQYRVSAR LGSIKPHI
MY T Vt
MM T V MPMV RSV MoM LV HTLV- 1 VISNA
L
Kf$KlflsvnSPM.AlVrarJ
.AV
A AKLTVPF AVQQAPVLSAP WPLMVLD&KCWSIPLAEQORE A PAPTVfNQ A LRvED:Mril S Q F' nEW R D?EMS:' H PTV PN PY N LL SGLP S FPS HQWYV LDLKIDA PFC L RF, L S LT 'IDLSSSSPQPPDLSSLPrT LAH LQTWb>A"QI PLPKQFQPYIWTV W rQYG KQTEDLAEAQLSHPGGLQ RKKHVTILRIGDAYVTIPLYEPYFQY'TCFTMLPN'ULG
Vt
flQKFVP%ALLTVWfDKYQWYlSWYXPZLL&MFWtZVUZ
kn AWkQMY-IwLI AGK:; GQ QV LQ A P!siMJIKn?QAIbtonYvAsATA;H I0TVVGQ\ EPLRLKHF..CML*I£LLAASSH LEA PARVtKVLJT SGQLT W#TRRf$ DAHl~:RQHPODLILLYvRL4AtSEI.D S PWf ,vFI ].L L Z, P RW KtWA H. PGNAY 4wA-V V A-, L,3 DLLE S EH F. RT W M : QFGWY:G PCvatYYIKVL WRLS#A;YQFTMQIK:TLF.JIEErP
MIVI
b
F
RI ?EP>LT C _ L ; FaXE LFA v4 SE }i
MPMV R S, V MOML.V MT L V -1 V I SNA
C FDQL K QE L"T AAHI APS.,VQLQEDP Y TYWFEL NGP AG- E E V ISS T L E RA TI SPDKVQREEPS VQflYKLGST Y KEG LA GTR ALL T LG N LRx'AA KKAQI CQQVKIITt LSEAT MASLI SDSLPVIENKT,QQZKTPC;M-KFIAQIZSPN GI7TLV N F L A S YrA Q YSFlMPyE.CQEJYP AKWLS FLH?E
MNTV% MloM LV HTL'V- 2. VI SNA
GVNPR ZLT*AIEBSAQKSEALHQNAAALRFQFHITREQAREIKVKLCPNCPDWGRAPQL I Nf N L&% A4Hr, N A C77L T-TM P .1 IF It MAR I VW-YCP Cp' 9 L GVNPR QA YErL?EvAKLMTALML 3PF ;.ALSKAThNIZXQARE V&?CPF CNS ^ PAL.EA GVNPIR r'lY M LNFD FNI..C>CAASA.^G'P. THLSF S .MK A ..EFSWHS2 C AeC fH. A LTT LQ, P P L, L. ;S; A E: RESX TEB w =^ A.G7 @ -FWR a A T:TT:& P ALrSF H i IG T?A 5C sQ ' FR TA L. W ECr E AHF I LI
M TV MPMV PR SV MoMLV MTLV -1 VISNA
WQMDVTHVSEPGKL KYVHVTVDTYSHFTFATARTGQATKDVLQJLAQSFA V: T HL.z-; G"TrTK . Fs WQMDVTh"'SEflN L KYtNV:H;: D Tw S PAT1 WTTDFT GLGPLI FF3 WLA.-,?Y£VTVDTA.SJw s/W}T';- ; fi;HA?A 3 L_Y frKYI -D A E -KvTrKLYF .VFY FIDTFS GHRPPS-TH WE D FTE IF-n KN L F HVAVDT?S AIX.T K 3:5JLA:A GLS1P H 9WQ D :TN F;-. IVTr TNSs' .A erG: GSN DxR DYTEx.rf-, LP .rhVxMKWYA =ID1WQ.
M PMV
MMT V M PMV M oM LV H L -A V ISNA
487
TM RKALEAQUIXYT G?GN?VFPVPLWA NGTISF QPVPFPgUlQHL AQI WTUE GtKEIVDRLEKEGKVGF¶AFPHWTCWNTIIFCZIW 151"MLLIDFREL"U
MPMV R S IV" MoMLV HTLV-1 VISNA
MMTV DNA SEQUENCE
-
-
GLKPRVL
QLFPM'
-
.
YMGIPQKZKTDNAPAYVSRSIQEFLARWKXSHVTGI PYNPQOQAIVERTHQNIKAQLNKL V H:TGITPYNPQQQ:I--VLRAHLKT??:E K: :AIBLPKQW1KTDlM PGP-YTYSVVrQEFW½CI2--eVIVLG^R PKA IKTDNIGJ>_ W- LSKS? EnW-ARW 2It*^H.TGi;P *eN LQGQAMVKRCEA\Esi .K2 1F IRVL T GyI P::YPQ .'*vRIJVJ sVK I P. C N ,..- LA R WI It.-. 1-. PQ P K A Inwr_TD-3PAN'-AFVSi` NT DNff.:3 `_Y- FS K S. --E'r - IKL' 'QKL F FGM
..' HKT>PYN?`x3/K. HL_GKP'SYINTDNSPAYSCW
H GI PA"NP.QA" VKRTHQt-'K\ BK lgF A PrK.3EZDXN.3 PA FV AEES', .J--;-. ..iT FIG. 5. Amino acid sequence homologies among retrovirus pol genes. Sequence comparisons are presented between the pol genes of MMTV (this paper), Mason-Pfizer monkey virus (49), Rous sarcoma virus (42), Moloney murine leukemia virus (47), HTLV-I (44), and Visna virus (48). Residues homologous to the MMTV sequence are shaded. The distinction between polymerase (a) and endonuclease (b) domains is based on the features of these proteins in Rous sarcoma virus and Moloney murine leukemia virus.
part of the pol gene, occurring downstream of the termination codon separating gag and pol; and in HTLV-I, HTLVII, bovine leukemia virus, and Mason Pfizer monkey virus, the protease is encoded in a separate reading frame between the gag and pol genes (5, 18, 19, 38, 40, 42, 44, 46, 47, 49, 58-60). The situation in MMTV most closely parallels that of the last virus group in that the sequence LLDTGADK, which is homologous to the consensus active site of retroviral and some cellular acid proteases (52), occurs in frame 3 (centered on nucleotide 3821) in the bridge between the gag
and pol domains (Fig. 1 and 2). Although we are not yet in a position to define the boundaries or exact size of the MMTV protease, it is clear from these data that its expression requires at least one translational frameshift. Polymerase and endonuclease. Characterization of the MMTV reverse transcriptase has been and remains a contentious topic, with conflicting reports of a 100-kDa monomer or 85- and 55-kDa dimers as the active moiety (for a review, see reference 9). Now that the full DNA sequence is known, it should be possible to resolve this issue by obtain-
488
J. VIROL.
MOORE ET AL. Pr77 /PrllO READTHROUGH
a
3260
GCTGAAAATTCAAAAAACTTGTAAAGGGGCA Frame 1 Frame 3
b
E
A *
N S K N L K F K K L V
Q
K G
PrllO/Prl6O READTHROUGH 4
4069
GATGATTCACAGGATTTATGATAGGGGCCAT Frame
3
Frame 2
D
D S Q D L * F T G F M
I
G
A
I
FIG. 6. Sequences involved in translational frameshifting. The DNA sequences spanning the boundaries between Pr77 and PrllO (a) and between PrllO and Prl6O (b) are displayed in a simplified form. The amino acids encoded in the relevant frames are indicated by the single letter code, with stop codons also shown (i). The sequence motifs discussed in the text are underlined.
ing and aligning terminal amino acid sequences. A similar strategy will be required to characterize the endonuclease domain, since the presence of poison sequences has precluded the types of mutagenesis studies applied to other retroviruses (12, 16, 32, 43). As noted by others, the coding domains of polymerase and endonuclease are nevertheless readily discernible in the MMTV sequence simply by homologies to other known sequences (3, 6, 38, 40, 42, 44, 46-49). The degree of conservation is most striking within the polymerase itself (Fig. 5), particularly between MMTV and Mason-Pfizer virus, where the homology in the region shown is around 65% (Fig. 5) (49). env gene. We previously reported the env gene sequence of the GR strain of MMTV and discussed its implications in terms of the viral glycoproteins and their precursor Pr73 (37). The nucleotide sequence of the corresponding region of the BR6 provirus differs at 60 positions, reflected in only 10 amino acid differences. Five of these changes are localized within the leader preceding the gpS2 domain (17, 26, 37), but their significance is unclear, since we do not yet know which ATG codon serves as the initiator for Pr73 (26, 37). None of the essential features of the two mature glycoproteins, such as the number and position of glycosylation sites or the nature of the transmembrane anchor, is affected. The 3' LTR and orf. The most significant feature of the 3' LTR of MMTV is the existence of a substantial open reading frame designated orf, which begins at an A residue in the polypurine tract immediately preceding the boundary of the LTR and continues for 962 residues in the U3 portion of the genome (8, 10, 11, 14, 20, 26, 45). We previously argued that the maintenance of orf in several endogenous and milk-borne strains of MMTV indicates it has a functional role (9). The sequence presented in Fig. 1 extends this argument, since the integrity of the reading frame is maintained despite 94 single-base differences between this sequence and the GR virus LTR. Moreover, the putative splice acceptor site mapped approximately 50 nucleotides upstream of the LTR is also represented in our sequence (54, 57) (Fig. 1). Suppression by translational frameshifting. Although it is still unclear how or why orf may be expressed, synthesis of the env gene from a spliced, subgenomic mRNA presumably reflects the need for high levels of the viral glycoproteins, roughly equivalent to the levels of the gag structural pro-
teins. In contrast, the protease, polymerase, and endonuclease are nonstructural components, and although they may occur in virions, they are required in only catalytic amounts. It has long been recognized that infected cells express the gag-pol precursor at only around 5% of the level of the gag precursor, and it has recently become apparent that retroviruses have adopted an unusual strategy, namely suppression of termination condons, for controlling the levels of these functions (19, 36, 58-60). In MMTV, there are three precursors, now identified as pr77gag prl logag pro, and Pr160gag pro pol, so that two termination codons must be bypassed during synthesis of the pol gene. This required two frameshifts (Fig. 2), and both events could occur in rabbit reticulocyte lysates primed with defined segments of MMTV RNA (Fig. 3 and 4). From the ratios of readthrough products to terminated products, we estimated that each frameshift occurred with an efficiency of around 15 to 25% in vitro, which is reasonably consistent with the levels of Pr77, PrilO, and Prl60 observed for infected cells (7, 9). However, these studies cannot formally exclude the possibility of self-splicing or splicing catalyzed by components of the reticulocyte lysate. The sequences at which frameshifting occurs are displayed in more detail in Fig. 6. At present, we have insufficient information to determine where within the overlaps between frames 1 and 3 in Fig. 6a and between frames 3 and 2 in Fig. 6b the actual switches occur. Resolution of this issue awaits protein sequence data for the virion p30 and protease, the two likely candidates for products which span these frameshifts. However, we note that both these frameshift sequences comply with precedents set by other retroviruses, in that one occurs at or near the sequence ATTTA, as in avian sarcoma virus (19, 42), and the other probably involves the AAAAAAC sequence identified as the potential frameshift site in bovine leukemia virus, HTLV-I, and HTLV-II and paralleled in some procaryote systems (22, 38, 46, 56, 60). Moreover, both frameshift sites are closely followed by potential stem-loop structures in the virion RNA. Although we have not assessed the thermodynamic stability of such secondary structural features relative to others in the sequence, it is interesting to speculate that they may contribute in cis to frameshifting (38, 46, 60). Whether such processes have wider significance in the general control of gene expression remains an intriguing possibility. ACKNOWLEDGMENTS We thank S. Brookes for additional technical help; L. Crawford, J. Wyke, and J. Witkowski for comments on the manuscript; and A. Kessler for its preparation. LITERATURE CITED 1. Arthur, L. O., C. W. Long, G. H. Smith, and D. L. Fine. 1978. Immunological characterization of the low molecular weight DNA binding protein of mouse mammary tumor virus. Int. J. Cancer 22:433-440. la.Brookes, S., M. Placzek, R. Moore, M. Dixon, C. Dickson, and G. Peters. 1986. Insertion elements and transitions in cloned mouse mammary tumour virus DNA: further delineation of the poison sequences. Nucleic Acids Res. 14:8231-8245. 2. Buetti, E., and H. Diggelmann. 1981. Cloned mouse mammary tumor virus DNA is biologically active in transfected cells and its expression is stimulated by glucocorticoid hormones. Cell 23:335-345.
3. Chiu, I. M., R. Callahan, S. R. Tronick, J. Schlom, and S. A. Aaronson. 1984. Major pol gene progenitors in the evolution of oncomaviruses. Science 223:364-370. 4. Copeland, T. D., S. Oroszlan, V. S. Kalyanaraman, M. G.
VOL. 61, 1987
5.
6.
7. 8. 9. 10.
11. 12.
13.
14.
15. 16.
Sarngardharan, and R. C. Gallo. 1983. Complete amino acid sequence of human T-cell leukemia virus structural protein p15. FEBS Lett. 162:390-395. Crawford, S., and S. P. Goff. 1985. A deletion mutation in the 5' part of the pol gene of Moloney murine leukemia virus blocks proteolytic processing of the gag and pol polyproteins. J. Virol. 53:899-907. Deen, K. C., and R. W. Sweet. 1986. Murine mammary tumor virus pol-related sequences in human DNA: characterization and sequence comparison with the complete murine mammary tumor virus pol gene. J. Virol. 57:422-432. Dickson, C., and M. Atterwill. 1979. Composition, arrangement and cleavage of the mouse mammary tumor virus polyprotein precursor Pr77gag and pllOgag. Cell 17:1003-1012. Dickinson, C., and G. Peters. 1981. Protein-coding potential of mouse mammary tumor virus genome RNA as examined by in vitro translation. J. Virol. 37:36-47. Dickson, C., and G. Peters. 1983. Proteins encoded by mouse mammary tumour virus. Curr. Top. Microbiol. Immunol. 106:1-34. Dickson, C., R. Smith, and G. Peters. 1981. In vitro synthesis of polypeptides encoded by the long terminal repeat region of mouse mammary tumour virus DNA. Nature (London) 291:511-513. Donehower, L. A., A. L. Huang, and G. L. Hager. 1981. Regulatory and coding potential of the mouse mammary tumor virus long terminal redundancy. J. Virol. 37:226-238. Donehower, L. A., and H. E. Varmus. 1984. A mutant murine leukemia virus with a single missense codon in pol is defective in a function affecting integration. Proc. Natl. Acad. Sci. USA 81:6461-6465. Fasel, N., E. Buetti, J. Firzlaff, K. Pearson, and H. Diggelmann. 1983. Nucleotide sequence of the 5' noncoding region and part of the gag gene of mouse mammary tumor virus; identification of the 5' splicing site for subgenomic mRNAs. Nucleic Acids Res. 11:6943-6955. Fasel, N., K. Pearson, E. Buetti, and H. Diggelmann. 1982. The region of mouse mammary tumor virus DNA containing the long terminal repeat includes a long coding sequence and signals for hormonally regulated transcription. EMBO J. 1:3-7. Foulds, L. 1949. Mammary tumours in hybrid mice: the presence and transmission of the mammary tumour agent. Br. J. Cancer 3:230-239. Grandgenett, D., T. Quinn, P. J. Hippenmeyer, and S. Oroszlan. 1985. Structural characterization of the avian retrovirus reverse transcriptase and endonuclease domains. J. Biol. Chem.
260:8243-8249. 17. Henderson, L. E., R. Sowder, G. Smythers, and S. Oroszlan. 1983. Terminal amino acid sequences and proteolytic cleavage sites of mouse mammary tumor virus env gene products. J. Virol. 48:314-319. 18. Herr, W. 1984. Nucleotide sequence of AKV murine leukemia virus. J. Virol. 49:471-478. 19. Jacks, T., and H. E. Varmus. 1985. Expression of the Rous sarcoma virus pol gene by ribosomal frameshifting. Science 230:1237-1242. 20. Kennedy, N., G. Knedlitschek, B. Groner, N. E. Hynes, P. Herrlich, R. Michalides, and A. J. J. van Ooyen. 1982. Long terminal repeats of endogenous mouse mammary tumour virus contain a long open reading frame which extends into adjacent sequences. Nature (London) 295:622-624. 21. Klemenz, R., M. Reinhardt, and H. Diggelmann. 1981. Sequence determination of the 3' end of mouse mammary tumor virus RNA. Mol. Biol. Rep. 7:123-126. 22. Kohno, T., and J. R. Roth. 1978. A salmonella frameshift suppression that acts at runs of A residues in the messenger RNA. J. Mol. Biol. 126:37-52. 23. Kozak, M. 1984. Compilation and analysis of sequences upstream from the transcriptional start site in eucaryotic mRNAs. Nucleic Acids Res. 12:857-872. 24. Lobel, L. I., and S. P. Goff. 1985. Reverse transcription of retroviral genomes: mutations in the terminal repeat sequences. J. Virol. 53:447-455.
MMTV DNA SEQUENCE
489
25. Majors, J. E., and H. E. Varmus. 1981. Nucleotide sequences at host-proviral junctions for mouse mammary tumour virus. Nature (London) 289:253-258. 26. Majors, J. E., and H. E. Varmus. 1983. Nucleotide sequencing of an apparent proviral copy of env mRNA defines determinants of expression of the mouse mammary tumor virus env gene. J. Virol. 47:495-504. 27. Mann, R. S., R. C. Mulligan, and D. Baltimore. 1983. Construction of a retrovirus packaging mutant and its use to produce helper-free defective retrovirus. Cell 32:871-879. 28. Melton, D. A., P. A. Krieg, M. R. Robagliati, T. Maniatis, K. Zinn, and M. R. Green. 1984. Efficient in vitro synthesis of biologically active RNA and RNA hybridization probes from plasmids containing a bacteriophage SP6 promoter. Nucleic Acids Res. 12:7035-7056. 29. Moore, D. H., C. A. Long, A. A. Vaidya, J. B. Sheffield, A. S. Dion, and E. Y. Lasfargues. 1979. Mammary tumor viruses. Adv. Cancer Res. 29:347-418. 30. Moore, R., G. Casey, S. Brookes, M. Dixon, G. Peters, and C. Dickson. 1986. Sequence, topography and protein coding potential of mouse int-2: a putative oncogene activated by mouse mammary tumour virus. EMBO J. 5:919-924. 31. Panganiban, A. T., and H. M. Temin. 1984. Circles with two tandem LTRs are precursors to integrated retrovirus DNA. Cell 36:673-679. 32. Panganiban, A. T., and H. M. Temin. 1984. The retrovirus pol gene encodes a product required for DNA integration: identification of a retrovirus int locus. Proc. Natl. Acad. Sci. USA 81:7885-7889. 33. Peters, G., S. Brookes, R. Smith, and C. Dickson. 1983. Tumorigenesis by mouse mammary tumor virus: evidence for a common integration region for provirus integration in mammary tumors. Cell 33:369-377. 34. Peters, G., and C. Glover. 1980. tRNA's and priming of RNAdirected DNA synthesis in mouse mammary tumor virus. J. Virol. 35:31-40. 35. Peters, G., M. Placzek, S. Brookes, C. Kozak, R. Smith, and C. Dickson. 1986. Characterization, chromosomal assignment, and segregation analysis of endogenous proviral units of mouse mammary tumor virus. J. Virol. 59:535-544. 36. Philipson, L., P. Anderson, U. Olshevsky, R. Weinberg, D. Baltimore, and R. Gesteland. 1978. Translation of MuLV and MSV RNAs in nuclease-treated reticulocyte extracts: enhancement of the gag-pol polypeptide with yeast suppressor tRNA. Cell 13:189-199. 37. Redmond, S. M. S., and C. Dickson. 1983. Sequence and expression of the mouse mammary tumour virus env gene. EMBO J. 2:125-131. 38. Rice, N. R., R. M. Stephens, A. Burny, and R. V. Gilden. 1985. The gag and pol genes of bovine leukemia virus: nucleotide sequence and analysis. Virology 142:357-377. 39. Ringold, G. M. 1983. Regulation of mouse mammary tumor virus gene expression by glucocorticoid hormones. Curr. Top. Microbiol. Immunol. 106:79-103. 40. Sagata, N., T. Yasunaga, J. Tsuzuku-Kawamura, K. Ohishi, Y. Ogawa, and Y. Ikawa. 1985. Complete nucleotide sequence of the genome of bovine leukemia virus: its evolutionary relationship to other retroviruses. Proc. Natl. Acad. Sci. USA 82:677-681. 41. Schultz, A. M., and S. Oroszlan. 1983. In vivo modification of retroviral gag gene-encoded polyproteins by myristic acid. J. Virol. 46:355-361. 42. Schwartz, D. E., R. Tizard, and W. Gilbert. 1983. Nucleotide sequence of Rous sarcoma virus. Cell 32:853-869. 43. Schwartzberg, S., J. Colicelli, and S. P. Goff. 1984. Construction and analysis of deletion mutations in the pol gene of Moloney murine leukemia virus: a new viral function required for productive infection. Cell 37:1043-1052. 44. Seiki, M., S. Hattori, Y. Hirayama, and M. Yoshida. 1983. Human adult T-cell leukemia virus: complete nucleotide sequence of the provirus genome integrated in leukemia cell DNA. Proc. Natl. Acad. Sci. USA 80:3618-3622. 45. Sen, G. C., J. Racevskis, and N. H. Sarkar. 1981. Synthesis of
490
46.
47. 48.
49. 50. 51. 52. 53.
J. VIROL.
MOORE ET AL. murine mammary tumor viral proteins in vitro. J. Virol. 37:963-975. Shimotohno, K., Y. Takahashi, N. Shimizu, T. Gojobori, D. W. Golde, I. S. Y. Chen, M. Miwa, and T. Sugimura. 1985. Complete nucleotide sequence of an infectious clone of human T-cell leukemia virus type II: an open reading frame for the protease gene. Proc. Natl. Acad. Sci. USA 82:3101-3105. Shinnick, T. M., R. A. Lerner, and J. G. Sutcliffe. 1981. Nucleotide sequence of Moloney murine leukemia virus. Nature (London) 293:543-548. Sonigo, P., M. Alizon, K. Staskus, D. Klatzmann, S. Cole, 0. Danos, E. Retzel, P. Tiollais, A. Haase, and S. Wain-Hobson. 1985. Nucleotide sequence of the visna lentivirus: relationship to the AIDS virus. Cell 42:369-382. Sonigo, P., C. Barker, E. Hunter, and S. Wain-Hobson. 1986. Nucleotide sequence of Mason-Pfizer monkey virus: an immunosuppressive D-type retrovirus. Cell 45:375-385. Sorge, J., and S. H. Hughes. 1982. Polypurine tract adjacent to the U3 region of the Rous sarcoma virus genome provides a cis-acting function. J. Virol. 43:482-488. Staden, R. 1980. A new computer method for the storage and manipulation of DNA gel reading data. Nucleic Acids Res. 8:3673-3694. Toh, H., M. Ono, K. Saigo, and T. Miyata. 1985. Retroviral protease-like sequence in the yeast transposon Tyl. Nature (London) 315:691. Ucker, D. S., S. R. Ross, and K. R. Yamamoto. 1981. Mammary
54. 55.
56. 57.
58.
59.
tumor virus DNA contains sequences required for its hormone regulated transcription. Cell 27:257-266. Van Ooyen, A. J. J., R. J. A. M. Michalides, and R. Nusse. 1983. Structural analysis of a 1.7-kilobase mouse mammary tumor virus-specific RNA. J. Virol. 46:362-370. Watanabe, S., and H. M. Temin. 1982. Encapsidation sequences for spleen necrosis virus, an avian retrovirus, are between the 5' long terminal repeat and the start of the gag gene. Proc. Natl. Acad. Sci. USA 79:5986-5990. Weiss, R. B. 1984. Molecular model of ribosome frameshifting. Proc. Natl. Acad. Sci. USA 81:5797-5801. Wheeler, D. A., J. S. Butel, D. Medina, R. D. Cardiff, and G. L. Hager. 1983. Transcription of mouse mammary tumor virus: identification of a candidate mRNA for the long terminal repeat gene product. J. Virol. 46:42-49. Yoshinaka, Y., I. Katoh, T. D. Copeland, and S. Oroszlan. 1985. Murine leukemia virus protease is encoded by the gag-pol gene and is synthesized through suppression of an amber termination codon. Proc. Natl. Acad. Sci. USA 82:1618-1622. Yoshinaka, Y., I. Katoh, T. D. Copeland, and S. Oroszlan. 1985. Translational readthrough of an amber termination coding during synthesis of feline leukemia virus protease. J. Virol.
55:870-873. 60. Yoshinaka, Y., I. Katoh, T. D. Copeland, G. W. Smythers, and S. Oroszlan. 1986. Bovine leukemia virus protease: purification, chemical analysis, and in vitro processing of gag precursor polyproteins. J. Virol. 57:826-832.