MuLV - Journal of Virology - American Society for Microbiology

5 downloads 49456 Views 1MB Size Report
Apr 26, 2006 - Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane, .... Auto- mated sequencing was performed in the DNA core facility of Robert Wood. Johnson ...... AMBER, a computer program for applying molecular mechanics, normal.
JOURNAL OF VIROLOGY, Oct. 2006, p. 9497–9510 0022-538X/06/$08.00⫹0 doi:10.1128/JVI.00856-06 Copyright © 2006, American Society for Microbiology. All Rights Reserved.

Vol. 80, No. 19

Revealing Domain Structure through Linker-Scanning Analysis of the Murine Leukemia Virus (MuLV) RNase H and MuLV and Human Immunodeficiency Virus Type 1 Integrase Proteins Jennifer Puglia,1† Tan Wang,2† Christine Smith-Snyder,1 Marie Cote,1 Michael Scher,1 Joelle N. Pelletier,4 Sinu John,3 Colleen B. Jonsson,2 and Monica J. Roth1* Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane, Piscataway, New Jersey 088541; Department of Biochemistry and Molecular Biology, Southern Research Institute, 2000 9th Ave. S., Birmingham, Alabama 352052; Graduate Program in Biochemistry and Molecular Genetics, University of Alabama at Birmingham, Birmingham, Alabama 352943; and De´partement de Chimie, Faculte´ des Arts et Sciences, et De´partement de Biochimie, Faculte´ de Me´decine, Universite´ de Montre´al, C.P. 6128, Succursale Centre-Ville, Montre´al, Que´bec H3C 3J7, Canada4 Received 26 April 2006/Accepted 7 July 2006

Linker-scanning libraries were generated within the 3ⴕ terminus of the Moloney murine leukemia virus (M-MuLV) pol gene encoding the connection-RNase H domains of reverse transcriptase (RT) as well as the structurally related M-MuLV and human immunodeficiency virus type 1 (HIV-1) integrase (IN) proteins. Mutations within the M-MuLV proviral vectors were Tn7 based and resulted in 15-bp insertions. Mutations within an HIV-1 IN bacterial expression vector were based on Tn5 and resulted in 57-bp insertions. The effects of the insertions were examined in vivo (M-MuLV) and in vitro (HIV-1). A total of 178 individual M-MuLV constructs were analyzed; 40 in-frame insertions within RT connection-RNase H, 108 in-frame insertions within IN, 13 insertions encoding stop codons within RNase H, and 17 insertions encoding stop codons within IN. For HIV-1 IN, 56 mutants were analyzed. In both M-MuLV and HIV-1 IN, regions are identified which functionally tolerate multiple-linker insertions. For MuLV, these correspond to the RT-IN proteolytic junction, the junction between the IN core and C terminus, and the C terminus of IN. For HIV-1 IN, in addition to the junction between the IN core and C terminus and the C terminus of IN, insertions between the N terminus and core domains maintained integration and disintegration activity. Of the 40 in-frame insertions within the M-MuLV RT connection-RNase H domains, only the three C-terminal insertions mapping to the RT-IN proteolytic junction were viable. These results correlate with deletion studies mapping the domain and subdomain boundaries of RT and IN. Importantly, these genetic footprints provide a means to identify nonessential regions within RT and IN for targeted gene therapy applications. The replication and integration of retroviral particles are two distinct yet interrelated processes. Replicative complexes and preintegrative complexes have been purified and characterized from infected cells (6, 9, 17, 28–30, 39, 52, 53, 55, 56, 66, 67). Within viral species as well as between viral species, the composition of replicative complexes differs from that of preintegrative complexes. Interactions between RT and IN are also reported (40, 69, 89, 90, 96), and multiple mutations of IN are known to alter viral replication (27, 58–60). Despite extensive efforts, the assembly of these complexes is not well understood. These studies have been assisted by structural studies. A structure of the M-MuLV RT has recently been reported (21), as have structures of related retroviral IN subdomains (8, 14, 18, 19, 26, 35, 43, 87, 94). However, to date, neither a structure of a complete retroviral IN protein nor one of a subdomain in complex with DNA has been obtained. The ability of retroviral particles to stably integrate into the host genome is a great benefit for gene delivery, but the potential for insertional mutagenesis cannot be overlooked (15, 22, 38, 63). Schemes to target integration into alternative positions within the host chromosome to avoid this issue frequently involve generation of fusion proteins with novel targeting domains (10, 48, 84). The linker insertion genetic footprint provides a means to identify nonessential regions

Methods have been developed for the comprehensive analysis of a gene by construction of a saturating or near-saturating library of mutants (5, 78, 83). This approach has defined domain boundaries, provided functional maps, and given insights into previously predicted unstructured loops (4, 5, 50, 71, 73, 78, 83). In this report, this method of insertional functional mapping is applied to three catalytically related domains: the Moloney murine leukemia virus (M-MuLV) RNase H domain of the reverse transcriptase (RT), and the M-MuLV and human immunodeficiency virus type 1 (HIV-1) integrase (IN) proteins. Inclusion of the HIV-1 IN protein assisted comparison and model building, since structural information is available (7, 18, 26, 34, 35, 37, 87). In the retroviral life cycle, the RNase H activity is required for viral replication during the conversion of the viral RNA (vRNA) into double-stranded DNA through the RNA-DNA intermediate. The IN protein is required for the insertion of the double-stranded DNA into the host chromosome, establishing the integrated provirus. * Corresponding author. Mailing address: Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane, Piscataway, NJ 08854. Phone: (732) 235-5048. Fax: (732) 235-4783. E-mail: [email protected] .edu. † These authors contributed equally to the manuscript. 9497

9498

PUGLIA ET AL.

within proteins capable of withstanding insertions. Extending these studies to include the RNase H domain provides a parallel analysis of a protein containing a related catalytic core consisting of an acidic catalytic triad. In this report, the 3⬘ terminus of the M-MuLV pol gene and the HIV-1 IN gene were subjected to random insertion mutagenesis. Individual constructs, selected from the library, were assayed for the effects on virus viability in vivo or IN functions in vitro. Using this complementary approach, four regions functionally tolerant of insertions were identified within RNase H-IN. These regions correlate with domain and protein junctions. No viable linker insertions were identified within any nonstructured regions of connection-RNase H. MATERIALS AND METHODS Generation of plasmids. Construction and analysis of pNCA-C, a viable, replication-competent M-MuLV proviral construct have been previously described (31). The pNCA-C-XN-SU8 M-MuLV proviral construct was derived from pNCA-C (74, 80). This contains a NotI linker within the XbaI site at the 3⬘ terminus of the M-MuLV pol gene, yielding a 23-amino-acid C-terminal truncation of the MuLV IN protein plus a suppressor tRNA in the 3⬘ long terminal repeat (LTR) (SU8) (57). The 3⬘ terminal two-thirds of the pol gene was subcloned into a minimal plasmid backbone for mutagenesis. The amp gene and the origin of replication of pGEM-3Zf(⫹) (Promega) were PCR amplified using primer Hpatag/45314 (5⬘-GCCGTTAACACATGTGAGCAAAAGGCC-3⬘) and primer Bamtag/ 45315 (5⬘-CGGGATCCTTGAAAAAGGAAGAGTATG-3⬘) using a mixture of 5 U Taq DNA pol (Invitrogen) and 2.5 U cloned Pfu (Stratagene). The 1.7-kb PCR product was digested with BamHI and HpaI and ligated to the 2.3-kb BamHI/HpaI fragment from pNCA-C-XN-SU8. The resulting plasmid, pGEMBH-XN-BH, was used for mutagenesis. To facilitate the reconstruction of the insertional library into pNCA-C-XNSU8, a deletion within MuLV IN was generated. pNCA-C-XN-SU8 was partially digested with XmnI, and the linear product was isolated and digested with PmlI. The 10-kb DNA fragment was isolated and ligated to yield pNCA-C-XN-SU8⌬IN. This deletion within IN maintains the SalI and NotI sites required for reconstruction of the library. pNCA-C-XN-SU8-⌬IN and pGEM-BH-XN-BH plasmids were generated in HB101 Escherichia coli cells. Mutagenesis. M-MuLV mutagenesis was performed on pGEM-BH-XN-BH (80 ng) using the GPS-LS linker scanning system kit (NEB). The method is based on random Tn7 transposition (5) introducing the chloramphenicol resistance gene (Cmr). DNA was introduced into ElectroMAX DH10B cells (Gibco BRL) by electroporation. Chloramphenicol-resistant colonies (105) were selected on one 245- by 245-mm plate. Colonies were scraped off the plate and pooled; the mutagenized pGEM-BH-BH-chlor plasmids were isolated (Midi protocol; QIAGEN) and maintained as a library. This initial library was digested with PmeI to remove the chloramphenicol resistance gene, ligated, and electroporated into ElectroMAX DH10B cells (Gibco BRL). Ampicillin-resistant colonies were pooled and lysed to isolate the pGEM-BH-XN-BH-15 constructs, which contained the 15-bp linker insertion encoding a PmeI site. The EZ:TN in-frame linker insertion kit (Epicenter Biotechnologies) was used to generate a library of mutants of HIV-1 IN with 19-amino-acid insertions within the target plasmid pINSD.His (NIH AIDS Research and Reference Reagent Program) by following the manufacturer’s protocols. Mutants were screened using a PCR-based strategy. The pinsdBscreen primer (5⬘-CGG GCT TTG TTA GCA GCC GG-3⬘) and pinsdFscreen primer (5⬘-GGT GCC GCG CGG CAG CC-3⬘) were used to amplify the HIV-1 IN sequence, which annealed to nucleotide positions 301 to 320 and 335 to 351 of the pET15b plasmid, respectively. The PCR mixtures contain 1⫻ PCR buffer from the Expand Long template PCR system (Boehringer Mannheim), 2.25 mM MgSO4, 0.2 mM deoxynucleoside triphosphate, 2 ␮M primers, 2 U Taq polymerase, and a toothpick trace of the glycerol stocks stored at ⫺80°C. PCR conditions were as follows: 94°C for 4 min, followed by 35 cycles of denaturing at 94°C for 30 s, annealing at 69°C for 30 s, and an extension step at 72°C for 2 min and 15 s. This cycle was followed by a final extension period at 72°C for 4 min, which was followed by a hold at 4°C. After PCR, the samples were loaded onto a 1.5% agarose gel and examined to determine which of the clones was positive for linker insertion within the IN gene. Clones with insertions were individually digested with NotI according to manufacturer’s recommendations.

J. VIROL. Reconstruction of library into MuLV provirus. The 15-bp insertion library was reconstructed back into the pNCA-C-XN-SU8 provirus backbone. The SalI-NotI 2,060-bp fragment from the pGEM-BH-XN-BH-15 library was exchanged into the pNCA-C-XN-SU8-⌬IN, which was digested with the same enzymes. Library DNA was introduced into chemically competent UltraMAX DH5␣-FT (Tetr) (Gibco BRL) and maintained on one 245- by 245-mm plate. Since the Tn7 transposition was performed on the BamHI-HpaI 2,281-bp region of MuLV within pGEM-BH-XN-BH, the possibility remained that the insertions occurred outside the SalI-NotI fragment utilized in the provirus reconstruction. To eliminate constructs in which the wild-type coding sequence was transferred, the library DNA was digested with PmeI, and the linear DNA was isolated and ligated to generate the final mutant library. This selected for pNCA-C-XN-SU8 plasmids bearing a PmeI linker insertion. Single isolate mapping with MuLV. The position of the PmeI sites within MuLV (pNCA-C-XN-SU8) was determined by size analysis of the SalI/PmeI fragment released from the individual mutated plasmid library isolates. Automated sequencing was performed in the DNA core facility of Robert Wood Johnson Medical School (UMDNJ) using an appropriate primer determined after restriction mapping. Alternatively, individual colonies were directly sequenced with primers spanning the MuLV RT/IN coding region to identify PmeI-containing sequences. Approximately 750 individual colonies were isolated and screened for insertions. DNA sequencing of HIV-1 clones was performed with the ABI PRISM BigDye Primer v3.0 cycle sequencing ready reaction kit with AmpliTaq DNA polymerase, FS (Applied Biosystems, Foster City, CA) to determine the site of the 19-codon insertion. Sequence data were analyzed with VectorNTI from InforMax Inc. (Frederick, MD). Cell culture. The generation and maintenance of canine D17 cells expressing MCAT, the receptor for ecotropic M-MuLV (pJET) (1) has been previously described (68). Individual PmeI-encoding MuLV proviral constructs (100 ng each) from the final library were transiently introduced to 2 ⫻ 104 D17/pJET (15 mm wells) in the presence of 150 ␮g/ml DEAE-dextran (64). Upon confluence, supernatants were collected and cells were passed to six-well (60 mm) plates for maintenance. Supernatants were collected on all subsequent days of confluence and assayed for RT activity (33). Viral DNA was isolated from RT-positive cultures using the method of Hirt (41). PCR of MuLV viral DNA. Unintegrated MuLV viral DNA (41) was isolated from D17/pJET cells and used as a template for PCR in the presence of 100 pmol of primers JR6325L (5⬘-CAGTACTGACCCCTCTGAGCATC-3⬘) and JR4085R (5⬘-ATCAAGCAAGCTCTTCTAACTGCC-3⬘) using a mixture of Taq DNA pol (5 Units; Invitrogen) and cloned PFU (2.5 Units; Stratagene). The amplified 2.2-kb product (bp 4085 to bp 6325 in the pNCA-C-XN-SU8 parental vector) was isolated from a 1% agarose gel using the QIAquick gel extraction kit protocol (QIAGEN) and subjected to automated DNA sequencing. Expression of the M-MuLV IN C terminus. A directional deletion analysis was performed to select for a stable MuLV IN C terminus expression construct. The His6-thrombin-WTIN construct within a pET vector was digested with SphI and subjected to Bal31 digestion for five time points between 5 and 30 min. The DNA was digested with PstI, and the deletion fragments between 1 and 1.8 kbp were gel isolated. The plasmid pIN1-105 plasmid contains the His6-thrombin-leader followed by the IN 1-105 expressed downstream of an NdeI site (C⌬303) (91). The pIN1-105 was digested with NdeI and blunt ended by filling in with Klenow polymerase. After PstI digestion, the 4,568-bp fragment was isolated and ligated with the Bal31 deletion fragments. Individual colonies (total of 93) were analyzed for the size of the deletion, and 20 were further selected for protein expression in E. coli BL21(DE3). Isolate 77 was subjected to DNA sequence analysis. Expression and purification of HIV IN and mutants. Wild-type HIV-1 IN and insertion mutants were expressed in E. coli BL21(DE3) cells in 50 ml of medium and purified as hexahistidine-tagged fusion proteins as described previously (88). Purification from 50-ml cultures yielded approximately 2 mg of 90 to 95% homogenous protein. The protein fraction refolded at a concentration of 5 mg/ml exhibited the greatest enzymatic activity. HIV-1 IN precipitated upon addition of buffer C (0.2 M NaCl). The precipitated protein was resuspended in buffer D (0.5 M NaCl) to a final concentration of 1 mg/ml. In vitro integration and disintegration assays of HIV-1 IN. Strand transfer and disintegration reactions were performed as described previously (88). Reaction products were separated on a 20% polyacrylamide denaturing gel and subjected to autoradiography or PhosphorImager screens (Molecular Dynamics). Products were quantified with ImageQuant software (Molecular Dynamics). Oligonucleotides were purified on 20% denaturing polyacrylamide gels, 32P labeled at the 5⬘ end with T4 polynucleotide kinase, and hybridized to complementary strands as previously described (47). Unincorporated radioactivity was removed from labeled integration and disintegration substrates with G-25 or G-50 Quick Spin columns (Boehringer, Mannheim, IN).

VOL. 80, 2006

RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION

9499

FIG. 1. (A) Generation of the M-MuLV pol insertional library. The seven steps required to generate the insertional library within the retroviral proviral construct are outlined. The top figure schematically outlines the pNCA-C XN construct, containing the viral LTR and gag, pol, and env genes. The region of the pol gene encoding the RT, connection (C), and RNase H (R) domains and the IN protein subjected to Tn7 insertional mutagenesis (GPS-LS linker scanning system; NEB) are expanded. Restriction sites utilized and their positions within the M-MuLV viral RNA (82), where appropriate, are BamHI (B, 3535), SalI (S, 3705), XbaI (X, 5766), NotI (N), HpaI (H, 5816), and PmeI (P). (B) Generation of the HIV-1 insertional library. The five steps required to generate the insertional library are outlined. The region of the IN gene encoding the protein was subjected to Tn5 insertional mutagenesis, which contains the kanamycin resistance gene between its short 19-bp mosaic end (ME) Tn5 transposase recognition sequences. NotI restriction sites flanking the ME also are shown.

Molecular modeling of MuLV RT. A three-dimensional model of the MMuLV RT was reconstructed using the 1RW3 crystal structure comprising the fingers, palm, thumb, and connection domains (21) and the preliminary RNase H ⌬C crystal structure (54; Wayne Hendrickson, personal communication). A crude full model was generated using the O program (45) by positioning the RNase H ⌬C domain into the diffuse electron density observed in the 1RW3 structure. Nonstructured regions were molecularly modeled using the O program and subjected to energy minimization using the AMBER suite of programs (70). Regions reconstructed include residues 327 to 334 (at the tip of the thumb), residues 475 to 504 (between connection and RNase H), and residues 592 to 603 and 633 to 642 (RNase H). Additional modification included insertions of the C-helix from E. coli RNase H (1G15) (32) between the B- and D-helices of the M-MuLV RNase H domain, mutating the residues to the correct M-MuLV residues and subjecting the final model to energy minimization. The corresponding figures were generated using MOLSCRIPT (49) and Raster3D (65). Structural model of HIV-1 IN monomer. The structural model of the HIV-1 IN monomer was constructed from a combination of two X-ray crystal structures, represented by PDB codes 1k6y (the two-domain finger/core) and 1ex4 (the two-domain core/C terminus). The “A” molecule core region of 1k6y was superimposed onto the “A” molecule core region of 1ex4 using the program O. The 1k6y structure is comprised of residues 1 to 46, 56 to 139, and 149 to 210; and 1ex4 is comprised of residues 56 to 141 and 145 to 270. Thus, the superpositioning consisted of overlaying the C␣ atoms of all common core residues (root mean square deviation, 0.83 angstroms). Where the model contained disordered regions (residues 47 to 55 and 142 to 144, inclusive), polyalanine segments containing the correct number of amino acids were created and moved into the appropriate linking positions in the model. The Ala residues were then changed to the proper residues, and the regions were subjected to least-squares minimization. Similarly, residues 271 to 288 (absent from 1ex4) were created using a polyalanine chain, mutating to the appropriate amino acid residues, followed by energy minimization.

RESULTS Figure 1 outlines the series of steps used to generate the linker insertion library within the M-MuLV proviral construct pNCA-C-XN-SU8 (panel A) and the HIV-1 IN expression construct (panel B). For M-MuLV, the target fragment encoding the 3⬘ terminal two-thirds of the pol gene (2.3 kb BamHI/ HpaI fragment) was first subcloned into a minimal plasmid encoding ori/amp, generating pGEM-XN-BH-XN. The Tn7 mutagenesis system results in the random insertion of the transposon encoding the chloramphenicol resistance gene throughout the plasmid. The use of a minimal plasmid biases the nonessential regions to be within the target viral insert. With the target sequence 2,281 bp in size, the generation of 105 Cmr colonies was indicative of an extensive mutational library. The colonies were pooled, and the plasmid DNA was isolated as a population and digested with PmeI to remove the chloramphenicol resistance gene. After ligation, the remainder of the Tn7 element reconstitutes a 15-bp linker insertion encoding a PmeI site. This population of 5 amino acid insertions was selected for Ampr, colonies were pooled, and the plasmid DNA was isolated as a population. Reconstruction of the library back into a retroviral backbone utilized a proviral construct bearing a deletion of the IN gene, decreasing the possibility of wild-type (WT) sequences within the library. Reconstruction was facilitated by the presence of a

9500

PUGLIA ET AL.

J. VIROL.

FIG. 2. Functional mapping of the linker insertions on the MuLV pol gene products. The figure summarizes the linker insertions and their effects on retroviral viability. E, nonviable termination inserts; F, viable termination inserts; ƒ, nonviable in-frame insertions; , viable in-frame insertions. Asterisks (*) indicate linker insertions previously characterized (74, 79). The insertion marked with a plus sign was temperature sensitive for replication and integration. Amino acid numbering within RT and IN are indicated at the left and right edges. The protease cleavage site marking the junction between MuLV RT and IN is indicated above the sequence. MuLV RT aa 515 is marked, indicating the N terminus of the domain homologous to E. coli RNase H (93). The HHCC N-terminal domain of MuLV IN corresponds to IN1-105 (91). The position of T287 of MuLV IN is indicated, marking the N terminus of the MuLV IN C-terminal domain (Fig. 7). The coding region subjected to mutagenesis in this study includes the sequence N⬘-DEKQ. . . .GGPS-C⬘.

unique NotI site introduced at an XbaI site at the C terminus of IN (75). This mutation truncates the C-terminal 23 amino acids of MuLV IN and maintains virus viability (75). The PmeI-bearing pGEM-BH-XN-BH plasmid library was digested with the unique restriction enzymes SalI and NotI, and the library was regenerated by fragment exchange into the pNCA-C-XN-SU8 proviral backbone. With this approach, it is possible that a small number of WT sequences could be transferred, if the initial transposition occurred either within the 170-bp region between the BamHI and SalI sites at the 5⬘ end or if it occurred in the 50-bp region between the NotI and HpaI sites at the 3⬘ end. To eliminate these particular constructs from the library, the reconstructed pNCA-C-XN-SU8-PmeI library was digested with PmeI, and the linear DNA was isolated, religated, and transformed back into E. coli to generate the final library. WT MuLV does not encode a PmeI site and would be eliminated during this process. The Tn5 mutagenesis system was used to create a library of mutants within HIV-1 IN. The generation of over 2,000 Kanr colonies was indicative of a large-scale mutational library. To make the screening process high throughput, each individual colony was picked and transferred into 96-well culture blocks and PCR-based screening of insertions were conducted. In total, 1,056 colonies were analyzed for the presence of the Tn5 transposon insertion. One hundred eleven clones were positive for having 1 insertion within the HIV-1 IN gene; of these, 56 were unique. Insertion sites of M-MuLV and HIV-1 individual isolates. The final pNCA-C-XN-SU8 PmeI insertion library for M-MuLV was characterized by analyzing individual isolates. Isolates of the final library were subjected to restriction mapping and sequencing analysis (summarized in Fig. 2 and Tables 1 and 2). The 15-bp insertion generated by the linker scanning

system resulted in a 5-amino-acid insertion in 4/6 reading frames and a TAA stop codon in 2/6 reading frames. Sequencing and restriction mapping of isolates from the library demonstrated that insertions were distributed throughout the fragment. The insertion sites, however, were not randomly distributed, with clustering of insertions within the center of the fragment. This could indicate a preference for a specific structure within the plasmid DNA by the transposase enzyme or reflect an inadequate sampling of the large population of constructs within the library. However, within the population examined, a large number of duplicate isolates were identified, indicating that the sample size was representative. A total of 148 in-frame insertions were identified. The ratio of in-frame inserts to those with stop codons was as predicted. In the initial screen of 80 individual colonies analyzed, 67 had unique insertion sites that could be readily sequenced. Approximately 2/5 (29/67) individual constructs had mutations that resulted in stop codons; 37 constructs resulted in the insertion of 5 amino acids. One isolate bore a deletion of 20 essential amino acids within the core region of MuLV IN. The composition of 5-amino-acid inserts is determined by the target site selected and duplicated during the transposition process as well as the sequences encoding the PmeI restriction site. Depending upon the reading frame, the in-frame insertions will encode a core of either CLN or FKQ/H (Table 1). The insertions are therefore not simple aliphatic side chains but contain bulky and often reactive or charged species. Similarly, the TAA stop codon cannot be avoided, as it encodes the core of the PmeI site (GTTTAAAC). Of the 148 in-frame insertions, 40 were within MuLV RT, 10 were within the connection region, and 30 were within RNase H. The remaining 108 in-frame insertions mapped within the MuLV IN protein; 45 mapped to the N-terminal zinc-binding

TABLE 1. Summary of in-frame insertions MuLV 3815 3866 3879 3899 3903 3945 3947 3948 3965 4023 4100 4113 4149 4275 4299 4311 4313 4314 4320 4343 4346 4353 4362 4367 4394 4458 4460 4461 4463 4464 4473 4503 4518 4520 4538 4541 4559 4583 4604 4607 4614 4628 4629 4638 4640 4641 4646 4647 4650 4661 4671 4686 4694 4695 4697 4698 4707 4724 4728 4730 4733 4736 4737 4740 4743 4746 4755 4761 4767 4775 4779 4785 4788 4790

a

aa

b

RT/W406 RT/A423 RT/T427 RT/I434 RT/L435 RT/D449 RT/R450 RT/R450 RT/R456 RT/V475 RT/E501 RT/T505 RT/A517 RT/Q559 RT/T567 RT/K571 RT/M572 RT/M572 RT/E574 RT/T582 RT/D583 RT/R585 RT/F588 RT/T590 RT/R600 RT/L620 RT/L621 RT/L621 RT/K622 RT/K622 RT/F625 RT/C635 RT/K640 RT/G641 RT/R647 RT/G648 RT/Q654 RT/T662 RT/T669 RT/L670 IN/I1 IN/P6 IN/P6 IN/S9 IN/E10 IN/E10 IN/F12 IN/F12 IN/H13 IN/R17 IN/K20 IN/L25 IN/I28 IN/I28 IN/Y29 IN/Y29 IN/T32 IN/Y38 IN/Q39 IN/G40 IN/K41 IN/P42 IN/P42 IN/M44 IN/M44 IN/P45 IN/F48 IN/F50 IN/L52 IN/F55 IN/L56 IN/Q58 IN/L59 IN/T60

Insert

c

WCLNRWP ACLNNAG TMFKQTM ICLNIIL LVFKHLA DLFKHDR RCLNNRW RLFKHRW RCLNTRM VVFKQVV ECLNTEA TLFKQTR AVFKHAD QLFKHQR TLFKHTQ KMFKQKM MCLNKMA MVFKQMA EVFKQEG TCLNNTD DCLNTDS RMFKHRY FVFKHFA TCLNTTA RCLNNRR LLFKHLL LCLNILK LMFKQLK KCLNIKA KVFKQKA FLFKHFL CLFKHCP KVFKQKG GCLNKGH RCLNTRG GCLNRGN QCLNNQA TCLNITE TCLNTTL LCLNTLL IVFKHIE PCLNTPY PLFKQPY SVFKHSE ECLNTEH ELFKQEH FCLNNFH FLFKHFH HLFKHHY TCLNMTD KVFKQKD LVFKQLG ICLNNIY ILFKHIY YCLNIYD YVFKHYD TMFKQTK YCLNIYQ QVFKHQG GCLNKGK KCLNRKP PCLNKPV PVFKQPV MFKHVNP MLFKQMP PVFKQPD FMFKHFT FVFKHFE LLFKQLL FCLNNFL LLFKHLH QLFKHQL LMFKQLT TCLNMTH

RTd

MuLVa

aab

Insertc

RTd

⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫹ ⫹ ⫹ ⫺ ⫹ ⫹ ⫹ ⫹ ⫹ ⫹ ⫹ ⫹ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺

4797 4821 4823 4832 4845 4851 4854 4856 4865 4875 4895 4902 4911 4917 4923 4934 4947 4953 4967 4998 5000 5009 5010 5012 5013 5018 5025 5045 5049 5057 5064 5132 5138 5165 5174 5175 5189 5217 5243 5244 5250 5253 5255 5256 5261 5273 5274 5276 5279 5282 5289 5291 5292 5295 5298 5300 5309 5310 5327 5330 5331 5354 5355 5390 5435 5450 5465 5487 5535 5574 5634 5654 5673 5742

IN/L62 IN/L70 IN/L71 IN/L74 IN/Y78 IN/M80 IN/L81 IN/L82 IN/R85 IN/K88 IN/K95 IN/C97 IN/V100 IN/A102 IN/K104 IN/K108 IN/R112 IN/R114 IN/R119 IN/I129 IN/K130 IN/L133 IN/L133 IN/Y134 IN/Y134 IN/Y136 IN/Y138 IN/T145 IN/F146 IN/W149 IN/E151 IN/R174 IN/G176 IN/N185 IN/A188 IN/A188 IN/V193 IN/G202 IN/Y211 IN/Y211 IN/P213 IN/Q214 IN/S215 IN/S215 IN/G217 IN/R221 IN/R221 IN/M222 IN/N223 IN/R224 IN/I226 IN/K227 IN/K227 IN/E228 IN/T229 IN/L230 IN/L233 IN/L233 IN/S239 IN/R240 IN/R240 IN/L248 IN/L248 IN/H260 IN/L275 IN/D280 IN/R285 IN/L292 IN/R308 IN/V322 IN/K341 IN/K348 IN/L354 IN/A377

LMFKHLS LLFKHLL LCLNILE SCLNRSH YLFKHYY MLFKHML LMFKQLN NCLNMNR RCLNNRT KMFKHKN KCLNSKA CVFKHCA VMFKQVN AMFKHAS KLFKHKS KCLNIKQ RVFKHRV RVFKHRG GCLNNGT IMFKQIK KCLNIKP LCLNRLY LLFKQLY YCLNMYG YVFKQYG YCLNSYK YLFKQYL TCLNNTF FLFKHFS WCLNSWI EVFKQEA RCLNTRF GCLNIGM NCLNNNG ACLNTAF AFFKHAF VCLNKVS GMFKQGI YCLNTYR YMFKQYR PLFKQPQ QMFKHQS SCLNKSS SLFKQSS GCLNTGQ RCLNKRM RMFKQRM MCLNRMN NCLNMNR RCLNNRT IMFKHIK KCLNIKE KVFKHKE EMFKQET TLFKQTL LCLNTLT LCLNKLT LMFKQLT SCLNSSR RCLNTRD RVFKHRD LCLNTLA LVFKHLA HCLNTHG LCLNTLV DCLNTDP RCLNTRV LLFKHLQ RLFKQRP VCLNTVV KMFKHKN KCLNRKG LLFKHLL AVFKQAA

⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫹ ⫹ ⫹ ⫺ ⫺ ⫺ ⫺ ⫺ ⫹

a

Position of insertion, based on the vRNA sequence (82). Amino acid position within RT or IN, as indicated, N-terminal to the insertion. Sequences of the 5-aa inserts are shown in bold. The first and last amino acids are encoded by the WT pol gene product. d ⫹, positive, or ⫺, negative, for reverse transcriptase activity released into the media after transient introduction of the MuLV provirus in D17/pJET cells (see Materials and Methods). b c

9501

9502

PUGLIA ET AL.

J. VIROL.

TABLE 2. Summary of terminations MuLV

3730 4099 4159 4312 4318 4363 4369 4387 4411 4462 4474 4483 4544 4627 4672 4687 4726 4867 4876 4999 5005 5011 5020 5104 5257 5347 5458 5554 5743 5764

a

aa

b

RT/A377 RT/A500 RT/T520 RT/K571 RT/A573 RT/F588 RT/T590 RT/E596 RT/L604 RT/L621 RT/F625 RT/K628 RT/G648 IN/S5 IN/K20 IN/L25 IN/Y38 IN/R85 IN/K88 IN/I129 IN/P131 IN/L133 IN/Y136 IN/T164 IN/S215 IN/L245 IN/D282 IN/Y314 IN/A377 IN/P384

Insert

c

ANV*TAK ADV*TAE TCV*TTW KIV*TKM ADV*TAE FAV*TFA TAV*TTA EIV*TEI LTV*TLT LNV*TLK FLV*TFL KSV*TKR GNV*TGN SPV*TSP KDV*TKD LGV*TLG YHV*TYQ RTV*TRT KNV*TKN INV*TIK PGV*TPG LYV*TLY YNV*TYK TNV*TTK SSV*TSS LLV*TLL DIV*TNM YHV*TYQ AAV*TAA PSV*TPS

RT

d

⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫹

a

Position of insertion, based on the vRNA sequence (82). Amino acid position within RT or IN, as indicated, N-terminal to the insertion. Sequences of the amino acid inserts are indicated in boldface type. Asterisks indicate the stop termination codons. The first and last amino acids are encoded by the WT pol gene product. d ⫹, positive, or ⫺, negative, for reverse transcriptase activity released into the media after transient introduction of the MuLV provirus in D17/pJET cells (see Materials and Methods). b c

domain (amino acids [aa] 1 to 105), 56 to the catalytic core (aa 106 to 286), and 7 to the C-terminal domain (aa 287 to 408). Sequencing of the HIV-1 IN library showed that the insertions were distributed throughout the HIV-1 IN gene (Fig. 3 and Table 3). The insertion sites, however, were not randomly distributed, with clustering of insertions within the C-terminal domain. Of the 111 insertions, 2 were within the N-terminal domain, 35 were within the catalytic core, and 74 were within the C-terminal domain. Of these, 56 clones had unique insertion positions and the correct sequence. In vivo analysis of individual M-MuLV isolates. The viability of individual viral constructs was tested for the passage of transiently expressed virus in tissue culture. Three series of viral constructs were analyzed. The first consisted of a random mixture of both in-frame and terminating codon insertions spanning the complete target sequence. Since the pol gene is expressed as a precursor protein containing protease, reverse transcriptase, and integrase, it was predicted that termination codons within RT would be lethal, resulting in loss of MuLV IN protein. Twenty-nine termination codon insertions were included for analysis. The second series contained linker insertion mutations that mapped to the MuLV IN protein plus one termination codon at the C terminus of IN, and the third

series included mutations within the MuLV RT connection and RNase H domains. Plasmid DNA of the individual constructs from the final pNCA-C-XN-SU8 insertion library was transiently introduced into D17/pJET cells in the presence of DEAE-dextran. On days of confluence, cells were screened for the release of reverse transcriptase into the supernatant. Figure 4 is an autoradiograph of one RT assay performed on day 16. The insertions were arranged within the 96-well plate in a linear order from the 5⬘ end to the 3⬘ end of the pol gene. Rows A to G contained 84 in-frame insertions; a single termination insertion within the C terminus of IN (in5743. IH) was included. The positive controls, pNCA-C-XN-Su8 (H11) and pNCA-C (H12) are clearly positive for RT activity at this time point. Quite remarkably, two regions of viable insertions are readily detected in this series. The first 10 isolates are all viable. These correspond with the linker insertions initiating at the extreme C terminus of RNase H and spanning into the N terminus of MuLV IN. These include in4583, in4603, in4607, in4628, in4629, in4638, in4640, in4641, in4647, and in4650. This indicates that insertions within the first 14 amino acids of MuLV IN are tolerated as well as the terminal 9 aa of MuLV RNase H (Fig. 2). Within this region, one insertion from a separate assay series was found not to be viable (Table 1, in4614). This insertion is within the protease recognition sequence and results in the substitution of the P2⬘ and P3⬘ position from STLL/ IEN to STLL/IVF. Of considerable interest are the three consecutive insertions (G7 to 9) (Fig. 4) consisting of in5450, in5465, and in5487. These three insertions span a 12-aa region between the core and C-terminal domain. This region has not been previously explored in mutational analyses of IN. Using the homology of MuLV and HIV-1 IN defined by McClure et al. (62), the equivalent region of HIV-1 IN was identified. Figure 5A shows the mapping of this region onto the two-domain structure (IEX4) of the HIV-1 IN core-C terminus (18). The region of HIV-1 IN homologous to MuLV IN spanning in5450 to in5487 is shown, as are the HIV-1 IN core domain and C terminus. The region connecting the C terminus and core consists of an extended alpha-helix, containing a central bend. The homologous insertion-tolerant region maps within this alpha-helical domain, centered at the bend. The net result of the 5-aminoacid insertion would be to lengthen the distance between the two domains and/or increase the discontinuity of the extended alpha-helix. A third region of the MuLV IN protein was found nonessential. This mapped to the extreme C terminus of MuLV IN. Interestingly, insertion in4742 resulting in the in-frame insertion of AVFKAAA (insertion shown in boldface type) was viable (Fig. 2), whereas the terminator insertion in5743 encoding AA*TAA was nonviable (Fig. 2). These studies more finely define the nonessential region of the C terminus of MuLV IN. Linker insertions and truncational studies that mapped three amino acids upstream were previously reported to be nonviable (Fig. 2), whereas in-frame insertions and truncations mapping 2 amino acids downstream were viable (Fig. 2) (74). Of the terminator insertions, only one, in5764, was viable. This mapped within the region previously identified to be nonessential (74). The 16 viable viruses identified in this analysis appeared with

VOL. 80, 2006

RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION

9503

FIG. 3. Mutation functional map of insertions of HIV IN. Positions of each insertion (indicated by arrow) and their activity (using different color scheme) relative to disintegration (circle) and strand transfer activity (square) are shown in the alignment of HIV-1 and MuLV IN protein. Amino acid sequences alignment of MuLV and HIV-1 IN was based on the method of Johnson et al. (44). Dots indicate alignment gap/insertion. Numbering from the N terminus of MuLV IN includes alignment gaps. The GenBank accession number for MuLV IN sequences is NC 001501. Known structural elements of HIV-1 IN, determined by crystallography of recombinant HIV-1 IN (18, 87), are also shown (bold horizontal lines) above the respective homologous segments. Their PDB accession numbers are 1K6Y and 1ex4, respectively. Core structural elements are labeled with a prime (⬘); C-terminal elements are labeled with a double prime (⬙). HHCC and DDE motifs are highlighted by red color. Activity is based on the WT activity set to 100%: ⫺, 0%; ⫾, 0 to 5%; ⫹, 6 to 35%; ⫹⫹, 36 to 75%; ⫹⫹⫹, 76 to 100%.

a time course identical to that of the parental pNCA-C-XN virus. The RT⫹ virus passaged in this study was isolated and utilized to isolate the unintegrated viral DNA by the method of Hirt (41). The terminal two-thirds of the pol gene was PCR amplified from the viral DNA. This PCR product was sequenced in its entirety. All of the viral constructs maintained the linker insertion sequence encoding the PmeI site. No additional second-site mutations within MuLV IN were identified. Mutations within MuLV RT-connection-RNase H. Of the 40 linker insertions within the C-terminal half of MuLV RT encoding connection and RNase H, only the three extreme Cterminal in-frame insertions were viable (in4583, in4603, and in4607). These define the sequences at the MuLV RT-IN junction. These results are surprising, as the preliminary X-ray structure of the MuLV RT contains unstructured or flexible loops in several regions within connection-RNase H. Figure 6 shows a molecular model of MuLV RT based upon the structures 1RW3 and MuLV RNase H domain (54). Gaps in the structure were reconstructed and are indicated, including the region between amino acids 327 to 334 (thumb), 475 to 504 (joining connection with the RNase H domain), 592 to 603 (RNase H), and 633 to 642 (RNase H). The positions of the linker insertions are mapped onto this MuLV RT model (Fig. 6). Although several of these insertions map within structurally undefined regions, none of the inserts were viable. These nonstructured regions display stringent requirements for correct replication of the virus in vivo.

Expression of MuLV IN C-terminal domain. The positioning of the viable insertions between nucleotides 5450 to 5487 is indicative of a domain boundary between the core and C terminus. To confirm this through biochemical means, a deletion analysis of the MuLV IN protein was performed to identify a stably expressed C-terminal domain. Of the 93 deletion constructs generated, 20 with deletions beyond position 5137 of MuLV (82) were further analyzed for protein expression. It was predicted that one in three N-terminal directional deletions would result in an in-frame deletion. However, only one construct reproducibly yielded an abundant, stable MuLV Cterminal IN protein. Figure 7 shows the screening of five individual constructs, where isolate 77 expressed a single 17-kDa protein. DNA sequence analysis identified the N terminus of this protein to be TNSP, corresponding with Thr287 of MuLV IN (marked in Fig. 2). This maps to nucleotide 5471 (82) in the center of the region defined by the linker insertion analysis. Additional studies indicated that IN 287 to 408 construct (no. 77) could be purified from soluble E. coli extracts by nickelaffinity chromatography (data not shown). These results confirm, using biochemical data, that the boundary between the core and C-terminal domain lies within this region. In vitro analysis of individual HIV-1 IN mutants. Fifty-six insertional mutant proteins were expressed, purified, and assessed for strand transfer and disintegration activity (Table 3 and Fig. 3). These activities varied and are summarized below based on the position of the mutation within the N-terminal, core and C terminus of the HIV-1 IN protein.

9504

PUGLIA ET AL.

J. VIROL.

TABLE 3. Summary of HIV-1 IN insertions Insertion positiona

Insertion amino acid sequenceb

N272L N272L D552C D552C P582G D642C I732L Y832I E962T T1152D D1162N T1252V A1282A W1312W I1352K N1442P S1472Q G1492V V1652R D1672Q A1692E E1702H K1732T T1742A M1782A A1962G I2002V V2012D A2052T D2072I E2122L R2282D S2302R A2392K K2402L W2432K G2452E G2472A A2482V V2502I V2502I I2512Q D2532N S2552D V2592V I2682R R2692D G2722K K2732Q D2792C V2812A R2842D Q2852D E2872D D2882

LSLVHILRPQDVYKRQDFN PVSCTHLAAARCVQETDFN LSLVHILRPQDVYKRQQVD CLLYTSCGRKMCTRDRQVD VSCTHLAAARCVQETDCSP SVSCTHLAAARCVQETELD LSLVHILRPQDVYKRQKVI TVSCTHLAAARCVQETGGY LSLVHILRPQDVYKRQGQE CLLYTSCGRKMCTRDRVHT TVSCTHLAAARCVQETDTD AVSCTHLAAARCVQETGTT CLLYTSCGRKMCTRDRVKA LSLVHILRPQDVYKRQACW CLLYTSCGRKMCTRDRAGI LSLVHILRPQDVYKRQPYN CLLYTSCGRKMCTRDSPQS AVSCTHLAAARCVQETGQG LSLVHILRPQDVYKRQGQV CLLYTSCGRKMCTRDRVRD CLLYTSCGRKMCTRDRDQA PVSCTHLAAARCVQETEAE LSLVHILRPQDVYKRQHLK CLLYTSCGRKMCTRDSLKT LSLVHILRPQDVYKRQVQM CLLYTSCGRKMCTRDRYSA CLLYTSCGRKMCTRDRERI AVSCTHLAAARCVQETGIV VSCTHLAAARCVQETDIIA LSLVHILRPQDVYKRQATD SVSCTHLAAARCVQETAKE AVSCTHLAAARCVQETDYR SCLLYTSCGRKMCTRDRDS TVSCTHLAAARCVQETGPA PVSCTHLAAARCVQETAAK NCLLYTSCGRKMCTRDSLW CLLYTSCGRKMCTRDSWG LSLVHILRPQDVYKRQGEG CLLYTSCGRKMCTRDSEGA CLLYTSCGRKMCTRDRAVV LSLVHILRPQDVYKRQAVV PVSCTHLAAARCVQETVVI LSLVHILRPQDVYKRQIQD CLLYTSCGRKMCTRDRDNS CLLYTSCGRKMCTRDSIKV SCLLYTSCGRKMCTRDRII AVSCTHLAAARCVQETVIR CLLYTSCGRKMCTRDRDYG PVSCTHLAAARCVQETDGK LSLVHILRPQDVYKRQGDD CLLYTSCGRKMCTRDSDCV PVSCTHLAAARCVQETASR AVSCTHLAAARCVQETGRQ AVSCTHLAAARCVQETEDE CLLYTSCGRKMCTRDRDED

DSc,e

STd,e

†† ††† ††† ††† ††† ⫾ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ †† † ⫾ ⫺ ⫺ † ⫺ † ⫾ † ⫾ ⫾ † † † † †† † † ††† †† †† †† ††† †† †† †† ††† ††† ††† ††† ††† †† ††† ††† ††† ††† ††† ††† ††† †††

⫾ ⫾ †† †† ††† ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ †† ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ††† ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ †† †† †† †† †† †† ††† ††† ††† †††

a Positions are based on the protein sequence of HIV-1 IN. Arrows mark the insertion between the two amino acids indicated. All isolates are independent insertions. b Sequence of the 19-aa insertion. c Disintegration assay. d Strand-transfer assay. e Activity is based on WT IN activity (set at 100%). Symbols: ⫺, 0%; ⫾, 0 to 5%; †, 6 to 35%; ††, 36 to 75%; †††, 76 to 100%.

HIV-1 IN N-terminal domain mutants. The HIV-1 N-terminal domain is made of a three-helix bundle (Fig. 5B). Two insertions were identified at N272L, located at one end of the helix bundle in the loop connecting the second (␣2) and third

(␣3) helices. The two mutants were at the same position but had different amino acid sequence insertions. Both of the mutants retained full disintegration activity; however, integration activity was barely detectable. Two additional insertions, D552C and P582G, fall into the hinge region between the HIV-1 HHCC and core domains. In the two-domain crystal structure, this connecting region (residues 47 to 55) is disordered in all four monomers (87). These two insertions retained full disintegration activity and had moderate to full integration activity. HIV-1 IN core mutants. In the HIV-1 core domain (aa 50 to 186) (12), all insertions resulted in disruption of strand transfer activity. The requirements for strand transfer are more stringent than disintegration, and three regions that displayed low levels of disintegration were identified. These include three insertions (I1352K, N1442P, S1472Q) located between ␣3 and ␣4 and a group of five mutations (D1672Q through M1782A) from the end of ␣4 into ␣5. Interestingly, six insertions that were distributed within ␣6⬘ (A196 to E212) showed a gradient of increasing disintegration activity as one moves toward the C-terminal end of the helix. Of considerable interest, insertion E2122L maintained nearly full disintegration and integration activity. E2122L is within the region connecting the C terminus and core, which consists of an extended alpha-helix with a bend at the center. HIV-1 IN C-terminal domain. In the HIV-1 C-terminal domain, four different regions of activity were identified, and the overall activity of each region increased toward the C terminus. In the first region, between ␤1⬙ and ␤2⬙, two insertions were identified (R2282D and S2302R) with barely detectable disintegration and no integration activity. The second region, which comprises ␤2⬙ through ␤4⬙ revealed 10 insertion sites that retained a higher level of disintegration than the first region but exhibited no integration activity. Mutant G2472A, which is just before ␤3⬙, was the exception, as it retained full integration and disintegration activity. Interestingly, two insertions, which are right before and after G247, had no integration activity and were decreased in disintegration activity. The third region, which is after ␤5⬙ (from I268 to V281) had similar levels of activity in disintegration and retained moderate integration activity compared to wild-type HIV-1 IN. The fourth

FIG. 4. RT assay. RT assay of 85 individual isolates 16 days after transfection into D17/pJET cells (see Materials and Methods). RTpositive constructs are as follows: A1, in4583-15; A2, in4603-15; A3, in4607-15; A4, in4628-15; A5, in4629-15; A6, in4638-15; A7, in4640-15; A8, in4641-15; A9, in4647-15, A10, in4650-15; G7, in5450-15; G8, in5465-15; G9, in5487-15; H11, XN, parental vector pNCA-C-XN-SU8 (positive control); H12, WT, full-length pNCA-C M-MuLV proviral vector.

VOL. 80, 2006

RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION

9505

FIG. 5. (A) MuLV viable domain mapped onto the HIV-1 core-C terminus structure (1EX4). The 14-amino-acid region in MuLV IN spanning insertions in5450-15 through in5487-15 (DPDMTRVTNSPSLQ) was tolerant of 5 amino acid insertions in vivo. This region corresponds to the HIV-1 IN sequence IATDIQFKELQKQI (44), which is highlighted in red (A204 to I217 of the A molecule in 1EX4 is taken from the two-domain structure of the HIV-1 core-C terminus [18]). The HIV-1 core domain is colored blue; the C terminus is yellow. The C terminus ends at amino acid 271. The figure was generated in MOLSCRIPT V 2.0 (49). (B) A three-dimensional structural model of the HIV-1 monomer (aa 1 to 288). The locations of the insertion mutations and their subsequent effects on disintegration and strand transfer activity are shown using the color scheme corresponding to Fig. 3. Amino acid numbering within HIV-1 IN is shown in white. The large spheres denote disintegration activity and the widened colored linear portions denote strand transfer activity.

region, which comprised insertions at R284 to the C terminus, retained full disintegration and integration activity. Overall summary of HIV and MuLV IN analysis. In summary, four regions retained full integration activity in this complementary in vivo and in vitro study of M-MuLV and HIV-1 IN, respectively. These correspond to the first 14 amino acids of IN (MuLV), the hinge region connecting the N-terminal and core domains (HIV), the region within the ␣6⬘ helix connecting the core and C-terminal domains (MuLV and HIV), and the extreme C terminus of the IN (MuLV and HIV).

DISCUSSION The retroviral genome has evolved to encode multifunctional proteins expressed within polyproteins. These compact viral particles must assemble, infect, replicate, and integrate the viral genome using limited enzymatic functions. In this study, we have used two parallel transposon-based mutational systems (Tn5 and Tn7), differing in the size of the insertion, to create functional maps of the M-MuLV and HIV-1 IN proteins. Studies in M-MuLV extend the region within the 3⬘

FIG. 6. Position of the linker insertions within M-MuLV RT-RNase H. The figure shows two views, differing by 180°, of the molecular model of the M-MuLV RT. The individual subdomains are colored as follows: finger-palm, salmon; thumb, pink; connection, blue; and RNase H, green. The catalytic triad (D524, E562, and D583) is shown as space-filled orange spheres. The loop structures introduced into structurally undefined regions are yellow. The position of each individual linker insertion is shown in red. Amino acid positions within MuLV RT are shown in black. The figure was generated in MOLSCRIPT V 2.0.

9506

PUGLIA ET AL.

FIG. 7. Expression of a stable C-terminal IN domain. Whole-cell E. coli extracts of individual deletion constructs of the MuLV IN N terminus. Extracts were subjected to sodium dodecyl sulfate-polyacrylamide gel electrophoresis, followed by Coomassie blue staining. Five individual colonies are shown. Lane 1, isolate 1; lane 2, isolate 3; lane 3, isolate 74; lane 5, isolate 77; lane 6, isolate 35. The positions of the protein standards are indicated at the left. The arrow marks the stable C-terminal IN287-408 MuLV protein product of approximately 17 kDa (isolate 77).

terminus of the pol gene to include the connection and RNase H domains of RT. Analysis of 178 mutations in MuLV and 57 mutations in HIV-1 IN indicate limited nonessential regions tolerant of amino acid insertions. These regions localize to protein and domain boundaries, between the RT and IN, between the N terminus and the core of IN, at the C terminus of IN, and between the core and C terminus of IN. Although these results are nonsaturating, the data indicate functional conservation even within regions shown to be disordered within crystallographic structures. Several systems have been developed for “genetic footprinting” of a gene based upon the generation of a library of random inserts and screening those pools for selectable phenotypes. The systems are based on bacterial transposons, including Tn5, Tn7, and Mu, or viruses (5, 42, 73, 78, 83). These systems have the potential to screen the entire population of insertions before and after a selection process through positional mapping of the inserts by PCR. The two systems utilized in this study characterized individual isolates rather than the population as a whole. For the in vitro studies, selection for IN function is complex, and a high-throughput approach was developed. For the Tn7 system, the unique sequence of the insertion is limited to a 10-nucleotide region, which is insufficient to direct a PCR primer to specifically hybridize. Mapping insertions using a series of nested PCR products followed by PmeI digestion proved difficult, as the PCR products were not efficiently cleaved by PmeI. Due to this technical difficulty, this study focused on analysis of individual isolates whose insertion sites could be predetermined prior to introduction into tissue culture for selection. This approach eliminated several additional complications, including limiting the number of termination insertions analyzed as well as decreasing the number of false positives resulting from complementation and/or recombination of mixed infections. Approximately 750 isolates were sequenced to identify the 178 unique isolates utilized in these studies. Within this population of 750,

J. VIROL.

duplicates were identified, indicating that the population analyzed was representative of the library. The domain boundaries defined in these studies are in general agreement with previous biochemical studies. For the MuLV RT, deletion studies in E. coli which identified a stable and active MuLV RT (pB6B15.23) (77) resulted in the truncation of the seven terminal amino acids of RT/RNase H. This truncation is within the 9-aa region at the C terminus of RT, which was tolerant of linker insertions. Similarly, MuLV IN deletion constructs (p135-1) (76), which lacked the N-terminal 8 amino acids, bound DNA similar to a full-length IN construct. The results of these studies indicate that the N-terminal 13 amino acids of MuLV IN tolerated insertions in vivo. The one exception was the insertion that altered the protease recognition site (in4614). It should be noted that the N terminus of MuLV IN encodes 45 amino acids not conserved in either HIV or avian sarcoma virus-related INs (92). The region tolerant of 5-aa insertions at the N terminus of MuLV IN maps within this nonconserved region. Previous studies indicated that the MuLV IN C terminus could be truncated by 28 amino acids and maintain virus viability (74). These studies refine this region, demonstrating that truncation of 31 aa resulted in nonviable virus. Interestingly, the in-frame linker insertion at this coding region was viable, whereas insertions 3 amino acids upstream were not. These boundaries for IN function may assist in expressing minimized IN constructs for crystallization studies. In the HIV-1 IN N terminus, only two insertions, at N27, were obtained. These insertions retained disintegration but had barely detectable levels of integration. Relevant to our mutants, it has been shown that a monoclonal antibody which interacts with amino acids 27 to 29 destabilizes the N-terminal helical bundle and decreases 3⬘ processing and transfer activities of HIV-1 IN in vitro (95). In addition, it is known that deletion of the N-terminal 39 aa abolishes integration activity (25). In the core domain of HIV-1 IN, using an extensive panel of mutants, we show that integration was abolished and disintegration was diminished with insertions between D642C and E212L inclusive. HIV-1 IN disintegration requires only the core domain (residues 50 to 186) (12). Importantly, this set of mutants demonstrates the compactness of IN and underscores the complexity of intramolecular and intermolecular interactions that IN must maintain during the integration process. In our studies, it was anticipated that some of the loop regions within the core might be more amenable to mutation given the solvent accessibility shown in the monomer and dimer structures, such as the loops between the N-terminal ␣3 and the core ␤1⬘, the core ␤5⬘ and ␣4⬘, and the core ␣5⬘ and ␣6⬘. While we did not expect integration activity per se, we expected disintegration, since this activity may not require a higherorder complex. However, in our studies, insertions located at the core loops ␤5⬘ and ␣4⬘ and ␣4⬘ and ␣5⬘, all lost integration activity and had no or barely detectable disintegration activity. These two regions retaining minimal disintegration activity correspond to an extended loop (residues 137 to 156) and a flanking region (residues 161 to 173), which are protected from proteolysis upon metal binding (2, 3). Substitution of Gly140 and Gly149 with more constrained Ala residues impaired catalysis of HIV-1 IN, indicating a requirement for some degree of conformational flexibility for catalytic activity (37). These

VOL. 80, 2006

RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION

two loops are believed to undergo significant movement to aid in the coordination of a metal ion by the catalytic triad (2, 3). Interestingly, residues 168 to 171 are also reported to contact the host factor LEDGF (20). Previously, we and others had shown that the C terminus of HIV-1 and M-MuLV IN can tolerate large C-terminal deletions and, similar to the core, can still retain considerable disintegration activity (12, 25, 46, 61). Herein, we show four different regions in the HIV-1 IN C terminus with a gradient of increasing activity as one moves toward the carboxyl terminus. Insertional mutants after amino acids 239 in ␤2⬙ and in the loop between ␤3⬙ and ␤4⬙ lost strand transfer activity while exhibiting full or moderate levels of disintegration activity. G2472A was an exception, as it retained full integration and disintegration activity. Interestingly, the insertions in ␤2⬙ and ␤3⬙, positioned before and after G247, had no integration activity and low disintegration activity. The context of G247 differed within two molecular models of an HIV IN tetramer (72, 87). In contrast to the Wang tetramer model (72, 87), a 19 aa insertion in the Podtelezhnikov et al. model (72, 87) could interfere with the binding of a putative LTR and sterically clash with the loop region (between ␤1⬘ and ␤2⬘) of another core molecule. Our results are consistent with this tetramer model. Insertions after I268 and before Q284 had similar levels of activity in disintegration and retained moderate integration activity compared to wild-type IN. The terminal region, which comprised insertions after R284, retained full integration and disintegration activity. It is of interest that, although functional complementation of MuLV IN was achieved in vitro using constructs that stably expressed the N-terminal zinc binding domain (MuLV IN1105) with the core-C terminus fragment (MuLV IN 106 to 404) (91), no viable linker insertion was identified in vivo at the junction of the HHCC domain and the core domain. However, in the case of the in vitro HIV-1 IN mutational study, three 19-bp insertions at two positions (D552C and P582G) were identified at the transition between the HHCC and core domain, which retain full activity in both disintegration and strand transfer activity of HIV-1 IN. The D55/C56/S57 sequence is proposed to be involved in close proximity with the HIV LTR positions 1 to 4, based on a structural tetramer model (16). Although it is possible that the linkers are substituting for natural amino acids at that position, we did not observe instances where two in-frame insertions at the same position resulted in differential effects both in MuLV and HIV-1. This might have been predicted, as the insertions frequently encode Cys, which could alter the protein folding. However, within the MuLV IN 6/11 viable insertions encoded Cys. In both the HIV-1 and MuLV IN studies, insertions at the same coding sequence were identified that behaved identically, indicating there was not a positive selection for a Cys residue to, for example, stabilize the region. For MuLV IN, this is exemplified within in4628 PCLNTPY and in4628 PLFKQPY; for HIV, the two insertions at D552C encode LSLVHILRPQDVYKR QQVD and CLLYTSCGRKMCTRDRQVD and those at V2502I encode CLLYTSCGRKMCTRDRAVV and LSLVH ILRPQDVYKRQAVV. Insights into the boundaries defining the insertion-tolerant region between the core and C terminus were obtained in these

9507

comparative studies. In M-MuLV, this region, encoding DPD MTRVTNSPSLQ, corresponds with HIV-1 IN sequence IAT DIQFKELQKQI (Fig. 5A). At the 5⬘ terminus, the closest nonviable 5-amino-acid insertion in MuLV IN is 5 aa upstream. However, the closest insertion downstream of 5487 is at 5535, 16 aa C-terminal. A more-saturated library within this region would be required. The deletion study that identified a stable C-terminal construct mapped directly within this region, supporting this as a domain boundary. The 19-amino-acid insertions within HIV-1 IN provide additional insights into these boundaries. A panel of insertional mutants within the HIV-1 IN ␣6⬘ showed a gradient of increasing disintegration activity, with E2122L active for both disintegration and integration. Insertion E2122L maps within the 12-aa region homologous to MuLV (IATDIQFKELQKQI, where EL is underlined) (18). Insertions C-terminal to the observed bend tolerated insertions of both 5 and 19 amino acids, in vitro and in vivo in the HIV-1 and MuLV IN, respectively. The 19-aa insertion D2072I maps within the region homologous to MuLV IN (IATDIQFKELQKQI, with the DI insertion site underlined) yet is not active for disintegration of strand-transfer activity. Thus, differences in the boundaries between HIV-1 and MuLV IN were identified. This may reflect the differences in the size of the insertions, where 5 amino acids are tolerated and 19 amino acids are not, or structural differences in the assembly of IN multimers. In both MuLV and HIV IN studies, the results indicate considerable flexibility in the linkage between the catalytic core and C-terminal domain, either through lengthening the distance between the two domains and/or increasing the discontinuity of the extended alpha-helix. It is not known whether the insertions into the long ␣6⬘ helix that connects the core and C terminus present a favorable condition for the virus. In the related insertional study of the Cre recombinase, insertions into the M-N linker increased DNA binding cooperativity (71). In this system, it was proposed that extending the length of the linker would lead to a smaller bend angle and thus stabilize partner Cre subunits binding to the loxP. In a similar manner, extending the distance between the core and C terminus in IN may assist in the assembly of the synaptic complex consisting of the two viral termini plus the target DNA. The arrangement of the C-terminal domain relative to catalytic core differs among HIV-1, simian immunodeficiency virus type 1, and Rous sarcoma virus IN X-ray structures (18, 19, 94). The results of the linker insertions into the MuLV RTconnection and RNase H domains were unexpected, as no viable mutations outside the extreme C terminus were identified. Figure 6 contains a molecular model of the MuLV RT, based on the structure of the MuLV RT (1RW3, 443 resides, encoding through residue 474), plus the model of the MuLV RNase H ⌬C domain (54). To assist in mapping the linker insertions, the structurally undefined and deleted regions were reconstructed into this model as tubes. These include the region within the thumb (residues 327 to 334), the region in the connection domain downstream of residue 474 through to the structurally unelucidated region within RNase H (residues 475 to 504), the ␣-C helix of RNase H, the region homologous to the His loop (23) in HIV-1 RNase H (residues H634 to H642), and residues 592 to 603 of RNase H. The function of the large structurally undefined region between residues 474 to 504 is of

9508

PUGLIA ET AL.

interest. Domain mapping using in vitro RT activities (85, 86) mapped the N terminus of RNase H to position 4542 of the DNA provirus (4093 of the viral RNA) (82). Therefore, in4100 localizes within the structurally undefined N terminus of RNase H and in4113 at the beginning of the RNase H structured region. The in vivo data presented in this paper correlate with the in vitro data, indicating that the N terminus of RNase H, despite being structurally undefined, is essential for RNase H activity. In addition, in4023 maps within the structurally undefined region of the RT connection domain. By molecular modeling, residues 475 to 504 were placed on the opposite face of the RT molecule from where the nucleic acid binding site lies, and it was therefore believed that it may reflect a nonessential region of RT. However, in4023 was found to be nonviable in vivo. Interestingly, insertions within this region (M38, H7, and H2) (85) were found to be temperature sensitive for RT activity in vitro. Conformational changes within this region may be required for switching between the polymerase and RNase H activities or to allow steric access to the active sites. Similarly, in Cre, flexible loops were identified which were not tolerant to insertions, indicating their role in Cre function, possibly protein assembly or DNA binding. The function of these structurally uncharacterized loops in both RT and IN need to be defined. The intrinsic flexibility of both these enzymes may reflect the multifunctional activities and staged assembly steps required to specifically bind and recognize their cognate substrates (24, 51). One aim of this mutational analysis was to identify sites within the IN protein that may tolerate small insertional tags whose function may alter the target site selection of the viral integrases. Protein domains and tags have been inserted both into the N terminus (11, 48, 84) and C terminus (13, 36, 48, 80, 81, 84) of retroviral IN constructs. The identification of the region between the N terminus, the core, and C terminus of IN as functional in the presence of a variety of linker insertions strongly suggests that this region could serve as a third potential insertion site for short tags within the IN protein. The ability of this site to function in alternative protein-protein or protein-DNA interactions depends on its accessibility within the synaptic complex. Further biochemical and structural studies are required to address this question.

J. VIROL.

7. 8. 9.

10. 11. 12.

13. 14. 15.

16.

17. 18.

19.

20. 21. 22. 23.

ACKNOWLEDGMENTS This work was supported by NIH grants RO1 GM070837 issued to M.J.R. and GM07666-24 to C.B.J. We thank Jennifer Jones and Naadira McClean for their assistance. REFERENCES 1. Albritton, L. M., L. Tweng, D. Scadden, and J. M. Cunningham. 1989. A putative murine retrovirus receptor gene encodes a multiple membranespanning protein and confers susceptibility to virus infection. Cell 57:659– 666. 2. Asante-Appiah, E., S. H. Seeholzer, and A. M. Skalka. 1998. Structural determinants of metal-induced conformational changes in HIV-1 integrase. J. Biol. Chem. 273:35078–35087. 3. Asante-Appiah, E., and A. Skalka. 1997. A metal-induced conformational change and activation of HIV-1 integrase. J. Biol. Chem. 272:16196–16205. 4. Auerbach, M., C. Shu, A. Kaplan, and I. Singh. 2003. Functional characterization of a portion of the Moloney murine leukemia virus gag gene by genetic footprinting. Proc. Natl. Acad. Sci. USA 100:11929–11930. 5. Biery, M. C., F. J. Stewart, A. E. Stellwagen, E. A. Raleigh, and N. L. Craig. 2000. A simple in vitro Tn7-based transposition system with low target site selectivity for genome and gene analysis. Nucleic Acids Res. 28:1067–1077. 6. Bowerman, B., P. O. Brown, J. M. Bishop, and H. E. Varmus. 1989. A

24. 25. 26. 27. 28. 29. 30. 31.

nucleoprotein complex mediates the integration of retroviral DNA. Genes Dev. 3:469–478. Bujacz, G., J. Alexandratos, Z. Qing, C. Clement-Mella, and A. Wlodawer. 1996. The catalytic domain of human immunodeficiency virus integrase: ordered active site in the F185H mutant. FEBS Lett. 398:175–178. Bujacz, G., M. Jaskolski, J. Alexandratos, A. Wlodawer, G. Merkel, R. A. Katz, and A. M. Skalka. 1995. High-resolution structure of the catalytic domain of avian sarcoma virus integrase. J. Mol. Biol. 253:333–346. Bukrinsky, M. I., N. Sharova, T. L. McDonald, T. Pushkarskaya, W. G. Tarpley, and M. Stevenson. 1993. Association of integrase, matrix, and reverse transcriptase antigens of human immunodeficiency virus type 1 with viral nucleic acids following acute infection. Proc. Natl. Acad. Sci. USA 90:6125–6129. Bushman, F. 1995. Targeting retroviral integration. Science 267:1443–1444. Bushman, F. D. 1994. Tethering human immunodeficiency virus 1 integrase to a DNA site directs integration to nearby sequences. Proc. Natl. Acad. Sci. USA 91:9233–9237. Bushman, F. D., A. Engelman, I. Palmer, P. Wingfield, and R. Craigie. 1993. Domains of the integrase protein of human immunodeficiency virus type 1 responsible for polynucleotidyl transfer and zinc binding. Proc. Natl. Acad. Sci. USA 90:3428–3432. Bushman, F. D., and M. D. Miller. 1997. Tethering Human immunodeficiency virus type 1 preintegration complexes to target DNA promotes integration at nearby sites. J. Virol. 71:458–464. Cai, M., R. Zheng, M. Caffrey, R. Craigie, G. M. Clore, and A. M. Gronenborn. 1997. Solution structure of the N-terminal zinc binding domain of HIV-1 integrase. Nat. Struct. Biol. 4:567–577. Calmels, B., C. Ferguson, M. O. Laukkanen, R. Adler, M. Faulhaber, H.-J. Kim, S. Sellers, P. Hematti, M. Schmidt, C. von Kalle, K. Akagi, R. E. Donahue, and C. E. Dunbar. 2005. Recurrent retroviral vector integration at the MDS1-EVI1 locus in non-human primate hematopoietic cells. Blood 106:2530–2533. Chen, A., I. T. Weber, R. W. Harrison, and J. Leis. 2006. Identification of amino acids in HIV-1 and avian sarcoma virus integrase subsites required for specific recognition of the long terminal repeat ends. J. Biol. Chem. 281: 4173–4182. Chen, H., and A. Engelman. 1998. The barrier-to-autointegration protein is a host factor for HIV type 1 integration. Proc. Natl. Acad. Sci. USA 95: 15270–15274. Chen, J. C.-H., J. Krucinski, L. J. W. Miercke, J. S. Finer-Moore, A. H. Tang, A. D. Leavitt, and R. M. Stroud. 2000. Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: a model for viral DNA binding. Proc. Natl. Acad. Sci. USA 97:8233–8238. Chen, Z., Y. Yan, S. Munshi, Y. Li, J. Zugay-Murphy, B. Xu, M. Witmer, P. Felock, A. Wolfe, V. Sardana, E. A. Emini, D. Hazuda, and L. C. Kuo. 2000. X-ray structure of simian immunodeficiency virus integrase containing the core and C-terminal domain (residues 50–293)-an initial glance of the viral DNA-binding platform. J. Mol. Biol. 296:521–533. Cherepanov, P., A. Ambrosio, S. Rahman, T. Ellenberger, and A. Engelman. 2005. Structural basis for the recognition between HIV-1 integrase and transcriptional coactivator p75. Proc. Natl. Acad. Sci. USA 102:17308–17313. Das, D., and M. Georgiadis. 2004. The crystal structure of the monomeric reverse transcriptase from Moloney murine leukemia virus. Structure (Cambridge) 12:819–829. Dave, U. P., N. A. Jenkins, and N. G. Copeland. 2004. Gene therapy insertional mutagenesis insights. Science 303:333. Davies, J. F., III, Z. Hostomska, Z. Hostomsky, S. R. Jordan, and D. A. Matthews. 1991. Crystal structure of the ribonuclease H domain of HIV-1 reverse transcriptase. Science 252:88–95. Dayam, R., and N. Neamati. 2004. Active site binding modes of the betadiketoacids: a multi-active site approach in HIV-1 integrase inhibitor design. Bioorg. Med. Chem. 12:6371–6381. Drelich, M., R. Wilhelm, and J. Mous. 1992. Identification of amino acid residues critical for endonuclease and integration activities of HIV-1 IN protein in vitro. Virology 188:459–468. Dyda, F., A. B. Hickman, T. M. Jenkins, A. Engelman, R. Craigie, and D. R. Davies. 1994. Crystal structure of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases. Science 266:1981–1986. Engelman, A. 1999. In vivo analysis of retroviral integrase structure and function. Adv. Virus Res. 52:411–426. Farnet, C. M., and W. A. Hazeltine. 1991. Determination of viral proteins present in human immunodeficiency virus type 1 preintegration complex. J. Virol. 65:1910–1915. Fassati, A., and S. P. Goff. 2001. Characterization of intracellular reverse transcription complexes of human immunodeficiency virus type 1. J. Virol. 75:3626–3635. Fassati, A., and S. P. Goff. 1999. Characterization of intracellular reverse transcription complexes of Moloney murine leukemia virus. J. Virol. 73: 8919–8925. Felkner, R. H., and M. J. Roth. 1992. Mutational analysis of N-linked glycosylation sites of the SU protein of Moloney murine leukemia virus. J. Virol. 66:4258–4264.

VOL. 80, 2006

RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION

32. Goedken, E., and S. Marqusee. 2001. Co-crystal of Escherichia coli RNase HI with Mn2⫹ ions reveals two divalent metals bound in the active site. J. Biol. Chem. 276:7266–7271. 33. Goff, S. P., P. Traktman, and D. Baltimore. 1981. Isolation and properties of Moloney murine leukemia virus mutants; use of a rapid assay for release of virion reverse transcriptase. J. Virol. 38:239–248. 34. Goldgur, Y., R. Craigie, G. H. Cohen, T. Fujiwara, T. Yoshinaga, T. Fujishita, H. Sugimoto, T. Endo, H. Murai, and D. R. Davies. 1999. Structure of the HIV-1 integrase catalytic domain complexed with an inhibitor: a platform for antiviral drug design. Proc. Natl. Acad. Sci. USA 96:13040–13043. 35. Goldgur, Y., F. Dyda, A. B. Hickman, T. M. Jenkins, R. Craigie, and D. R. Davies. 1998. Three new structures of the core domain of HIV-1 integrase: an active site that binds magnesium. Proc. Natl. Acad. Sci. USA 95:9150– 9154. 36. Goulaouic, H., and S. A. Chow. 1996. Directed integration of viral DNA mediated by fusion proteins consisting of human immunodeficiency virus type 1 integrase and Escherichia coli LexA protein. J. Virol. 70:37–46. 37. Greenwald, J., V. Le, S. Butler, F. Bushman, and S. Choe. 1999. The mobility of an HIV-1 integrase active site loop is correlated with catalytic activity. Biochemistry 38:8892–8898. 38. Hacein-Bey-Abina, S., V. K. C., M. Schmidt, M. P. McCormack, N. Wulffraat, P. Leboulch, A. Lim, C. S. Osborne, R. Pawliuk, E. Morillon, R. Sorensen, A. Forster, P. Fraser, J. I. Cohen, G. de Saint Basile, I. Alexander, U. Wintergerst, T. Frebourg, A. Aurias, D. Stoppa-Lyonnet, S. Romana, I. Radford-Weiss, F. Gross, F. Valensi, E. Delabesse, E. Macintyre, F. Sigaux, J. Soulier, L. E. Leiva, M.Wissler, C. Prinz, T. H. Rabbitts, F. Le Deist, A. Fischer, and M. Cavazzana-Calvo. 2003. LMO2-associated clonal T cell proliferation in two patients after gene therapy for SCID-X1. Science 302: 415–419. 39. Hansen, M. S., and F. D. Bushman. 1997. Human immunodeficiency virus type 2 preintegration complexes: activities in vitro and response to inhibitors. J. Virol. 71:3351–3356. 40. Hehl, E. A., P. Joshi, G. V. Kalpana, and V. R. Prasad. 2004. Interaction between human immunodeficiency virus type 1 reverse transcriptase and integrase proteins. J. Virol. 78:5056–5067. 41. Hirt, B. 1967. Selective extraction of polyoma DNA from infected mouse cell cultures. J. Mol. Biol. 26:365–371. 42. Hoffman, L., J. Jendrisak, R. Meis, I. Goryshin, and S. Reznikof. 2000. Transposome insertional mutagenesis and direct sequencing of microbial genomes. Genetica 108:19–24. 43. Hyde, C. C., F. D. Bushman, T. C. Mueser, and Z.-N. Yang. 1999. Crystal structure of an active two-domain derivative of rous sarcoma virus integrase. J. Mol. Biol. 296:535–538. 44. Johnson, M. S., M. A. McClure, D. F. Feng, J. Gray, and R. F. Doolittle. 1986. Computer analysis of retroviral pol genes: assignment of enzymatic functions to specific sequences and homologies with nonviral enzymes. Proc. Natl. Acad. Sci. USA 83:7648–7652. 45. Jones, T. A., J. Y. Zou, S. W. Cowan, and M. Kjeldgaard. 1991. Improved methods of building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A. 47:110–119. 46. Jonsson, C. B., G. A. Donzella, E. Gaucan, C. M. Smith, and M. J. Roth. 1996. Functional domains of Moloney murine leukemia virus integrase defined by mutation and complementation analysis. J. Virol. 70:4585–4597. 47. Jonsson, C. B., and M. J. Roth. 1993. Role of the His-Cys finger of Moloney murine leukemia virus integrase protein in integration and disintegration. J. Virol. 67:5562–5571. 48. Katz, R. A., G. Merkel, and A. M. Skalka. 1996. Targeting of retroviral integrase by fusion to a heterologous DNA binding domain: in vitro activities and incorporation of a fusion protein into viral particles. Virology 217:178– 190. 49. Kraulis, P. J. 1991. MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr. 24:946–950. 50. Laurent, L. C., M. N. Olsen, R. A. Crowley, H. Savilahti, and P. O. Brown. 2000. Functional characterization of the human immunodeficiency virus type 1 genome by genetic footprinting. J. Virol. 74:2760–2769. 51. Lee, M. C., J. Deng, J. M. Briggs, and Y. Duan. 2005. Large scale conformational dynamics of the HIV-1 integrase core domain and its catalytic loop mutants. Biophys. J. 88:3133–3146. 52. Lee, M. S., and R. Craigie. 1998. A previously unidentified host protein protects retroviral DNA from autointegration. Proc. Natl. Acad. Sci. USA 95:1528–1533. 53. Li, L., C. M. Farnet, W. F. Anderson, and F. D. Bushman. 1998. Modulation of activity of Moloney murine leukemia virus preintegrative complexes by host factors in vitro. J. Virol. 72:2125–2131. 54. Lim, D. 2001. Functional and structural analysis of the RNaseH domain of the Moloney murine leukemia virus reverse transcriptase. Ph.D. dissertation. Columbia University, New York, N.Y. 55. Lin, C.-W., and A. Engelman. 2003. The barrier-to-autointegration factor is a component of functional human immunodeficiency virus type 1 preintegration complexes. J. Virol. 77:5030–5036. 56. Llano, M., M. Vanegas, O. Fregoso, D. Saenz, S. Chung, M. Peretz, and E. M. Poeschla. 2004. LEDGF/p75 determines cellular trafficking of diverse

57.

58.

59.

60.

61.

62.

63.

64.

65. 66.

67.

68.

69.

70.

71.

72.

73.

74. 75.

76.

77.

78.

79.

80.

81.

9509

lentiviral but not murine oncoretroviral integrase proteins and is a component of functional lentiviral preintegration complexes. J. Virol. 78:9524– 9537. Lobel, L. I., and S. P. Goff. 1984. Construction of mutants of Moloney murine leukemia virus by suppressor-linker insertion mutagenesis: positions of viable insertion mutations. Proc. Natl. Acad. Sci. USA 81:4149–4153. Lu, R., H. Z. Ghory, and A. Engelman. 2005. Genetic analyses of conserved residues in the carboxyl-terminal domain of human immunodeficiency virus type 1 integrase. J. Virol. 79:10356–10368. Lu, R., A. Limon, E. Devroe, P. A. Silver, P. Cherepanov, and A. Engelman. 2004. Class II integrase mutants with changes in putative nuclear localization signals are primarily blocked at a postnuclear entry step of human immunodeficiency virus type 1 replication. J. Virol. 78:12735–12746. Lu, R., A. Limon, H. Z. Ghory, and A. Engelman. 2005. Genetic analyses of DNA-binding mutants in the catalytic core domain of human immunodeficiency virus type 1 integrase. J. Virol. 79:2493–2505. Lutzke, R. A. P., and R. H. A. Plasterk. 1998. Structure-based mutational analysis of the C-terminal DNA-binding domain of human immunodeficiency virus type 1 integrase: critical residues for protein oligomerization and DNA binding. J. Virol. 72:4841–4848. McClure, M. A., M. S. Johnson, D.-F. Feng, and R. F. Doolittle. 1988. Sequence comparisons of retroviral proteins: relative rates of change and general phylogeny. Proc. Natl. Acad. Sci. USA 85:2469–2473. McCormack, M., A. Forster, L. Drynan, R. Pannell, and T. H. Rabbitts. 2003. The LMO2 T-cell oncogene is activated via chromosomal translocations or retroviral insertion during gene therapy but has no mandatory role in normal T-cell development. Mol. Cell. Biol. 23:9003–9013. McCutchan, J. H., and J. S. Pagano. 1968. Enhancement of the infectivity of simian virus 40 deoxyribonucleic acid with diethylaminoethyl-dextran. J. Natl. Cancer Inst. 41:351–357. Merrit, E. A., and D. J. Bacon. 1997. Raster3d: photorealistic molecular graphics. Methods Enzymol. 227:505–524. Miller, M. D., C. M. Farnet, and F. D. Bushman. 1997. Human immunodeficiency virus type 1 preintegration complexes: studies of organization and composition. J. Virol. 71:5382–5390. Nermut, M. V., and A. Fassati. 2003. Structural analyses of purified human immunodeficiency virus type 1 intracellular reverse transcription complexes. J. Virol. 77:8196–8206. O’Reilly, L., and M. J. Roth. 2000. Second-site changes affect viability of amphotropic/ecotropic chimeric enveloped murine leukemia viruses. J. Virol. 74:899–913. Oz-Gleenberg, I., O. Avidan, Y. Goldgur, A. Herschhorn, and A. Hizi. 2005. Peptides derived from the reverse transcriptase of human immunodeficiency virus type 1 as novel inhibitors of the viral integrase. J. Biol. Chem. 280: 21987–21996. Pearlman, D. A., D. A. Case, J. W. Caldwell, W. R. Ross, T. E. Cheatham III, S. DeBolt, D. Ferguson, G. Seibel, P. Kollman, and P. Amber. 1995. AMBER, a computer program for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to elucidate the structure and energies of molecules. Comput. Phys. Commun. 91:1–41. Petyuk, V., J. McDermott, M. Cook, and B. Sauer. 2004. Functional mapping of Cre recombinase by pentapeptide insertional mutagenesis. J. Biol. Chem. 279:37040–37048. Podtelezhnikov, A., K. Gao, F. Bushman, and J. McCammon. 2003. Modeling HIV-1 integrase complexes based on their hydrodynamic properties. Biopolymers 68:110–120. Quinonez, R., I. Sinha, I. R. Singh, and R. E. Sutton. 2003. Genetic footprinting of the HIV co-receptor CCR5: delineation of surface expression and viral entry determinants. Virology 307:98–115. Roth, M. J. 1991. Mutational analysis of the carboxyl terminus of the Moloney murine leukemia virus integration protein. J. Virol. 65:2141–2145. Roth, M. J., P. Schwartzberg, N. Tanese, and S. P. Goff. 1990. Analysis of mutations in the integration function of Moloney murine leukemia virus: effects on DNA binding and cutting. J. Virol. 64:4709–4717. Roth, M. J., N. Tanese, and S. P. Goff. 1988. Gene product of Moloney murine leukemia virus required for proviral integration is a DNA-binding protein. J. Mol. Biol. 203:131–139. Roth, M. J., N. Tanese, and S. P. Goff. 1985. Purification and characterization of murine retroviral reverse transcriptase expressed in Escherichia coli. J. Biol. Chem. 260:9326–9335. Rothenberg, S. M., M. N. Olsen, L. C. Laurent, R. A. Crowley, and P. O. Brown. 2001. Comprehensive mutational analysis of the Moloney murine leukemia virus envelope protein. J. Virol. 75:11851–11862. Schwartzberg, P., M. Roth, N. Tanese, and S. Goff. 1993. Analysis of a temperature-sensitive mutation affecting the integration protein of Moloney murine leukemia virus. Virology 192:673–678. Seamon, J. A., M. Adams, S. Sengupta, and M. J. Roth. 2000. Differential effects of C-terminal molecular tagged integrase on replication competent Moloney-murine leukemia virus. Virology 274:412–419. Seamon, J. A., C. Miller, K. S. Jones, and M. J. Roth. 2002. Inserting nuclear targeting signals onto a replication-competent M-MuLV affects viral export

9510

82. 83.

84.

85.

86.

87.

88.

89.

PUGLIA ET AL.

and is not sufficient for cell cycle independent infection. J. Virol. 76:8475– 8484. Shinnick, T. M., R. A. Lerner, and J. G. Sutcliffe. 1981. Nucleotide sequence of Moloney murine leukaemia virus. Nature 293:543–548. Singh, I. R., R. A. Crowley, and P. O. Brown. 1997. High-resolution functional mapping of a cloned gene by genetic footprinting. Proc. Natl. Acad. Sci. USA 94:1304–1309. Tan, W., K. Zhu, D. J. Segal, I. Carlos F. Barbas, and S. A. Chow. 2004. Fusion proteins consisting of human immunodeficiency virus type 1 integrase and the designed polydactyl zinc finger protein E2C direct integration of viral DNA into specific sites. J. Virol. 78:1301–1313. Tanese, N., and S. P. Goff. 1988. Domain structure of the Moloney murine leukemia virus reverse transcriptase: mutational analysis and separate expression of the DNA polymerase and RNase H activities. Proc. Natl. Acad. Sci. USA 85:1777–1781. Tanese, N., A. Telesnitsky, and S. P. Goff. 1991. Abortive reverse transcription by mutants of Moloney murine leukemia virus deficient in the reverse transcriptase-associated RNase H function. J. Virol. 65:4387–4397. Wang, J.-Y., H. Ling, W. Yang, and R. Craigie. 2001. Structure of a twodomain fragment of HIV-1 integrase: implications for domain organization in the intact protein. EMBO J. 20:7333–7343. Wang, T., M. Balakrishnan, and C. B. Jonsson. 1999. Major and minor groove contacts in retroviral integrase-LTR interactions. Biochemistry 38: 3624–3632. Wilhelm, M., and F.-X. Wilhelm. 2005. Role of integrase in reverse tran-

J. VIROL.

90.

91.

92. 93. 94. 95. 96.

scription of the Saccharomyces cerevisiae retrotransposon Ty1. Eukaryot. Cell 4:1057–1065. Wu, W., H. Lui, L. Xiao, J. A. Conway, E. Hehl, G. V. Kalpana, V. Prasad, and J. C. Kappes. 1999. Human immunodeficiency virus type 1 integrase protein promotes reverse transcription through specific interactions with the nucleoprotein reverse transcription complex. J. Virol. 73:2126–2135. Yang, F., O. Leon, N. J. Greenfield, and M. J. Roth. 1999. Functional interactions of the HHCC domain of Moloney murine leukemia virus integrase revealed by non-overlapping complementation and zinc dependent dimerization. J. Virol. 73:1809–1817. Yang, F., J. A. Seamon, and M. J. Roth. 2001. Mutational analysis of the N-terminus of Moloney murine leukemia virus integrase. Virology 291: 32–45. Yang, W., W. A. Hendrickson, R. J. Crouch, and Y. Satow. 1990. Structure of ribonuclease H phased at 2 Å resolution by MAD analysis of the selenomethionyl protein. Science 249:1398–1405. Yang, Z. N., T. C. Mueser, F. D. Bushman, and C. C. Hyde. 2000. Crystal structure of an active two-domain derivative of Rous Sarcoma virus integrase. J. Mol. Biol. 296:535–548. Yi, J., J. Arthur, R. Dunbrack, Jr., and A. Skalka. 2000. An inhibitory monoclonal antibody binds at the turn of the helix-turn-helix motif in the N-terminal domain of HIV-1 integrase. J. Biol. Chem. 275:38739–38748. Zhu, K., C. Dobard, and S. A. Chow. 2004. Requirement for integrase during reverse transcription of human immunodeficiency virus type 1 and the effect of cysteine mutations of integrase on its interactions with reverse transcriptase. J. Virol. 78:5045–5055.

Suggest Documents