Subclusters Based on Long Terminal Repeat Sequences in the ...

1 downloads 0 Views 225KB Size Report
JAAP GOUDSMIT,5 and GEORGIOS POLLAKIS1. ABSTRACT. While the Ethiopian HIV-1 epidemic is dominated by subtype C, two distinguishable cocirculating ...
AIDS RESEARCH AND HUMAN RETROVIRUSES Volume 19, Number 10, 2003, pp. 917–922 © Mary Ann Liebert, Inc.

Sequence Note HIV Type 1 C and C9 Subclusters Based on Long Terminal Repeat Sequences in the Ethiopian HIV Type 1 Subtype C Epidemic MICHEL P. DE BAAR,1,2 ALMAZ ABEBE,3 ALETTA KLIPHUIS, 1 GIRMA TESFAYE, 4 JAAP GOUDSMIT, 5 and GEORGIOS POLLAKIS 1

ABSTRACT While the Ethiopian HIV-1 epidemic is dominated by subtype C, two distinguishable cocirculating C genotypes have been identified based on sequences of the C2V3 envelope region. In this study we sequenced and analyzed the long terminal repeat (LTR) sequence from 22 Ethiopian HIV-1-positive individuals. The two phylogenetically distinguishable genotypes C (n 5 13) and C9 (n 5 4) are separated by significant bootstrap values. Nucleotide differences between the two groups were identified in the NF-AT, TCF-1a, and SP1 transcription factor binding sites, whereas the NF-kB and NRE-core sequences were identical between the two groups. Five isolates that could not be classified C or C9 were found to be recombinants within the LTR sequence upon bootscan analysis. Comparison of all the LTR sequences with their corresponding C2V3 envelope sequence revealed four intersubtype C/C9 recombinant isolates. Thus, the prevalence of C/C9 recombinant viruses is well over 40%. Interestingly, the C2V3 envelope sequences of all recombinant viruses belonged to the genotype C9, whereas every LTR sequence belonged to the genotype C. This result indicates that recombination between the two genotypes is unidirectional, possibly as the result of evolutionary pressure on the respective biological functions of the LTR promoter and the envelope protein.

T

of HIV-1 together with its capacity for rapid adaptation poses serious challenges to the development and application of successful chemotherapy as well as the development of an effective vaccine. The identification and surveillance of HIV-1 viruses that are newly transmitted or already circulating in distinct populations are therefore crucial, especially in areas with a high HIV prevalence. Subtype C forms by far the largest group of HIV1 isolates circulating worldwide, creating a pandemic that is spreading alarmingly fast. Thus far, a solid biological explanation for the preferential spread of subtype C has not been proHE EXTRAORDINARY HIGH GENETIC DIVERSITY

vided. Several viral functions could contribute, the three NFkB sites of the subtype C long terminal repeat (LTR) sequence have been correlated with increased replication rates,1–3 although this effect also seems to be cell-type dependant.2,4 Sociodemographic factors may also influence the more rapid spread of subtype C isolates versus other subtypes, but a significantly higher viral load has been observed in individuals infected with subtype C isolates.5 The Ethiopian HIV-1 epidemic has been dominated by subtype C viruses and it has been estimated that the epidemic originated in the early 1980s.6,7 The incidence of HIV infection in

1Department

of Human Retrovirology, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands. Amsterdam, The Netherlands. 3Ethiopian-Netherlands AIDS Research Project (ENARP) and the Ethiopian Health and Nutrition Research Institute (EHNRI), Addis Ababa, Ethiopia. 4Ethiopian Red Cross Society-National Blood Transfusion Service, Addis Ababa, Ethiopia. 5Crucell, Leiden, The Netherlands. 2Primagen,

917

918 Ethiopia has increased dramatically with a reported prevalence of 17% in 1988 up to 74% in 1998 among commercial sex workers,8,9 and from 11% in 1991 to 18% in 1997 among pregnant women.10 We have previously identified in Ethiopia the presence of two distinct subtype C subclusters (C and C9) based on phylogenetic analysis of the C2V3 envelope region. We found no association between the spread of the two virus groups or the geographic location or risk group classification of the human hosts.11 To further study the Ethiopian epidemic, we have now analyzed the LTR nucleotide sequence of 22 HIV-1- infected individuals of whom the C2V3 sequence is known (GenBank accesion numbers AF245563, AF245552, AF307298, AF307299, AF245554, AF245581, AF245565, AF245523, AF245584, AF245594, AF245530, AF245549, AF245583, AF245553, AF245538, AF245580, AF245613, AF245605, AF245601, AF245581, AF245536, and AF245602). The primers and protocols of the reverse transcription polymerase chain reactions (RT-PCRs) as well as the sequencing have been described previously in detail.12 The complete LTR region was

A

DE BAAR ET AL. amplified in two separate RT-PCR and nested PCR reactions, amplifying two overlapping DNA products. After sequence determination and alignment, phylogenetic analysis was performed using MEGA for Windows v2. The distance matrix was calculated according to the Kimura two-parameter model whereas the topology of the isolates in the phylogenetic trees was determined with the neighbor-joining model. Phylogenetic analysis revealed that there were clearly two subclusters to be identified (Fig. 1A). Both subgroups cluster with significant bootstrap values of 81% for C and 98% for C9. In contrast, clustering of the C2V3 envelope sequences was supported by a significant bootstrap value of 94% only for subcluster C9, but not for the subcluster C (Fig. 1B). We also performed a separate analysis of the U3 region of the LTR promoter, which contains most transcription factor binding sites. The U3 sequence also discriminates between the two subclusters with significant bootstrap values of 72% for C and 93% for C9 (Fig. 2). Five of the 22 isolates could not be classified in either one of the two subclusters. These isolates were not included in the phylogenetic

B

FIG. 1. LTR (A) and C2V3 envelope (B) phylogenetic tree analysis of Ethiopian HIV-1 sequences using the neighbor-joining method under the Kimura two-parameter model of the MEGA program. The two genotypes cocirculating in Ethiopia are indicated as C and C9. The primary isolates were isolated from HIV-1-infected individuals from seven Ethiopian towns, which are indicated by the first two letters of the sequence name: AA, Addis Ababa; AM, Arba Minch; AS, Assab; DD, Dire Dawa; DE, Dessie; JM, Jima; GO, Gondar. The first two digits following these codes indicate the sample collection year and the next three digits indicate the sample number. Numbers by the branches represent bootstrap values out of 100 replications and values smaller than 75 were considered nonsignificant. The LTR sequences 89UG57/AF196739 and 96ZR429/AF196737 of subtype A, 92SU64/AF196719 and SF2/K02007 of subtype B, 89ZR56/AF196722 and 95TN48/AF196727 of subtype D, together with the V3 sequences UG95454/AF000512 and 95KEB2-Q23/AF048183 of subtype A and 94UG114/AF016338 and 84ZR085/U88822 of subtype D were taken as references.

HIV-1 C AND C9 SUBCLUSTERS IN THE ETHIOPIAN EPIDEMIC

FIG. 2. Phylogenetic tree analysis performed on the U3 region (39 LTR) of the sequences described in Figure 1 together with the sequences AF067154/93IN999, AF067155/95IN21068, AF067157/93IN904, and AF067158/93IN905 from India. The two groups are indicated as C and C9. Numbers at the branches represent bootstrap values out of 100 replications.

analysis, but we will deal with them separately as they were identified to be C/C9 recombinants. Intriguingly, we found that within the C group there is a subcluster with a bootstrap value of 78% (Fig. 1A) and 79% for U3 (Fig. 2), which does not seem to be linked to geography or to time. The earliest Ethiopian isolates ETH1984, ETH1985, and 86ETH2220 were not part of this last group, which may be an indication of multiple introductions and may indicate that the original isolates did not spread epidemically. The long terminal repeat of HIV-1 subtype C has been studied extensively, both genetically as well as functionally.2,3,12–14 In these studies, no significant genetic difference was reported for the sequences of the different subtype C viruses, collected mainly in India and southern Africa. Interestingly, the analysis of the U3 LTR region demonstrates that sequences from India cluster with the Ethiopian C9 genotype, with a significant bootstrap value of more than 90% (Fig. 2). We did not find any Indian HIV-1 sequence that clusters with the Ethiopian C genotype. The C2V3 analysis revealed that among the 22 isolates, 9 were C and 13 were C9. The analysis of the LTR region of the

919

same isolates revealed 13 C sequences and only 4 C9 sequences. Four isolates (AM96146, AS88649, JM96111, and JM96125) belong to the C9 genotype for the C2V3 region but switch to the C genotype for the LTR region, providing strong evidence for recombination. The five isolates (AA88055, GO88052, GO96009, AA97202, and DE96043) that could not be classified as C or C9 by their LTR sequence were also investigated for recombination within the LTR region. Sliding window bootscanning analysis was performed according to the Simplot program with the results shown in Figure 3. The results clearly indicate that these isolates had recombined within the LTR region. We could identify two separate recombination crossover points: one around the position of nucleotide 350 and the other around nucleotide 550. Similar crossover points have been documented previously.15,16 Using both the sliding window bootscanning and the phylogenetic analyses, we found evidence that 9 sequences of the 22 studied isolates could be C/C9 recombinants, indicating a high recombination rate that possibly exceeds 40%. Six of these presumably recombinant viruses (66%) were collected in 1996/ 1997, and only 3 (33%) were collected in 1988, which is in accordance with the fact that recombination is more likely to occur with increasing HIV-1 prevalence. The most interesting observation is that all nine recombinant isolates cluster with the C9 genotype in the C2V3 region, indicating that HIV-1 subtype C isolates may favor the C9 envelope. Similarly, we found by analysis of the LTR sequences that only four sequences belong to the C9 genotype, indicating that there may be a biological advantage to the C genotype LTR. These hypotheses can be investigated experimentally, and such studies may also provide insight on the recombination mechanism and the evolution of more fit HIV-1 recombinant strains. The LTR region plays a key role in transcriptional regulation and is also a key factor for efficient viral replication. We therefore studied the sequence differences in the known promoter motifs. Several groups have already suggested that sequence variation in LTR motifs can affect the binding of cellular regulatory proteins, and thus exert either a positive or negative effect on viral replication (reviewed in Pereira et al.17). We show the consensus sequence of both subclusters as determined by the CONSENS feature of the Phylip software package in Figure 4. Sequence differences were located between the two genotypes in the NF-AT sites (bases 163–181, 182–200, and 240–250), in the TCF-1a site (bases 305–329), and the SP1 sites (bases 378–388 and 396–405). The NF-ATI site in C9 contains an adenosine at position 170, whereas the C isolates contain a guanidine and an AGC motif in C9 replaces the GAA of C at position 176. The C9 NF-AT-II site contains a uracil, an adenosine, and a cytosine at positions 183, 192, and 200, respectively, instead of the cytosine, guanidine, and uracil of the C isolates. The C9 NF-AT-III site contains a guanidine at position 240 instead of the cytosine of the C isolates. The C9 TCF1a site contains a uracil at position 316 instead of an adenosine that the C isolates have. The sequence of the TAR element of the two genotypes differs by only two nucleotides, at positions 466 and 506, which is a degree of diversity not greater than the overall C/C9 intrasubtype diversity. The free Gibbs energy for both these sequences is identical (DG 5 225.6 kcal/mol) and within the normal values for both genotypes.2

920

DE BAAR ET AL.

FIG. 3. LTR analysis of C/C9 recombinant isolates by bootscanning, based on the neighbor-joining tree and Kimura two-parameter methods with bootstrapping. The bootstrap values that support the clustering of the sample sequences with the references are plotted. The chosen window size is 200 nt, moving in steps of 20 nt along the alignment. The region noted by a solid line belongs to C and the region noted by a broken line belongs to C9. The shaded columns indicate the areas of crossover points. The last two panels show the plot analysis of the isolate DE88404 shown as C and the plot analysis of the isolate GO88019 shown as C9. The asterisk indicates the crossover point(s) for each isolate.

The regulatory SP1-I site of the C9 group has a tyrosine at position 382 where the C group has an adenosine and in the SP1-III site the C9 group has a cytosine at position 399 where the C group has a tyrosine while the SP1-II site remains unchanged. The NF-kB sites, which have been reported to contribute to virus replication, are identical for both genotypes and we observe the characteristic three copies for all subtype C viruses.

Other functional regions such as the NRE-core site, the E box, the TATAA box, the polyadenylation site [poly(A)], and the primer-binding site (PBS) are indistinguishable for both genotypes, confirming that sequences in the LTR leader region are subject to stringent structural or biological constraints. The most dramatic sequence difference between C9 and C is in the region upstream of the PBS element. From position 609 to 619, we observe a cluster of seven nucleotide substitutions.

HIV-1 C AND C9 SUBCLUSTERS IN THE ETHIOPIAN EPIDEMIC

921

FIG. 4. Alignment of the LTR consensus sequences of the C and C9 genotypes. Indicated are enhancer and promoter elements that have been identified mainly for subtype B sequences.

This region has been demonstrated to fold into a complex RNA structure that is implicated in the regulation of reverse transcription that is primed at the PBS site by an annealed tRNA primer.18 Besides the functional sites, we also compared the regions that are involved in the amplification of a 120-nt region including the PBS and part of U5 that are used in real-time monitored, NASBA-based quantitative viral RNA assays, currently used for determination and monitoring of viral RNA levels in serum or plasma from HIV-infected individuals.19 The 59 am-

plification site contains one or two mismatches that have been shown not to affect the accuracy of the assay whereas the 39 amplification site and the molecular beacon detection site sequence are completely conserved and identical to the oligonucleotides used in this assay.19 We have no evidence as to whether the differences in sequence of the LTR region reflect any difference in pathogenicity of the two genotypes. Nevertheless it is important to point out that the sequence analysis indicates the presence of significant differences. There is evidence that both genotypes have been

922

DE BAAR ET AL.

introduced in Ethiopia in the early 1980s, and the analysis of Ethiopian C2V3 sequences revealed that the prevalence of C9 was 48% in 1988. This prevalence rose to 70% in 1997, indicating that C9 envelope viruses may be spreading faster. Among the 22 viruses that we studied, on the other hand, 64% carry the LTR sequence of the C group while only 18% carry the sequence of the C9 group indicating that when considering the LTR sequence, the one from the C genotype may be advantageous. The analysis of the recombinant viruses that we have identified confirms this fact as all recombinant viruses carry the C9 C2V3 and the C LTR regions. Other parts of the genome would be involved and such facts are important in the research for vaccine development and therefore it would be highly informative to have full-length genome sequences from isolates of both genotypes and the recombinant forms.

ACKNOWLEDGMENTS The authors would very much like to thank Drs. Ben Berkhout and William A. Paxton for critical reading of the manuscript as well as helpful discussions.

REFERENCES 1. Hunt G and Tiemessen CT: Occurrence of additional NF-kappaBbinding motifs in the long terminal repeat region of South African HIV type 1 subtype C isolates. AIDS Res Hum Retroviruses 2000;16:305–306. 2. Jeeninga RE, Hoogenkamp M, Armand-Ugon M, de Baar M, Verhoef K, and Berkhout B: Functional differences between the long terminal repeat transcriptional promoters of human immunodeficiency virus type 1 subtypes A through G. J Virol 2000;74:3740– 3751. 3. Montano MA, Novitsky VA, Blackard JT, Cho NL, Katzenstein DA, and Essex M: Divergent transcriptional regulation among expanding human immunodeficiency virus type 1 subtypes. J Virol 1997;71:8657–8665. 4. Chen BK, Feinberg MB, and Baltimore D: The kappaB sites in the human immunodeficiency, virus type 1 long terminal repeat enhance virus replication yet are not absolutely required for viral growth. J Virol 1997;71:5495–5504. 5. Neilson JR, John GC, Carr JK, Lewis P, Kreiss JK, Jackson S, Nduati RW, Mbori-Ngacha D, Panteleeff DD, Bodrug S, Giachetti C, Bott MA, Richardson BA, Bwayo J, Ndinya-Achola J, and Overbaugh J: Subtypes of human immunodeficiency virus type 1 and disease stage among women in Nairobi, Kenya. J Virol 1999;73:4393–4403. 6. Abebe A, Lukashov VV, Pollakis G, Kliphuis A, Fontanet AL, Goudsmit J, and de Wit TF: Timing of the HIV-1 subtype C epidemic in Ethiopia based on early virus strains and subsequent virus diversification. AIDS 2001;15:1555–1561. 7. Abebe A, Lukashov VV, Rinke De Wit TF, Fisseha B, Tegbaru B, Kliphuis A, Tesfaye G, Negassa H, Fontanet AL, Goudsmit J, and Pollakis G: Timing of the introduction into Ethiopia of subcluster C9 of HIV type 1 subtype C. AIDS Res Hum Retroviruses 2001; 17:657–661. 8. Aklilu M, Messele T, Tsegaye A, Biru T, Mariam DH, van Benthem B, Coutinho R, Rinke de Wit T, and Fontanet A: Factors associated with HIV-1 infection among sex workers of Addis Ababa, Ethiopia. AIDS 2001;15:87–96.

9. Mehret M, Mertens TE, Carael M, Negassa H, Feleke W, Yitbarek N, and Burton T: Baseline for the evaluation of an AIDS programme using prevention indicators: A case study in Ethiopia. Bull WHO 1996;74:509–516. 10. Fontanet AL, Messele T, Dejene A, Enquselassie F, Abebe A, Cutts FT, Rinke de Wit T, Sahlu T, Bindels P, Yeneneh H, Coutinho RA, and Nokes DJ: Age- and sex-specific HIV-1 prevalence in the urban community setting of Addis Ababa, Ethiopia. AIDS 1998; 12:315–322. 11. Abebe A, Pollakis G, Fontanet AL, Fisseha B, Tegbaru, Kliphuis A, Tesfaye G, Negassa H, Cornelissen M, Goudsmit J, and Rinke De Wit TF: Identification of a genetic subcluster of HIV type 1 subtype C (C9) widespread in Ethiopia. AIDS Res Hum Retroviruses 2000;16:1909–1914. 12. de Baar MP, De Ronde A, Berkhout B, Cornelissen M, van der Horn KHM, van der Schoot AM, De Wolf F, Lukashov VV, and Goudsmit J: Subtype-specific sequence variation of the HIV type 1 long terminal repeat and primer-binding site. AIDS Res Hum Retroviruses 2000;16:499–504. 13. Choudhury S, Montano MA, Womack C, Blackard JT, Maniar JK, Saple DG, Tripathy S, Sahni S, Shah S, Babu GP, and Essex M: Increased promoter diversity reveals a complex phylogeny of human immunodeficiency virus type 1 subtype C in India. J Hum Virol 2000;3:35–43. 14. Scriba TJ, de Villiers T, Treurnicht FK, zur Megede J, Barnett SW, Engelbrecht S, and van Rensburg EJ: Characterization of the South African HIV type 1 subtype C complete 59 long terminal repeat, nef, and regulatory genes. AIDS Res Hum Retroviruses 2002;18: 149–159. 15. Jubier-Maurin V, Saragosti S, Perret J-L, Mpoudi E, Esu-Williams E, Mulanga C, Liegeois F, Ekwalanga M, Delaporte E, and Peeters M: Genetic characterization of the nef gene from human immunodeficiency virus type 1 group M strains representing genetic subtypes A, B, C, E, F, G, and H. AIDS Res Hum Retroviruses 1999;15:23–32. 16. Rodenburg CM, Li Y, Trask SA, Chen Y, Decker J, Robertson DL, Kalish ML, Shaw GM, Allen S, Hahn BH, and Gao F: Near fulllength clones and reference sequences for subtype C isolates of HIV type 1 from three different continents. AIDS Res Hum Retroviruses 2001;17:161–168. 17. Pereira LA, Bentley K, Peeters A, Churchill MJ, and Deacon NJ: A compilation of cellular transcription factor interactions with the HIV-1 LTR promoter. Nucleic Acids Res 2000;28:663–668. 18. Berkhout B, Ooms M, Beerens N, Huthoff H, Southern E, and Verhoef K: In vitro evidence that the untranslated leader of the HIV-1 genome is an RNA checkpoint that regulates multiple functions through conformational changes. J Biol Chem 2002;277: 19967–19975. 19. de Baar MP, van Dooren MW, de Rooij E, Bakker M, Van Gemen B, Goudsmit J, and De Ronde A: Single rapid real-time monitored isothermal RNA amplification assay for quantification of human immunodeficiency virus type 1 isolates from group M, N, and O. J Clin Microbiol 2000;39:1378–1384.

Address reprint requests to: Georgios Pollakis Department of Human Retrovirology Academic Medical Center University of Amsterdam Meibergdreef 15 1105 AZ Amsterdam, The Netherlands E-mail: [email protected]

Suggest Documents