AIDS RESEARCH AND HUMAN RETROVIRUSES Volume 24, Number 7, 2008 © Mary Ann Liebert, Inc. DOI: 10.1089/aid.2008.0064
Sequence Note
Discrepancies in Assignment of Subtype/Recombinant Forms by Genotyping Programs for HIV Type 1 Drug Resistance Testing May Falsely Predict Superinfection Michel Ntemgwa,1,2 M. John Gill,3 Bluma G. Brenner,1 Daniela Moisi,1 and Mark A. Wainberg1,2
Abstract
With the growing diversity of the HIV pandemic, routine genotyping is an important tool for monitoring viral subtype as well as drug resistance. In this regard, numerous subtyping tools and drug resistance algorithms are available online. However, there are discrepancies in the use of these online tools in the designation of HIV1 subtypes or recombinant forms that may have an impact on drug susceptibility profiles. Indeed, inconsistencies in some of these tools may lead to a false designation of dual infection and/or superinfection. In this case study, we evaluated the sequence diversity of an infection that was referred to us as a potential case of superinfection as a result of variations in designation of subtype. We evaluated sequences using five different online tools and finally determined by phylogenetic analysis that the sequence was a unique A1/C intersubtype recombinant at baseline and not a case of superinfection.
T
HE HIV/AIDS PANDEMIC has diversified to include 11 subtypes and 34 circulating recombinant forms (CRFs). Routine genotyping of the pol gene is now being used to classify HIV-1 subtypes, in addition to determining drug resistance profiles, and many online tools exist for classifying HIV-1 subtypes based on comparisons of available pol gene sequences to consensus sequences. Previously, such consensus determinations were based on whole genome or env sequencing. Now, the shorter sequence length of Pol (900–1400 nucleotides) and the lower genetic diversity in this region can contribute to an inaccuracy of web-based algorithms in determining subtypes and CRFs. Moreover, these online sites routinely modify their algorithms, and this can sometimes result in inconsistencies in comparison with data obtained in real time. Indeed, limitations in regard to the use of such tools and variations among different methods of subtyping based on pol sequences have been described.1 Although subtype assignment has been highly consistent (94%) across methods for earlier reported subtypes (A1, A2, B, C, D, and F), there is relatively little agreement with regard to CRFs and newly
arising recombinant forms. These differences or variations in HIV-1 subtype designation may have serious implications for study of dual infections (i.e., a second infection before seroconversion) and superinfections (i.e., a second infection after seroconversion).2,3 A patient can be classified as superinfected if viral sequences are initially described as representative of one subtype and later of a different subtype; sometimes such data have been obtained using different methods or as a result of variations in the same algorithm in real time. Although a confident assignment of viral subtypes requires full-length genome sequences, practical limitations often result in such decisions being based on subgenomic analysis. Traditionally, gag and env sequences have been used to designate subtypes, as genetic variability in these regions is relatively high. However, the routine use of HIV resistance genotyping as a means of following up patients on antiretroviral therapy has increased the volume of pol sequence data for each of the protease (PR) and reverse transcriptase (RT) genes.4,5 Recently, some of these pol sequences have been used to study HIV-1 transmission.6 Other studies have
1McGill
University AIDS Centre, Lady Davis Institute-Jewish General Hospital, Montreal, Quebec, Canada. of Experimental Medicine, McGill University, Montreal, Quebec, Canada. 3Department of Microbiology and Infectious Diseases, University of Calgary, Calgary, Alberta, Canada. 2Division
995
996 TABLE 1.
NTEMGWA ET AL. SUBTYPE ASSIGNMENT
OF
SEQUENCES USING FIVE DIFFERENT ONLINE SUBTYPING TOOLS
AND
VICRO INTERPRETATIONa
Method used (online tool) Sample ID 5915–2002b 5915–2002 2566_2005 2566_2006
HIVDB PR/RT CRF01_AE_D 94.9%/90.6% CRF01_AE_B 94.9%/92.6% CRF02_AG_B 92.3%/92.6% CRF02_AG_B 92.3%/92.6%
STAR
REGA
Geno2pheno
jpHMM
Virco
Uc
Ud
CRF10_CD (100%)
A1/C/A2
Uc
Ud
A1 (100%)
A1/C
D (2002) U (2008) n.d.
Uc
Ud
A1 (100%)
A1/C
n.d.
Uc
Ud
A1 (100%)
A1/C
n.d.
an.d.,
not determined; U, unknown. sequence was longer, containing up to 400 amino acids in the RT. cThe subtype was unassigned but predicted a subtype A recombinant using STAR recombination analysis. dThe subtype was unassigned based on sequence 800 bps that clustered with a pure subtype with bootstrap 70%. Also shown are detection of recombination in the bootscan and failure to classify the sequence as either a CRF or subtype (bootstrap support). Bootscan predicted a URF (unidentified recombinant form) containing subtypes A1 and C. bThis
a
b
c
FIG. 1. Recombinant patterns of (a) 5915_2002, (b) 2566_2005, and (c) 2566_2006 (with corresponding color-coded regions) were evaluated using the jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach. The positions of the breakpoints are indicated at positions 3066 nt, 3003 nt, and 3003 nt, respectively, using the HIV-1 HXB2 numbering engine.
FIG. 2. Phylogenetic tree derived from 5915_2002, 2566_2005, and 2566_2006 (1017 nucleotides) showing its relationship to alignments of group M reference strains. N.CM.95.YBF30 was used as an outgroup. Also included are subtype A1C recombinants (92RW009, 04CA7750, and 95IN21301) from the Los Alamos HIV Sequence Database. Subtype designations for reference strains from the Los Alamos HIV Sequence Database are indicated on the appropriate branches. Bootstrap values supporting a monophyletic group are provided on the branches. N.CM.95.YBF30 was used as an outgroup.
A
FIG. 3. Phylogenetic tree derived from 5915_2002, 2566_2005, and 2566_2006 based on breakpoints derived from the jpHMM approach for 2253–3065 (A) and 3066–3269 (B), showing their relationship to alignments of group M reference strains from the Los Alamos HIV Sequence Database. N.CM.95.YBF30 was used as an outgroup. Bootstrap values are provided on the branches. This confirms that the pol sequence was composed of two subtypes (A1 and C).
DISCREPANCIES IN HIV-1 SUBTYPING TOOLS
999
B
FIG. 3.
(Continued).
1000 demonstrated the utility of genotyping data for assignment of subtype despite high levels of conservation in pol.7,8 Doubtless, the ongoing exchange of viruses between geographic areas will continue to render interpretation of data more complex, especially in regard to superinfection. In this study, we describe a patient from Montreal, Canada of Pakistani origin who harbored a non-B HIV-1 variant, designated as subtype D by Virco sequencing (VircoTYPE HIV1 report)9 in 2002. The Virco subtype call is not currently defined, i.e., unknown (U) (January 2008), based on the addition of CRF consensus sequences in subtype call algorithms. The patient later moved to Calgary and subsequent genotyping in 2005 and 2006 suggested a virus of subtype A1. The sequences were referred to the McGill AIDS Centre as a potential case of superinfection. To investigate this case, we analyzed the historical sequence obtained in 2002 and compared it with sequences obtained in 2005 and 2006 using many currently available methods. The 2002 sequence was given the laboratory identification 5915_2002 while those of 2005 and 2006 were labeled 2566_2005 and 2566_2006, respectively. Genotyping for 5915_2002 was performed at the McGill AIDS Centre using a published protocol9 (Virco, BVBA, Mechelen, Belgium) based on sequencing of a 1497-bp fragment of the HIV-1 pol gene (positions 2253–3749), encompassing up to 400 amino acids in RT while those of 2566_2005 and 2566_2006 were performed using a method similar to that in use in the Calgary laboratory, in which only 250 amino acids in RT were sequenced. Five methods of subtype assignment were used: The (1) percentage of identity-based genotyping determined using the Stanford database (HIVDB program) (http:// hivdb.stanford.edu/), (2) STAR, a distance-based tool for rapid and accurate determination of HIV-1 from the University College London (www.vgb.ucl.ac.uk/starn. shtml); (3) automated neighbour joining phylogeny-based genotyping as implemented in the REGA HIV-1 subtyping tool, version 2.0 (http://www.bioafrica.net/subtypetool/html/ index.html) with mirror sites at Stanford (http://dbpartners. stanford.edu/REGASubtyping/) and the REGA Institute for Medical Research (http://jose.med.kuleuven.be/subtype tool/html/index.html); (4) the geno2pheno program available online (http://www.geno2pheno.org), and (5) the jumping profile Hidden Markov Model (jpHMM) that is widely used for detection of genomic recombinations in HIV-1 (http:// jphmm.gobics.de/jphmm.html), representing a collaborative effort between the Department of Bioinformatics of the University of Göttingen and the Los Alamos HIV Sequence Database Group.10 In addition, all sequences were compared with those of 462 non-B subtype infections in our laboratory database from persons living in the province of Quebec. The subtype assignments of the sequences using the various tools are shown in Table 1. The 2002 sequence was longer with 400 amino acids in RT, while those of 2005 and 2006 had only 250 amino acids, due to differences in sequencing protocols. For comparison, the longer ancestral sequence, i.e., 5915_2002*, was cut to a length equal to those of 2005 and 2006. This equal length 2002 sequence was designated 5915_2002, consistent with recommendations of the International AIDS Society–USA Drug Resistance Mutations Group that has identified drug resistance mutations of relevance to reside only within this 250 amino acid segment.
NTEMGWA ET AL. Genotyping showed that 5915_2002 was a wild-type sequence containing only the M36I polymorphism in the PR gene and no resistance mutations in the RT gene. A repeat report of the same sample in 2008 revealed 13V, 35D, 36I, 41K, 69K, and 89M as PR polymorphisms that possessed a measurable impact on drug resistance. Samples 2566_2005 and 2566_2006 possessed the L10F, K20I, M36I, I54V, A71V, and V82A mutations in PR and M184V in RT, indicative of antiretroviral (ARV) experience. Of importance, the sample included the M89I mutation in PR, a major resistance mutation that has been linked to therapy failure in patients infected with HIV-1 non-B subtypes.11 The existence of M89I was not included in the resistance reports of 2005 and 2006. The three sequences 5915_2002, 2566_2005, and 2566_2006 were compared using the five online tools described above, and the results are shown in Table 1. The 5915_2002 sequence was evaluated twice, first in comparison with the original 5915_2002* sequence and again with 2566_2005 and 2566_2006. When 5915_2002* was evaluated using the Stanford online tool, the subtype obtained was CRF01_AE/D. This algorithm designates subtypes based on PR and RT sequences, independently. The results suggested that the PR and RT regions in this sample were most closely related to subtype CRF01_AE and subtype D consensus sequences. Analysis of the shorter 250 aa sequence 5915_2002 designated the same sequence CRF01_AE/B. This shows that the RT fragment of 5915_2002*, according to the Stanford tool, is subtype D based on the longer sequence and subtype B based on the shorter sequence. It should be noted that subtypes B and D, in general, show little sequence divergence. Sequence evaluations of 2566_2005 and 2566_2006 both revealed CRF02_AG/B using the Stanford online tool, clearly showing disparities. These same sequences were also evaluated using more rigorous online algorithms for designations of subtypes and recombinant forms. The STAR algorithm did not assign any subtype but predicted a subtype A recombinant form (Table 1). Similarly, the REGA tool assigned either an indeterminate subtype or CRF, although the bootscan revealed a recombinant form showing homology to subtypes A1 and C in different regions of pol. Geno2pheno predicted a subsubtype A1 with 100% confidence for the shorter sequences, but the longer ancestral sequence 5915_2002* predicted a CRF10_CD recombinant form. The jpHMM tool predicted an A1/C recombinant for the 2005/2006 sequences, while the ancestral 5915_2002* longer sequence predicted a mosaic virus involving subtypes A1, C, and A2 (A1/C/A2), as shown in Table 1. Alignments of the three sequences with reference strains from the Los Alamos HIV database were generated using Bioedit 7.0 software. Online software was used to remove all gaps in the alignments (http://www.hiv.lanl.gov/content/ hivdb/ GAPSTREEZE/strip_ready.html) and similarity and bootstrap plots were performed using the Simplot software of the Phylip package to analyze the recombinant structure of the genome (http://sray.med.som.jhmi.edu/RaySoft/ Simplot/). Recombinant breakpoints were further evaluated using the jpHMM,10 which is a probabilistic generalization of the jumping-alignment approach that is more rigorous than traditional methods in determining recombination breakpoints. Phylogenetic trees were constructed using Mega 3.0 software12 and bootstrap values were used to con-
DISCREPANCIES IN HIV-1 SUBTYPING TOOLS firm the subtypes of the segments. Since several of the methods used suggested that recombination had occurred, we used the jpHMM approach to identify potential intersubtype recombinant breakpoints. The length of the 5915_2002* sequence was reduced at the 3 end to that of 2566_2005 and 2566_2006. The recombinant structures are shown in Fig. 1. The structures indicate that all three sequences were derived from subtypes A1 and C. As shown in Fig. 1a, the breakpoint of 5915_2002 was at position 3066 (HXB2 numbering) while those of 2566_2005 and 2566_2006 were both at 3003 (Fig. 1b and c). Three phylogenetic trees were then constructed. One involved aligning and comparing all three sequences (Fig. 2) while the other two involved cutting the sequences at the breakpoints and constructing the phylogenetic trees independently. Figure 2 shows a tree generated by aligning relevant sequences with those of diverse HIV-1 reference strains. Also included in the tree were recombinants involving subtypes A1 and C (92RW009, 04CA7750, and 95IN21301) obtained from the Los Alamos HIV Sequence Database. Of note is isolate 04CA7750, which is a novel subtype A1/C recombinant recently described in our laboratory.13 As shown, the sequences of 5915_2002, 2566_2005, and 2566_2006 clustered around subtype A1/C samples with 100% bootstrap support, forming its own distinct cluster. 2566_2005 and 2566_2006 were identical but slightly different from 5915_2002, probably due to changes in viral evolution over time and also because of resistance mutations that the patient had acquired while on therapy. Based on the breakpoints derived from the jpHMM approach, two other phylogenetic trees were reconstructed spanning the 2253–3065 (Fig. 3A) and 3066–3269 (Fig. 3B) regions. The results show that sequences from the 2253–3065 region (HXB2 numbering) clustered with subsubtype A1 while sequences spanning nucleotides 3066–3269 (HXB2 numbering) clustered with reference sequences of subtype C. Thus, phylogenetic tree analysis indeed confirmed that the pol sequences represented an intersubtype recombinant, consisting of two subsubtypes, A1 and C. The sequence was also inserted into a phylogenetic tree of non-B viral samples isolated in Quebec, many of which originate in Francophone countries in Central and West Africa (unpublished). This virus did not segregate with any of the major viral subtypes, but rather appeared to be an A/C recombinant. However, it did not share sequence homology with the other subtype A/C recombinant that we identified through whole genomic sequencing.13 The three viral genomes presented here represent a unique A1/C recombinant form in pol. This finding is consistent with the patient’s country of origin of Pakistan, a region in which subtypes A and C cocirculate. The virus was identical over three different times of analysis, apart from the acquisition of resistance mutations related to treatment. This case underscores the problematic issue of subtype interpretation in clinical settings. Physicians who see reports of different subtypes on genotypes may falsely presume that a superinfection occurred. Aside from the difficulty of raising this subject with patients, such inconsistencies may also lead to questions about the quality assurance programs of different reference laboratories. This case emphasizes the need to conduct detailed phylogenetic analyses, especially in cases in which superinfection is suspected. It also em-
1001 phasizes the limitations of online subtyping tools in reporting subtypes. As it is not practical to perform full-length genomic analysis for each sample genotyped, phylogenetic analysis should at least be used to validate any potential case of superinfection. The impact of viral subtype on drug resistance profiles is an important issue that is currently being debated. Our laboratory has shown that a signature polymorphism can lead to preferential acquisition of V106M in persons with subtype C viruses treated with efavirenz.14 Our studies also show the preferential acquisition of the K65R/Q151M resistance pathway in subtype C viruses.15,16 Others have demonstrated the importance of the 89I pathway for PI resistance in subtype A, C, and F viruses, due to a 89M polymorphism. Of note, the virus in our case harbored 89M, but many algorithms do not yet consider this polymorphism to be significant in regard to PI resistance. This emphasizes that it will become increasingly important to identify the correct subtypes of viral isolates in regard to both treatment options and proper understanding of HIV epidemiology. Sequence Data The nucleotide sequences (5519_2002*, 2566_2005, and 2566_2006) used in this study have been submitted to GenBank and are available under accession numbers EU531746, EU531747, and EU531748 Acknowledgments This work was supported in part by the Canadian Institutes of Health Research (CIHR) and the Reseau SIDA of the Fonds de la Recherche en Santé du Québec. Michel Ntemgwa is the recipient of a CIHR doctoral fellowship award. We thank Jorge L. Martinez-Cajas for editing the manuscript, Brenda Beckthold for the acquisition of sequence data, and André Dascal for patient referral. References 1. Gifford R, de Oliveira T, Rambaut A, et al.: Assessment of automated genotyping protocols as tools for surveillance of HIV-1 genetic diversity. AIDS 2006;20:1521–1529. 2. Kozaczynska K, Cornelissen M, Reiss P, Zorgdrager F, and van der Kuyl AC: HIV-1 sequence evolution in vivo after superinfection with three viral strains. Retrovirology 2007;4:59. 3. Pernas M, Casado C, Fuentes R, Perez-Elias MJ, and LopezGalindez C: A dual superinfection and recombination within HIV-1 subtype B 12 years after primoinfection. J Acquir Immune Defic Syndr 2006;42:12–18. 4. Turner D, Brenner B, Moisi D, et al.: Nucleotide and amino acid polymorphisms at drug resistance sites in non-B-subtype variants of human immunodeficiency virus type 1. Antimicrob Agents Chemother 2004;48:2993–2998. 5. Turner D, Brenner B, Mosis D, Liang C, and Wainberg MA: Substitutions in the reverse transcriptase and protease genes of HIV-1 subtype B in untreated individuals and patients treated with antiretroviral drugs. MedGenMed 2005;7:69. 6. Brenner BG, Roger M, Routy JP, et al.: High rates of forward transmission events after acute/early HIV-1 infection. J Infect Dis 2007;195:951–959. 7. Pasquier C, Millot N, Njouom R, et al.: HIV-1 subtyping using phylogenetic analysis of pol gene sequences. J Virol Methods 2001;94:45–54.
1002 8. Yahi N, Fantini J, Tourres C, Tivoli N, Koch N, and Tamalet C: Use of drug resistance sequence data for the systematic detection of non-B human immunodeficiency virus type 1 (HIV-1) subtypes: How to create a sentinel site for monitoring the genetic diversity of HIV-1 at a country scale. J Infect Dis 2001;183:1311–1317. 9. Akouamba BS, Viel J, Charest H, et al.: HIV-1 genetic diversity in antenatal cohort, Canada. Emerg Infect Dis 2005;11: 1230–1234. 10. Zhang M, Schultz AK, Calef C, et al.: jpHMM at GOBICS: A web server to detect genomic recombinations in HIV-1. Nucleic Acids Res 2006;34:W463–465. 11. Abecasis AB, Deforche K, Snoeck J, et al.: Protease mutation M89I/V is linked to therapy failure in patients infected with the HIV-1 non-B subtypes C, F or G. AIDS 2005;19: 1799–1806. 12. Kumar S, Tamura K, and Nei M: MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 2004;5:150–163. 13. Ntemgwa M, Toni T, Brenner BG, Routy J-P, Moisi D, Oliveira M, and Wainberg MA: Near full-length genomic analysis of a novel subtype A1/C recombinant HIV-1 isolate from Canada. AIDS Res Hum Retroviruses 2008;24:655–659.
NTEMGWA ET AL. 14. Brenner B, Turner D, Oliveira M, et al.: A V106M mutation in HIV-1 clade C viruses exposed to efavirenz confers crossresistance to non-nucleoside reverse transcriptase inhibitors. AIDS 2003;17:F1–5. 15. Brenner BG, Oliveira M, Doualla-Bell F, et al.: HIV-1 subtype C viruses rapidly develop K65R resistance to tenofovir in cell culture. AIDS 2006;20:F9–13. 16. Doualla-Bell F, Avalos A, Brenner B, et al.: High prevalence of the K65R mutation in human immunodeficiency virus type 1 subtype C isolates from infected patients in Botswana treated with didanosine-based regimens. Antimicrob Agents Chemother 2006;50:4182–4185.
Address reprints requests to: Mark A. Wainberg McGill AIDS Centre Jewish General Hospital 3755 Cote Ste Catherine Rd. Montreal, Quebec, Canada H3T 1E2 E-mail:
[email protected]