Hepatitis C virus variants from Jakarta, Indonesia ... - CiteSeerX

12 downloads 0 Views 881KB Size Report
Hepatitis C virus variants from Jakarta, Indonesia classifiable into novel genotypes in the second (2e and 2f), tenth (10a) and eleventh (lla) genetic groups.
Journal of General Virology (1996), 77, 293 301.

293

Printed in Great Britain

Hepatitis C virus variants from Jakarta, Indonesia classifiable into novel genotypes in the second (2e and 2f), tenth (10a) and eleventh (lla) genetic groups H a j i m e T o k i t a , 1 H i r o a k i O k a m o t o , 1 H i s a o Iizuka, 2 Junichi K i s h i m o t o , 3 F u m i o Tsuda, 4 Laurentius A. L e s m a n a , 5 Y u z o M i y a k a w a 6 and M a k o t o M a y u m i 1. 1Immunology Division, Jichi Medical School, Minamikawachi-Machi, Tochigi-Ken 329-04, Japan, 2 Japanese Red Cross Saitama Blood Center, Saitama-Ken 338, Japan, 31nstitute of Immunology, Tokyo 112, Japan, 4Department of Medical Sciences, Toshiba General Hospital, Tokyo 140, Japan, 5 Department of Internal Medicine, Faculty of Medicine, University of Indonesia, Jakarta 10430, Indonesia and 6 Mita Institute, Tokyo 108, Japan

Hepatitis C virus (HCV) isolates from 126 hepatitis patients in Jakarta, Indonesia were genotyped by PCR with genotype-specific primers deduced from the HCV core gene. Fifty-five isolates (44%) were classified as genotype II/lb, 15 (12 %) as lc, 33 (26 %) as III/2a, and 1 (1%) as V/3a, while the remaining 22 (17 %) were not classifiable into any of the five common genotypes (I/la, II/lb, III/2a, IV/2b and V/3a) or lc. Sequences of a part of the NSSb region [1093 bp (nucleotides 8279-9371)] of the 22 isolates of unclassifiable genotype were subjected to pair-wise comparison and phylogenetic analysis along

with those of 62 isolates of 25 genotypes in nine genetic groups. Seven of the isolates were classified into 2e and two into 2f, representing novel genotypes in genetic group 2, while ten and three were classified into two new genetic groups, l0 and 11, respectively, and their genotypes were provisionally designated 10a and l la. The isolates of genotype 10a (JK049) and 1 la (JK046) were sequenced in full. Comparison of 24 HCV genomes including those of JK049 and JK046, over the entire genome and subgenomic regions, supported the classification of HCV into 11 genetic groups.

Introduction

in various areas of the genome is used for classifying the virus. Since the divergence in the entire genome and parts of the NS5b region distributes in two distinct tiers, HCV is classified into genetic groups or types named 1, 2, 3 etc. which divide further into genotypes (Okamoto et al., 1994; Tokita et al., 1995) or subtypes called a, b, c etc. (Simmonds et at., 1993; Simmonds, 1995). Up to the present, at least 34 genotypes in nine genetic groups have been reported (Cha et al., 1992; Mori et al., 1992; Bukh et al., 1993, 1994; Simmonds et al., 1993; Stuyver et al., 1993, 1994; Okamoto et al., 1994; Tokita et al., 1994a, b, 1995). HCV genotypes have distinct geographical distributions. It is not clear, however, how HCV genotypes should be classified in a purely virological context or for any clinical purposes. From the standpoint that HCV genotypes should not be characterized fully except by determining the entire genomic sequence, we have proposed five common genotypes designated I, II, III, IV and V (Sakamoto et al., 1994). These five represent essentially all HCV genotypes in Europe, North America, Oceania and most areas in Asia (Bukh et al., 1995; Miyakawa et al., 1995; Simmonds, 1995). Genotype I corresponds to la, II to lb, III to 2a, IV to 2b and V to 3a.

Hepatitis C virus (HCV) is a positive-stranded linear RNA virus of approximately 9500 nucleotides (nt), and is the major causative agent of blood-borne non-A, nonB hepatitis worldwide (for a review, see Houghton et al., 1991). It has a single, long open reading frame flanked by 5' and 3' untranslated regions (UTR). The polyprotein precursor encoded by the open reading frame is cleaved by viral and host proteases into core (C) protein, envelope 1 (El) and envelope 2/nonstructural 1 (E2/NS1) glycoproteins, as well as six nonstructural proteins designated NS2, NS3, NS4a, NS4b, NS5a and NS5b (Houghton et al., 1991; Grakoui et al., 1993). HCV is one of the few viruses pathogenic for humans for which detection of viral RNA remains the sole means of establishing current infection, and sequence divergence

* Author for correspondence. Fax +81 285 44 1557. The sequences reported in this paper have been deposited in the DDBJ, GenBank and E M B L nucleotide sequence databases (accession nos D49745 D49780, D6382! and D63822). 0001-3437 © 1996 S G M

294

H. T o k i t a and others

Sequence divergence of the entire HCV genome is reflected, to a greater or lesser extent, within the various genomic regions. Hence, new genotypes keep being recorded based on divergence in partial genomic sequences, typified by 222 bp in the NS5b region representing only ~ 2 % of the entire genome (Simmonds et al., 1993). Longer sequences have been proposed as the basis of c o m p a r i s o n , such as 576 b p o f the E1 gene (Bukh et al., 1993; T o k i t a et at., 1994b), 573 b p o f the C gene ( B u k h et al., 1994), a n d p a r t s o f the N S 5 b region s p a n n i n g 329 b p ( n t 8 2 7 9 - 8 6 0 7 ) ( T o k i t a et al., 1994b), 3 4 0 b p (nt 8276-8615) ( E n o m o t o et al., 1990; Stuyver et al., 1994) a n d 1093 b p (nt 8279-9371) ( T o k i t a et al., 1994b). O f 126 H C V isolates f r o m hepatitis patients in J a k a r t a , I n d o n e s i a , 22 (17 % ) were n o t classifiable into a n y o f the five c o m m o n g e n o t y p e s ( I / l a , I I / l b , I I I / 2 a , I V / 2 b a n d V / 3 a ) or into g e n o t y p e l c (for which the entire g e n o m i c sequences have been c h a r a c t e r i z e d ) by P C R with typespecific p r i m e r s d e d u c e d f r o m the H C V core gene. Pairwise c o m p a r i s o n a n d p h y l o g e n e t i c analysis o f the 1093 b p (nt 8279-9371) N S 5 b sequence i n d i c a t e d t h a t these new isolates b e l o n g to novel g e n o t y p e s 2e a n d 2f in genetic g r o u p 2, as well as 10a a n d l l a in new genetic groups. F u r t h e r , full-length sequences o f two H C V g e n o m e s o f g e n o t y p e 10a o r l l a were d e t e r m i n e d a n d were c o m p a r e d with the c o m p l e t e genomes o f 22 H C V isolates o f v a r i o u s genotypes.

Methods Sera containing HCV. Sera from 126 hepatitis patients in Jakarta, Indonesia, testing positive for HCV RNA, were genotyped by PCR with type-specific primers deduced from the HCV core gene (Okamoto et al., 1992b, 1993, 1994). Genotype lc was determined by PCR with primers specificfor this genotype deduced from the core gene (Okamoto et al., 1994). Amplification by PCR and sequence analysis. Nucleic acids were extracted from 50100 ~1 of serum, and reverse-transcribed to cDNA with HCV-specific 20-mer antisense primers [#299 (nt 250-269), #122 (nt 828-847), a mixture of #127, #204 and #337 (nt 1587-1606), and #316 (nt 8608 8627)] or a non-specific 43-mer primer (# 165) with (A)I7 as described previously (Okamoto et al., 1992a, 1993; Tokita et al., 1994a, b). Nucleotides were numbered from the putative 5" end of the HCV genome of genotype II/lb [HC-J4/83 (accession no. D01217)]. Amplification and sequencing of the Y-terminal 1.6 kb (containing the 5'UTR, C gene, E1 gene and part of the E2/NS1 gene) and the 31terminal 1-1 kb (containing parts of the NS5b region and the 3'UTR) were performed by methods described elsewhere (Okamoto et al., 1993 ; Tokita et al., 1994a, b). The consensus sequence was determined from three independent clones. We also sequenced HCV isolates in group 4 (HEMA51) and group 5 (FR741). HEMA51 was 89-91% similar in the NS5b 222bp sequence to the genotype 4a isolates (EG-I3 and EG-19) reported by Simmonds et al. (1993) but only 75 % similar in the entire E1 gene sequence to the genotype 4a isolate (Z4) documented by Bnkh et al. (1993); therefore, it was deduced to be of genotype 4a (Simmonds et al., 1993). FR741 was 95-97 % similar in the NS5b 222 bp sequence when compared with the genotype 5a isolates (SA30, SA156, SA183 and

34REV) reported by Simmonds et al. (1993) and 89 91% similar in the entire E1 gene sequence to the genotype 5a isolates (SA1, SA4-7 and SA13) reported by Bukh et al. (1993). Hence, it was of genotype 5a. Construction of the phylogenetic tree. A phylogenetic tree for HCV was constructed by the unweighted pair-group method with the arithmetic mean of Nei (1987) using a molecular evolutionary analysis system for DNA and amino acid sequences (treeupg of the ODEN programs version 1.1.1, National Institute of Genetics, Mishima, Japan). Determination of the full-length sequences of two HCV isolates in novel genetic groups. The entire genomic sequences of isolate JK049 of genotype 10a and isolate JK046 of genotype 1la were determined. The nucleotide sequences of the Y-terminal 1.6 kb and the Y-terminal 1.1 kb were determined as described above. The central portion was divided into four domains, nt 1469 1871 (403bp), nt 1811-4958 (3148 bp), nt490~5489 (590bp) and nt 5439-8333 (2895 bp) for JK049, and nt 14421867 (421bp), nt 1803 5233 (3431bp), nt 5160 5485 (326 bp) and nt 5427-8346 (2920 bp) for JK046; nt were numbered in accordance with sequences specific to genotype 10a or I la. These sequences were amplified by PCR with universal primers deduced from the conserved areas among HCV genomes of known fulllength sequence or primers specific for JK049 or JK046. PCR was performed by the method described above or by means of long-distance PCR with Taq Extender PCR additive (Stratagene) and Takara Ex Taq DNA polymerase (Takara Biochemicals). Long-distance PCR was performed for 30 cycles with each cycle consisting of denaturation at 94 °C for 30 s, primer annealing at 55 °C for 30 s, and primer extension at 72 °C for 6 rain.

Results Classification o f H C V isolates o f u n k n o w n genotype f r o m Jakarta, Indonesia by pair-wise comparison o f a 1093 bp N S 5 b sequence

O f 126 sera f r o m h e p a t i t i s p a t i e n t s in J a k a r t a , H C V g e n o t y p e I I / l b was detected in 55 ( 4 4 % ) , l c in 15 (12 %), I I I / 2 a in 33 (26 % ) a n d V / 3 a in 1 ( 1 % ) . H C V isolates o f unclassifiable g e n o t y p e in the r e m a i n i n g 22 ( 1 7 % ) sera were sequenced p a r t i a l l y . T h e y were subjected to pair-wise c o m p a r i s o n , a l o n g with 62 H C V isolates o f 25 g e n o t y p e s in nine genetic groups, within a 1093 b p (nt 8279 9371) sequence in the N S 5 b region (Table 1). W e have r e p o r t e d t h a t H C V isolates can be classified by the criteria o f isolate similarity o f 91.7 99.2%, g e n o t y p e similarity o f 80.2 8 8 ' 6 % a n d g r o u p similarity o f 66.5-80-0% ( T o k i t a et al., 1994b, 1995). The 22 J a k a r t a isolates o f u n k n o w n g e n o t y p e showed a sequence similarity o f ~< 87.6 % to the r e p o r t e d isolates o f 25 genotypes, i n d i c a t i n g t h a t they w o u l d b e l o n g to distinct genotypes. T h e y were classified into two novel g e n o t y p e s in g r o u p 2 (2e a n d 2f) a n d two novel g e n o t y p e s in new genetic g r o u p s (10a a n d 1 l a ) for the r e a s o n s detailed below. Seven o f the J a k a r t a isolates (JK020, JK025, JK109, J K l l 7 , JK128, JK143 a n d JK151) differed f r o m one a n o t h e r b y < 7"7 % a n d were classified as g e n o t y p e 2e, while two o f the isolates (JK081 a n d JK139), differing

Hepatitis C virus genotypes

295

Table 1. Comparison of genotypes 2e, 2f, lOa and l l a from Jakarta, Indonesia

with reported genotypes within a 1093 bp (nt 8279-9371) NS5b sequence Jakarta HCV isolates of novel genotypes (2e and 2f) and in new genetic groups (10a and 1la) were compared with HCV isolates of various genotypes in nine genetic groups for which the 1093 bp sequence (nt 827%9371) is available. Sequence similarity (%) to new genotypes of HCV from Jakarta Groups/genotypes

n

2e (n = 7)

Group 1 (I/la, II/lb, lc) Group 2 III/2a IV/2b 2c Group 3 V/3a 3b 3c 3d 3e 3f Group 4 Group 5 5a Group 6 (6a, 6b) Group 7 7a 7b 7c 7d Group 8 (Sa, 8b) Group 9 (9a, 9b, 9c)

18

2f (n = 2)

10a (n = 10)

1la (n = 3)

66-2-68.9 67.4-69.5

68.0-72.0

70.4-73.2

2 2 1 5 4 1 1 1 1 1 1 5

83.5-85.9 82.2-83-3 86.2-87.5 66.2-67.8 67.7-69-2 66.4-67.7 66-8-68.6 67.2-68-3 68.6-69.2 68-4-69.7 69.2-70.0 67.8-69.7

84.7-85.6 82.9-83.8 86.9-87.6 66.8-68.2 67-0-68.4 66.5-67.2 68.1-68.6 67.8-68.0 69.2-69.3 69.0-69-6 70-9-71.4 68.3-71.0

69.3 7 0 . 4 69.1-71.5 69.5-70.5 78.0-81.0 78-4-81.3 80-1-82.2 80.6-82.9 77.4-79.9 80.6-82.0 73.9 7 5 . 0 71.3 72.1 71"0-73.5

68-2-69.3 69.3 70-4 68.3 69.0 71.4-72.4 71-3-72-9 71.7-73.1 72.6-73.0 71.3 72-1 71-8 72.1 73.4-73.6 70.1 70.8 75.9-79.1

4 1 4 1 4

66.4-68.9 66.5-68.4 65.9-68.6 67.9-69.6 67.2-69.0

67.2-68.8 67-6-68.1 66.7-67.6 67.7-68.5 68.2-68.8

694-71.5 70-4-71.8 68.9 7 1 . 2 70.5-71.5 69.8 7 1 . 6

78.8-80.1 78-7-79.2 78.4-80.7 80.~81.0 76.3-78.0

5

66.3-69"2 67.4-69.4

68"9-71'6

74.6-76.9

f r o m each other by 6' 1%, were classified as genotype 2f. These two new genotypes had a similarity o f only 83.4-86.1%, and were distinct f r o m genotypes in genetic g r o u p 2, such as I I I / 2 a , I V / 2 b and 2c, based on the above criteria. Genotypes 2e and 2f also differed f r o m genotype 2d (Stuyver et al., 1994), for which the 1093 bp sequence is not available, based on sequence similarity o f only 73.2-78.6 % in a 384 bp stretch o f the E1 gene and 82.0-85.6 % in a 334 bp stretch o f NS5b. G e n o t y p e 10a, represented by ten J a k a r t a isolates, resembled genotypes in g r o u p 3 with a sequence similarity o f 77-4-82.9 %, which is on the b o u n d a r y o f g r o u p and genotype similarity. The 10 isolates differed f r o m the other groups, however, with a similarity o f < 7 5 % . Likewise genotype l la, represented by three J a k a r t a isolates, resembled g r o u p 7 with a similarity of 78.4-81-0%; it differed f r o m all the others with a similarity o f < 79-1%. In order to classify the 22 J a k a r t a isolates o f novel genotype by another method, they were subjected to phylogenetic analysis, within 1093 bp (nt 8279 9371) in the N S 5 b region, along with 62 H C V isolates o f 25 genotypes in nine genetic groups (Fig. 1). The phylogenetic tree supported the classification o f nine J a k a r t a isolates into genotypes 2e and 2f. Ten J a k a r t a isolates (JK012, JK030, JK049, JK055, JK070, JK072, JK092,

JK093, J K l l l and J K l 1 4 ) made a g r o u p which bifurcated f r o m the same b r a n c h as the genetic g r o u p 3. They were separated f r o m g r o u p 3 at an evolutionary distance o f 0.236, and therefore were classified into a putative new genetic g r o u p designated g r o u p 10. The ten J a k a r t a isolates, with a distance o f < 0.068 f r o m each other, were o f the same genotype which was provisionally named 10a. The remaining three J a k a r t a isolates (JK046, JK065 and JK148) were considered to f o r m another new genetic g r o u p tentatively designated 11. It was on the same b r a n c h as g r o u p 7, but separated by a distance o f 0.239. Since this distance was m u c h closer to the value o f 0.246, which separates g r o u p 8 f r o m g r o u p 9, than the range 0.191-0.214 which separates genotypes o f g r o u p 7 (7a 7d), it would constitute a new genetic group.

Determination of the entire genomic sequences of Jakarta isolates of genotypes lOa and l l a In order to confirm the new genetic groups 10 and 11, isolate JK049 o f genotype 10a and isolate JK046 o f genotype l l a were sequenced in full (accession nos D63821 and D63822, respectively). JK049 had a genomic length o f 9433 bp [poly(T) tail excluded] with a 5 ' U T R and a 3 ' U T R o f 339 and 37 bp, respectively. Its open

296

H. Tokita and others

,/ 5

10

10a

/

1 la

[-1 t

8

[

9

I ~

2f I

I/la (3) lc (3) II/lb (12) 5a(l) V/3a (5) 3c (1) 3d(1) 3e(l) 3b (4) 3f(1) JK012 JK092 JK070 JKll4 JK030 JK093 JK049 JK072 JK111 JK055 4(1) 6a (4) 6b (1) 7a (4) 7d (1) 7b (1) 7c (4) JK046 JK148 J K065 8a(3) 8b(l) 9a (2) 9b (2) 9c (1) III/2a (2) 2c (1) JK081 JKI39 JK020

F----JKo25 I~---Jg143 ('~l L----JK15 l [[' JKll7 I i JKI09 --JK128 IV/2b (2)

2 2e

i

0.4

I

i

I

0.3 0.2 0.1 No. ofnucleotide substitutionsper site

i

0

Fig. 1. Phylogenetic tree of HCV constructed on an NS5b sequence of 1093 bp (nt 8279-9371). Sequences of 62 HCV isolates of 25 genotypes are compared along with 22 Jakarta isolates of novel genotypes 2e, 2f, 10a and lla. Figures in parentheses are numbers of HCV isolates reported for each genotype. One isolate of group 4 (HEMA51) is deduced to be of genotype 4a of Simmonds et al. (1993).

reading frame coded for a 3019 amino acid (aa) precursor which could be C, El, NS2, NS3, NS4a, NS4b and NS5b proteins of sizes identical to those in other genotypes, except for E2/NS1 with 431 aa and NS5a with 451 aa. The genomic length of JK046 was 9439 bp [poly(T) tail excluded] with a 5 ' U T R and a 3 ' U T R of 338 and 35 bp, respectively. The open reading frame coded for a 3022 aa precursor with E2/NS1 of 430 aa and NS5a of 455 aa, which, as in JK049, differed in size from those previously reported. The entire genomic sequences of JK049 (genotype 10a) and JK046 (genotype lla), as well as their various subgenomic areas, were subjected to pair-wise comparison with the 22 HCV genomes for which the fulllength sequences are known (Fig. 2). Within the entire genomic sequence, group similarity spanned 65.0-75"0 %, genotype similarity 76-9-80'0% and isolate similarity 90-1-99.1%, which were clearly separated without overlapping. Such a clear separation was seen only in the comparison of NS3 and NS5b regions among various subgenomic areas. JK049 showed a similarity of 74-4~75-0 % to genomes of HCV isolates of genotype V / 3 a (NZL1 and K3a/650) or 3b (HCV-Tr) within the entire genomic sequence, which justified its classification into a new genetic group separate from group 3, as deduced from phylogenetic analysis of the 1093 bp NS5b region. Likewise, JK046 was only 66.2-70.0 % similar to reported H C V genomes in the entire genomic sequence, and therefore was in a novel genetic group. JK049 was 67.5 % similar to JK046, thereby indicating that these two isolates belong to distinct genetic groups.

Comparison of 5'-terminal 1"6 kb and 3'-terminal 1"1 kb stretches among Jakarta isolates of four novel genotypes Four Jakarta isolates (JK020, JK025, JK109 and JK128) were selected randomly from the seven of genotype 2e, as were four (JK030, JK055, JK070 and JK072) from the ten of genotype 10a. Along with both Jakarta isolates of genotypes 2f (JK081 and JK139) and l la (JK065 and JK148), they were sequenced within 5'-terminal 1.6 kb and 3'-terminal 1.1 kb stretches which covered approximately 30 % of the entire genome. They were compared along with JK049 and JK046 for which the entire genomic sequences had been determined. Jakarta isolates of the same genotype (2e, 2f, 10a or l la) showed a similarity in nucleotide sequence of > 91-8 % and in amino acid sequence of > 91.9 %. As in the other genotypes, the 5 ' U T R sequence was most conserved with a similarity > 99-1%, while the 3 ' U T R sequence was most divergent. The hypervariable region 1 in E2/NS1 (Hijikata et al., 1991 ; O k a m o t o et al., 1992a) spanned 72-84 bp; it differed by 27-54 % in nucleotide

Hepatitis C virus genotypes

Group •

Genotype [ ] 65.0

Jakarta isolates were characterized further in distinct regions of the genome in comparison with reported genotypes.

Isolate [ ] 75.0

I90.1- - 199.1

Genome 76.9 80.0

99.4 5' UTR 87.3

96.5 98.2

[--

84.3

76.8 C

81.7 54.5

91.1 93.5 99.7

69.6

F - - I98.4 88.9

El 69.3 76.9 71.5

60.5

I84.6

E2/NS 1

712 76.1 70.1

53.8

I

98.5

[

NS2

84.8

66.5 75.3 66.8 74.5

99.5

[--1

NS3 75.9 80. I 78.4

57.4

89.5 88.9

75.9 74.8

62.5

100.0

85.2

[

NS4b 74.7 79.6

89.1

70.2 76.7 67.9 79.6

89.1

99.9

70.4

NS5a

80.0 83.4 19.4

73.9

99.2

I--1 92.0 99.0

NS5b 87.8

3' UTR 52.5

//,

so

100.0

72.5

6'0

7'o

8.0

9'o

5'UTR. Of four Jakarta isolates of genotype 2e sequenced, three (JK025, JK109 and JK128) had an insertion of A after nt 14 of the 5'UTR, resulting in a length of 342 bp. By contrast, both isolates of genotype 2f lacked nt 6, resulting in a 5'UTR of 340 bp. All five isolates ofgenotype 10a had a deletion of 2 bp (nt 13 and 14) which shortened the length of the 5'UTR to 339 bp, and all three of genotype 1 la were 338 bp long due to the absence ofnt 6, 15 and 35 as seen in isolates of genotypes 7a-7c, 8a, 8b and 9a-9c. All six Jakarta isolates of genotype 2e or 2fhad A as nt 210 and T as nt 214, which were shared by every reported isolate in group 2 but not by any genotypes in the other groups.

99.0

NS4a

57.5

297

lOO

Percentage sequence similarity Fig. 2. Pair-wise comparison of the 24 H C V genomes of various genotypes over the nucleotide sequence of the entire genome and subgenomic regions. Ranges of group, genotype and isolate similarity are shown for the entire genome and for various subgenomic regions. The comparison was made among the 22 HCV genomes for which the full-length sequences have been reported, and for isolates JK049 of genotype 10a a n d J K 0 4 6 of genotype 11 a. The 22 genomes with known full-length sequences are classified in three genetic groups and seven different genotypes and are deposited under the following designations and accession numbers: M62321 (HCV-1), M67463 ( H C V - H ) and D10749 ( H C - J I ) of genotype I / l a ; D90208 (HCV-J), M58335 (HCVBK), M84754 (HCV-T), D01217 ( H C - J 4 / 8 3 ) , D10750 ( H C - J 4 / 9 1 ) ,

D01171 (HCV-JT), D01172 ( H C V - J T ' ) , X61596 ( H C V - J K 1 ) , D10934 (HC-C2), $62220 ( H C V - N ) , L02836 ( H C V - H B ) , M96362 ( H C V - L 1 ) and U01214 ( H C V - L 2 ) o f I I / l b ; D14853 ( H C - G 9 ) of lc; D00944 (HCJ6) o f I I I / 2 a ; D01221 (HC-JS) o f I V / 2 b ; D17763 ( N Z L 1 ) a n d D28917 ( K 3 a / 6 5 0 ) of V / 3 a ; and D26556 (HCV-Tr) of 3b.

and 40-79% in amino acid sequences even among Jakarta isolates of the same genotype. JK 148 of genotype 1la had an in-phase insertion of 9 bp, while the other two isolates of genotype l l a (JK046 and JK065) and three of the four isolates of genotype 2e (JK020, JK109 and JK128) had an in-phase deletion of 3 bp at the start of the E2/NS1 protein. The four novel genotypes of

Amino acid sequence of the E1 protein. Fig. 3 compares the entire amino acid sequence of the E1 protein of HCV isolates of reported genotypes and the 14 Jakarta isolates of four novel genotypes. All Jakarta isolates of genotypes 2e, 2f and 10a possessed the five potential N-glycosylation sites, Asn at amino acid positions 196, 209, 234, 305 and 325, as reported for isolates of genotypes I/la, lc, III/2a, V/3a, 3b, 3d-3f, 4aMd and 5a. All three Jakarta isolates of genotype l l a had another potential Nglycosylation site, Asn-250, as reported for isolates of genotypes II/lb, 6a, 7a-7d, 8a, 8b and 9a-9c. All 14 Jakarta isolates of novel genotype conserved the eight Cys residues at positions 207, 226, 229, 238,272, 281,304 and 306 as did the other isolates of all other genotypes. In addition, all five Jakarta isolates of genotype 10a had Cys-376, which is not found in any isolates of group 3. Jakarta isolates of genotypes 2e and 2f highlighted amino acids specific for group 2. These were Trp-GlnLeu at positions 214-216, as well as Val-220, Glu-230, Asn-245, Leu-254, Met-285 and Phe-300. His-336, which is characteristic of group 3, was not found in isolates of any of the other groups; it was not shared by Jakarta isolates of genotype 10a, either. Gly-290 and Arg/His295 and Cys-376 were specific for genotype 10a, as was Gln-260 for l la. These amino acids reinforced the distinctness of 10a and 1 la from genotypes in groups 3 and 7, respectively, although they were close in the phylogenetic tree (Fig. 1) and similar in sequence (Table 1). 3'UTR. Fig. 4 compares the sequence of the 3'UTR [poly(T) tail excluded] among HCV isolates of known genotype including the four novel Jakarta genotypes. All 22 Jakarta isolates had a stop codon at nucleotide positions 9372 9374: TAG for genotypes 2e, 2f and 10a and TAA for genotype l la. A second in-phase stop

298

H. Tokita and others

192

IC

HCV

-1

*

YQV~

NS~ GLYHVTNDCI

IVYEAADAI

HC-J4/83

-E--

-V~ -I .......

.......

e*



...........

~-s~

HC-G9

VGAE-I

-I~ TG-M

HC-J8

VE--

- I -~ S S - Y A

2c

FR866

VE--

-T~ DS-MA

2d

N~92

L- -~ -T~ SS-M

.....

I. . . . .

JK020

A---

-T~ TS-M

. . . . .

!-N- - - ~ E - - ~ - - I

......

~LR-

,fK025

A--~

-T- HT-M

.....

- N - - - ~ K - ~ - - V

......

~LRE-K.

IV/2b

2e

JKI09

2f

V/3a

- --~ -V ........



AE--

-A- V---TENL-M-L

.....

....

-T- D--I

.....

_-NDI - ~ Q A - ~

....

- T ~ T $

.....

~ R -

!-N- - - ~ E

-L ......

~

H

-~ .........

~%~ ~EK

- ~-

......

SLS

-

-IQV-

~

....

PVA

. .-.I. ...

ISV--%--ISQPG~KG--T---TI-~

- I - - ~

.......

I - - ~ E

IPV-S~I

NZLI

LEW-

- T -~ - - - V L

....

.....

NEI37

LEY-

- T -~ - - - V L . . . .

-Q ......

DEV---L

.......

ATA-Q~

3--TPVS

3c

NE048

LEY-

-Vfi - - - I L

.....

DHV---L

.......

QN~

-~.

P--IPV

-G ......

PEV---L

.......

QS-

-S-

.....

E-V---M

.......

QN--T-

-V~ ---VL

. . . .

$

....

.~

-T~ ---IL

....

10a

.....

.....

DNV-

LEY-

- A -c - - - M

LEY-

- A -~ - - - T . . . . .

-G ......

G-V---L---I

JK055

LEY-

-A~

--G

JK070

LEY-

-I -M ..... ..... .....

LEY-

4a

Z4

EHY-

- A -~ - I - - I

4b

Z1

VHY-

-Aft - V . . . . . . . .

4c

Z6

VNY-

-A~ -V . . . . . . . .

4d

DKI3

-NY-

---~ -V

5a

SAI

VPY-

- A -~ - V -

6a

VN506

6b

Th580

L T Y C - - - ~ . . . . . .L. . . LTYC- --S .... L .....

7a

VN540

.....

c

.....

G-I---L

....

Q ....

.....

DSL-

VHy~

-KE ....

7b

VN235

VHY~

-KS -I--L

7c

Th271

VHY~

-R$ DI -QL .....

.......

7d

Th846

VNY~

-KS -I--L

.....

........

L ............. .............

PTI-M-F

.......

E-F-M-L

......

TDSM---L D-I-M-L

~--TPIS TPVS ISIS

-T ....

TPV

T!

-T .....

V-

-Q ......

....

VSHPGAATAS

- -T -V- MM - -A ........

AI ....

~

VAHPGA-LESF APYPNA-

L .....

AQHLNA-

K%q

QI -- - LSAPTFGAVTAP

-R- T - - L S A S - - L - % U

IKS-

NA~

.......

G ....... G .......

I-T-

[QQI - - - T P A - - - L - I I

N S ~.V - - S G F ~

....

M--A-A

.... M-L

.....

G .......

E--TPA---L-II

NS~ V-VSGF~

....

M--A-A

....

M .......

G .......

....

M-A

G .......

-AS -I--L

.....

........

ETI---L

-T .....

VN405

VHY-

-IS -I--L

.....

....

9a

VN004

I---

-AS -I--L

....

9b

Th555

IHA-

-VS -V--L

9c

Th553

IH--

-ASS - I - - L

HCV-I

FTFSPRRHWTTQG(

HC-J4/83

TSA---L-I-'

.....

A-AF---M-I

N S ~.V - V S G F ~ R - - - M - - A - A

......

T-T-

.........

KT-

-N- --F--ETI---L

......

IKV-

-G ....

LSVS--L-VI

NS~

V-IHGF---V

. . . . .

........

DNV---L

.......

QV-

-T ....

YSV---L-AS

NA$

V-VHGF---V--I--A-AF---M-I

. . . . .

-A__£ . . . . .

DTM---L

........

, --

HC-G9

........ E-V-DI-...... H ..... D- - -

III/2a

HC-J6

-~V--QH-

-FV-D-

- -

IV/2b

HC-J8

-~--o-~-~~--QH-~V-N-

--

I--~ . . . . . .

.....

l--~--G---v

v ...........

FR866

2d

NE92

-~I--0H-~-D . . . . . . . . . . . . . . .

2e

JK020

- I V - - T H - ~ - D

JK025

J'KI28

-iV--~-~v-D -%V--~-~V-D---~V--~a-e~v-D-

JK081

-~I--

M

RL . . . . . . . . . . .

........

L. . . . . . . . . . .

.....

-- -

S ....

NZLI

---R

Q-V-~-

- -

L ....

3b

NEI37

........

A-V-~-

--

......

3c

NE048

- - -R

Q-V-~-

- -

L ....

3d

NE274

- - -R-

- -

L .....

vs ........

w - - v

..........

LTMIL-YAA-V-ELV-EI-F

*

......

MF-L

-G .... I-TEG

.....

G ......

I

.....

G ......

M

*

L--Y

..........

L--Y

.........

.....

Q-~

I-A

.......

VF-L

.....

Q-A~---

....

MF-L

.....

QJA~---

....

MF-L--

- - ~ .... ....... IAI-~ .... I-I---~--E-

.... I-I---~S

MF-L--

....

- -V-EVL-EI-G-M

--ATMI--Y-M-V-EVV-EI

-ALGM-VS~I -AVG-

--A--T-A-IV

--

L ..................

-AAG-AL

---RA-Q-y-v-~-

--

L ........

- -A---V-~

-S~-Q---V-D-

--

.....

L .............

-AAT-

-VS-V--L--T-

JK049

-SWR~-Q-

--

.....

L .............

-AMT-

IVS-V--L-

JK055

....

.....

L .............

-AVTM-VS-V--L-

~070

-s~-e---V-D -s~-0---v-~

....

.....

L .............

-AVTM-VS-V-

-L--T-F-LV

JK072

-SWS~-QL--V-E

....

.....

L .............

-AVTM-VS-V-

-L--T---LVI

4a

Z4

I - -R ........

E ....

--T .................

---,-,,--,,-,-,-,---,--,

4b

Zl

-D-R

........

D ....

.....

- -S--I-

4C

Z6

-S-Q

........

D ....

--A .................

-- -T-LL-

4d

DKI3

---Q

........

D

....

--T .................

- -AT-

-,---,---,-,-,,--,---,-,

.....

L- --M .....

-N---V--T-F ......

--I ....

--LVI

M--V-

-Y-

-Q ......

-Y--Q

SILG-LLT-G

-S-N

- -WQ--V-H-V-D

7d

Th846

-AIR--Q-E-V-E

....

lla

JK046

---R--I-Q-V-D

....

- -T-

..............

--ATF

JK065

- --R--I-Q-V-D

....

--T--V

..............

--ASF-VSSA-

JKI48

- - -R- -M-Q-V-D

....

- -T

- -V

..............

--VT-

8a

VN507

- --R--Y-TIV-E

....

--T

--V

.............

--I-YA-SS-

8b

VN405

- --R--Q-T-V-E

....

--T ................

--VTFITSS---V--LL-EIALEG

9a

VN004

- - -R-KK-QV-

-D ....

--A ................

- -VSY-VSSA--V--LL-EV-T

9b

Th555

- -LR-KI

....

--T ................

- -VTY-VSSA-

9c

Th553

-A-R-

....

--T--V

- -T-K

--V .................

--AT--LSYVM

7-V ..................

- -AG-

.......

VVI

.....

- -AT - IVSYVM-V--L-

I- ILV-

....

- - -T-LLSYAM-T-VSSA-

IFV-G

-L-I-ILV-G -A--VLF-

- -A-YA-SSV

--G

--V--LL-EIFLEG

-T- -LL.....

.......

........

-IFL

VL-EIFL

....

FL-

-~--S

....

VL-

-F .......

IL--F

.....

QA .....

IL--F

.......

QA .....

IL--F

.......

VL--F

......

LL-Y-

-A ...... ..... --A ......

-Y-TAA

GALL-Y-

.....

-AA .....

-Y--AA

....

IGAVL-Y--

....

IGALL-Y-

.......

GALL

.......

GAVV-YC

.......

GAVV-

-L- -IA-

- -A .....

L-Y--

--- IMGALL

.....

- -L---A-

G-S--

GALM-Y---A

I F - -G - - - I IGALL

-A--VLF-IF

-VSSV--A-EVLFNIF

FL--~--S

......

-G-A-

L-Y-

-I .... .....

....

-S ....

Q ......

--ASAA

IA ....

G ........

V-M-IFT-G--

IVSYA--V-EL-M-

FAA-

I- LAV--

.....

-~--S

-QS .....

.....

.......

F-G---

-LSSI--V-EIV-EVF--G

L-F-V-L

-I- - -G .....

--V-EICASV-

L .....

.......

-VM---STLV-LL--G

V-IR-

....

......

FL-

....

M--V-

VN540

. . . .

......

....

.......

-Y- -Q ......

I- I-VM GI--I--S

FL--'~--S

Th271

....

......

....

-Q ......

VN235

-L-Q-V-E

....

M- -V--Y-

7c

-- -T-

-VM-S

FIIIVM-S

......

7b

---T--LSSI

FII

.....

-T---LVI

7a

..............

....

.....

FL--~-

---Q

..............

-S ....

-IM-S

-Q ......

M- -L- - -A-Q

- -~ .....

GI-

-Q ......

A .....

I-T--V

-Y-LQS

I-I-

AI IMVM

M--V--Y-

SA1

--T--V

....

-G-

M--V-

Th580

--S .................

Q-~

Q ......

-~S-

- - -T---E-

......

VNS06

....

I -I-V-I

......

6b

....

....

.... ....

.....

-TMF-LVI

6a

....

- --~S

I-I---~

....

-L--Y--QT

M- -L- -Y-

--V ......

IAI

Q-~

-Q ......

.....

M--LIM-

-VL ....... TVF-

5a

I-DI--D

M--L

.......

--

....

I -- -L- -Y-

I .......

NEI25

.....

MF-LV

...... ........

-V-'~--L--T-F-

- -A-Q-~

MF-L

........

-L--TLF-IM

-AIG-AVS~-M-L--TFF-L

....

-A-Q-~

MF-L-

........

- -VTMIV-Y-V-V-EVV-EI -AVGM-V-:RV-

....

-A-Q-~

G

V ...........

- - -V-I

Mr-L---A-0-~

-Y-M-V-EVLFEIMG-M

- -ATRA--Y-

383 ,

*

....

Y -M-V-EVLFEI-G-M

NEI45

..............

A-AF---M-I

....

S . . . . . . . . .

-V

.............

- -MTMA-

VS ............

...............

.....

..........

I- I-G

JK030

- -V-R

A-AF---M-I

- - A T ~ - - Y - M - V - E V L F E I - G - M

S ............

...............

.....

--

LS ............

VS .............

G .......

v .............

3f

--V-D-

G .......

I .....

--ATMAV-

S ............

H .......

I .....

AA .....

LDMIAGAHWGVLAGIAYFSMVGNWAKVLWLLLFAGVDA

10a

T-V-~-

A ......

-~

..... L ............. ...... p ............ ......

.......

350 I

PQAI

.....

-S -V .......

N-__~ ~ V-IHGF---V

A---TMLL-Y-V---EV---

L .............

-~T-~.-~V-D---Q-N-V-~-

L

NA.' V - I R G F

*

i- - q - - A T M I L - Y A M - V - E V -

Q ..........

N A ~.V - V R G F - S - V

-L-V@

-K~ Q - - L P V S - - L - V ~

*

..... LS ..........

2c

-Y-QPV-D-

A/

I T G H R M A W D ~ T T A L V M A Q L L R I

.... T ............ i --Q .............

. . . . . . .

SVA--L-II

-K- Q - - - P V A -

M

GI ......

.....

MT-

PAS--L-II

I-VRGF---V--M--A-AF---M

.....

8b

*

...... L--A--A---A---

NA~ A-L-GF-K-V--M--A-AF---M-I

VHY-

[Y P G H

A--A

NA~ V-IQGF-K-V

VN507

-T-V-D

.........

M

.......

-N- K - - L P A - - - M - V I

8a

-V-M-V-E

---AV-Y-A-G-A

GA .....

KS-

-Q ....

e

.....

I--V--G

.......

I-T-

VV-D

G .......

-M- - -F-I

-V ........... ........

NA~' T - L - G F - K - V - - M - - , - , , - - - , - , - - , - - , , - - , - - V

-G- S - - L P A - L - I - V ]

......

.....

-A-

-Q .....

......

V-D

-M-

V--M--AV--M--G

MT-

.......

-Q-T-V-D

- -V-

.... ....

-M- -M

.......

EAI---L

......

LES

......

ETL---L

--YR-

GA-

.....

ETI---L

....

-A ..............

NA: V--SGF-K-V---A-A-VV--SM-I

........

-R

-M-

LESM-

VSYIGA-LDS

.....

I .....

- - -V-

NA: T---GF---V---A-A-VV--S--I

........

3e

~.~

HPVS--L-V1

--- M ....

....

~

AL ....

-H- T - - H - V S A - L - V |

.....

....

AL ....

-A ..............

-I ....

.....

V/3a

-A ..............

- -T -V-MM-

A-

-KS -I--L

JK139

- -T -V-MM-

A/

VNY~

2f

VSRPGAATAS VSRPGAVTAS

PL .....

-V .....

,

.... ....

.....

K- T---SL

JKI48

i

A ~ ~ J

.....

300

A ...... AL ....

-KS -I--L

JKI09

-A ..............

AL ....

VNY~

ic

-M-

A A

-A ..............

VNY~ -KS -I--L

I/la

I - - -V-

...... A ......

-A ..............

JK046

I---DNI-M

M--A

....................

A

-V-MM-

JK065

II/Ib

A ...........

A

A ......

-V-MM-

QI

.... .......

......

A ....... A

--T

S- -T ....

- -A ........

--L--DAM---L---L

T ..............

VSHPGAATAS--T

...........

R----L--EAM---LA

M--A

VSRPGAATAS

MT-

....

.....

I- S -V .....

VKHPGAVTAS

........

....

DHH-

........

A--M

..............

V-YAGATTASV-G-V

.....

V--A~IAS-A

.....

....

T-I---L

EHQ---L---L

VKYAGATTAS

V- -A-~IAA-A

-M .... V--A~IAS-A F ........

K--TSVS

.....

TDYH---L

I- -V--A~IAA-A

K--TPVS

.....

TEHH-M-L

TPV

.........

VKHLGATTASI-S-V-M---A

....

A-~-AS-V

V--A~IAA-A I--V--A~IAA-A

-T --- IV-~---

VSHVGATTASI-G-V

.....

V- -A~ILS-A V - -A~IAA-V

I .....

.........

-II-~---F--

I

S--T.

S - -T ....

....

-A--

I .....

LN--. A--T.

........

....

-A ........

--F .....

- -TI -~ ............

V-YVGATTASI-S-V

.....

~-- IPVA

G- I - - - L ........

- -L .......

.....

~--TPA

GEI---L

.....

-- . . . . .

T ....

........

r--TPV

.....

-T .....

........

QD--T'

- -M ......

JK030

J'K072



...........

JK049

-V~ ---I ____~C_ _ _ M

lla

D-V

-VNQPG~RG-

--- IPV--~I-VSRPG~TEG-

S--

G~-AA-M

............

- -A-

3b

-A~ ---ML

-~-

-~ ............ I~-

~--IPV--~--VSQRG~KG--A---TI-~

-VSQPG~KR

........

--TI

- -A---TI-~

--~EG~--V---

IEY-

*

...............

- -VI

-~--VSQPG~TRG

--~EG-~--V--

LEY-

- -A-

-~-

. . . . .

LE--

- -T-V-MI

-VSQPG~TKG--T-

IPV-

-Rfi - S - M

NE274

-VKHRG~RS -IK Q P G ~ K G

TPV-

V--I

NEI25

-~-~

-T ....

JKI39

NEI45

- -MV-~

-T ....

. . . . .

3f

*e

............

A-AF---H

I--A-AF---M

- -T-

IPVS-%I

-K ....

A--SRV-VSEV--RV-S -VQQPGA~TQG

~LQ-

-S-M

3e

....

- - -IPVS-~-

~LR-

.....

-R~

3d



LVGQL

-V ......

- T .~ T S - M

V--I

....

*

.....

-~- -V ......

l---

JK081

- - ~ K -

....

~V-~T~

-~--V

JKI28

.~

.

,

L---L-A~V-T-TI---V

.....

Y---~--~

- -V ......

- ~-

I ~ E G

-

*

LHTPGCVPCVRE~RCWVAMTPTVATRDGKLPATQLRRHIDLLVGSATLCSALYVGDLCGSVF --.

HC-J6

III/2a

250

200 *

I / la II/ib

..... -A ......

- -A .....

.....

A .....

- -A ..... YC--A

.....

EG

G

-F .....

E-

-F .....

E-

IGI ....... CI-F IA--I GI-F

E-

....... .....

EG

.......

I I .......... II--

-M ......

II--

-M ......

A--V

......

G

FA .......... IA--F

....

A- -

I .... M ...... IA ..........

Fig. 3. Amino acid sequences of the entire E1 protein. Sequences of representative HCV isolates of 29 genotypes are compared along with 14 Jakarta isolates of novel genotypes 2e, 2f, 10a and 1la. Potential N-glycosylation sites are boxed, and Cys residues are marked by dots. Amino acid sequences characteristic of groups 2, 3, 10 and 11 are shaded.

Hepatitis C virus genotypes

I/la

II/lb ic

i

1o

I

I

20 .........

30

I...

40

~ A G G T T G G G G ~ A A ~ G G C C T C

HC-J4/83

T G A - C - G G - A - CTAACi~i~!::~!~!~i~A - G - C A A . . . . . . . . CCCG TAG - C- G - T - - - C - - C ~ A - G- CT ....... C- G - - T A A ~ A G G C C T T T A G G C C C C G

HC-G9

TAG

- - CGGCACAC

-

A

~

A

-AGCTAAC-

TAG

- - CGGCAAACCCTAG

--

A

~

A

-AGCTAGT-

2c

FR866

TAG

- - CGGCACAC

-- A ~ T A G C T A A C -

G

2e

,.TK020

TAG-

- A ~ A - A G C T A A C -

G

JK025

TAG

JKI09

TAG-

JKII7 JKI28

2f

- TTAG-

TTAGGCCATTTTCTGTG

HC-J6

IV/2b

/ TTAG

- CGGCACACCTTAG-

-CCG

-

i

~

A

-AGCTAAT-

-CGGCACACCTTAG-

-

i

~

A

-AGCTAAC

TAG-

-CGGCACACCTCAG-

- i ~ A - A G C T A A C -

TAG-

- CGGAACACCTTAG-

- A ~ A C A G C T A A T -

JKI43

TAG-

-CGGCACACCTTAG-

- i ~ A - A G C T A A C

-G

JKI51

TAG-

- CGGCACACCTTAG-

-A

-G

JK081

TAG-

- CGGCACACCTTAG-

-A~I~A-AGCTAAC-

JKI39

TAG

- - CGGCACAC

- CTAG

~

A

NZLI

TGAGCTGGTAA TGAGCTGGTA

3c

NE048

TGAGCTGGTA-

3d

NE274

TGAGCTGGTA

3e

NEI45

TC~AGCTGGTA-

3f

NEI25

TAGGCTGGTA

JK012

T A G G C T G G T A A T T A A ~

G- - - C -G G - - -C G- - - C

- AGCTAAC

- -i%~:A~A

NEI37

10a

G - C- C

- - CGGCACACCTTAG-

3b

v/3a

6,o

I

HC-JI

HC- J8

III/2a

%0

299

G - - - CTG

- AGCTAAC

- G - - -CCC

-AT - - ~ NI~I~I~A - - G C - - ~i~i~i~i~ A A T T C T G -AC-

-:~i~i~i~i~::~A T - T C T G

- TTA -

- - A C - - ~::~!~::~i~i~ATTTCA - A C - -iN~i~!#~%i?~A T T T C - - - T - -~!~!~!~i~ ~ A T T

JK030

TAGGCTGGTAAT'I~NATAT-

JK049

T A G G C T G G T A A C T A A ~

JK055

TAGGCTGGTAATT~~ATATTC-

JK070

T A G G C T G G T A A ~ N A

TG -G

- CACAGTTT CT -A

ATTTT

-AACTTA

JK072

T A G G C T G G T A A T T ~ ~ A

JK092

TAGGCTGGTAACTAA~ACT

JK093

TAGGCTGGTAATTAA~ATATA

JKIII

TAGGCTGGTAACTAA~ATAATA

JKII4

T A G G C T G G T A A T T A A ~ A

4

HEMA51

T A G G C A G C T T A A ~ G A -

5a

FR741

TAGGCTGG

6a

VN506

TAG-C-AGC-

-CCCT-

6b

Th580

TAG-

-CCCT-A-

7a

VN540

TAAGC-GGAA-CCC

7b

VN9.35

TAAGC-GG-A-C-T-

- A G - A A ~ A

7c

Th271

TAG-

- ATAAi~:.~

7d

Th846

TAGGC-GG-A-CAT-

11a

JK046

TAA-C-GGC--CCTC-AG-AA~A-

-T-C-

JK065

TAACC

- TCCG

-

- A - CTAA

C-AGC-

- GGC

- TTAGGG-

CCTAA-

-i~i~::~i~i~i~A T A A C T

- A A ~ - A - C A

-AG-

- A ~ A C

-

A

- -ATA

AT - -CC~

....

- -C C C T - A C A A G ~ -

,.TEl 4 8

TAA-C-GGC--CCTT-AG--A~-

8a

VN507

TAGGCAGG-A-CAT-

8b

TAG

9a

VN405 VN004

TAGGCTGA-A-CTAA

.....

9b

Th555

TAGGCTGA-A-C-T-

- ~N~Ni~i~ AAA

-ATC

9c

Th553

TAGGCTGA

- ~::~::~::~::~::~ ACA

-A

- C - GGTA

-

-T-TT-

- ~IN~ATTTC-A-

-AC

- C - T - - ~::~::~::~::~!~ ATTTC

- A - CCCC

TG

- CT -

GCAA~::~A

- -AG-

CAGG-A-CAT-

-C

GA

A

~

TG

G -G

Fig. 4. Nucleotide sequences of the 3'UTR. Sequences of representative H C V isolates of 25 genotypes are compared along with 22 Jakarta isolates of novel genotypes 2e, 21", 10a and 11 a. The C A C T C C motif is shaded, and in-frame stop codons are underlined. Dashes represent nucleotides that are the same as in HC-J1 of genotype I / l a shown at the top.

codon was found in six of the seven Jakarta isolates of genotype 2e and in both isolates of genotype 2f; it was TAG at positions 9387 9389 as in isolates of genotype III/2a or IV/2b. All ten Jakarta isolates of genotype 10a had a TAA stop codon at nt 9384-9386 as in isolates of genotypes I / l a , I I / l b , 5a and 9a. Along with the presence of a second stop codon, this makes the genotype 10a isolates markedly different from isolates of genotypes in group 3. All three Jakarta isolates of genotype l la lacked a second stop codon as reported for isolates of genotypes in groups 3 and 6 4 (with the exception of 9a). The

length of the 3'UTR before the poly(T) tail was 38-42 bp for genotype 2e, 44 bp for 2f, 22-33 bp for 10a and 34-35 bp for 1la. Thus, genotype 10a had a short 3'UTR comparable with isolates in groups 3-9.

Discussion Classification of HCV into genotypes now faces a critical turning point. Genotypes defined by pair-wise comparison and phylogenetic analysis of nucleotide sequences, within the entire genome or over shorter stretches, now number at least 34. Enduring enthusiasm

300

H. Tokita and others

to pursue new HCV variants is driven by the high prevalence of HCV throughout the world and its strong disease-inducing activity, resulting in chronic hepatitis in > 80 % of those who become infected. An endless list of new genotypes is being logged, whenever HCV variants are obtained from previously unexamined areas of the world (Bukh et al., 1995). A long-standing dilemma of taxonomists is that they are inclined to be either 'splitters' or 'lumpers'. Are we going to split HCV genotypes any further when there appears to be little, if any, definitive virological or clinical significance known for them? Alternatively, should we start to lump them together into fewer categories based on common features in various regions of the genome? The 22 HCV isolates from Jakarta, Indonesia of four novel genotypes, 2e, 2f, 10a and 1 la, may provide some insight into the way forward for HCV genotyping. First, they allow us to argue against the two-tiered classification of HCV genotypes presently advocated. We have reported group similarities (66.5-80.0%) which did not overlap with the genotype similarities (80.2-88.6 %) in a comparison of a 1093 bp NS5b sequence spanning nt 8279-9371 (Tokita et al., 1994b, 1995). The Jakarta HCV isolates of genotypes 10a and 1la, however, were 77.4-82.9% and 78.7 81.0% similar, respectively, to genotypes in groups 3 and 7 within this sequence. Therefore it is questionable whether groups 10 and 11 are independent or should be classified as genotypes in groups 3 and 7, respectively (Table 1). For the purpose of establishing these two genetic groups, one Jakarta isolate of each of the suggested genotypes (10a or l la) was sequenced in full, and compared with 22 HCV isolates for which the entire genomic sequences are known. The comparison substantiated the new genetic groups, which shared only < 75.0 % and < 70.0 % of the sequence, respectively, of genomes from the other genetic groups. In these comparisons, it has become evident that sequence divergence over the entire genome can distinguish genetic groups and genotypes most clearly. Two subgenomic regions, NS3 and NS5b, can be used to separate group from genotype similarity without overlapping, but with margins much less than those obtained by comparison of the entire genome. These results highlight the limitations of distinguishing genotypes by comparison of subgenomic sequences, and underscore the need to compare entire genomic sequence when new genotypes are proposed. Complete sequences for genetic groups 4-9 are needed urgently. Although HCV isolates are classified into eleven genetic groups (or types) by us (Tokita et al., 1994a, b, 1995; present report) and six groups by P. Simmonds and co-workers (Simmonds et al., 1993; Simmonds, 1995), i.e. 1, 5, 3 (group 10 included), 4, 6 (groups 7, 11,

8 and 9 included), and 2, such groupings must be further evaluated for validity as sequence data for new HCV isolates are accumulated. Based on the phylogenetic analysis on the 1093 bp NS5b sequence, five groups (1 ; 5; 3, 10 and 4; 6, 7, 11, 8 and 9; 2), four groups (1 and 5; 3, 10 and 4; 6, 7, 11, 8 and 9; and 2) or even two groups (2 and all the others) can be distinguished by setting an arbitrary distance for grouping (Fig. 1). From a practical point of view, only five of >~ 34 genotypes prevail in most areas of the world. The five common genotypes, I / l a , I I / l b , III/2a, IV/2b and V/3a, account for essentially all HCV isolates in Europe and North America as well as North and Far East Asia and Oceania (Bukh et al., 1995; Miyakawa et al., 1995; Simmonds, 1995). Among all HCV genotypes presently recorded, only genotype I I / l b (group 1) is associated with severe hepatitis and poor response to interferon (Yoshioka et al., 1992; Hino et al., 1994). It remains to be seen whether or not such clinically important features of genotype I I / l b are shared by other genotypes in genetic group 1. Some groups of investigators report differences in the response to interferon between patients infected with HCV genotype I / l a and those infected with I I / l b (Qu et al., 1994; Nousbaum et al., 1995), while others do not (Mahaney et al., 1994). Additional clinical and therapeutic trials are required to settle this. The time may have come when we must start to ponder the future for logging new HCV genotypes by an intricate classification based on sequence divergence in partial sequences of the genome. There is concern that such an effort may overlook the true genomic divergence which may be defined only by comparing the entire genomic sequences. Furthermore, it can potentially distract from essential virological characters of HCV, and can isolate HCV genotyping from other fields of medicine. For exact classification of HCV further studies are required, in which virologists and clinicians proceed in close cooperation.

References BUKH, J., PURCELL,R. H. & MILLER, R. H. (1993). At least 12 genotypes of hepatitis C virus predicted by sequence analysis of the putative E1 gene of isolates collected worldwide. Proceedings of the National Academy of Sciences, USA 90, 8234-8238. BUKH, J., PURCELL, R. H. & MILLER, R. H. (1994). Sequence analysis of the core gene of 14 hepatitis C virus genotypes. Proceedings of the National Academy of Sciences, USA 91, 8239 8243. BUKH, J., MILLER, R. H. & PURCELL, R. H. (1995). Genetic heterogeneity of hepatitis C virus: quasispecies and genotypes. Seminars in Liver Disease 15, 41 63. CHA, T. A., BEALL, E., IRVINE, B., KOLBERG, J., CHIEN, D., KUO, G. & URDEA, M. S. (1992). At least five related, but distinct, hepatitis C viral genotypes exist. Proceedings of the National Academy of Sciences, USA 89, 7144-7148. ENOMOTO, N., TAKADA, A., NAKAO, T. & DATE, T. (1990). There are two major types of hepatitis C virus in Japan. Biochemical and Biophysical Research Communications 170, 1021 1025.

Hepatitis C virus genotypes GRAKOUI,A., McCOURT, D. W., WYCHOWSKI,C., FEINSTONE,S. M. & RmE, C.M. (1993). Characterization of the hepatitis C virusencoded serine proteinase: determination of proteinase-dependent polyprotein cleavage sites. Journal of Virology 67, 283~2843. HIJIKATA, M., KATO, N., OOTSUYAMA,Y., NAKAGAWA,M., OHKOSHI, S. ~¢ SHIMOTOHNO,K. (1991). Hypervariable regions in the putative glycoprotein of hepatitis C virus. Biochemical and Biophysical Research Communications 175, 220-228. HINO, K., SAINOKAMI,S., SHIMODA,K., IINO, S., WANG, Y., OKAMOTO, H., MIYAKAWA,Y. & MAYUMI,M. (1994). Genotypes and titers of hepatitis C virus for predicting response to interferon in patients with chronic hepatitis C. Journal of Medical Virology 42, 299 305. HOUGHTON,M., WEINER,A., HAN, J., Kuo, G. & CHOO, Q.-L. (1991). Molecular biology of the hepatitis C viruses: implications for diagnosis, development and control of viral disease. Hepatology 14, 381 388. MAHANEY, K., TEDESCHI, V., MAERTENS,G., DI BISCEGLIE, A.M., VERGALLA, J., HOOFNAGLE,J. H. & SALLIE, R. (1994). Genotypic analysis of hepatitis C virus in American patients. Hepatology 20, 1405 1411. MIYAKAWA, Y., OKAMOTO, H. & MAYUMI, M. (1995). Classifying hepatitis C virus genotypes. Molecular Medicine Today 1, 20-25. MORI, S., KATO,N., YAGYU,A., TANAKA,T., IKEDA,Y., PETCHCLAI,B., CHIEWSILP,P., KURIMURA,T. & SHIMOTOHNO,K. (1992). A new type of hepatitis C virus in patients in Thailand. Biochemical and Biophysical Research Communications 183, 334-342. NEI, M. (1987). Phylogenetic trees. In Molecular Evolutionary Genetics, pp. 287 326. Edited by M. Nei. New York: Columbia University Press. NOUSBAUM,J. B., POE, S., NALPAS, B., LANDAIS,P., BERTHELOT,P., BRECHOT,C. & THE COLLABORATIVESTUDYGROUP(1995). Hepatitis C virus type lb (II) infection in France and Italy. Annals' oflnternal Medicine 122, 161-168. OKAMOTO, H., KOJIMA,M., OKADA, S., YOSHIZAWA,H., IIZUKA, H., TANAKA,T., MUCHMORE,E. E., PETERSON,D. A., ITO, Y. & MISHIRO, S. (1992a). Genetic drift of hepatitis C virus during an 8.2-year infection in a chimpanzee: variability and stability. Virology 190, 894-899. OKAMOTO, H., SUGIYAMA,Y., OKADA, S., KURAI, K., AKAHANE,Y., SUGAI, Y., TANAKA, T., SATO, K., TSUDA, F., MIYAKAWA,Y. & MAYUMI,M. (1992b). Typing hepatitis C virus by polymerase chain reaction with type-specific primers: application to clinical surveys and tracing infectious sources. Journal of General Virology 73, 673-679. OKAMOTO,H., TOKITA,H., SAKAMOTO,M., HORIKITA,M., KOJIMA,M., hZUKA, H. & MISHIRO, S. (1993). Characterization of the genomic sequence of type V (or 3a) hepatitis C virus isolates and PCR primers for specific detection. Journal of General Virology 74, 2385-2390. OKAMOTO, H., KOJIMA, M., SAKAMOTO, M., hZUKA, H., HADIWANDOWO,S., SUWIGNYO,S., MIYAKAWA,Y. & MAYUMI,M. (1994). The entire nucleotide sequence and classification of a hepatitis

301

C virus isolate of a novel genotype from an Indonesian patient with chronic liver disease. Journal of General Virology 75, 629-635. Qu, D., LI, J. S., VITVITSKI,L., MECHAI,S., BERBY, F., TONG, S. P., BAILEY, F., WANG, Q.S., MARTIN, J.L. • TREPO, C. (1994). Hepatitis C virus genotypes in France: comparison of clinical features of patients with HCV type I and type II. Journal of Hepatology 21, 70-75. SAKAMOTO,M., AKAHANE,Y., TSUDA, F., TANAKA,T., WOODFIELD, D.G. & OKAMOTO, H. (1994). Entire nucleotide sequence and characterization of a hepatitis C virus of genotype V/3a. Journal of General Virology 75, 1761-1768. SIMMONDS,P., HOLMES,E. C., CHA, T. A., CHAN, S. W., McOMISH, F., IRVlNE, B., BEALL, E., YAP, P. L., KOEBERG, J. & URDEA, M. S. (1993). Classification of hepatitis C virus into six major genotypes and a series of subtypes by phylogenetic analysis of the NS-5 region. Journal of General Virology 74, 2391-2399. SIMMONDS,P. (1995). Variability of hepatitis C virus. Hepatology 21, 570-583. STUYVER,L., ROSSAU,R., WYSEUR,A., DUHAMEL,M., VANDERBORGHT, B., VAN HEUVERSWYN, H. & MAERTENS, G. (1993). Typing of hepatitis C virus isolates and characterization of new subtypes using a line probe assay. Journal of General Virology 74, 1093 1102. STUYVER, L., VAN ARNHEM, W., WYSEUR, A., HERNANDEZ, F., DELAPORTE,E. & MARTENS, G. (1994). Classification of hepatitis C viruses based on phylogenetic analysis of the envelope 1 and nonstructural 5B regions and identification of five additional subtypes. Proceedings of the National Academy of Sciences, USA 91, 10134-10138. TOKITA,H., SHRESTHA,S. M., OKAMOTO,H., SAKAMOTO,M., HORIKITA, M., IIZUKA, H., SHRESTHA, S., MIYAKAWA,Y. & MAYUMI, M. (1994 a). Hepatitis C virus variants from Nepal with novel genotypes and their classification into the third major group. Journal of General Virology 75, 931-936. TOKITA, H., OKAMOTO,H., TSUDA,F., SONG, P., NAKATA,S., CHOSA, T., IIZUKA,H., MISHIRO,S., MIYAKAWA,Y. & MAYUMI,M. (1994b). Hepatitis C virus variants from Vietnam are classifiable into the seventh, eighth, and ninth major genetic groups. Proceedings of the National Academy of Sciences, USA 91, 11022-11026. TOKITA,H., OKAMOTO,H., LUENGROJANAKUL,P., VAREESAGNTHIP,K., CHAINUVATI,T., IIZUKA,H., TSUDA,F., OKAMOTO,H., MIYAKAWA, Y. 8¢ MAYUMI,M. (1995). Hepatitis C virus variants from Thailand classifiable into five novel genotypes in the sixth (6b), seventh (7c, 7d) and ninth (9b, 9c) major genetic groups. Journal of General Virology 76, 2329 2335. YOSHIOKA, K., KAKUMU, S., WAKITA, T., ISHIKAWA,T., ITOH, Y., TAKAYANAGI, M., HIGASHI, Y., SHIBATA, M. ~ MORISHIMA, T. (1992). Detection of hepatitis C virus by polymerase chain reaction and response to interferon-alpha therapy: relationship to genotypes of hepatitis C virus. Hepatology 16, 293-299.

(Received 8 June 1995; Accepted 29 September 1995)