The galactose/N-acetylgalactosamine/N-acetylglucosamine. 6-O-sulfotransferases (GSTs) are a family of Golgi-resident enzymes that transfer sulfate from 3â² ...
Glycobiology vol. 11 no. 1 pp. 75–87, 2001
Chromosomal localization and genomic organization for the galactose/ N-acetylgalactosamine/N-acetylglucosamine 6-O-sulfotransferase gene family
Stefan Hemmerich1, Jin K. Lee2, Sunil Bhakta, Annette Bistrup2, Nancy R. Ruddle3, and Steven D. Rosen2 Department of Respiratory Diseases, Roche Bioscience, Palo Alto, CA 94304, USA, 2Department of Anatomy and Program in Immunology, University of California, San Francisco, CA 94143–0452, USA, and 3Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520–8034, USA Received on July 7, 2000; revised on September 5, 2000; accepted on September 6, 2000
The galactose/N-acetylgalactosamine/N-acetylglucosamine 6-O-sulfotransferases (GSTs) are a family of Golgi-resident enzymes that transfer sulfate from 3′phosphoadenosine 5′phospho-sulfate to the 6-hydroxyl group of galactose, N-acetylgalactosamine, or N-acetylglucosamine in nascent glycoproteins. These sulfation modifications are functionally important in settings as diverse as cartilage structure and lymphocyte homing. To date six members of this gene family have been described in human and in mouse. We have determined the chromosomal localization of these genes as well as their genomic organization. While the broadly expressed enzymes implicated in proteoglycan biosynthesis are located on different chromosomes, the highly tissue specific enzymes GST-3 and 4 are encoded by genes located both in band q23.1–23.2 on chromosome 16. In the mouse, both genes reside in the syntenic region 8E1 on chromosome 8. This cross-species conserved clustering is suggestive of related functional roles for these genes. The human GST4 locus actually contains two highly similar open reading frames (ORF) that are 50 kb apart and encode two highly similar enzyme isoforms termed GST-4α and GST-4β. All genes except GST0 (chondroitin 6-O-sulfotransferase) contain intron-less ORFs. With one exception these are fused directly to sequences encoding the 3′ untranslated regions (UTR) of the respective mature mRNAs. The 5′ UTRs of these mRNAs are usually encoded by a number of short exons 5′ of the respective ORF. 5′UTRs of the same enzyme expressed in different cell types are sometimes derived from different exons located upstream of the ORF. The genomic organization of the GSTs resembles that of certain glycosyltransferase gene families. Key words: sulfotransferase/chromosomal localization/genomic organization/glucosaminoglycan biosynthesis/lymphocyte homing
1To
whom correspondence should be addressed at: Thios Biotechnologies, 828 Clayton Street, San Francisco, CA 94117.
© 2001 Oxford University Press
Introduction The carbohydrates of glycoconjugates are highly diverse structures with variation in monosaccharide composition, position of glycosidic linkages, and branching of chains (Nelson et al., 1995). Further diversity is generated by the covalent attachment of sulfate (SO3-) to distinct hydroxyl and amino groups in these saccharides (Hooper et al., 1996). As sulfate modifications can be extensive and tend to occur in clusters in glycoproteins, they can have a profound effect on the physicochemical character of the glycoconjugate, at least in part by conferring a negative charge at physiological pH (Bowman and Bertozzi, 1999). In addition, regioselective sulfation modifications of carbohydrate frequently generate specific epitopes that can be recognized by hormones, extracellular matrix proteins, cell surface receptors, and viruses (Hooper et al., 1996; Bowman and Bertozzi, 1999; Rosen et al., 2000). Hexose units in glycoconjugates can be sulfated either at a 2-amino-group or at the 3, 4, or 6 hydroxyl groups. These regioselective sulfation modifications are facilitated in the Golgi by a class of enzymes known as carbohydrate sulfotransferases. Based on sequence similarities, some of those have been grouped into families including the heparin-sulfate sulfotransferases (Shworak et al., 1999) and the galactose/Nacetylgalactosamine/N-acetylglucosamine 6-O-sulfotransferases (GSTs). A number of other carbohydrate sulfotransferases such as the HNK-1 sulfotransferase or the galactocerebroside sulfotransferase display sequence similarities to certain glycosyltransferases or possess unique amino acid sequences (Rosen et al., 2000). The GST-family (Hemmerich and Rosen, 2000) is a recently discovered family of carbohydrate sulfotransferases that facilitates 6-O-sulfation on the 6-hydroxyl of galactose, GalNAc or GlcNAc. To date six members of this family (GST-0 through 5) are known in human and mouse (Table I). These enzymes have been implicated in proteoglycan biosynthesis (Fukuta et al., 1995, 1997; Uchimura et al., 1998b; Kitagawa et al., 2000) as well as lymphocyte homing (Hemmerich and Rosen, 2000). Thus, a particular member of this enzyme family, GST-3, also known as high endothelial cell N-acetylglucosamine 6-sulfotransferase (HEC-GlcNAc6ST) or L-selectin ligand sulfotransferase (LSST), is highly restricted in its expression to lymph-node high endothelial venules (HEV) and has been implicated in L-selectin ligand biosynthesis (Bistrup et al., 1999; Hiraoka et al., 1999). It is tempting to speculate that the highly related intestinal N-acetylglucosmine 6-O-sulfotransferase GST-4 may have similar functions (Lee et al., 1999). Other members of these family such as GST-0, -1, and -2 can sulfate GalNAc, galactose, or GlcNAc (respectively) in chondroitin sulfate or keratan sulfate biosynthesis (Fukuta et al., 1995, 1997; Uchimura et al., 1998b). However, at least in vitro GST-1 can also contribute to L-selectin ligand formation 75
S. Hemmerich et al.
Table I. Nomenclatures of GST-type carbohydrate sulfotransferases Recommended Species deignation
Various names
Abbreviations
Reference
Accession#
Modified monosaccharide
Predominant expression pattern
GST-0
chick human mouse
Chondroitin 6-0-sulfotransferase
C6ST
Fukuta et al., 1995 Fukuta et al., 1998 Uchimura et al., 1998a
D49915 AB012192 AB008937
GalNAc Gal
broad
GST-1
human
Keratan sulfate 6-0-sulfotransferase Chondroitin 6-0-sulfotransferase Carbohydrate sulfotransferase 1 mouse GST-1
KSGal6ST
Fukuta et al., 1997 Mazany et al., 1998 Li and Tedder, 1999 this paper (fig. 6)
ab003791 u65637 af090137 af280087
Gal
broad
Uchimura et al., 1998c Li and Tedder, 1999 Uchimura et al., 1998b
AB014680* AF083066 ab011451
GlcNAc
broad
HEC-GlcNAc6ST
Bistrup et al., 1999
GlcNAc
HEC
LSST
Hiraoka et al., 1999
AF131235 AF131236 AF109155
Lee et al., 1999 this paper (Fig. 5) Lee et al., 1999
AF176838 AF280086 AF176840
GlcNAc
intestine
Kitagawa et al., 2000
AB037187
GalNAc GlcNAc
broad
Bhakta et al., 2000
AF280089
mouse GST-2
GlcNAc6ST N-acetylglucosamine 6-0-sulfotransferase CHST-2 Carbohydrate sulfotransferase 2 N-acetylglucosamine 6-0-sulfotransferase GlcNAc6ST
human mouse
GST-3
CHST-1
human
HEC-resticted N-acetylglucosamine 6-0-sulfotransferase L-selectin ligand sulfotransferase
mouse GST-4
human α human β mouse
intestinal I-GlcNAc6ST N-acetylglucosamine 6-0-sulfotransferase
GST-5
human
Chondroitin 6-0-sulfotransferase 2 human GST-5 mouse GST-5
mouse
C6ST-2
(Bistrup et al., 1999). In order to facilitate further functional comparison within this family and enable gene-deletion strategies, we have determined the chromosomal localizations and genomic organizations of the genes encoding these GST enzymes in human and mouse. The human and murine genes encoding GSTs implicated in proteoglycan biosynthesis are located at loci on different chromosomes that are usually syntenic between mouse and human. The genes encoding GST-3 and GST-4 are located within the same band on chromosome 16 in human and the syntenic region on chromosome 8 in mouse.
Results Isolation and characterization of genomic clones containing GST-genes BAC libraries from human and mouse (C57Bl6) were screened at Incyte Genomics (St. Louis, MO) with EST-derived probes as described in Materials and methods. The probes used for each gene are summarized in Table II. The GST ORFs contained within the BAC DNAs were then sequenced with cDNA or EST-derived primers (cf. Table II).
Table II. Chromosomal localization of GST genes. gene
locus Chr.
10q21.3
Reference
huC6ST
HO
msC6ST
MO
9
9C
Uchimura et al., 1998a
huGST1
H1
11
11p11.1-11.2
Fukuta et al., 1997
msGST1
M1
2
2D-E1
this paper (Fig. 1)
huGST2
H2
7 3
7q31 3q24
Uchimura et al., 1998c Li and Tedder, 1999
msGST2
M2
9
9E3.2-3.3
this paper (Fig. 1)
huGST3
H3
16
16q23.1-23.2
this paper (Fig. 1)
msGST3
M3
8
8E1
this paper (Fig. 1)
16
16q23.1
this paper (Fig. 1)
huGST4 α H4 huGST4 β
10
Chr. Loc
syntenic marker
cDNA
region in human
(accession no.)
(accession no.) derived from
GeneMap98*
msGST4
M4
8
8E1
this paper (Fig. 1)
huGST5
H5
X
Xp11
Kitagawa et al., 2000
msGST5
M5
X
XA3.1-3.2
Bhakta et al., 2000
*http://www.ncbi.nlm.nih.gov/genemap98/loc.cgi?ID=18552
76
syntenic
Probe for BAC-lifting
BAC-ORF length
sequenced
cDNA AF280087 nt 487-1287
800 bp
in full
EST AA103962 nt 5-752
747 bp
partial
AB012192 11q22.3-23.3
AA013756
AB008937
11p11-13
D45212
AF280087
AB003791
AF083066 3q23
AF043120
AB011451 AF131235
cf. Bistrup et al., 1999
496 bp
partial
16q23.1
D26046
AF109155
cf. Bistrup et al., 1999
241 bp
in full
AF176838 AF280086
cf. Lee et al., 1999
237 bp
in full
AF176840
cf. Lee et al., 1999
151 bp
in full
EST A1419198 nt-5-1385
1390 bp
in full
16q23.1
D26046
AB037187 Xp11.23
X91238
AF280089
Chromosomal localization and genomic organization of GSTs
Chromosomal localization of GST genes Chromosomal localizations of GST genes in human and mouse that had not been previously assigned in the literature were determined by fluorescence in situ hybridization analysis (FISH) at Incyte Genomics as described in Materials and methods. Results of our FISH assignments of loci M1, M2, H3, M3, H4, and M4 are depicted in Figure 1. The results for all known GST genes as established here and previously are summarized in Table II. Southern analysis on the human GST4 gene BAC DNA containing the human GST4 gene was digested with appropriate restriction enzymes (cf. Materials and methods), and fractionated digests were probed with a 153 bp probe derived from the center of the human GST4 ORF (GenBank accession no. AF176838). Surprisingly, with three different combinations of restriction enzymes we always observed two bands of different sizes (Figure 2). This suggested that the genomic sequence contained within this BAC contained two identical or highly similar sequences that are complementary to the GST4-derived probe and prompted us to search for GST4-like sequences in genomic sequence databases. Genomic organization of human GST-genes 5′UTR, ORF, and 3′UTR of human GST-cDNAs (cf. Table II for pertinent accession numbers) were used as probes in screening (BLASTn) for matching genomic sequences contained in GenBank’s nr and htgs databases. Matching sequences in htgs were always found within collections of unordered fragments. These were converted into separate GCG documents and assembled in a contig assembly program (Sequencher). As shown in Figure 3, the ORFs coding for GST-3, -4, and -5 were always found as contiguous, intron-less sequences within single genomic exons. The respective 3′UTR were in all cases, except GST5, fused to the exons encoding the ORFs. The respective 5′UTRs were found in upstream sequences within the same fragments or in fragments within the respective htgs-derived pools. Two separate exons contribute to the ORF encoding GST-0 in agreement with data reported earlier (Tsutsumi et al., 1998). Genomic sequences containing GST1 and GST2 were incomplete and could not be assembled into contigs. Therefore, their detailed respective genomic organizations remain to be determined. However, as depicted in Figure 3, it is plausible that these genes are organized similarly to GST3, -4, and -5: a series of short exons encoding the 5′UTR is followed by one or two exons containing the ORF and 3′UTR. Detailed analysis of the H4 locus The screen for genomic sequences corresponding to 5′UTR, ORF, or 3′UTR of human GST4 cDNA (GenBank accession no. AF176838) (Lee et al., 1999) yielded the following collections of genomic sequences (listed by accession numbers): AC009105 (61 unordered fragments), AC009163 (58 unordered fragments), AC011934 (15 unordered fragments), AC025287 (42 unordered fragments), and AC026419 (17 unordered fragments). These 193 fragments were analyzed using a contig alignment program (Sequencher); 110 of the sequences assembled into 11 contigs. The largest of these contigs was
comprised of 49 fragments spanning a total sequence of 161 kb and contained the GST4α ORF. It was therefore termed H4. The shortest overlap of fragments used in assembly of H4 was 660 identical bases (nt 77114–77773). Editing and trimming of low-quality regions in individual fragments yielded a 160552 nt consensus sequence that is available upon request in GenBank or FASTA format; 88.9% of the consensus was derived from at least two overlapping fragments. Where there were differences, base-calls in the consensus were based on majority; in case that no clear call could be obtained, the consensus base was noted as ambiguous (S = G or C; Y = C or T; W = A or T; M = A or C; R = A or G; K = G or T). Only 117 out of a total of 160552 bases in the contig were ambiguous. An overview of this contig including the corresponding cDNA fragments is shown in Figure 4. Closer examination of this contig revealed, that it contained bases 328 through 2134 of the human GST4 cDNA (GenBank accession no. AF176838), which includes the ORF, 16 bp of 5′UTR, and all of 3′UTR (5U0+ORF+3U) within one exon located at position 47940 through 49745. The remaining 5′UTR of the GST4 cDNA (bases 9–326) appear to be contained within at least four short upstream exons: 4a_5U1 (bases 263–327 in GST-4 cDNA), nt 46637–46701 in the contig; 4a_5U2 (bases 168–262 in GST4 cDNA) nt 45094–45188 in the contig; 4a_5U3 (bases 86–167 in GST4 cDNA), nt 35593–35674 in the contig; 4a_5U4 (bases 11–85 in GST-4 cDNA); nt 32849–32923 in the contig. The introns between these five transcribed exons all feature typical splice junctions. The overall structure of the human GST4 gene (locus H4) is shown in Figure 3. The entire contig sequence was screened for sequence tagged sites (STS), repeats, EST matches, and potential binding sites for transcription factors, using GeneMiner software (deCode Genetics). A GenBank format summary of these features along with the sequence is available upon request. Interestingly, the H4 contig featured a conspicuous repeat of three Sp1 consensus sequences (5′GGGGCGGGGC-3′). This Sp1 triplet is contained within a stretch of 72 bp located at nt 11942–12013, 20 kb upstream of the first known exon of GST4 5′UTR (4a_5U4, nt 32849–32923, Figure 4). A novel highly related GST-gene is present in the H4 contig downstream of the GST4 gene Further examination of the H4 contig revealed a long open reading frame (nt 98474–99661) potentially encoding a novel member of the GST family. The enzyme encoded by this 1188 bp ORF is predicted to be a typical type II transmembrane protein of 395 amino acids with 85.6% identity and 87.4% similarity to human GST-4 (GenBank accession no. AF176838; Lee et al., 1999) on the amino acid level. The predicted gene product was therefore termed GST-4β to highlight its similarity to GST-4 with the latter hereafter referred as GST-4α. In order to address the question, whether GST4β is expressed in vivo, we searched the GenBank and LifeSeq EST (Incyte Genomics) databases for matching expressed sequence tags (ESTs). We found three (accession number AI824100 from GenBank, and clones #5968031 and 6869651 from LifeSeq). Plasmids containing these three sequences were retrieved and sequenced in full. AI824100 was found to contain the GST4β ORF from its start ATG through a NotI site (GCGGCCGC) at position 795 of the ORF. In addition, this plasmid contained 188 bases of GST4β 5′UTR. Incyte clone #6869651 contained the GST4β ORF 77
S. Hemmerich et al.
Fig. 1. Chromosomal localization of GST genes. Chromosomal localizations of GST genes were assigned by fluorescence in situ hybridization (FISH) as described in Materials and methods. M1 (mouse GST1 locus), measurement of 10 specifically labeled (>) chromosomes 2 demonstrated that msGST1 is located at a position (—), which is 47% of the distance from the heterochromatic–euchromatic boundary to the telomere of mouse chromosome 2, an area that corresponds to the interface between bands 2D and 2E1. A total of 80 metaphase cells were analyzed with 73 exhibiting specific labeling. M2 (mouse GST2 locus), measurement of 10 specifically labeled (>) chromosomes 9 demonstrated that msGST2 is located at a position (—), which is 69% of the distance from the heterochromaticeuchromatic boundary to the telomere of mouse chromosome 9, an area that corresponds to the interface between bands 9E3.2 and 9E3.3. A total of 80 metaphase cells were analyzed with 72 exhibiting specific labeling. H3 (human GST3 locus), measurements of 10 specifically labeled (>) chromosomes 16 demonstrated that huGST3 is located at a position which is 75% the distance from the heterochromatic-euchromatic boundary to the telomere of human chromosome arm 16q, an area which corresponds to band 16q23.1–23.2. A total of 80 metaphase cells were analyzed with 69 exhibiting specific labeling. M3 (mouse GST3 locus), measurements of 10 specifically labeled (>) chromosomes 8 demonstrated that msGST3 is located at a position (—), which is 90% of the distance from the heterochromatic-euchromatic boundary to the telomere of mouse chromosome 8, an area that corresponds to band 8E1. A total of 80 metaphase cells were analyzed with 73 exhibiting specific labeling. H4 (human GST4 locus), measurements of 10 specifically labeled (>) chromosomes 16 demonstrated that huGST4 is located at a position which is 67% the distance from the heterochromatic-euchromatic boundary to the telomere of human chromosome arm 16q an area which corresponds to 16q23.1. A total of 80 metaphase cells were analyzed with 73 exhibiting specific labeling. M4 (mouse GST4 locus), measurements of 10 specifically labeled (>) chromosomes 8 demonstrated that msGST4 is located at a position (—), which is 90% of the distance from the heterochromatic-euchromatic boundary to the telomere of mouse chromosome 8, an area that corresponds to band 8E1. A total of 80 metaphase cells were analyzed with 68 exhibiting specific labeling.
78
Chromosomal localization and genomic organization of GSTs
Definition and characterization of the human GST3 gene (H3)
Fig. 2. Southern analysis of genomic DNA from human GST4 locus. Genomic BAC DNA was digested with restriction enzymes, and digests were fractionated, Southern blotted, and probed with a DNA fragment from the human GST4α ORF as described in Materials and methods. Digestion with EcoRI and ClaI or EcoRI and XbaI yielded the same pattern of two bands of ∼15 kb and ∼4 kb size. Digestion with BglII and HindIII yielded again two bands of ∼4.5 kb and ∼5.5 kb size.
Genomic sequences containing the human GST3 gene were identified through the same approach described above for GST4. This search yielded a group of 108 unordered genomic in GenBank’s HTGS database (accession number AC010547). The ORF and 3′UTR of human GST3 cDNA (GenBank accession no. AF131235; Bistrup et al., 1999; nt 163–2017) were present in a single exon (nt 3114–4968) contained in fragment AC010547–108 (total length 12 kb). Most of the 5′UTR of the same human cDNA (AF131235, bases 9–162), obtained from human gall bladder, was in a single short exon (nt 2720–2873) on another fragment AC010547–89 (reverse complement, total length 3.94 kb). 5′UTR from another GST3 cDNA (GenBank accession no. AF280088, bases 34–102, Bistrup, Hemmerich, and Rosen, unpublished observations), which was obtained from human tonsillar high endothelial cells, mapped to another short exon (nt 2988–3056) immediately downstream on the same fragment AC010547–89. Based on the organization of the human GST4 gene and other related GST genes as determined above, we predict, that fragment AC010547–89 containing the 5′UTR is located upstream from fragment AC010547–108 containing the ORF and 3′UTR. The predicted organization of the human GST3 gene (H3) is shown in Figure 3. The gap between fragments AC010547–89 and AC010547–108 is unknown at present. Mouse homologues of human GST-1 and GST-5
from the NotI site at position 795 of the ORF through the stopcodon (TAG) and an additional 300 bp of 3′UTR. Additional 2106 bp of 3′UTR terminating in a poly A tail were contained in Incyte clone 5968031. The 3786 bp complete GST4β cDNA sequence constructed from these three ESTs (submitted to GenBank under accession no. AF280086) is presented in Figure 5 with the amino acid sequence of the predicted protein. This sequence was then mapped against the H4 contig. It was thus found that the GST4β ORF together with 16 bp of 5′UTR and the first 1590 bp of the 3′UTR were contained within a single exon located at positions 98458–101251 (commencing 50.5 kb downstream from the start of the GST4α ORF). The remaining 792 bp of 3′UTR including two polyadenylation signals (AATAAA, nt 3106–3111 and 3734–3739) were contained in a second exon (nt 104402–105193 in the genomic contig) 3.2 kb downstream of the coding exon. The GST4β 5′UTR was again contained in at least two small exons located upstream of the GST4β ORF but downstream of the GST4α ORF. Thus, 4b_5U1 (bases 98 – 172 in GST4β cDNA) corresponds to bases 96411–96485 in the contig, while 4b_5U2 (bases 10 – 97 in GST4β cDNA) corresponds to bases 83258– 83345 in the contig. Again, as was the case with the GST4α gene, the introns connecting between these four GST4β exons feature typical splice junctions. Thus, as illustrated in Figure 3, the H4 locus is a bigenic locus in that it contains a tandem repeat of two highly similar GST genes GST4α and GST4β. The enzyme encoded by GST4α has been shown experimentally to facilitate 6-O-sulfation at GlcNAc in mucin-type acceptor glycoproteins (Lee et al., 1999). Since GST-4β is 85.6% identical to GST-4α, it is likely to possess a highly similar enzymatic activity.
A full length cDNA encoding mouse GST-1 was identified by screening GenBank’s mouse EST database with a probe derived from the human GST1 ORF. The matching EST (accession number AA821661) was retrieved, and plasmid DNA sequenced in full length. This clone was found to be a full-length cDNA (2164 bp) containing a complete long ORF of 1236 bp followed by a polyadenylated 3′UTR (submitted to GenBank under accession no. AF280087). The predicted 411 aa mouse enzyme encoded by the long ORF is a typical type II transmembrane glycoprotein with 95.6% similarity and 94.2% identity to human GST-1 on the amino acid level. The mouse GST1 cDNA sequence together with translation of the ORF and a comparison to human GST-1 are shown in Figure 6. Mouse GST-5 (the murine analogue of human chondroitin 6-O-sulfotransferase-2; Kitagawa et al., 2000) has been cloned by homology screening of a mouse BAC library with a human GST-5 derived probe (GenBank accession no. AF280089; Bhakta et al., 2000). Evolutionary and functional relationship of GST genes The protein sequences of human and mouse GST-0 through -5 were compared by pair-wise alignment using the ClustalW algorithm (Thompson et al., 1994). A phylogenetic tree was constructed from the results and is shown in Figure 7. Within their catalytic domains, the GSTs are >40% similar (>30% identical) to one another at the amino acid level, and fall into two subfamilies: those that sulfate galactose and those that sulfate GlcNAc (cf. Table I). GST-0 and GST-5 have also been reported to catalyze 6-O-sulfation of GalNAc in chondroitin (Fukuta et al., 1998; Kitagawa et al., 2000). Within each subfamily, sequence similarity is ∼50%. Cross species homologies between human and mouse for a given GST range between 75% and 98%. 79
S. Hemmerich et al.
Fig. 3. Structure of human GST genes. Exons or part of exons transcribed into 5′UTR are highlighted in red. Sequence transcribed into a sulfotransferase open reading frame (ORF) is highlighted in green. Sequence transcribed into 3′UTR is highlighted in yellow. H3, Information from the htgs database did not allow construction of a complete contig. However, the available data suggest a genomic organization of H3 along the lines shown for H4 below. Interestingly, different exons appear to be utilized for mRNA transcription from different tissues. Thus an mRNA from human gall bladder contains 5′UTR derived from exon 1 (highlighted in magenta), while mRNA derived from human tonsillar high endothelial cells contains 5′UTR derived from exon 1′ (highlighted in orange) located just a few bases further downstream of exon 1. H4, The exons relate to the nomenclature of Figure 3 as follows: exon 1, 4a_5U4; exon 2, 4a_5U3; exon 3, 4a_5U2; exon 4, 4a_5U1; exon 5, 4a_5U0+ORF+3U; exon 6, 4b_5U2; exon 7, 4b_5U1; exon 8, 4b_5U0+ORF+3U0; exon 9, 4b_3U1. The short exons transcribed into 5′UTR are not necessarily drawn to scale. The position of a triplet of putative Sp1 transcription factor binding sites is indicated.
Discussion The Gal/GalNAc/GlcNAc 6-O-sulfotransferases are a newly described family of carbohydrate sulfotransferases, which are implicated in biological processes as diverse a proteoglycan biosynthesis and lymphocyte homing (Bistrup et al., 1999; Hiraoka et al., 1999; Habuchi, 2000; Hemmerich and Rosen, 2000). To date six different members of this family, GST-0 80
through GST-5, have been identified in human. We have cloned the heretofore unknown mouse homologues of human GST1 (Figure 6) and GST5 (GenBank accession no. AF280089; Bhakta et al., 2000) and have assigned the chromosomal localizations of several of the GST loci in human and mouse (bold entries in Table II). The genomic loci in human of the broadly expressed GST genes 0, 1, and 5 have previously been
Chromosomal localization and genomic organization of GSTs
Fig. 4. Overview of genomic contig H4. The contig was assembled from fragments of human genomic sequence retrieved from GenBank’s high throughput genome sequencing (htgs) database. The positions of exons that are transcribed into GST4α or GST4β mRNAs are indicated. Thus 4a_5U0+ORF+3U is an exon that is transcribed into a short stretch of GST4 5′ untranslated cDNA (5′UTR) followed by the coding open reading frame (ORF) and the 3′-untranslated region (3′UTR) of the GST4a cDNA. 4a_5U1 is the next further upstream exon that is transcribed into GST4α 5′UTR. The positions of other pertinent exons are indicated analogously. The 150552 nt consensus sequence with a list of features in GenBank or FASTA format is available upon request.
assigned to chromosomes 10, 11, and X, respectively (Table II). The genomic locus of the broadly expressed human N-acetylglucosamine 6-O-sulfotransferase GST-2 has been
assigned by one group to human chromosome 7 (band 7q31; Uchimura et al., 1998c)while another group mapped the same locus to human chromosome 3 (band 3q24; Li and Tedder, 81
S. Hemmerich et al.
1999). We have assigned the loci of the mouse GST1, 2, and 5 genes. Using syntenic markers specified in the mouse genome informatics database (Blake et al., 2000; Table II), we found that the loci of human and murine GST1 and 5 are respectively syntenic. Our data support the assignment of the human GST2 locus to chromosome region 3q24 rather than 7q31, because the former locus is syntenic to mouse chromosome region 9E3.2–3.3, the position to which we have assigned the mouse GST2 gene. The locus of human GST0 has been assigned to chromosome region 10q21.3 in the GenMap ’98 database. This localization is not syntenic with mouse chromosome region 9C, the previously assigned locus of mouse GST0 (Uchimura et al., 1998a). In contrast to the broadly expressed genes GSTs 0, 1, 2, and 5, the human loci of the genes coding for the two tissue-restricted sulfotransferases GST-3 and –4 are clustered in chromosomal region 16q23.1–23.2. A number of human patients have been described who exhibit deletions in this region on a single chromosome. A variety of developmental and morphological anomalies were present in these individuals (Callen et al., 1993; Werner et al., 1997). In the mouse the clustering of the GST3 and GST4 loci is conserved in the syntenic chromosomal region 8E1. The HEC restricted N-acetylglucosamine 6-O-sulfotransferase has been proposed to play a functional role in lymphocyte homing to peripheral lymph nodes (Bistrup et al., 1999; Hiraoka et al., 1999). The fact, that GST4 expression is highly restricted to intestinal tissue (Lee et al., 1999), together with the close proximity of its locus to the GST3 locus in human and mouse, suggest that GST-4 may play a role in lymphocyte migration to mucosal lymphoid organs. Using data generated by the public human genome sequencing project, we have assembled partial or complete genomic sequences corresponding to the GST loci. Only for the GST4 locus (H4) were we able to assemble a complete 160 kDa contig (Figure 4); however, the information obtained from the other partial contigs suggest, that in human all six GST genes are organized along similar lines. Thus, except for GST0, the coding sequences (ORF) in all GST genes are contained within single exons (Figure 3). In most cases the 3′UTR is fused to the ORF within the same exon; however, the 5′UTR of the GST mRNAs are usually contained within several short exons upstream of the ORF. This overall genomic organization resembles that of the topologically and functionally related β3-galactosyltransferases and α(1,3)fucosyltransferases (Gersten et al., 1995; Smith et al., 1996; Amado et al., 1998), but is very different from the genomic organization of the human heparan sulfate N-deacetylase/N-sulfotransferase genes, whose protein coding sequences are distributed over 13 or 14 exons (Gladwin et al., 1996; Humphries et al., 1998). For the GST3 gene we have shown that its mRNA expressed in human tonsillar HEC utilizes a 5′UTR that is different form the 5′UTR utilized in GST3 mRNA from human gall bladder. The presence of two different 5′UTRs associated with the two cDNAs corresponding to GST3 implies that different mature mRNAs are derived by differential splicing. Features of the 5′UTR contribute significantly to the specificity and overall efficiency of translation initiation (Gary and Wickens, 1998). This is controlled at the level of secondary structure, which affects general accessibility, and also at the level of specific sequences, which interact with specific mRNA binding proteins. It is also possible that features of the 5′UTR affect the stability of the transcript, thereby conferring another level of 82
control of expression. Thus, the expression of different 5′UTRs for GST3 possibly represents one mechanism for exerting tissue specific control of enzymatic activity. Our analysis has revealed that the human GST4 locus (H4) contains a tandem repeat of two genes predicting highly similar isozymes GST-4α and GST-4β (Figure 3). GST-4α has been shown to encode a novel N-acetylglucosamine 6-O-sulfotransferse that is expressed predominantly in intestinal tissue. The existence of these tandem GST4 genes in the human genome is further evidenced by our Southern analysis (Figure 2). Thus, the BglII or HindIII restriction sites flanking the GST4α ORF are two BglII sites located in H4 at positions 45752 and 50139 (5′ and 3′ of GST4α ORF, respectively). The pertinent sites flanking the GST4β ORF are again two BglII sites located at positions 95503 and 101014 (5′ and 3′ of GST4β ORF, respectively). The pattern of labeled bands predicted from this arrangement (4387 bp and 5511 bp) indeed agrees with our experimental data. Of the three other enzymes used in our analysis, the closest flanking sites for the GST4α or GST4β ORF are in both cases EcoRI sites located in H4 at positions 39727, 55239, 98108, and 108392. This confirmed our observation that the banding pattern with EcoRI was unchanged by either of the other two enzymes (XbaI or ClaI) used in combination with EcoRI (Figure 2). Of the predicted band pattern (15512 bp and 10284 bp), only the 15 kb band is seen in the Figure 2. The second ∼4 kb band in the EcoRI digests requires the existence of an additional EcoRI site in H4 more proximal to either the GST4α or GST4β ORF. The absence of such a site could be due either to a sequencing error or to a polymorphism at an appropriate location within H4. The existence of ESTs mapping to either GST4α (Lee et al., 1999) or GST4β suggests that both isozymes are indeed expressed. Three overlapping human ESTs mapping to GST4β were found in a lung tumor derived library (EST AI824100) or brain libraries (Incyte ESTs). A full-length GST4β cDNA sequence (Figure 5) was constructed from these ESTs and compared to our GST4α cDNA (Lee et al., 1999). The amino acid sequences predicted by the GST4α and GST4β ORFs are highly similar (85.6% identity). Therefore, probes derived from either ORF are unlikely to differentiate between both genes, as exemplified by our 152 bp probe from the center of the GST4α ORF, that hybridized to both genes on our Southern blot. The same probe was used previously for Northern analysis. It hybridized to a relatively abundant ∼2.6 kb transcript as well as at least three less abundant longer transcript at ∼3.5, ∼4.0 and ∼4.5 kb present only in intestinal tissue-derived mRNA (Lee et al., 1999). Based solely on size comparison, the 2.6 kb transcript may correspond to our GST4α cDNA (2.2 kb), while the larger transcripts may correspond to the GST4β cDNA (3.8 kb). Splicing variants of both transcripts may occur. Thus, both GST4α and GST4β appear to be expressed predominantly in the intestine. Analysis of differential expression of both genes requires probes specific for one or the other gene. As discussed above, comparison of the GST4α and GST4β cDNA across their coding sequences reveals high similarity on the nucleotide and protein level. However, both cDNAs diverge rapidly about 270 bp downstream of their respective ORF stop-codons as well as 70 bp upstream of their ORF start-codons. Sequences derived from these regions are expected to be useful in design of specific probes for GST4α or GST4β gene expression.
Chromosomal localization and genomic organization of GSTs
Fig. 5. Nucleotide and deduced amino acid sequence of human GST4β cDNA (GenBank accession no. AF280086). The open reading frame is denoted by capital letters. The predicted amino acid sequence is indicated below the nucleotide sequence and compared to the protein sequence of human GST-4α (italics). The putative transmembrane domains are underlined and potential N-linked glycosylation sites are boxed. A 5′ inframe stop codon (tga) and two putative 3′ polyadenylation signals (aataaa) are indicated.
The close proximity of both GST4 genes ( in Figure 1) and the specific locus of the pertinent GST gene (— in Figure 1). 85
S. Hemmerich et al.
Southern analysis of human GST4 genomic BAC DNA DNA generated from the BAC containing the human GST4α gene was digested with restriction enzymes EcoRI and ClaI, EcoRI and XbaI, or HindIII and BglII. These enzyme combinations were chosen because the absence of respective restriction sites within the GST4α ORF. Digests were fractionated by horizontal agarose gel electrophoresis followed by transfer of the DNA fragments to a nitrocellulose membrane. The membrane was then probed with a 32P-labeled DNA probe from the center of the GST4α ORF (AF176838, nt 620–772; Lee et al., 1999). Cloning of a human GST4β cDNA EST clones AI824100, 5968031,and 6869651 were retrieved from Research Genetics or Incyte Genomics, expanded in E.coli, and plasmid DNA isolated. All three plasmids were sequenced over their entire inserts on both strands. Thus, AI824100 was found to contain a 5′ portion of the novel GST4β ORF as well as 5′UTR capped by a 5′ EcoRI cloning site and ending in a 3′ NotI cloning site that represents the internal NotI site at nt 796 in the GST4β cDNA depicted in Figure 5. EST 6869651 was found to contain the residual 3′ portion of the GST4β ORF starting at the same NotI site as 5′ cloning site followed by 300 bp of 3′UTR capped by a 3′EcoRI cloning site. These inserts were excised and ligated into the EcoRI site of a pCDNA3.1 expression vector (Invitrogen). Recombinant plasmids (pCDNA3.1-GST4β) were screened for correct orientation by PCR across the 5′ EcoRI junction and the internal Not I junction. The sequence of this partial cDNA (AF280086 nt 1–1694) that contained the entire GST4β ORF was confirmed by sequencing of both strands. Incyte EST 5968031 was found to contain 2106 bp of GST4β 3′UTR (AF280086 nt 1681-end). The GST4β complete cDNA sequence depicted in Figure 5 was compiled from pCDNA3.1-GST4β and EST 5968031.
Acknowledgments The work described in this study has been supported by grants from the NIH (RO1GM5741) and Roche Bioscience to SDR. JKL was supported by a postdoctoral fellowship from the Arthritis Foundation. The LifeSeq database was accessed through a licensing agreement between Incyte Genomics and Roche. We thank Sophia Lai, Steve Stoufer, and Chinh Bach for sequencing plasmids and BACs, and Annemieke van Zante and Dr. Frank Ruddle for helpful comments on the manuscript. Abbreviations aa, amino acid residues; BAC, bacterial artificial chromosome; EST, expressed sequence tag; Gal, galactose; GalNAc, N-acetylgalactosamine; GlcNAc, N-acetylglucosamine; GST, Gal/GalNAc/ GlcNAc 6-O-sulfotransferase; HEC, high endothelial cell; nt, nucleotide; ORF, open reading frame; 5′UTR, 5′ untranslated region; 3′UTR, 3′ untranslated region. Sequence data from this article have been deposited with the GenBank data library under accession nos. AF280086 (human GST4β cDNA), AF280087 (mouse GST1 cDNA), AF280088 86
(human GST3 cDNA from tonsillar high endothelial cells), and AF280089 (mouse GST5 intronless genomic coding sequence). References Amado, M., Almeida, R., Carneiro, F., Levery, S.B., Holmes, E.H., Nomoto, M., Hollingsworth, M.A., Hassan, H., Schwientek, T., Nielsen, P.A., Bennett, E.P., and Clausen, H. (1998) A family of human β3-galactosyltransferases. J. Biol. Chem., 273, 12770–12778. Bhakta, S., Bartes, A., Bowman, K.G., Polsky, I., Lee, J.K., Kao, W.-M., Cook, B.N., Bruehl, R.E., Bertozzi, C.R., Rosen, S.D., and Hemmerich, S. (2000) Sulfation of N-Acetylglucosamine by chondroitin 6-O-sulfotransferase-2. J. Biol. Chem., forthcoming. Bistrup, A., Bhakta, S., Lee, J.K., Belov, Y.Y., Gunn, M.D., Zuo, F.-R., Huang, C.-C., Kannagi, R., Rosen, S.D., and Hemmerich, S. (1999) Sulfotransferases of two specificities function in the reconstitution of high endothelial cell ligands for L-selectin. J. Cell Biol., 145, 899–910. Bowman, K.G. and Bertozzi, C.R. (1999) Carbohydrate sulfotransferases: mediators of extracellular communication. Chem. Biol., 6, R9–R22. Blake, J.A., Eppig, J.T., Richardson, J.E., and Davisson, M.T. (2000) The Mouse Genome Database (MGD): expanding genetic and genomic resources for the laboratory mouse. The Mouse Genome Database Group. Nucleic Acids Res., 28, 108–111. Callen, D.F., Eyre, H., Lane, S., Shen, Y., Hansmann, I., Spinner, N., Zackai, E., McDonald-McGinn, D., Schuffenhauer, S., Wauters, J., Van Thienen, M.-N., Van Roy, B., Sutherland, G.R., and Haan, E.A. (1993) High resolution mapping of interstitial long arm deletions of chromosome 16: relationship to phenotype. J. Med. Genet., 30, 828–832. Eckhardt, M. and Gerardy-Schahn, R. (1998) Genomic organization of the murine polysialyltransferase gene ST8SiaIV (PST-1). Glycobiology, 8, 1165–1172. Fukuta, M., Inazawa, J., Torii, T., Tsuzuki, K., Shimada, E., and Habuchi, O. (1997) Molecular cloning and characterization of human keratan sulfate Gal 6-O-sulfotransferase. J. Biol. Chem., 272, 32321–32328. Fukuta, M., Kobayashi, Y., Uchimura, K., Kimata, K., and Habuchi, O. (1998) Molecular cloning and expression of human chondroitin 6- sulfotransferase. Biochim. Biophys. Acta, 1399, 57–61. Fukuta, M., Uchimura, K., Nakashima, K., Kato, M., Kimata, K., Shinomura, T., and Habuchi, O. (1995) Molecular cloning and expression of chick chondrocyte chondroitin 6-sulfotransferase. J. Biol. Chem., 270, 18575–18580. Gary, N.K. and Wickens, M. (1998) Control of translation initiation in animals. Annu. Rev. Cell. Dev. Biol., 14, 399–459. Gersten, K.M., Natsuka, S., Trinchera, M., Petryniak, B., Kelly, R.J., Hiraiwa, N., Jenkins, N.A., Gilbert, D.J., Copeland, N.G., and Lowe, J.B. (1995) Molecular cloning, expression, chromosomal assignment and tissue- specific expression of a murine a (1, 3)-fucosyltransferase locus corresponding to the human ELAM-1 ligand fucosyl transferase. J. Biol. Chem., 270, 25047–25056. Gladwin, A.J., Dixon, J., Loftus, S.K., Wasmuth, J.J., and Dixon, M.J. (1996) Genomic organization of the human heparan sulfate-N-deacetylase/ N-sulfotransferase gene: exclusion from a causative role in the pathogenesis of Treacher Collins syndrome. Genomics, 32, 471–473. Habuchi, O. (2000) Diversity and functions of glycosaminoglycan sulfotransferases. Biochim. Biophys. Acta, 1474, 115–127. Hemmerich, S. and Rosen, S.D. (2000) Carbohydrate sulfotransferases in lymphocyte homing. Glycobiology, 10, 844–856. Hiraoka, N., Petryniak, B., Nakayama, J., Tsuboi, S., Suzuki, M., Yeh, J.-C., Izawa, D., Tanaka, T., Miyasaka, M., Lowe, J.B., and Fukuda, M. (1999) A novel, high endothelial venule-specific sulfotransferase expresses 6-sulfo sialyl Lewis x, an L-selectin ligand displayed by CD34. Immunity, 11, 79–89. Hooper, L.V., Manzella, S.M., and Baenziger, J.U. (1996) From legumes to leukocytes: biological roles for sulfated carbohydrates. FASEB J., 10, 1137–1146. Humphries, D.E., Lanciotti, J., and Karlinsky, J.B. (1998) cDNA cloning, genomic organization and chromosomal localization of human heparan glucosaminyl N-deacetylase/N-sulphotransferase-2. Biochem. J., 332, 303–307. Kadonaga, J.T., Carner, K.R., Masiarz, F.R., and Tjian, R. (1987) Isolation of cDNA encoding transcription factor Sp1 and functional analysis of the DNA binding domain. Cell, 51, 1079–90. Kitagawa, H., Fujita, M., Ito, N., and Sugahara, K. (2000) Molecular cloning and expression of a novel chondroitin 6-O-sulfotransferase. J. Biol. Chem., 275, 21705–21080.
Chromosomal localization and genomic organization of GSTs
Lee, J.K., Bhakta, S., Rosen, S.D., and Hemmerich, S. (1999) Cloning and characterization of a mammalian N-acetylglucosamine-6-sulfotransferase that is highly restricted to intestinal tissue. Biochem. Biophys. Res. Commun., 263, 543–549. Li, X. and Tedder, T.F. (1999) CHST1 and CHST2 sulfotransferases expressed by human vascular endothelial cells: cDNA cloning, expression and chromosomal localization. Genomics, 55, 345–347. Mazany, K.D., Peng, T., Watson, C.E., Tabas, I., and Williams, K.J. (1998) Human chondroitin 6-sulfotransferase: cloning, gene structure and chromosomal localization. Biochim. Biophys. Acta, 1407, 92–97. Nelson, R.M., Venot, A., Bevilacqua, M.P., Linhardt, R.J., and Stamenkovic, I. (1995) Carbohydrate–protein interactions in vascular biology. Annu. Rev. Cell. Dev. Biol., 11, 601–631. Rosen, S.D., Bistrup, A., and Hemmerich, S. (1999) Carbohydrate sulfotransferases. In Ernst, B., Sinaÿ, P. and Hart, G. (eds.), Oligosaccharides in Chemistry and Biology. Wiley-VCH, Weinheim, vol. 2, pp. 245–260. Shworak, N.W., Liu, J., Petros, L.M., Zhang, L., Kobayashi, M., Copeland, N.G., Jenkins, N.A., and Rosenberg, R.D. (1999) Multiple isoforms of heparan sulfate D-glucosaminyl 3-O-sulfotransferase. Isolation, characterization and expression of human cDNAs and identification of distinct genomic loci. J. Biol. Chem., 274, 5170–5184. Smith, P.L., Gersten, K.M., Petryniak, B., Kelly, R.J., Rogers, C., Natsuka, Y., Alford, J.A., Scheidegger, E.P., Natsuka, S., and Lowe, J.B. (1996) Expression of the α (1, 3)fucosyltransferase Fuc-TVII in lymphoid aggregate high endothelial venules correlates with expression of L-selectin ligands. J. Biol. Chem., 271, 8250–8259. Takashima, S., Kurosawa, N., Tachida, Y., Inoue, M., and Tsuji, S. (2000) Comparative analysis of the genomic structures and promoter activities of mouse Siaα (2, 3)Galβ (1, 3)GalNAc GalNAcα (2, 6)-sialyltransferase
genes (ST6GalNAc III and IV): characterization of their Sp1 binding sites. J. Biochem. (Tokyo), 127, 399–409. Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–80. Tsutsumi, K., Shimakawa, H., Kitagawa, H., and Sugahara, K. (1998) Functional expression and genomic structure of human chondroitin 6-O-sulfotransferase. FEBS Lett., 441, 235–241. Uchimura, K., Kadomatsu, K., Fan, Q.W., Muramatsu, H., Kurosawa, N., Kaname, T., Yamamura, K., Fukuta, M., Habuchi, O., and Muramatsu, T. (1998a) Mouse chondroitin 6-sulfotransferase: molecular cloning, characterization and chromosomal mapping. Glycobiology, 8, 489–96. Uchimura, K., Muramatsu, H., Kadomatsu, K., Fan, Q.W., Kurosawa, N., Mitsuoka, C., Kannagi, R., Habuchi, O., and Muramatsu, T. (1998b) Molecular cloning and characterization of an N-acetylglucosamine-6-O-sulfotransferase. J. Biol. Chem., 273, 22577–83. Uchimura, K., Muramatsu, H., Kaname, T., Ogawa, H., Yamakawa, T., Fan, Q.W., Mitsuoka, C., Kannagi, R., Habuchi, O., Yokoyama, I., Yamamura, K., Ozaki, T., Nakagawara, A., Kadomatsu, K., and Muramatsu, T. (1998c) Human N-acetylglucosamine-6-O-sulfotransferase involved in the biosynthesis of 6-sulfo sialyl Lewis x: molecular cloning, chromosomal mapping and expression in various organs and tumor cells. J. Biochem. (Tokyo), 124, 670–678. Werner, W., Kraft S., Callen D.F., Bartsch, O., and Hinkel, G.K. (1997) A small deletion of 16q23.1→16q24.2 [del (16) (q23.1q24.2).ish del (l6) (q23.1q24.2) (D16S395+, D16S348-, P5432+)] in a boy with iris coloboma and minor anomalies. Am. J. Med. Genet., 70, 371–376.
87