Nutrient-Gene Expression
Characterization of the Genomic Structure of the Human Vitamin C Transporter SVCT1 (SLC23A2)1 Hans Christian Erichsen,2 Peter Eck,*2 Mark Levine*3 and Stephen Chanock Immunocompromised Host Section, Pediatric Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 and *Molecular and Clinical Nutrition Section, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892 ABSTRACT Vitamin C (L-ascorbic acid), a critical cofactor for intracellular enzymatic reactions, functions as a scavenger of free oxygen radicals and is an essential micronutrient. Vitamin C is actively transported into cells by one of two closely related sodium-dependent transporters, SVCT1 or SVCT2. In this paper, we report the complete sequencing and gene structure of SLC23A2, the gene encoding SVCT1. The1797-bp cDNA sequence (open reading frame) of the SLC23A2 gene was derived from a compact genomic sequence of 7966 bp [translation initiation codon (ATG) to poly A tail], which is divided into 14 exons. Furthermore, repetitive or masked elements constituted 17.98% of the gene; there were 4 Alu sequences and 5 MIR (Mammalian Interspersed Repetitive element) sequences. A search for common variants in SLC23A2, using current bioinformatic tools and direct resequencing of control populations, failed to identify common single nucleotide polymorphisms. The start of transcription was mapped to a position ⫺47 relative to the ATG; the immediate 5⬘ sequence was determined and analyzed for possible consensus binding sites for known transcription factors. Our findings will serve as the foundation for investigation of the regulation and expression of the tissue-specific sodium-dependent vitamin C transporter, SLC23A2. J. Nutr. 131: 2623–2627, 2001. KEY WORDS:
●
ascorbic acid transporter
●
gene
●
exon
Vitamin C (L-ascorbic acid, ascorbate) is an essential cofactor for eight mammalian enzymatic reactions and functions as an intra- and extracellular scavenger of free oxygen radicals (1). Humans rely upon dietary intake of vitamin C because, unlike most mammals, they do not synthesize vitamin C de novo (2). Vitamin C is accumulated within cells by two distinct mechanisms involving transport of two different substrates (3). In one mechanism, vitamin C is the transported substrate and the process is sodium dependent. In the second mechanism, oxidized vitamin C (dehydroascorbic acid) is transported independently of sodium by one of three different glucose transporters and immediately reduced intracellularly to vitamin C (4 – 6). Recently, the cDNA sequences were described for two distinct human sodium (Na⫹)-dependent vitamin C transporters, SVCT14 and SVCT2 (encoded by SLC23A2 and SLC23A1, respectively) (7–9). These human cDNA sequences were determined using an amplification strategy based upon the previously reported orthologs in rats, svct1 and svct2
●
intron
●
polymorphism
(10). The cDNA sequences were first identified as nucleobase transporters YSPL3 and YSPL2 (SVCT1 and SVCT2, respectively) (11). Sequence homology between the deduced human cDNAs is 65% for the amino acid sequence and 58% for the nucleotide sequence, scattered throughout the predicted coding region. Tissue expression patterns differ between the genes. SLC23A2 is restricted to absorptive intestinal and renal tissues and the liver, whereas SLC23A1 is ubiquitously expressed in most cell types (7,9). Lastly, fluorescent in situ hybridization analysis was used to determine the chromosomal location of the two human transporters. SLC23A2 mapped to the long arm of chromosome 5 (5q31.2–31.3) (7) and SLC23A1 mapped to the short arm of chromosome 20 (20p12.2–12.3) (12). The cDNA sequence of SLC23A2 had been described previously with an open reading frame (ORF) of 1797 bases encoding a 598 amino acid residue protein (7–9). In this paper, we report the gene structure of SLC23A2 and its transcription start point (TSP). We searched for common polymorphisms, known as single nucleotide polymorphisms (SNP) by resequencing the ORF and 5⬘/3⬘ flanking regions of the gene in two control populations, but did not find any.
1
Genbank accession number AF 375875. Indicates that the two first authors contributed equally to this manuscript. 3 To whom correspondence should be addressed. E-mail:
[email protected]. 4 AA, African American; ATG, translation initiation codon; BAC, bacterial artificial clones; CA, Caucasian; EST, expressed sequence tags; MIR, mammalian interspersed repeat; ORF, open reading frame; PCR, polymerase chain reaction; SINE, short interspersed repeat; SNP, single nucleotide polymorphism; SVCT, sodium-dependent vitamin C transporter; TSP, transcription start point; UTR, untranslated region. 2
MATERIALS AND METHODS Screening of genomic BAC library. An amplification-based screening technique was used to identify bacterial artificial clones (BAC) containing the SLC23A2 gene from a commercial human genomic library (GenomeSystems, St. Louis, MO). Two sets of poly-
0022-3166/01 $3.00 © 2001 American Society for Nutritional Sciences. Manuscript received 26 March 2001. Initial review completed 22 May 2001. Revision accepted 20 June 2001. 2623
2624
ERICHSEN ET AL.
merase chain reaction (PCR) primers were used for the initial screening and isolation of a BAC clone. The first set corresponded to a region in the mid-portion of the cDNA, TGTCTACCGCTGGGGCAA (forward) and TGCAATAGCCATGATGTC (reverse), generating a 314-bp fragment. The second set of screening primers corresponded to a region in the carboxyl terminus of the cDNA, GTCTGATACAGTGGAAAG (forward) and TCAGACCTTGGTGCACAC (reverse), generating a 230-bp fragment. Sequence analysis of BAC clones. On the basis of the published cDNA sequence (8), short oligonucleotide primers were generated to amplify regions of the gene to be sequenced directly. Amplification reactions were performed with the following: 0.5 mol/L of each primer, 50 mmol/L KCl, 10 mmol/L Tris-HCl, 2.5 mmol/L MgCl, and 200 mol/L of dNTPs. PCR conditions included a 10-min denaturing step at 95°C followed by 35 cycles of 94°C for 30 s, 55°C for 30 s and 72°C for 30 s. The predicted amplicon sizes were confirmed by gel electrophoresis before direct sequence analysis. Purified PCR products were sequenced bidirectionally on an ABI377XL with ABI Prism dRhodamine terminator cycle sequencing ready reaction kit with AmpliTaq DNA Polymerase, FS (Applied Biosystems, Foster City, CA). Sequence data were analyzed using AssemblyLign, and MacVector version 6.5 (both Oxford Molecular, Madison, WI) and Sequencher 3.1.1 (Applied Biosystems). Repeat Masker2 (version 7/16/00; http://ftp.genome.washington.edu/cgi-bin/RepeatMasker) was used to perform a masked query of the entire gene sequence to determine repetitive sequence elements. Similarly, the program Mat Inspector v2.2 (http://www.gsf.de/cgi-bin/matsearch2.pl) was utilized to identify consensus sequence binding sites in the 320-bp upstream region (13). Southern blot hybridization. The complete gene size and intron/ exon sizes were confirmed by sequential Southern blot analysis of the BAC clone digested with three separate, rare, six-base cutting restriction endonucleases, BamHI, Bgl II and Hind III (New England Biolabs, Beverly, MA). Individual oligonucleotides, end-labeled with P-32 ␥-ATP (Amersham, Arlington Heights, IL), were used to probe the blot under standard conditions. Amplicons were sequenced to confirm all intron/exon borders, all of which conformed to the AG/GT splice junction rule (14). Search for novel single nucleotide polymorphisms. Possible SNP were identified by currently available bioinformatics. Initially, the four separate, published human cDNA sequences were aligned and compared. A publicly available program (SNPpipeline) (http:// lpgws.nci.nih.gov:82/perl/snp/snp_cgi.pl) (15) was used to align and compare all expressed sequence tag (EST) sequences containing portions of the SLC23A2 cDNA sequence. All fragments were aligned and possible SNP identified for validation in genomic DNA assays. As part of a large-scale National Cancer Institute–sponsored effort to identify common SNPs, the Cancer Genome Anatomy Project-Genetic Annotation Initiative (CGAP-GAI), SNPs were identified by direct sequencing of 8 CEPH (Center d’Etude Polymorphisme Humaine) Caucasian (CA) individuals. Separately, cDNA derived from 16 anonymous African American (AA) blood donors were analyzed. These were collected under an IRB approved protocol in the Department of Transfusion Medicine, Clinical Center, NIH (courtesy of Dr. Susan F. Leitman). Assays for validating possible
SNPs were designed for genomic DNA using flanking, unique oligonucleotides. Start of transcription and analysis of 5ⴕ upstream region of SLC23A2 gene. The initiation of the SLC23A2 transcript was estimated by alignment of existing ESTs and comparison with the published sequences (7–9). The start of transcription was confirmed by ribonuclease protection assay. The oligonucleotide probe for the assay was made by PCR using two oligonucleotide primers, TCCACTCCCTCTTTCTCCGC (forward) and GGCACAGGTTTGAGCAGTTCC (reverse), which gave a 334-bp product. This product was cloned (Topo TA Cloning Kit for Sequencing, Invitrogen, Carlsbad, CA), and a new 410-bp oligonucleotide probe, which contained a T7 primer site, was made using a vector specific primer, M13 forward, and the gene-specific reverse primer. The T7 polymerase reaction and labeling of the primer with radioactive nucleotides (P-32 ␣-CTP; Amersham) were performed according to the directions of a commercially available kit (Maxiscript, In vitro Transcription Kit, Ambion, Austin, TX). The ribonuclease protection reaction was done with polyA RNA from human intestine (Clontech, Palo Alto, CA) according to the directions from another kit (RPA III, Ribonuclease Protection Assay, Ambion). Size fractionation was determined by comparing the mobility pattern to radiolabeled DNA standards (Promega, Madison, WI) on a denaturing 6% polyacrylamide gel. Novel sequence corresponding to the 5⬘ upstream was determined by direct sequence analysis of the BAC clone using a modified version of standard cycle sequence technique (provided by GenomeSystems) using a short oligonucleotide primer GTCCTCCTGGGCCCTCAT.
RESULTS The complete SLC23A2 gene was sequenced and is 7966 bp long [from translation initiation codon (ATG) to polyA tail]. The GC content of the gene is 52.64%. and only 17.98% of the gene is comprised of repetitive or masked elements. Of the complete gene, 17.41% consists of SINEs (Short Interspersed Elements), including four Alu elements and five MIR elements (Mammalian Interspersed Repetitive) (16,17). In the reference clone, there are eight repeats of a hexanucleotide repeat (CTGGGG) in intron 8. The gene is organized into 14 separate exons, 13 of which include coding sequence; there is an intron in the 3⬘ untranslated region (UTR) of 1568 bp (Fig. 1 and Table 1). A combination of Southern blot analysis and sequence analysis of the PCR-amplified fragments was used to define the exon/ intron boundaries, which conform to the GT/AG splice junction conventions (14) (Table 1). The sequence of each of the 14 exons was confirmed by direct sequence analysis of fragments amplified from control genomic DNA (Fig. 1). The composite exon sequence data confirm the predicted product of translation, namely, 598 amino acids. The sizes of individual coding exons varied between 68 bp (e.g., exon 4) and 265 bp
FIGURE 1 The genomic structure of SLC23A2. Boxes represent exons with intervening introns represented by connecting lines. Roman numerals above boxes are exon numbers. Numerals above line refer to exon length, numerals below line refer to intron length. The transcription start point (TSP) is at position ⫺47 relative to the first nucleotide position (A) of the initiation codon, ATG (18). Positions of known repeat units, include four Alu elements and the five MIR elements (Mammalian Interspersed Repetitive) are shown below the gene (Repeat Masker; http://ftp.genome. washington.edu/cgi-bin/RepeatMasker). These consensus sequences occur throughout the genome and often cluster around or within known genes. The low percentage of repetitive sequence in SLC23A2 indicates a dearth of redundant sequences, and reflects the possible evolutionary effects on the gene structure.
HUMAN SLC23A2 GENOMIC STRUCTURE
2625
TABLE 1 SLC23A2 intron/exon borders1 Exon no.
Acceptor
I II III IV V VI VII VIII IX X XI XII XIII XIV
ctctgctgcctccag tctccctctgcccag gactcttacccccag tgtttgcccttccag acccccctccttcag acccccctccttcag tccctccgagcacag tccctttccttacag ctggctccccggcag ctcttgcgcccttag tggcctattccacag actatcctcttccag ttctaattcccacag
Exon sequence GTCATCCCC. CATGAAACC. CACTACCTG. AGGAGATCT. GTCCAGGGT. CTCCATTCT. ATCATGCTG. GTCAGTGGG. GGGCATCTT. GTGGGCAGC. GCATGATTA. GCATTCTTG. GGAGCCCAG. GAAGCATGG.
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
CGGACACAG GGCTTCCAG CCCCGGAAG ATACGGGAG CTCAGCTTG ATGTTTCCT CCTACCCCT TATCAACAG ATTACCAAG CTCTCTTTG TCAATACAG CAGTGCCAG TCCAGGAAG TGAAGAAAT
Donor
Intron length
gtgctgggggctgga gtgggtcccccaggg gtgggtacacttctg gtgggtttgcatgta gtgagcaggcaccag gtgaggacctggcag gtgagcaacacccct gtagcctacctgccc gtgtcgcgacgagcc gtgagtgtctggcgc gtgcctccagctgtt gtatggttcggtcct gtaggatgttgctat
573 442 167 175 131 121 323 522 227 103 477 887 1568
Exon length 83 (⫺47–36) 114 (37–150) 247 (151–397) 68 (398–465) 182 (466–647) 121 (648–768) 157 (769–925) 148 (926–1073) 106 (1074–1179) 130 (1180–1309) 144 (1310–1453) 96 (1454–1549) 267 (1550–1816) 434 (1817–2250)
1 This table presents the sequence boundaries for each exon and intron of SLC23A2, and indicates the exact number of bp in each exon and intron. Splice site consensus sequences of the acceptor (ag on the “right side” of the splice site) and the donor (gt on the “left side” of the splice site) are shown after each boundary of exon/intron border. Splice sites conform to a consensus sequence pattern (14) and lead directly to the removal of intronic sequence at the same time that exonic sequences are assembled into the messenger RNA species. Numbers in parentheses refer to positions in cDNA relative to the first nucleotide (A) of the initiation codon, ATG (18).
(e.g., exon 12). Interestingly, the largest exon of 434 bp was the last one, which contains 3⬘ UTR sequence only. Notably, all introns were relatively small; the largest was 1568 bp and the smallest was 103 bp. The TSP was mapped by ribonuclease protection assay to ⴚ47 bp relative to the number 1 position nucleotide in the initiation codon (ATG) in exon 1 (18) (Fig. 2, Panel A). This site mapped directly to the 5⬘ end of the cDNA sequence reported by Wang et al. (9), whereas the other sequences reported were incomplete, terminating in the region between the start of translation and the newly defined start of transcription. In the region immediately upstream of the start of translation of the SLC23A2 gene, 320 bp were sequenced and analyzed for possible consensus binding sites for transcription binding factors (13) (Fig. 2, Panel B). Relative to the A in the initiation codon, there are several notable features, i.e., a consensus site for a CAAT box between ⫺64 bp and ⫺53 bp; a TATA 1 box between ⫺189 bp and ⫺180 bp; two overlapping AP-1 binding sites between ⫺57 bp and ⫺47 bp and also ⫺52 bp and ⫺42 bp; and a site for GATA1 between ⫺92 bp and ⫺78 bp. Utilizing current bioinformatics tools, five possible candidate SNPs were identified at the following positions in the ORF (relative to the initiation ATG): G/A at bp 12, G/A at bp 31, G/T at bp 455, C/A at bp 825 and C/T at bp 1377. However, we did not validate the presence of common SNPs at each site after direct sequence analysis of cDNA in 24 control individuals and, in addition, in genomic DNA from 90 controls (i.e., 45 AA and 45 CA anonymous blood donors). DISCUSSION In this paper, we report the complete sequencing and gene structure of SLC23A2, the gene encoding SVCT1, one of two sodium-dependent vitamin C transporters. Previously, the chromosomal location of SVCT1 was mapped to 5q31.2–31.3, which we confirmed (data not shown). The SLC23A2 gene is compact, 7966 bp long (from ATG to polyA tail) and contains 14 exons, 13 of which contain coding sequence. The sizes of the introns were exceptionally small, between 103 and 1568
bp in length. Analysis of the complete intronic sequence is remarkable in that only 17.98% of the gene contains repetitive or masked elements, primarily four Alu sequences and five MIR elements (16,17). The start of transcription was mapped to ⴚ47 bp by ribonuclease protection assay. This site correlated exactly with the 5⬘ end of the cDNA clone reported by Wang et al. (9). The 5⬘ sequence immediately upstream to the start of transcription is notable for several features. First, there are several motifs frequently associated with the start site, namely, a TATA box and a CAAT box. There are no apparent binding sites for transcription factors specific to either renal or intestinal expression. By mapping the primary sequence, it is now possible to initiate functional studies targeted at understanding the cis-acting elements that control expression. This is especially important in a gene whose expression profile is restricted to a small number of tissues. With the completion of the first map of the human genome, the opportunity to study the biological and clinical implications of variations in known genes has emerged as a major initiative. In the coming years, there will be intense effort to dissect the contribution of variations in genes to disease susceptibility, outcome and pharmacologic responses (19). To accomplish these goals, characterization of common SNPs is a high priority because the SNP variant is the most common form in the human genome. Currently, the density of SNPs in the vicinity of genes has been estimated to be one in every 1300 base pairs (20 –23). We utilized two different approaches to find potential SNPs in the SLC23A2 gene, i.e., direct sequence analysis of anonymous reference subjects and mining existing genetic databases (e.g., employing the SNPipeline) (15). These approaches did not identify a single common SNP in the coding region in two different sample populations. Possible sites identified by the latter approach were not confirmed in genetic analysis of sample control subjects. It is possible that the discrepancies identified by the alignment of EST sequences identified rarer variants (i.e., those with a frequency of ⬍1% or those that are specific to a small group or population). We did align the
2626
ERICHSEN ET AL. FIGURE 2 Transcription start point (TSP) and 5⬘ sequence analysis for SLC23A2. Panel A: mapping TSP for SLC23A2 by ribonuclease protection assay. Lane M is X174 Hinf I DNA markers. Lane 1 is RNA probe hybridized with human intestinal polyA-RNA and digested with RNase. Lane 2 is RNA probe digested by RNase. Lane 3 is RNA probe alone. The RNA probe is 410 bp long. The protected fragment is 41 bp long when adjusted for running property differences between the DNA marker and RNA. Panel B: sequence analysis of 320 bp immediately upstream of the initiation start codon (ATG); 320 bp are shown in the figure. Possible consensus binding sites for transcription binding factors (AP-1 and GATA1) are indicated by boxes in the 320-bp upstream region of SLC23A2. A CAAT and TATA box are each indicated by boxes (13). An arrow indicates the TSP. Base pair numbers are relative to the first position (A) in the initiation codon, ATG (18).
implications. The combination of a compact genomic organization and absence of common coding SNP in SLC23A2 suggests that the gene provides an essential function and has not undergone significant variation. It is possible that this reflects a conserved function that might be common to similar orthologs. In this regard, comparison to other mammalian orthologs will be informative and perhaps provide insights into the gene, its organization and function. It is surprising that its size is very small (⬍8 kb in total) in relation to the number of exons (14 total) and the observed paucity of repetitive sequence elements (⬃18% overall). Still, SLC23A2 has a highly homologous sodium-dependent transporter, SLC23A1, which has been cloned from human and rat cDNA libraries. Preliminary data indicate that the size and complexity of the related gene for SVCT2, SLC23A1, is significantly larger than SLC23A2. ACKNOWLEDGMENTS We thank the following for providing support and advice during this project: Edward Miller, Charles Foster, James Taylor and Eunhwa Choi.
LITERATURE CITED
published cDNA sequences and found a series of discrepancies between our data and that of Faaland et al. (11), suggesting sequence errors in the latter cDNA clones. Thus, there was no evidence for the presence of a common SNP in the ORF nor the 5⬘ or 3⬘ UTRs of the gene in two reference populations. Our findings have interesting evolutionary and biological
1. Levine, M., Rumsey, S. C., Daruwala, R., Park, J. B. & Wang, Y. (1999) Criteria and recommendations for vitamin C intake. J. Am. Med. Assoc. 281: 1415–1423. 2. Nishikimi, M., Fukuyama, R., Minoshima, S., Shimizu, N. & Yagi, K. (1994) Cloning and chromosomal mapping of the human nonfunctional gene for L-gulono-gamma-lactone oxidase, the enzyme for L-ascorbic acid biosynthesis missing in man. J. Biol. Chem. 269: 13685–13688. 3. Welch, R. W., Wang, Y., Crossman, A., Jr., Park, J. B., Kirk, K. L. & Levine, M. (1995) Accumulation of vitamin C (ascorbate) and its oxidized metabolite dehydroascorbic acid occurs by separate mechanisms. J. Biol. Chem. 270: 12584 –12592. 4. Rumsey, S. C., Kwon, O., Xu, G. W., Burant, C. F., Simpson, I. & Levine, M. (1997) Glucose transporter isoforms GLUT1 and GLUT3 transport dehydroascorbic acid. J. Biol. Chem. 272: 18982–18989. 5. Rumsey, S. C., Welch, R. W., Garraffo, H. M., Ge, P., Lu, S. F., Crossman, A. T., Kirk, K. L. & Levine, M. (1999) Specificity of ascorbate analogs for ascorbate transport. Synthesis and detection of [(125)I]6-deoxy-6-iodo-L-ascorbic acid and characterization of its ascorbate-specific transport properties. J. Biol. Chem. 274: 23215–23222. 6. Rumsey, S. C., Daruwala, R., Al-Hasani, H., Zarnowski, M. J., Simpson, I. A. & Levine, M. (2000) Dehydroascorbic acid transport by GLUT4 in Xenopus oocytes and isolated rat adipocytes. J. Biol. Chem. 275: 28246 –28253. 7. Wang, H., Dutta, B., Huang, W., Devoe, L. D., Leibach, F. H., Ganapathy, V. & Prasad, P. D. (1999) Human Na(⫹)-dependent vitamin C transporter 1 (hSVCT1): primary structure, functional characteristics and evidence for a nonfunctional splice variant. Biochim. Biophys. Acta 1461: 1–9. 8. Daruwala, R., Song, J., Koh, W. S., Rumsey, S. C. & Levine, M. (1999) Cloning and functional characterization of the human sodium-dependent vitamin C transporters hSVCT1 and hSVCT2. FEBS Lett. 460: 480 – 484. 9. Wang, Y., Mackenzie, B., Tsukaguchi, H., Weremowicz, S., Morton, C. C. & Hediger, M. A. (2000) Human vitamin C (L-ascorbic acid) transporter SVCT1. Biochem. Biophys. Res. Commun. 267: 488 – 494. 10. Tsukaguchi, H., Tokui, T., Mackenzie, B., Berger, U. V., Chen, X. Z.,
HUMAN SLC23A2 GENOMIC STRUCTURE Wang, Y., Brubaker, R. F. & Hediger, M. A. (1999) A family of mammalian Na⫹-dependent L-ascorbic acid transporters. Nature (Lond.) 399: 70 –75. 11. Faaland, C. A., Race, J. E., Ricken, G., Warner, F. J., Williams, W. J. & Holtzman, E. J. (1998) Molecular characterization of two novel transporters from human and mouse kidney and from LLC-PK1 cells reveals a novel conserved family that is homologous to bacterial and Aspergillus nucleobase transporters. Biochim. Biophys. Acta 1442: 353–360. 12. Stratakis, C. A., Taymans, S. E., Daruwala, R., Song, J. & Levine, M. (2000) Mapping of the human genes (SLC23A2 and SLC23A1) coding for vitamin C transporters 1 and 2 (SVCT1 and SVCT2) to 5q23 and 20p12, respectively. J. Med. Genet. 37: E20. 13. Quandt, K., Frech, K., Karas, H., Wingender, E. & Werner, T. (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23: 4878 – 4884. 14. Burset, M., Seledtsov, I. A. & Solovyev, V. V. (2000) Analysis of canonical and noncanonical splice sites in mammalian genomes. Nucleic Acids Res. 28: 4364 – 4375. 15. Buetow, K. H., Edmonson, M. N. & Cassidy, A. B. (1999) Reliable identification of large numbers of candidate SNPs from public EST data. Nat. Genet. 21: 323–325. 16. Makalowski, W. (2000) Genomic scrap yard: how genomes utilize all that junk. Gene 259: 61– 67 17. Moyzis, R. K., Torney, D. C., Meyne, J., Buckingham, J. M., Wu, J. R., Burks, C., Sirotkin, K. M. & Goad, W. B. (1989) The distribution of interspersed repetitive DNA sequences in the human genome. Genomics 4: 273–289. 18. Antonarakis, S. E. (1998) Recommendations for a nomenclature system for human gene mutations. Nomenclature Working Group. Hum. Mutat. 11: 1–3.
2627
19. Foster, C. B. & Chanock, S. J. (2000) Mining variations in genes of innate and phagocytic immunity: current status and future prospects. Curr. Opin. Hematol. 7: 9 –15. 20. Sachidanandam, R., Weissman, D., Schmidt, S. C., Kakol, J. M., Stein, L. D., Marth, G., Sherry, S., Mullikin, J. C., Mortimore, B. J., Willey, D. L., Hunt, S. E., Cole, C. G., Coggill, P. C., Rice, C. M., Ning, Z., Rogers, J., Bentley, D. R., Kwok, P. Y., Mardis, E. R., Yeh, R. T., Schultz, B., Cook, L., Davenport, R., Dante, M., Fulton, L., Hillier, L., Waterston, R. H., McPherson, J. D., Gilman, B., Schaffner, S., Van Etten, W. J., Reich, D., Higgins, J., Daly, M. J., Blumenstiel, B., Baldwin, J., Stange-Thomann, N., Zody, M. C., Linton, L., Lander, E. S. & Attshuler, D. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature (Lond.) 409: 928 –933. 21. Wang, D. G., Fan, J. B., Siao, C. J., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., Kruglyak, L., Stein, L., Hsie, L., Topaloglou, T., Hubbell, E., Robinson, E., Mittmann, M., Morris, M. S., Shen, N., Kilburn, D., Rioux, J., Nusbaum, C., Rozen, S., Hudson, T. J., & Lander, E. S. (1998) Large-scale identification, mapping, and genotyping of singlenucleotide polymorphisms in the human genome. Science (Washington, DC) 280: 1077–1082. 22. Halushka, M. K., Fan, J. B., Bentley, K., Hsie, L., Shen, N., Weder, A., Cooper, R., Lipshutz, R. & Chakravarti, A. (1999) Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet. 22: 239 –247. 23. Cargill, M., Altshuler, D., Ireland, J., Sklar, P., Ardlie, K., Patil, N., Shaw, N., Lane, C. R., Lim, E. P., Kalyanaraman, N., Nemesh, J., Ziaugra, L., Friedland, L., Rolfe, A., Warrington, J., Lipshutz, R., Daley, G. Q. & Lander, E. S. (1999) Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22: 231–238.