pseudogenes, pseudo-RF1 and pseudo-SLC25A5, were derived from processed mRNA transcripts as they were found to be uninterrupted, with no genomic ...
Molecular Psychiatry (2002) 7, 867–873 2002 Nature Publishing Group All rights reserved 1359-4184/02 $25.00 www.nature.com/mp
ORIGINAL RESEARCH ARTICLE
A transcript map encompassing a susceptibility locus for bipolar affective disorder on chromosome 4q35 IP Blair1, LJ Adams1, RF Badenhop1,4, MJ Moses1, A Scimone1, JA Morris2, L Ma2, CP Austin2, JA Donald3, PB Mitchell4 and PR Schofield1 Garvan Institute of Medical Research, Sydney, Australia; 2Merck Research Laboratories, West Point, PA, USA; Macquarie University, Department of Biological Sciences, Sydney, Australia; 4University of New South Wales, School of Psychiatry, and Prince of Wales Hospital, Mood Disorders Unit, Sydney, Australia
1 3
Bipolar affective disorder is one of the most common mental illnesses with a population prevalence of approximately 1%. The disorder is genetically complex, with an increasing number of loci being implicated through genetic linkage studies. However, the specific genetic variations and molecules involved in bipolar susceptibility and pathogenesis are yet to be identified. Genetic linkage analysis has identified a bipolar disorder susceptibility locus on chromosome 4q35, and the interval harbouring this susceptibility gene has been narrowed to a size that is amenable to positional cloning. We have used the resources of the Human Genome Project (HGP) and Celera Genomics to identify overlapping sequenced BAC clones and sequence contigs that represent the region implicated by linkage analysis. A combination of bioinformatic tools and laboratory techniques have been applied to annotate this DNA sequence data and establish a comprehensive transcript map that spans approximately 5.5 Mb. This map encompasses the chromosome 4q35 bipolar susceptibility locus, which localises to a ‘most probable’ candidate interval of approximately 2.3 Mb, within a more conservative candidate interval of approximately 5 Mb. Localised within this map are 11 characterised genes and eight novel genes of unknown function, which together provide a collection of candidate transcripts that may be investigated for association with bipolar disorder. Overall, this region was shown to be very gene-poor, with a high incidence of pseudogenes, and redundant and novel repetitive elements. Our analysis of the interval has demonstrated a significant difference in the extent to which the current HGP and Celera sequence data sets represent this region. Molecular Psychiatry (2002) 7, 867–873. doi:10.1038/sj.mp.4001113 Keywords: chromosome 4q35; bipolar disorder; susceptibility locus; transcript map; candidate genes
Introduction Bipolar affective disorder, also known as manicdepressive illness, is one of the most common mental illnesses with a population lifetime prevalence of approximately 1%.1 The disorder is characterised by aberrant mood swings resulting in periods of mania and depression, with reversion to otherwise normal behavior between episodes. The aetiology of bipolar disorder remains unknown. Family, twin, and adoption studies strongly implicate a hereditary component in bipolar disorder, and the familial clustering of the disorder provides an opportunity to use genetic approaches to identify the predisposing genes. Genetic linkage studies have implicated several chromosomal regions as harbouring genes that predispose to bipolar disorder. Putative sus-
Correspondence: Prof PR Schofield, Garvan Institute of Medical Research, 384 Victoria St, Sydney 2010, Australia. E-mail: p.schofield!garvan.org.au Received 2 August 2001; revised 12 November 2001; accepted 29 November 2001
ceptibility loci for bipolar have now been reported on chromosomes 1q, 4p, 4q, 5p, 6p, 7q, 12q, 13q, 15q, 16p, 18p, 18q, 21q, and 22q (reviewed by Owen et al2,3). Despite this growing evidence, the discrete genetic variations and molecules involved in bipolar susceptibility and pathogenesis remain elusive. Previous efforts to identify candidate bipolar susceptibility genes based upon function alone have, to date, met with little real success (reviewed by Craddock and Jones4). This candidate gene approach requires a significant understanding of disease biology in order to identify genes that may be involved in the disorder. Unfortunately, our understanding of bipolar disorder aetiology is poor, and consequently a plausible hypothesis can be proposed for investigating most genes that are expressed in the brain. Alternatively, the application of positional cloning strategies to identify susceptibility genes has become increasingly attractive and feasible as the data from the public and private genome projects have been made available. This has provided the best opportunity to investigate those regions implicated in bipolar linkage studies, in order to identify transcripts for investigation as candidate susceptibility genes.
Transcript map of a bipolar locus on chromosome 4q35 IP Blair et al
868
We previously reported evidence for a novel bipolar disorder susceptibility locus on chromosome 4q35 following linkage analysis in a large bipolar pedigree (Zmax = 2.39 at D4S1652).5 At least two other groups have now provided support for this region. Friddle et al6 reported a maximum multipoint HLOD score of 2.11 on chromosome 4q35 (between D4S408 and D4S426), following linkage analysis in 50 bipolar families. Willour et al7 also reported linkage analysis in 56 bipolar pedigrees, and identified a broad peak spanning chromosome 4q32–4q35 (maximum NPL = 2.7, P = 0.005). More recently, we have extended linkage analysis to a large cohort of 55 bipolar pedigrees, which includes the pedigree studied by Adams et al,5 and have shown that the lod score at chromosome 4q35 increased (Zmax = 3.2 at D4S1652), strongly supporting the existence of a susceptibility gene at this locus.8 Subsequent haplotype analysis in 16 pedigrees showing suggestive linkage to chromosome 4q35 identified a most probable candidate region of 24 cM flanked by markers D4S3051 and 4qTEL13, which is encompassed within a more conservative candidate interval of 43 cM from D4S1540 to the telomere. By generation of a complete YAC and partial BAC contig, we were able to demonstrate that the ‘most probable’ and ‘more conservative’ candidate intervals spanned approximately 2.3 Mb and 5.0 Mb respectively.8 We have subsequently used the resources of the Human Genome Project (HGP) and Celera Genomics to identify a complete collection of overlapping sequenced BAC clones and sequence contigs that represent the region implicated by linkage analysis. A combination of bioinformatic approaches and laboratory techniques has been applied to annotate these data and establish a comprehensive transcript map that spans approximately 5.5 Mb and encompasses the bipolar disorder susceptibility locus on chromosome 4q35. This map includes 11 known and eight novel transcripts, which together provide a collection of genes that may be investigated for association with bipolar disorder.
Materials and methods Sequence identification Sequence databases searches were performed using the basic local alignment search tool (BLAST)9 available through the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/). HGP genomic sequence assemblies were identified using NCBI Map Viewer and genomic sequences were accessed through DDBJ, EMBL, and GenBank databases including NR (HGP phase 3, finished sequences) and HTGS (high throughput genomic sequences, HGP phase 0, 1 and 2, unfinished sequences).10 Genomic sequences from the Celera Genomics genome project were accessed through the Celera Publication Site (http://www.celera.com/).11 Expressed sequence tag (EST) sequences were accessed through the DDBJ, EMBL, and GenBank database, dbEST. Molecular Psychiatry
Sequence annotation Overlapping sequences were assembled using SeqMan II sequence analysis computer software (DNASTAR Inc, Madison, WI, USA). Known genetic linkage and sequence tagged site (STS) markers were identified using the UniSTS analysis computer program12 available through the NCBI web site. Genomic sequences were screened for interspersed repeats and low complexity DNA sequences using the RepeatMasker2 computer program of Smit and Green (http://ftp.genome. washington.edu/cgibin/RepeatMasker). Transcript maps from the genome-wide sequence annotation projects of Ensembl (Sanger Centre), NCBI human genome map viewer, and UCSC human genome browser were accessed at the following web sites (http://www. ensembl.org/; http://www.ncbi.nlm.nih.gov/cgi-bin/ Entrez/map search; http://genome. ucsc.edu/golden Path/hgTracks.html). Putative genetic transcripts were predicted using the GENSCAN gene identification computer program13 accessed through the Pasteur Institute web site (http://bioweb.pasteur.fr/seqanal/ interfaces/genscan-simple.html). RT-PCR and sequencing RT-PCR was carried out on the following templates: reverse transcribed RNA derived from human lymphoblastoid cell lines, human brain tissue, and human embryonic kidney (HEK293) cell line, as well as on DNA from the following seven cDNA libraries (all from Clontech Laboratories Inc, Palo Alto, CA, USA): human placenta 5!-Stretch Plus cDNA library, human testis cDNA library, human fetal adrenal cDNA library, human pancreas 5!-Stretch cDNA library, human brain-hypothalamus 5! Stretch cDNA library, human hippocampus cDNA library, human brain-amygdala 5! Stretch Plus cDNA library. Total RNA was isolated from 107 cells using TRIZOL reagent (Gibco BRL, Rockville, MD, USA). Reverse transcription was carried out with the SUPERSCRIPT Preamplification system (Gibco BRL) using an oligo d(T) primer. RT-PCR was carried out in 25-l volumes containing 10 mM Tris-Cl, pH 8.3; 50 mM KCI; 2.0 mM MgCl2; 200 "M each dNTP; 20 ng reverse transcribed template; 20 pmole each primer, and 1 U AmpliTaq Gold DNA polymerase (Applied Biosystems, Foster City, CA, USA). Primers to amplify all potential transcripts were designed from putative coding regions, primer sequences and amplification conditions are available from the authors on request. Reactions were cycled with initial denaturation at 94°C for 12 min, followed by 30 cycles of 94°C for 30 s, Ta for 45 s, 72°C for 1 min, and a final extension of 10 min at 72°C. RT-PCR products were visualised on 2% agarose gels. Amplified products were purified (QIAquick PCR purification, Qiagen, Valencia, CA, USA) and subjected to automated sequence analysis using the ABI377 automated sequencer with BigDye terminator sequencing (Applied Biosystems Inc). Radiation hybrid mapping The Genebridge 4 radiation hybrid DNA panel14 was obtained from Research Genetics (Huntsville, AL,
Transcript map of a bipolar locus on chromosome 4q35 IP Blair et al
USA). Radiation hybrid DNAs were screened by PCR and results visualised on agarose gels. Results were analysed in the context of the Whitehead genome wide radiation hybrid framework map. Data were submitted electronically for analysis using the web-based software at http:// carbon.wi.mit.edu:8000/cgi-bin/ contig/rhmapper.pl. Map positions were determined with a lod score of 15 (ie likelihood of 1 × 1015).
Results A sequence based framework map of chromosome 4q35 Previous genetic linkage and haplotype refinement using a pedigree-specific, identity-by-descent, allelesharing method, had led to the identification of a 5.5 Mb region on chromosome 4q35 that contains a bipolar disorder susceptibility gene.8 We have now applied a variety of bioinformatic tools to identify and combine previously unassembled HGP genomic sequence data derived from BAC clones spanning this 5.5 Mb chromosome 4q35 interval. We initially searched the draft sequence of the human genome derived from the HGP (HGP working draft sequence segments) for sequences representing this candidate interval. Five discontinuous assemblies of draft sequences (with each assembly consisting of sequences derived from at least two or more BACs) were identified (NCBI RefSeq accessions, NT 006410, NT 006169, NT 022841, NT 022800, NT 022835). When combined, these assemblies represent a total of 26 BAC clones. Each assembly consists of collections of multiple, unordered sequence fragments (ie draft sequences). In addition, draft sequence segments were identified for seven stand-alone BAC clones (386B13, 219G10, 462G22, 495M19, 215O6, 30B7, 55J24). Draft sequence fragments from all BAC clones were combined using the SeqMan computer program to generate multiple contiguous sequence assemblies. The BLAST tool was used to screen genomic sequence databases with the end sequences of each of these localised sequence contigs, as well as with sequences of genetic markers that define the chromosome 4q35 susceptibility locus. This identified draft sequences from an additional 12 BAC clones (126K10, 1151G9, 301L8, 390N2, 629I23, 713C19, 16L12, 756P10, 553E4, 45F23, AF250324, AF146191). We assembled all identified sequences to generate a framework map that consists of five BAC contigs representing 41 BAC clones, and four stand-alone BACs. This BAC sequence framework (spanning D4S1540 to the telomere) was combined with data derived from the Celera human genome assembly of draft sequences. Thirty-six known genetic and sequence tagged site (STS) markers were identified in the assembled sequences and incorporated to generate a sequence-based framework map of chromosome 4q35 (Figure 1). Figure 1 shows that sequences from the Celera data set do not extend to the telomere, and only encompass approximately one half of the candidate interval for the bipolar disorder susceptibility locus. Therefore, the
telomeric portion of the interval, distal to D4S3200, is represented solely by sequences derived from BAC clones. There are at least two significant gaps in sequence coverage located on either side of BAC 45F23, in the telomeric portion of the interval. Six of the 45 BAC clones in the assembly had previously been incorrectly localised to a variety of other chromosomes (756P10, 553E4, 354H17, 45F23, 215O6, 55J24), and two had not previously been localised to any chromosome (16L12, 462G22). The majority of sequences from BAC clones are in working draft or unfinished form, and are present as collections of multiple, non-overlapping sequence fragments. It is therefore difficult to accurately estimate the physical size of the region that is represented solely by contigs of these clones. However, by extrapolating the approximate physical distance indicated by the proximal Celera draft sequence assembly, the region shown in Figure 1 can be estimated to span approximately 5.5 Mb. The most probable candidate interval for the bipolar disorder susceptibility gene (D4S3051–4qTEL13) can be estimated to span approximately 2.3 Mb, and the more conservative candidate interval (D4S1540-telomere), approximately 5 Mb. These physical distances are consistent with earlier estimates proposed by Badenhop et al.8 A screen of sequences from the region shown in Figure 1 for common repetitive elements identified a high incidence (40–60%) of common interspersed repeat sequence elements from multiple repeat class families including LINE/L1, L2, and CR2; SINE/Alu and MIR; LTR/MaLr, ERV1, and ERVL; MER1 and MER2; tRNA and rRNA; and various low complexity repeats. The region of highest incidence of all repeat elements, common and novel, was identified distal to D4S3200, which corresponds to the region that is not represented by sequences from the Celera data set.
869
Transcript identification A combination of bioinformatic tools and laboratory techniques was used to identify and confirm the existence of transcripts within the interval shown in Figure 1. BLAST searches of GenBank sequence databases were performed using all identified sequence data across the interval, to identify known genes and expressed sequence tag (EST) sequences. In order to manage the large quantity of sequence data for BLAST searches, we successively submitted all individual BAC sequences with common repetitive elements masked (RepeatMasker2). We compared the known transcripts that were identified in this analysis (Figure 1) with the transcript maps of three genomewide sequence annotation projects from the Sanger Centre (Ensembl), NCBI (Map Viewer), and UCSC (Genome Browser). The cohort of transcripts identified in our analysis includes all known genes shown in these three public assemblies. We also analysed all genomic sequences with GENSCAN to predict additional novel transcripts. To manage the large quantity of sequence data, we successively submitted all individual BAC sequences. Because of the highly repetitive and complex nature of the Molecular Psychiatry
Transcript map of a bipolar locus on chromosome 4q35 IP Blair et al
870
Figure 1 Integrated transcript map of chromosome 4q35 encompassing the locus for a bipolar disorder susceptibility gene. The broad bar indicates the ‘most probable’ candidate interval for the 4q35 bipolar susceptibility locus.8 This is encompassed within a more conservative candidate interval that extends from D4S1540 to the telomere. Genetic markers that are grouped above a horizontal bar localise to the same BAC clone but their order could not be determined. Transcripts are represented by horizontal lines. Where possible, an arrowhead indicates direction of transcription (discontinuous draft genomic sequences prevented this being determined for all transcripts). The order of BACs 629I23 and 713C19 remains unresolved although their localisation within the framework is accurate. BAC 55J24 is tentatively localised at the telomere, however its precise location within this framework is yet to be determined.
sequence within the region, this analysis generated a large number of short predicted transcripts (many for partial LINE, ERV, and MER common repeat elements) for which we could not find EST sequences to support transcriptional activity. We similarly found the transcript assemblies of the Sanger Centre, NCBI, and UCSC to contain a remarkable number, and complexity, of predicted novel transcripts and exons. On closer inspection, many of these predicted transcriptional units were also found to be very short and without any supporting evidence for transcriptional activity. We therefore applied a strategy for analysing these presumed transcripts whereby putative genes were only retained in the map when evidence of transcriptional activity was provided by one or both of the following additional criteria: the predicted gene was represented by EST sequences from multiple tissue or library sources, and/or successful amplification of transcripts was achieved by RT-PCR. Primers were designed for each predicted novel transcript to test for transcriptional activity by RT-PCR. RT-PCR products were sequenced to confirm identity and radiation hybrid mapping performed to ensure they were derived Molecular Psychiatry
from a chromosome 4q35 locus. Details of novel transcripts that were retained in the transcript map following this serial analysis are provided in Table 1. Map locations for these transcripts are indicated in Figure 1. Eleven characterised genes, eight novel genes of unknown function, and eight putative pseudogenes were identified. This map represents the most comprehensive analysis to date, of sequences spanning this interval, and confirms the gene-poor nature of the region. A high incidence of pseudogenes has previously been described in an investigation of a 161-kb sequence fragment spanning the telomeric locus D4Z4, on chromosome 4q35.15 We found evidence for the presence of at least eight putative pseudogenes in the interval shown in Figure 1, seven of which localise to the distal half of the region. At least two of these pseudogenes (pseudo-RF1 and pseudo-RPL7a) are present in the public transcript maps/databases as predicted novel transcripts. Sequence analysis indicates that two of the pseudogenes, pseudo-RF1 and pseudo-SLC25A5, were derived from processed mRNA transcripts as they were found to be uninterrupted, with no genomic structure
Transcript map of a bipolar locus on chromosome 4q35 IP Blair et al
Table 1 Details of novel transcripts identified within the candidate region for a bipolar disorder susceptibility gene on chromosome 4q35 Interim gene symbol
Interim gene/protein name
mRNAa
Homology
Transcript evidence
Brain/CNS expressionc
KIAA1430
KIAA1430 protein
Partial, AB037851
EST
Brain, CNS
MSTP043
MSTP043 protein
M. musculus putative proteinsb, BAB30422, BAB32206, BAB32244 D. melanogaster CG1514 C. elegans sel-1 D. melanogaster CG16979 None identified
EST
Brain
EST RT-PCR, EST
Brain, CNS Brain
EST
None identified
RT-PCR, EST RT-PCR, EST
Brain Brain, CNS
RT-PCR, EST
Brain
KIAA1325 FLJ11200 PRO0618 DKFZp564J102 C4q35T1 C4q35T2
Putative full length, NM 031953 KIAA1325 protein Partial, AB037746 Hypothetical protein Putative full length, FLJ11200 NM 018359 PRO0618 protein Putative full length, NM 014133 DKFZP564J102 protein Partial, AL080065 4q35 novel transcript 1 Partial, AK022114
4q35 novel transcript 2 UniGene Cluster Hs.276429
None identified M. musculus putative proteinb, BAB23507; Cytochrome p450 Sjogren syndrome autoantigen
871
GenBank database accession; bEntrez database accession; cEST sources as indicated by UniGene.
a
retained from their functional homologues (on chromosomes 6 and X, respectively). Pseudo-RF1 is represented in the GenBank database under accession XM—004379, and the functional RF1 under accession NM—019041, however both sequences were previously assigned the same interim gene symbol (LOC54516). Pseudo-RF1 differs from RF1 by nine single base substitutions. Sequencing of RT-PCR products from a wide range of tissues indicated that only RF1 from chromosome 6 was transcriptionally active, as no pseudo-RF1 sequences were identified. In contrast, pseudo-SLC25A5 contained multiple nonsense and missense mutations as well as small insertions and deletions that clearly indicate this gene sequence has been rendered inactive. In addition, no EST sequences were found to represent pseudo-RF1 and pseudoSLC25A5 in dbEST. The remaining six putative pseudogene sequences shown in Figure 1 appear to be derived from chromosomal duplication/rearrangement events, with all pseudogenes retaining some level of genomic structure. We observed sequence divergence and variation that clearly indicates that these non-processed pseudogene sequences have been rendered inactive. This was supported by the absence of transcript amplification following RT-PCR using specific primers for all putative pseudogenes shown in Figure 1. In addition, no identified pseudogene was represented by EST sequences in dbEST. At least three of the eight pseudogenes (pseudo-YY1, pseudo-RF1, and pseudo-RPL7a) are derived from functional transcripts that localise to subtelomeric regions of other chromosomes (YY1, 14q32; RF1, 6q25–q26; RPL7a, 9q33–q34). The functional gene FRG1, has previously been shown to be represented by FRG1-like pseudogene sequences from multiple chromosomes,16,17 at least four of which we found to include some level of conserved genomic structure as
determined by our BLAST analysis of the HTGS database. Similarly, we found evidence of related sequences on multiple chromosomes for the transcript FLJ11200 following radiation hybrid mapping. It was not determined whether these related sequences also represent pseudogenes.
Discussion We have established a comprehensive transcript map of chromosome 4q35 that encompasses the bipolar disorder susceptibility locus originally described by Adams et al5 and recently refined by Badenhop et al.8 This map includes 45 sequenced BAC clones, and genomic sequence fragments from the Celera human genome data set, within a framework of 36 genetic linkage markers and sequence tagged sites. Localised within this map are 11 characterised genes and eight novel genes of unknown function. Together, these genes provide a collection of candidate transcripts that may be investigated for association with bipolar disorder. The locus for facioscapulohumeral muscular dystrophy (FSHD) also localises to chromosome 4q35 and has been the subject of a concerted effort to identify the associated molecular defect since linkage was first described in 1990.18 While no specific gene has been identified, FSHD is now known to be associated with large deletions of repeat elements associated with the D4Z4 locus. These deletions do not appear to disrupt a functional gene but are thought to interfere with the expression of an unidentified transcript(s) located proximal to D4Z4 (reviewed by Tawil et al19). In addition, linkage analysis has identified loci on chromosome 4q35 for several other disorders including benign intraepthelial dyskeratosis (DKBI),20 Bietti crystalline corneoretinal dystrophy (BCD),21 Buekes type hip dysplasia (BHD),22 and prelingual nonprogressive Molecular Psychiatry
Transcript map of a bipolar locus on chromosome 4q35 IP Blair et al
872
autosomal dominant nonsyndromic hearing loss (DFNA24).23 The transcript map presented in this study may also provide candidate genes for investigation in these disorders. The size of the most probable candidate region for the bipolar disorder susceptibility gene on chromosome 4q35 was estimated at 2.3 Mb, encompassed within a more conservative interval of approximately 5 Mb. The size of this interval is amenable to positional cloning. The relationship between the physical size of the candidate interval (‘most probable’ and ‘more conservative’) and the genetic distance between flanking markers (24 cM and 43 cM respectively), reflects the increased recombination seen in chromosome 4q35 that is characteristic of telomeric regions.24 The overall cM:Mb ratio for this region can be estimated at between 8.6 and 10.4. Detailed analysis of sequences across the candidate interval identified a high incidence of common repetitive sequences from a wide range of repeat class families. These were particularly prevalent in the 2 Mb region distal to D4S3200 and extending to the telomere. van Geel et al15 reported a similar proportion of repetitive sequences (45%) after analysis of 161 kb of DNA sequence encompassing the FRG1 gene. It was interesting to find that the telomeric portion of the interval (ie distal to D4S3200) was not represented in the Celera draft genomic sequence data set, and corresponds to the region containing the highest incidence of redundant and novel repetitive sequence elements. This suggests that the shotgun sequence strategy used by Celera Genomics for assembling the sequence of the human genome11 is less efficient for assembling the sequences of multiple interspersed and contiguous repeat elements in this telomeric region. The gene-poor nature of the interval under study has been indicated in previous studies to characterise the FSHD locus. If we accept the HGP estimate of 30 000– 40 000 protein coding genes in the human genome,10 then we may expect, on average, that a 5-Mb interval will contain between 50 and 67 genes. We identified only 19 genes within the 5-Mb interval shown in Figure 1, confirming the gene-poor nature of this region. Indeed, only four putative functional genes were identified in the interval spanning approximately 3.5 Mb from D4S2643 to the telomere. This number may actually be lower, as two of the four genes in this interval distal to D4S2643 (TUBB4Q and HSPCAL2) may also represent pseudogene sequences. The TUBB4Q gene is a putative novel member of the betatubulin gene family, which was identified following exon trapping and subsequent computer analysis to predict a full length cDNA.25 van Geel et al25 proposed that TUBB4Q is a pseudogene, as they found no evidence that the gene was transcriptionally active following RT-PCR from a wide range of tissues. No EST sequences for TUBB4Q were identified, supporting the suggestion that this represents a pseudogene. Multiple beta-tubulin pseudogene sequences are distributed throughout the genome.26–29 Cleveland and Sullivan28 suggested that the human genome contains a few auth-
Molecular Psychiatry
entic tubulin genes within a ‘sea of pseudogenes’. The HSPCAL2 gene was identified following a screen for sequences that are homologous to the heat shock protein HSPCA gene.30 This study identified three additional putative HSPCA homologues on various chromosomes, which were labelled HSPCAL2, −3, and −4. To date, only HSPCA and HSPCAL4 have been shown to be functional.31 Our database searches identified no EST sequences for HSPCAL2, suggesting that this may also represent a pseudogene. Transcripts identified in this study can now be prioritised for investigation in bipolar disorder based upon expression or known function. Most of the transcripts that were identified within the bipolar disorder candidate interval show some level of expression in brain and are therefore candidate genes on the basis of expression alone. C4q35T2 is the only gene with evidence for transcriptional activity that we could identify within the ‘most probable’ candidate interval shown in Figure 1. Analysis of publicly available sequences identified no significant homologies, however the sequence does contain a SPRY domain. SPRY domains are of unknown function, although it has been suggested that they may be involved in protein–protein interactions.32 The putative function of at least one other gene, MTNR1A, suggests that it should also be prioritised for investigation in bipolar disorder. MTNR1A, is a high affinity melatonin receptor. Melatonin plays a role in regulation of circadian rhythms and sleep, and is implicated in the regulation of reproduction, tumour growth, aging, and mood (reviewed by Brzezinski33). MTNR1A appears to regulate both the circadian and reproductive effects of melatonin.34 Abnormal circadian rhythms have been described in patients with affective disorders. Several studies have also shown abnormal concentrations of melatonin in bipolar patients and differing levels of plasma melatonin in the manic and depressive phases of the disorder (reviewed by Pacchierotti et al35). The subtelomeric region of chromosome 4q35 has previously proven difficult to characterise, probably due to the complex arrangement of redundant and novel repetitive elements. The map described in this study therefore makes a significant contribution to our knowledge of this region and provides candidate genes for investigation for association with bipolar disorder and other disorders linked to this telomeric region. Acknowledgements This study was supported by the National Health and Medical Research Council (Australia) via grants to the Garvan Institute of Medical Research (Block Grant 993050), the Mood Disorders Unit, Prince of Wales Hospital (Program Grant 993208), the Network of Brain Research into Mental Disorders (Unit Grant 983302), and by a research contract from Merck & Co and Merck Sharpe Dohme, Australia through PsyGene Pty Ltd, and by the New South Wales Health Department Infrastructure Grant Program.
Transcript map of a bipolar locus on chromosome 4q35 IP Blair et al
References 1 Weissman MM, Leaf PJ, Tischler GL, Blazer DG, Karno M, Bruce ML, Florio LP. Affective disorders in five United States communities. Psychol Med 1988; 18: 141–153. 2 Owen MJ, Cardno AG. Psychiatric genetics: progress, problems, and potential. Lancet 1999; 354 Suppl 1: SI11–SI14. 3 Owen MJ, Cardno AG, O’Donovan MC. Psychiatric genetics: back to the future. Mol Psychiatry 2000; 5: 22–31. 4 Craddock N, Jones I. Molecular genetics of bipolar disorder. Br J Psychiatry 2001; 178 (Suppl 41): S128–S133. 5 Adams LJ, Mitchell PB, Fielder SL, Rosso A, Donald JA, Schofield PR. A susceptibility locus for bipolar affective disorder on chromosome 4q35. Am J Hum Genet 1998; 62: 1084–1091. 6 Friddle C, Koskela R, Ranade K, Hebert J, Cargill M, Clark CD et al. Full-genome scan for linkage in 50 families segregating the bipolar affective disease phenotype. Am J Hum Genet 2000; 66: 205–215. 7 Willour VL, Zandi PP, MacKinnon DF, Simpson SG, Potash JB, Gershon ES et al. Genome scan on fifty-six multiplex bipolar pedigrees collected by the NIMH genetics initiative (Bipolar Disorder) [Abstract]. Am J Hum Genet 2001; 69: (Supplement): 527. 8 Badenhop RF, Moses MJ, Scimone A, Adams LJ, Kwok JBJ, Jones A-M et al. Genetic refinement of a probable disease haplotype and physical mapping of a 2.3 Mb region defining a bipolar affective disorder susceptibility locus on chromosome 4q35. Am J Med Genet (submitted). 9 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990; 215: 403–410. 10 International human genome sequencing consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409: 860–921. 11 Venter CJ, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG et al. The sequence of the human genome. Science 2001; 291: 1304–1351. 12 Schuler GD. Electronic PCR: bridging the gap between genome mapping and genome sequencing. Trends Biotechnol 1998; 16: 456–459. 13 Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 1997; 268: 78–94. 14 Gyapay G, Schmitt K, Fizames C, Jones H, Vega-Czarny N, Spillett D et al. A radiation hybrid map of the human genome. Hum Mol Genet 1996; 5: 339–346. 15 van Geel M, Heather LJ, Lyle R, Hewitt JE, Frants RR, de Jong PJ. The FSHD region on human chromosome 4q35 contains potential coding regions among pseudogenes and a high density of repeat elements. Genomics 1999; 61: 55–65. 16 van Deutekom JC, Lemmers RJ, Grewal PK, van Geel M, Romberg S, Dauwerse HG et al. Identification of the first gene (FRG1) from the FSHD region on human chromosome 4q35. Hum Mol Genet 1996; 5: 581–590. 17 Grewal PK, van Geel M, Frants RR, de Jong P, Hewitt JE. Recent amplification of the human FRG1 gene during primate evolution. Gene 1999; 227: 79–88. 18 Wijmenga C, Frants RR, Brouwer OF, Moerer P, Weber JL, Padberg GW. Location of facioscapulohumeral muscular dystrophy gene on chromosome 4. Lancet 1990; 336: 651–653.
19 Tawil R, Figlewicz DA, Griggs RC, Weiffenbach B, FSH Consortium. Facioscapulohumeral dystrophy: a distinct regional myopathy with a novel molecular pathogenesis. Ann Neurol 1998; 43: 279–282. 20 Allingham RR, Seo B, Rampersaud E, Bembe M, Challa P, Liu N et al. A duplication in chromosome 4q35 is associated with hereditary benign intraepithelial dyskeratosis. Am J Hum Genet 2001; 68: 491–494. 21 Jiao X, Munier FL, Iwata F, Hayakawa M, Kanai A, Lee J et al. Genetic linkage of Bietti crystallin corneoretinal dystrophy to chromosome 4q35. Am J Hum Genet 2000; 67: 1309–1313. 22 Roby P, Eyre S, Worthington J, Ramesar R, Cilliers H, Beighton P et al. Autosomal dominant (Beukes) premature degenerative osteoarthropathy of the hip joint maps to an 11-cM region on chromosome 4q35. Am J Hum Genet 1999; 64: 905–909. 23 Ha¨ fner FM, Salam AA, Linder TE, Balmer D, Baumer A, Schinzel AA et al. A Novel locus (DFNA24) for prelingual nonprogressive autosomal dominant nonsyndromic hearing loss maps to 4q35-qter in a large Swiss German kindred. Am J Hum Genet 2000; 66: 1437–1442. 24 Yu A, Zhao C, Fan Y, Jang W, Mungall AJ, Deloukas P et al. Comparison of human genetic and sequence-based physical maps. Nature 2001; 409: 951–953. 25 van Geel M, van Deutekom JC, van Staalduinen A, Lemmers RJ, Dickson MC, Hofker MH et al. Identification of a novel beta-tubulin subfamily with one member (TUBB4Q) located near the telomere of chromosome region 4q35. Cytogenet Cell Genet 2000; 88: 316–321. 26 Cowan NJ, Wilde CD, Chow LT, Wefald FC. Structural variation among human beta-tubulin genes. Proc Nat Acad Sci USA 1981; 78: 4877–4881. 27 Lee MG-S, Lewis SA, Wilde CD, Cowan NJ. Evolutionary history of a multigene family: an expressed human beta-tubulin gene and three processed pseudogenes. Cell 1983; 33: 477–487. 28 Cleveland DW, Sullivan KF. Molecular biology and genetics of tubulin. Ann Rev Biochem 1985; 54: 331–365. 29 Floyd-Smith GA, de Martinville B, Francke U. A beta-tubulin expressed gene M40 (TUBB)) is located on human chromosome 6 and two related pseudogenes are located on chromosomes 8 (TUBBP1) and 13 (TUBBP2). Cytogenet Cell Genet 1985; 40: 629. 30 Ozawa K, Murakami Y, Eki T, Soeda E, Yokoyama K. Mapping of the gene family for human heat-shock protein 90-alpha to chromosomes 1, 4, 11, and 14. Genomics 1992; 12: 214–220. 31 Vamvakopoulos NC, Griffin CA, Hawkins AL, Lee C, Chrousos GP, Jabs EW. Mapping the intron-containing human hsp90 alpha (HSPCAL4) gene to chromosome band 14q32. Cytogenet Cell Genet 1993; 64: 224–226. 32 Ponting C, Schultz J, Bork P. SPRY domains in ryanodine receptors (Ca2+-release channels). Trends Biochem Sci 1997; 22: 193–194. 33 Brzezinski A. Melatonin in humans. N Engl J Med 1997; 336: 186–195. 34 Reppert SM, Weaver DR, Ebisawa T. Cloning and characterization of a mammalian melatonin receptor that mediates reproductive and circadian responses. Neuron 1994; 13: 1177–1185. 35 Pacchierotti C, Iapichino S, Bossini L, Pieraccini F, Castrogiovanni P. Melatonin in psychiatric disorders: a review on the melatonin involvement in psychiatry. Front Neuroendocrinol 2001; 22: 18–32.
873
Molecular Psychiatry