DNA RESEARCH 4, 307-313 (1997)
Prediction of the Coding Sequences of Unidentified Human Genes. VIII. 78 New cDNA Clones from Brain Which Code for Large Proteins in vitro Ken-ichi ISHIKAWA, Takahiro NAGASE, Daisuke NAKAJIMA, Naohiko SEKI, Miki OHIRA, Nobuyuki MlYAJlMA, Ayako TANAKA, Hirokazu KOTANI, Nobuo NOMURA, and Osamu OHARA* Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292, Japan (Received 19 September 1997)
1.
Introduction
To accumulate information on the coding sequences of unidentified human genes, we have initiated a project for the sequencing of entire cDNA inserts which correspond to relatively long transcripts.1 In particular, we focused on analysis of human brain cDNA clones which code for large proteins since many genes encoding large proteins are known to play an important role in mammals.2'3 In the preceding paper, we developed a screening procedure based on the protein-coding potentiality assay in vitro, and successfully determined the coding sequences of 100 new cDNA clones coding for large proteins. The results of analysis convinced us that this method of selection worked well for discovery of biologically important genes. As an extension of the preceding report, we herein report the coding sequence features of 78 new cDNA clones which have the potential to code for large proteins in vitro. The growing list of genes encoding large proteins offers us a wealth of information regarding the primary structures of human proteins that are hard to identify by conventional gene discovery methods. *
Communicated by Mituru Takanami To whom correspondence should be addressed. Tel. +81-43852-3913, Fax. +81-438-52-3914, E-mail:
[email protected]
2.
Materials and Methods
2.1. The source and screening of cDNA clones cDNA clones were randomly sampled from fractions 3 to 6 (average insert size = 4.5, 5.3, 6.1, and 7.0 kb) of the size-fractionated cDNA libraries from human brain previously constructed2 and new clones to be sequenced were selected as described.3 In brief, the clones carrying unidentified sequences at both termini were first selected by search of the GenBank database (release 93.0), except expressed sequence tags, and then the clones producing proteins of 50 kDa and over (hereafter termed large proteins) in the in vitro transcription/translation system were isolated. 2.2. Other methods DNA sequencing, homology search of the predicted protein-coding sequences, expression analysis of the sequenced cDNA clones by reverse transcription-coupled PCR (RT-PCR) and chromosomal mapping were carried out as described previously.2'3 During the course of assembling the cDNA sequences, we noted that the deduced sequences for 4 clones (KIAA0440 to KIAA0443) harbor two relatively long open reading frames (ORFs) adjacent to each other. The sequence between the two ORFs was re-determined by direct analysis of the RT-PCR prod-
Downloaded from http://dnaresearch.oxfordjournals.org/ by guest on June 1, 2013
Abstract As a part of our project for accumulating sequence information of the coding regions of unidentified human genes, we herein report the sequence features of 78 new cDNA clones isolated from human brain cDNA libraries as those which may code for large proteins. The sequence data showed that the average size of the cDNA inserts and their open reading frames was 6.0 kb and 2.8 kb (925 amino acid residues), respectively, and these clones produced the corresponding sizes of protein products in an in vitro transcription/translation system. Homology search against the public databases indicated that the predicted coding sequences of 68 genes contained sequences similar to known genes, 69% of which (47 genes) were related to cell signaling/communication, nucleic acid management, and cell structure/motility. The expression profiles of these genes in 14 different tissues have been analyzed by the reverse transcription-coupled polymerase chain reaction method, and 8 genes were found to be predominantly expressed in the brain. Key words: large proteins; in vitro transcription/translation system; cDNA sequencing; expression profile; chromosomal location; brain
Prediction of Unidentified Human Genes
308
ucts generated from human brain mRNAs, and a single long ORF was finally deduced for each clone. The result implies that either the interruptions in these clones were generated during the course of cDNA library construction or unspliced introns have been retained. 3.
Results and Discussion
man proteins), "homologous" (> 90% to non-human protein entries), "related" (30-90% to any entry), or "weakly related" (< 30% to any entry) on the basis of the results of alignment analyses with the GAP program. Other sequence features noted are summarized below . 1 The prevalence of the C2H2-type zinc finger domain in newly predicted gene products was 8%, which is close to that previously reported.3 This frequency is considerably higher than that expected from the widely accepted notion that there are several hundred C2H2-type zinc finger genes in the human genome n 2 Besides the zinc finger domain described above, two domains were frequently found in the genes discovered in this project: the Dbl homology domain and the Hect domain. The Dbl homology domain is known to be present in many regulators of small GTP-binding proteins.12 KIAA0424 was found to carry this domain. We also noted that some of the previously characterized genes harbor this domain (KIAA0006, KIAA0142, KIAA0294, KIAA0337, KIAA0362, KIAA0380, and KIAA0382). The Hect domain, which appeared in the carboxy-terminal regions of several ubiquitinprotein ligases,13 was present in KIAA0439 as well as KIAA0010, KIAA0032, KIAA0045, KIAA0093, KIAA0312, KIAA0317, and KIAA0322, previously reported. 3 Three genes exhibited homology to disease genes: KIAA0405 to the gene of platelet glycoprotein Ib a chain for an autosomal dominant bleeding disorder, platelet-type von Willebrand disease,14 KIAA0425 to a candidate gene DXS6673E for X-linked mental retardation in Xql3.1,9 and KIAA0458 to the gene of atrophin-1 for a neurological disorder dentatorubral and pallidoluysian atrophy.15 The overall identities of amino acid residues were 24%, 46%, and 57%, respectively.
4 Some gene products exhibited structural similarities, not in the limited region but in the overall region 3.2. Functional classification of the predicted gene prodof the molecules, to the characterized genes in our ucts ongoing project. KIAA0411 and KIAA0456 showed The gene products newly predicted were tentatively similarity to KIAA0131 which was identified to be sorted according to the results of homology/motif 34 Rho-GAP hematopoietic protein Cl 7 with identisearches as described. 87% of the predicted gene prodties of 45% and 55%, respectively. KIAA0407 caructs exhibited sequence similarity to known genes, and ried an amino-terminal extension consisting of 648 60% were classified as functional proteins relating to cell amino acid residues of MET-hepatocyte growth facsignaling/communication, nucleic acid management, and tor receptor family SEP protein,8 and was related cell structure/motility on the basis of homology to known to KIAA0315 and KIAA0463 with overall identities gene products or sequence motifs tightly linked to cerof 45% and 38%, respectively. KIAA0425 showed tain functions. The remaining genes showed similarity to the identity of 46% to KIAA0385 which was idenfunctionally unclassified genes. The results of the analytified to be DXS6673E.9 KIAA0433 and KIAA0439 sis are given in Table 2, where the degree of overall simishowed identities of 64% and 34% to KIAA0377 and larity is indicated as "identical" (> 90% to entries of huKIAA0093, respectively.
Downloaded from http://dnaresearch.oxfordjournals.org/ by guest on June 1, 2013
3.1. Sequence analysis and prediction of the proteincoding regions of cDNA clones Following the two-step screening strategy previously described, we isolated 50 new clones possibly coding for large proteins and determined their entire sequences. Separately, we have evaluated the brain cDNA libraries employed by comparing the sequence patterns of randomly sampled cDNA clones (in the accompanying paper: Seki et al.5). Since 28 clones included in the data were found to code for large proteins in vitro, the sequence features of all 78 clones were analyzed together. The average size of the cDNA inserts and their ORFs reached 6.0 kb and 2.8 kb (925 amino acid residues), respectively. Physical maps of the analyzed cDNA clones, except for 28 clones whose maps appear in the accompanying paper by Seki et al. in this issue,5 are shown in Fig. 1, where the ORFs and the first ATG codons in respective ORFs are indicated by solid boxes and triangles, respectively. The in-frame termination codons upstream of the first ATG codon were identified in 26 clones, among which 17 clones carried the ATG codon within the contexts following Kozak's rule.6 Six genes had 5'-untranslated regions longer than 1 kb. As described in the cautionary note by Kozak,6 we could not completely rule out the possibility that these clones retained the intron upstream of the first ATG codon, since the RNAs used to construct the cDNA libraries contained heterogeneous nuclear RNAs in addition to the cytoplasmic mRNAs. Table 1 lists the lengths of the inserts, the ORF lengths, the apparent molecular masses of the largest in vitro products, and the chromosomal locations of the respective clones analyzed here.
[Vol. 4,
No. 5]
K. Ishikawa et al.
0 KIAA ,
1 ,
2 ,
3 ,
4 ,
309
5 ,
6 ,
7 ,
8 kb
Figure 1. Physical maps of cDNA clones analyzed. The horizontal scale represents the cDNA length in kb, and the gene numbers corresponding to respective cDNAs are given on the left. The ORFs and untranslated regions are shown by solid and open boxes, respectively. The positions of the first ATG codons with or without the contexts of the Kozak's rule are indicated by solid and open triangles, respectively. Alu sequences and other repetitive sequences are represented by dotted and hatched boxes, respectively.
Downloaded from http://dnaresearch.oxfordjournals.org/ by guest on June 1, 2013
0394 0395 0396 0397 0398 0399 0400 0401 0402 0403 0404 0405 0406 0407 0408 0409 0410 0411 0412 0413 0414 0415 0416 0417 0418 0419 0420 0421 0422 0423 0424 0425 0426 0427 0428 0429 0430 0431 0432 0433 0434 0435 0436 0437 0438 0439 0440 0441 0442 0443
[Vol. 4,
Prediction of Unidentified Human Genes
310
Table 1. Information of sequence data and chromosomal locations of the identified genes. Accession number*'
0394 0395
AB007854 ABOO7855
0396 0397 0398 0399 0400 0401 0402 0403 0404
AB0O7856 AB007857 ABO07858 ABO07859 ABOO786O AB0O7861 AB007862 ABOO7863 AB007864 ABOO7865 ABOO7866 AB007867 AB007868 AB007869 ABO07870 ABOO7871 ABOO7872 ABO07873 AB007874
0405 0406 0407 0408 0409 0410 0411 0412 0413 0414 0415 0416 0417 0418 0419 0420 0421 0422 0423 0424 0425 0426 0427 0428 0429 0430 0431 0432 a
' ' c ' d ) e ' '
AB007875 AB007876 ABO07877 AB007878 AB007879 ABOO788O ABOO7881 ABOO7882 ABOO7883 ABOO7884 ABOO7885 ABOO7886 ABOO7887 ABOO7888 AB0O7889 AB007890 ABO07891 AB007892
cDNA length (bp)" 7,979 7,453 6,419 6,629 6,203 6,310 5,711 6,454 5,618 6,467 6,299 7,527 7,335 7,308 6,581 6,469 6,356 6,879 6,530 5,457 5,725 5,743 5,572 5,721 5,504 5,399 5,879 5,717 5,320 5,960 5,413 6,108 5,471 5,737 5,940 5,657 6,011 5,350 5,962
Apparent Chromosomal ORF length (amino acid molecular location*1 residues) mass (kDa)" 17 50 412 20 60 459 601 488 476 783 1,006 344 1,735 416 1,956 660 799 2,135 577 464 485 650 627 864 467 634 516 710 940 991 564 1,302 974 1,696 516 1,224 604 598 370 356 1,056 667 749
56 64 59 82 >100 57 >100 63 >100 75 85 >100 73 57 53 87 84 89 82 68 55 73 >100 >100 59 >100 94 >100 58 >100 81 78 56
55 >100 >100 >100
15 17 18 17 2 2 21 6 11 14 20 3 6 11 13 3 19 12 9 7 5 10 10 16 16 16 12 14 X 12 6 18 3 8 16 16 18
0433 0434
ABOO7893 AB007894
cDNA length (bp)» 5,814 5,650
0435 0436 0437 0438
AB007895 ABOO7896 AB007897 ABOO7898 AB007899 AB007900 ABO079O1 ABO079O2 ABOO79O3 ABOO7913 ABOO7914 ABOO7918 ABOO7919 ABO07921 ABO07922 AB007923 ABO07924 AB007925 AB007926 AB007927 AB007929 AB007930 ABOO7931 ABO07932 ABO07934 ABOO7935 ABO07936 ABOO7937 ABOO7938 AB007939 AB007940 AB007942 AB007945 AB007946 AB007947 AB007949 ABO0795O
5,347 4,661 5,278 4,765 4,879 5,296 5,597 5,393 5,190 6,618 6,471 6,547 6,946 6,256 6,375 6,681 6,745 6,305 6,833 6,642 2,935 6,148 7,150 6,263 6,282 4,974 6,216 6,400 6,450 6,456 6,846 5,747 5,525 5,676 6,149 6,111 5,351
Accession Gene number number" (KIAA)
0439 0440" 0441" 0442" 0443" 0444" 0445" 0449" 0450" 0452" 0453" 0454" 0455" 0456" 0457" 0458° 0460" 0461" 0462" 0463" 0465" 0466" 0467" 0468" 0469" 0470" 0471" 0473" 0476" 0477" 0478" 0480" 0481"
ORF length Apparent Chromosomal (amino acid molecular location" residues) mass (kDa)" 1,243 5 >100 1,571 3 >100 777 689 1,042 708 995 1,138 697 1,172 1,395 978 1,313 755 425 453 1,052 1,882 499 1,095 845 1,268 903 1,355 2,276 1,963 1,697 700 2,055 384 539
>100 72 >100 >100
1 2 18 5
>100 56 67 80 >100 >100 >100 >100 50 50 >100 >100 60 >100 >IOO >100 >100 >100 >100 >100 >100 78 >100 71 64
18 14 6 7
1,460 370 913 1,386 1,132 418 1,252 483
>100 50 >100 >100 >100 56 >100 63
9
Accession numbers of DDBJ, EMBL and GenBank databases. Values excluding poly(A) sequences. Approximate molecular masses of the in vitro products estimated by SDS-PAGE. Chromosome numbers identified by using GeneBridge 4 radiation hybrid panel. cDNA and ORF lengths were revised by direct analysis of the RT-PCR products. Data were taken from the accompanying paper.
5 A sequence of 60 amino acid residues encompassing an unconventional zinc finger motif (CXXCX16 CXXC, where X is any amino acid) was found to be conserved in KIAA0400 in this work and in KIAA 0041, KIAA0050, KIAA0148, and KIAA0167 previously reported. We searched the public databases for other genes carrying this sequence and found that yeast Gcsl and adenosine diphosphate ribosylation factor-1 (ARF1) GAP, both of which possess GAP activity for ARF1,10 carry this sequence. Thus, this conserved sequence is the unidentified functional domain tightly linked to GAP activity.
3.3.
Expression profiles of the predicted genes
Figure 2 shows the expression profiles of the genes reported in this study, which could be categorized into the following classes: genes predominantly expressed in the brain (8 genes: KIAA0394, KIAA0408. KIAA0417, KIAA0420, KIAA0434, KIAA0444, KIAA0473, and KIAA0481), genes expressed in a limited number of tissues (17 genes), genes whose expression was specifically suppressed in a few tissues (38 genes), and genes expressed ubiquitously (15 genes). Acknowledgments: This project was supported by grants from the Kazusa DNA Research Institute. We thank Dr. M. Takanami for his continuous support and encouragement. Thanks are also due to Tomomi Tajino, Keishi Ozawa, Tomomi Kato, Seiko Takahashi, Kazuhiro
Downloaded from http://dnaresearch.oxfordjournals.org/ by guest on June 1, 2013
Gene number (KIAA)
311
K. Ishikawa et al.
No. 5]
Table 2. Functional classifications of the gene products based on homologies to known proteins and sequence motifs. Functional category" Cell signaling/communication
Cell slructure/molility
protein managing Unclassified
No homology
a
(KJAA) 0400 0405 0406 0407 0411 0413 0415 0416 0418 0421 0422 0424 0440 0456 0463 0466 0468 0470 0473 0402 0419 0429 0434 0438 0449 0450 0454 0457 0465 0467 0469 0477 0480 0395 0398 0410 0412 0414 0426 0432 0437 0441 0444 0460 0461 0476 0478 0436 0439 0394 0396 0397 0404 0409 0417 0420 0423 0425 0428 0433 0435 0442 0445 0452 0453 0458 0471 0481 0399 0401 0403 0408 0427 0430 0431 0443 0455 0462
class" R W W I R I W W W I R W N R 1 R R W
H W W W W R W W W W W
w R
W W W R R R W R I W R R W R R W W R H R W W R W R W R W R R R W W W
H W W
Homologous entry in the database" SrcSH3 binding protein (M) platelet glycoprotein IB alpha chain precursor (H) voltage-dependent L-type Ca channel alpha 1 subunit (H) SEP(H) Rho-GAP hematopoietic protein Cl (H) apoptic protease activating factor 1 (H) cyclin Al (H) acid labile subunit (M) neutrophil cytosol factor 2 (H) lambda/iota-protein kinase C-interacting protein (H) adenylate cyclase, type V (Rb) DBL oncogene (H) GTPase-activating protein SPA-1 (M) Rho-GAP hematopoietic protein Cl (H) OCT(H) leukocyte surface protein V7 (H) N-syndecan (R) APC(X) auxilin (B) pericentrin (M) transport protein USO1 (Sc) collagen 1(X) chain precursor (B) procollagen alpha 1 (I) chain precursor (H) neurodapl (R) myosin heavy chain kinase B (Dd) collagen alpha 4 (IV) chain precursor (H) plectinPLECl(H) trichohyalin (S) dystrophin DMD (H) myosin VIIA (H) ring canal protein kelch (D) myosin heavy chain cardiac muscle beta isoform (Gh) restin (H) homeotic protein zhx-1 (M) ABD1 (Sc) nucleoporin p58 (R) kruppel-related zinc finger protein. ZNF184 (H) ZID protein (H) KRAB-containing zinc finger protein (M) Cdc5-related protein (H) transcriptional activator alpha-NAC Naca (M) zinc fmger protein XLCGF57.1 (X) 218kDMi-2(H) PTB-associated splicing factor (H) Y chromosome repeat region OY11.1 DNA sequence (S)" irIB (H) zinc finger protein XLCOF7.1 (X) proteinase II (Ml) KIAAOO93 (H) growth arrest specific mRNA, clone 3544 (M) sex-determining protein FEM-1 (Ce) Caenorhabditis elegans cosmid C38H2 (Ce) Caenorhabditis elegans cosmid MO3A8 (Ce) hypothetical 36.3 kD protein C56F8.O9 (Sp) Caenorhabditis elegans cosmid F35C8 (Ce) SEC14HH) Caenorhabditis elegans cosmid B0024 (Ce) DXS6673E (H) Caenorhabditis elegans cosmid KO2H8 (Ce) KIAA0377 (H) pecanex (D) atrophin-1 related protein (R) major antigen (Ov) Caenorhabditis elegans cosmid C32D5 (Ce) Caenorhabditis elegans cosmid C25H3 (Ce) atrophin-1 related protein (R) Caenorhabditis elegans cosmid F35H12 (Ce) pecanex (D) none none none none none none none none none none
Accession no." U92478 P07359 Z34820 X879O4 P98171 AF013263 U66838 U66900 P14598 U32581 P40144 JO3639 P46062 P98171 X87831 Z33642 U73184 U64442 S68983 P48725 S67593 P232O6 P02452 D32249 U9O946 P53420 U63610 P22793 M18533 U39226 A45773 PI 3540 P30622 JC4863 P32783 U63839 U66561 X82018 U46186 U86753 U48363 PI 8729 X86691 P23246 U3O3O7 SI 8878 P1875I JC4185 P46934 U19860 P17221 Z35641 U41544 Q1O257 U40941 D67029 Z71178 X958O8 U67957 ABOO2375 P18490 U44091 P21249 U23511 U29535 U44091 U41540 PI8490
Identitie:s Overlap (amino (%)" 33.3 25.0 16.4 99.9 48.1 99.9 20.3 35.2 37.6 99.3 69.1 22.0 49.0 35.7 99.5 30.3 84.1 17.1 95.3 43.8 16.3 24.5 18.1 81.2 26.0 23.9 15.3 27.2 17.3 20.7 30.1 17 9 23.3 267 31.5 873 54.3 41.5 41.3 100.0 16.9 46.3 89.2 23.8 51.5 39.1 32.1 26.0 76.3 97.3 39.9 23.4 20.3 37.7 27.0 64.2 31.5 25.2 38.5 75.1 45.0 24.1 18.0 37.2 29.4 63.2 34.4 18.0
acid residues)" 333 180 226 1484 337 1238 123 244 141 703 647 368 394 830 434 462 359 525 909 454 552 159 842 711 215 201 1772 158 1319 169 508 730 313 225 337 479 385 123 455 749 603 227 231 269 266 192 312 600 531 412 343 154 320 316 100 586 305 640 187 989 409 253 1292 199 456 969 302 356
' Classifications based on the annotations of their homologous protein entries in the databases. ' The gene products were grouped into four similarity classes according to the sequence identities obtained by the GAP program: I, identical to known human gene products (sequence identity, > 90%); H, homologous to known non-human gene products (sequence identity, > 90%); R, related to some known gene products (sequence identity, 30 to 90%); W, very weakly related to known gene products (sequence identity, < 30%). The gene products in class I (> 90%) include alternative splicing products of reported genes. c ' Organisms in which these entries were identified are given in parenthesis: B, bovine; C, chicken; Ce, Caenorhabditis elegance; D, Drosophila melanogaster; Dd, Dictyostelium discoideum; Gh, golden hamster; H, human; M, mouse; Ml, Moraxella lacunata; Ov, Onchocerca volvulus; R, rat; Rb, rabbit; Sc, Saccharomyces cerevisiae; S, sheep; Sp, Schizosaccharomyces pombe; X, Xenopus laevis. d > Accession numbers of homologous entries in DDBJ/EMBL/GenBank/OWL/SWISS-PLOT/PIR database are shown. e ' The values were obtained by the FASTA program. f ' Classifications based on the sequence motifs. b
Downloaded from http://dnaresearch.oxfordjournals.org/ by guest on June 1, 2013
Nucleic acid managing
iene numb.:r Similarity
[Vol. 4,
Prediction of Unidentified Human Genes
312
5 6 KIAA0394
KIAAO434
KIAAO395
KIAAO435
K1AA0396
K1AAM36
KIAA0397
KIAAO437
KIAA0398
KIAAO438
K1AAO399
K1AAM39 KIAA0440
K1AA0401
KIAAO441
KIAAO4O2
K1AAO442
KIAAO4O3
KIAAOM3
KIAA04M
KIAAO444
KIAAO4O5
KIAAO445
KIAAO4G6
KIAA0449
KIAAO407
KIAAO450
KIAA0408
KLAAO452
K1AAO4O9
KIAA0453
K1AA0410
KIAAO454
K1AA04U
KIAAO455
K1AA0412
K1AA0456
KIAA0413
KIAAO457
KIAAO414
KIAAO458
KIAAO415
KIAAO46O
KIAA0416
KIAA0461
KIAA04t7
KIAA0462
K1AA041S
K1AAO463
K1AA04I9
KIAAO465
K1AA0420
KIAAO466
K1AA0421
KIAA0467
K1AA0422
KIAA0468
K1AA0423
KIAA0469
KIAAO424
KIAAO47O
KIAA0425
K1AAO471
KIAA0426
KIAAO473
KIAA0427
KIAAO476
K1AAO428
KIAAO477
KIAA0429
K1AAO478
KIAAO430
K1AA0480
KIAA0431
KIAAM81
K1AAO432
G3PDH
8
9 10 II 12 13 14 15 16 17 IB 19
KIAAO433
Figure 2. Expression profiles of 50 newly identified genes in 14 different tissues examined by RT-PCR. Electrophoretically resolved bands of the PCR products for individual genes are shown. Gene numbers are given on the left. G3PDH gene expression was analyzed as a positive control. In each set of electrophoretic patterns, lanes 1 to 5 show the PCR products derived from serial 10-fold dilutions of a cDNA clone of interest (from 0.1 fg to 1 pg) for the estimation of the PCR amplification efficiency. Lanes 6 to 19 are electrophoretic patterns of the RT-PCR products from mRNAs of 14 different tissues: lane 6, heart; lane 7, brain; lane 8, placenta; lane 9, lung; lane 10, liver; lane 11, skeletal muscle; lane 12, kidney; lane 13, pancreas; lane 14, spleen; lane 15, thymus; lane 16, prostate; lane 17, testis; lane 18, ovary; lane 19, small intestine.
Sato, Akiko Ukigai, Emiko Suzuki, Kazuko Yamada, and Naoko Suzuki for their technical assistance. References 1. Nomura, N., Miyajima, N., Sazuka, T. et al. 1994, Prediction of the coding sequences of unidentified human genes. I. The coding sequences of 40 new genes (KIAA0001KIAA0040) deduced by analysis of randomly sampled
cDNA clones from human immature myeloid cell line KG1, DNA Res., 1, 27-35. 2. Ohara, O., Nagase, T., Ishikawa, K.-I. et al. 1997, Construction and characterization of human brain cDNA libraries suitable for analysis of cDNA clones encoding relatively large proteins, DNA Res., 4, 53-59. 3. Nagase, T., Ishikawa, K.-I., Nakajima, D. et al. 1997, Prediction of the coding sequences of unidentified human genes. VII. The complete sequences of 100 new cDNA clones from brain which can code for large proteins in
Downloaded from http://dnaresearch.oxfordjournals.org/ by guest on June 1, 2013
KIAA0400
7
No. 5]
K. Ishikawa et al.
11.
12.
13.
14.
15.
1995, The ARF1 GTPase-activating protein: zinc finger motif and Golgi complex localization, Science, 270, 1999-2002. Bellefroid, E. J., Lecocq, P. J., Benhida, A., Poncelet, D. A., Belayew, A., and Martial, J. A. 1989, The human genome contains hundreds of genes coding for finger proteins of the Kriippel type, DNA, 8, 377-387. Chan, A. M.-L., McGovern, E. S., Catalano, G., Fleming, T. P., and Miki, T. 1994, Expression cDNA cloning of a novel oncogene with sequence similarity to regulators of small GTP-binding proteins, Oncogene, 9, 1057-1063. Huibregtse, J. M., Scheffner, M., Beaudenon, S., and Howley, P. M. 1995, A family of proteins structurally and functionally related to the E6-AP ubiquitin- protein ligase, Proc. Natl. Acad. Sci. USA, 92, 2563-2567. Miller, J. L., Cunningham, D., Lyle, V. A., and Finch, C. N. 1991, Mutation in the gene encoding the a chain of platelet glycoprotein Ib in platelet-type von Willebrand disease, Proc. Natl. Acad. Set. USA, 88, 4761-4765. Margolis, R. L., Li, S. H., Young, W. S., Wagster, M. V., Stine, O. C , Kidwai, A. S., Ashworth, R. G., and Ross, C. A. 1996, DRPLA gene (atrophin-1) sequence and mRNA expression in human brain, Brain Res. Mol. Brain Res., 36, 219-226.
Downloaded from http://dnaresearch.oxfordjournals.org/ by guest on June 1, 2013
vitro, DNA Res., 4, 141-150. 4. Devereux, J., Haeberli, P., and Smithies, O. 1984, A comprehensive set of sequence analysis programs for the VAX, Nucleic Acids Res., 12, 387-395. 5. Seki, N., Ohira, M., Nagase, T. et al. 1997, Characterization of cDNA clones in size-fractionated cDNA libraries from human brain, DNA Res., 4, 345-349. 6. Kozak, M. 1996, Interpreting cDNA sequences: some insights from studies on translation, Mammalian Genome, 7, 563-574. 7. Tribioli, C , Droetto, S., Bione, S. et al. 1996, An X chromosome-linked gene encoding a protein with characteristics of a rhoGAP predominantly expressed in hematopoietic cells, Proc. Natl. Acad. Sci. USA, 93, 695699. 8. Maestrini, E., Tamagnone, L., Longati, P. et al. 1996, A family of transmembrane proteins with homology to the MET-hepatocyte growth factor receptor, Proc. Natl. Acad. Sci. USA, 93, 674-678. 9. van der Maarel, S. M., Scholten, I. H. J. M., Huber, I. et al. 1996, Cloning and characterization of DXS6673E, a candidate gene for X-linked mental retardation in Xql3.1, Hum. Mol. Genet, 5, 887-897. 10. Cukierman, E., Huber, I., Rotman, M., and Cassel, D.
313
Downloaded from http://dnaresearch.oxfordjournals.org/ by guest on June 1, 2013