Genome Informatics 12: 417–419 (2001)
417
Guide of Putative Alternative Splicing Database Jiunn-Jei Lai1,2
Yen-hua Huang2,3
[email protected]
[email protected]
Yu-Tin
Chen1,2
Ueng-Cheng Yang1,2,3
[email protected] 1 2 3
[email protected]
Institute of Biochemistry, National Yang-Ming University, Taipei, Taiwan 112 Bioinformatics Center, National Yang-Ming University, Taipei, Taiwan 112 Bioinformatics Program, National Yang-Ming University, Taipei, Taiwan 112
Keywords: alternative splicing, database, UniGene, EST
1
Introduction
Putative ALternative Splicing Database (PALS db) is a genome-wide database of putative alternative splicing (AS) sites. We used the sequences from NCBI UniGene clusters and dbEST [2] to collect putative alternative splicing (AS) information. AS sites were found by comparing unique reference sequences of UniGene clusters with related sequences from the same UniGene cluster or sequences from dbEST. By this way, we could get large number of AS sites information and supporting evidences. Comparing to other AS-related site databases that discovered AS sites by aligning EST sequences to genomic sequence, we got information of more known genes. The PALS db contains human and mouse data in current release (version 2) using UniGene Human Build #138 and Human dbEST at 2001/08/12 and UniGene Mouse Build #93 and Mouse dbEST at 2001/08/12. About half of human genes (49.5%) contain AS site pairs in PALS db. Thirty-one percent mouse genes contain putative AS sites (5219 of 16615). Biologists could search their interesting AS sites with web query interface [3] (http://palsdb.ym.edu.tw/) to solve biological problems or find out biological phenomena. Our query forms were designed to accept query by keywords and number of AS sites respectively. The result pages provide most information of supporting sequences and hyperlinks to many useful online biomedical databases, such as OMIM, dbSNP, SAGE, GeneCards, etc. This part will be described in details in the following sections.
2 2.1
Method and Results How to Construct PALS db
We used the reference sequences of UniGene clusters as the query sequences to do blast [1] search against dbEST. We grouped these sequences of UniGene clusters and similar dbEST sequences to compare each other and discover the AS sites.
2.2
Features of Web Interface in PALS db
Biologists could query genes in PALS db either by number of AS sites or by keywords. There are five fields, shown in following Figure 1, biologists can choose one and key in the keyword to search against PALS db. In addition, optional parameters are useful in retrieving results. Researchers can choose interested species (Human, Mouse or both) and lower the threshold of identity and length of alignment to get more results.
418
Lai et al.
Figure 1: In keyword search, biologists can select one field and type the keywords to search against PALS db.
Figure 2
2.3
Display of Results
After submitting the search, you can get a brief description table of query results. You could hyperlink to the graphics (Figure 2 as a sample) with pic of “all seq info” field. The sky blue block in the graph is protein-coding region. Mouse-over each bar and wait for 2 seconds, brief descriptions of tissue, cell status will be shown in status field of browser. There are many hyperlinks as a table above the graph; you could use them to further analyze this gene.
3
Discussions
In PALS db, there are features distinct from other AS-related databases in the world. First, to avoid information loss, we used mRNA sequences as reference sequences instead of using human genome draft sequence (90% complete and half of the sequence is merely draft quality) to search AS sites. Some AS-related site databases just discovered AS sites from part of known genes using genomic sequence as reference sequence. We will align our results to the genomic sequence to get more information in the future. Second, the query interface of PALS db was designed for biologists to start their researches from the view of AS sites of genes. Biologists could find AS sites of interested genes, examine the information of supporting sequences and perform further analysis of this gene by hyperlink to genetic disorders and marker databases, protein domain databases, and other biomedical databases. We hope
Guide of Putative Alternative Splicing Database
419
that biologists will discover interesting phenomena from PALS db.
Acknowledgements Yen-hua Huang and Yu-tin Chen were supported by grants from National Science Council, Taiwan (NSC 89-2323-B-010-003 and NSC 89-2318-B-010- 011-M51, respectively). The computational resource was supported by Ministry of Education, ROC (Program for Promoting Academic Excellence of Universities, 89-B-FA22-2-4).
References [1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J., Basic local alignment search tool, J Mol Biol, 215(3):403–10, 1990. [2] Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M., dbEST–database for “expressed sequence tags”, Nat. Genet., 4:332–333, 1993. [3] http://palsdb.ym.edu.tw/