BMC Bioinformatics
BioMed Central
Open Access
Meeting abstract
LSI based framework to predict gene regulatory information Sujoy Roy1,2, Lijing Xu1,2 and Ramin Homayouni*1,2 Address: 1Department of Biology, University of Memphis, Memphis, TN 38152, USA and 2Bioinformatics Program, University of Memphis, Memphis, TN 38152, USA Email: Ramin Homayouni* -
[email protected] * Corresponding author
from UT-ORNL-KBRIN Bioinformatics Summit 2009 Pikeville, TN, USA. 20–22 March 2009 Published: 25 June 2009 BMC Bioinformatics 2009, 10(Suppl 7):A5
doi:10.1186/1471-2105-10-S7-A5
UT-ORNL-KBRIN Bioinformatics Summit 2009
Eric C Rouchka and Julia Krushkal Meeting abstracts – A single PDF containing all abstracts in this Supplement is available
here. http://www.biomedcentral.com/content/pdf/1471-2105-10-S7-info.pdf
This abstract is available from: http://www.biomedcentral.com/1471-2105/10/S7/A5 © 2009 Roy et al; licensee BioMed Central Ltd.
Background Latent Semantic Indexing (LSI), a vector space model for information retrieval, has shown promise in predicting functional relationships between genes using textual information in MEDLINE abstracts. The underlying principle is that genes may be represented as document vectors in a multi-dimensional hyperspace, and the conceptual relationship between any two genes is determined by the cosine of the angle between their vectors [1]. In this study, we sought to extend this concept for identification of putative transcription factors (TFs) that regulate a group of co-regulated genes. We hypothesized that co-expressed genes identified by microarray experiments are functionally related and that at least some of these genes have previously been linked explicitly or implicitly to TFs in the literature. A transcriptional module is then defined as a set of genes clustered together in LSI space with closely related TFs (Figure 1). We devised a framework using these assumptions to identify transcriptional modules from microarray and promoter motif data (Figure 2). The framework requires as input, co-expressed genes from a microarray dataset and a set of TFs that have consensus motifs in the promoter regions of the co-expressed genes. Usually the set of such motif-derived TFs is large and makes the identification of the critical ones difficult. The framework first identifies functionally related clusters of co-expressed genes based on their latent relationships from literature, and then adds to each cluster TFs that are closely associated with the genes in the cluster. The putative transcriptional modules are ranked based on the
degree of relative literature coherence amongst the entities in them.
Results and discussion The LSI-based algorithm allows prediction of TFs based on latent (implicit) relationships in the literature. A preliminary evaluation of our method using previously published knock-out experiments revealed that it has reasonable recall and precision. A more rigorous evaluation of the method will require several additional TF knock-out microarray experiments. This work provides proof of principle that the combination of motif analysis
Figure Gene clusters 1 and transcriptional modules in LSI space Gene clusters and transcriptional modules in LSI space. Gene document vectors (blue) are clustered together in LSI space based on the closeness of the angle between them. Next, transcription factor vectors (green) are added to the clusters if their cosine value is close to the average cosine of the gene cluster.
Page 1 of 2 (page number not for citation purposes)
BMC Bioinformatics 2009, 10(Suppl 7):A5
http://www.biomedcentral.com/1471-2105/10/S7/A5
Figure 2 of Framework Overview Overview of Framework.
and LSI may be used to identify putative transcriptional modules from microarray data.
References 1.
Homayouni R, Heinrich K, Wei L, Berry MW: Gene clustering by latent semantic indexing of MEDLINE abstracts. Bioinformatics 2005, 21(1):104-115.
Publish with Bio Med Central and every scientist can read your work free of charge "BioMed Central will be the most significant development for disseminating the results of biomedical researc h in our lifetime." Sir Paul Nurse, Cancer Research UK
Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright
BioMedcentral
Submit your manuscript here: http://www.biomedcentral.com/info/publishing_adv.asp
Page 2 of 2 (page number not for citation purposes)