Reconciling Bottom-Up and Top-Down Reconstruction of. Regulatory Networks. Markus J. HerrgÃ¥rd, Markus W. Covert, Shankar Subramaniam, and Bernhard Ã.
Reconciling Bottom-Up and Top-Down Reconstruction of Regulatory Networks Markus J. Herrgård, Markus W. Covert, Shankar Subramaniam, and Bernhard Ø. Palsson Department of Bioengineering, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0412, USA
Abstract
Target-regulator unit (TRU)
Fig. 1: Regulatory network elements studied in this work
Regulon
Yeast
Amino acid biosynth.
Nitrogen util.
Oxygen response
Feedforward loop
For each instance of the individual network elements present in the network, we computed a consistency measure between a particular gene expression data set and the network element structure. The measures we used were Pearson correlation coefficients for pairwise interactions, multiple coefficients of determination for TRUs, average within regulon correlation for regulons, and partial correlation coefficients for feedforward loops. Other types of measures such as mutual information were also explored, but in general the results did not depend significantly on the measure chosen. The statistical significance of a particular value of a consistency measure was determined by suitably randomizing the network structure to obtain a null distribution for the measure [2].
Data sources Regulatory networks We utilized the recently published database and literature-derived regulatory network structures for yeast [1] (108 regulatory genes, 414 target genes, 931 interactions) and E. coli [2] (123, 721, 1367). Gene expression data For yeast we used five separate gene expression data sets spanning a wide range of experimental conditions [3-7] (641 experiments). For E. coli we combined all the data from the ASAP database (http://www.genome.wisc.edu/functional/ microarray.htm) and Stanford Microarray Database into three separate data sets [811] (108 experiments). 1.Guelzim, N. et al. Nat. Genet. 31, 60-3. (2002). 2.Shen-Orr, S.S. et al. Nat. Genet. 31, 64-8 (2002). 3.Spellman, P.T. et al. Mol. Biol. Cell. 9, 3273-97 (1998). 4.Hughes, T.R. et al. Cell 102, 109-26 (2000). 5.Roberts, C.J. et al. Science 287, 873-80. (2000). 6.Gasch, A.P. et al. Mol. Biol. Cell. 12, 2987-3003. (2001). 7.Gasch, A.P. et al. Mol. Biol. Cell. 11, 4241-57. (2000). 8.Khodursky, A.B. et al. Proc. Natl. Acad. Sci. U S A 97, 12170-5 (2000). 9.Khodursky, A.B. et al. Proc. Natl. Acad. Sci. U S A 97, 9419-24. (2000). 10.Lee, K., Bernstein, J.A. & Cohen, S.N. Mol. Microbiol. 46, 295. (2002). 11.Courcelle, J. et al. Genetics 158, 41-64. (2001).
The percentage of coherent regulons is high indicating that known co-regulated gene groups can indeed be often identified from gene expression data. In yeast, repressor controlled regulons are less coherent than activator controlled ones whereas in E. coli the situation is reversed. Yeast
Cell cycle Flagellar biosynth.
Mating
Stress response
Stress response
E. coli
Carbon util. Carbon util.
E. coli
Fig. 5: Percentage of coherent regulons (in at least one data set) classified by the type of regulator
Yeast central metabolism
Purine biosynth.
Capsule biosynth.
Oxygen response
Fig. 2: Interactions, TRUs, and regulons consistent with at least one gene expression dataset at P < 0.01.
Methods Regulatory interaction
Regulons
Overall results
The reconstruction of transcriptional regulatory networks in an important step towards building integrated genome-scale models of cells [1]. Primary literature and information in databases for well-studied organisms such as E. coli and S. cerevisiae allows one to reconstruct well-curated models of regulatory networks in a “bottom-up” fashion. Genome-scale experimental approaches such as gene expression profiling and location analysis enable reconstruction directly from data in a “top-down” fashion. However, there have not been any large-scale efforts to integrate the bottomup and top-down reconstruction strategies. As an initial step to such integration we perform a comprehensive evaluation of the consistency of publicly available gene expression data sets with known regulatory networks in S. cerevisiae and E. coli. We also demonstrate the utility of gene expression and location analysis data in reconstructing the transcriptional regulatory network of central metabolism in yeast.
Pairwise interactions A relatively small number (less than 10% at P < 0.01) of pairwise interactions are supported by one or more of the gene expression data sets in either organism if the sign of the correlation is taken into account. In particular, virtually none of the repressor-target interactions are consistent with gene expression data. Feedforward loops could potentially lead to an overestimation of the number of consistent pairwise interactions. Although there is a significant number of feedforward loops in both networks (240 in yeast, 206 in E. coli), accounting for feed-forward loops only leads to a 1-2 percentage point change in the fraction of consistent interactions.
Target-regulator units Accounting for the effects of multiple regulators acting on one gene yielded a higher fraction of consistent network elements. In yeast, the percentage of Fig. 3: Percentage of consistent consistent TRUs varies TRUs in yeast. TRUs with four significantly depending regulators are in general best supported by gene expression on the number of data regulators a target gene has. Classifying the TRUs by the functional class reveals a high degree of variability in consistency depending on the functional class of the Fig 4. Percentage of consistent target. TRUs in E. coli classified by the functional class of the target.
A literature-based reconstruction of the transcriptional regulation of central metabolism in yeast has been performed (53 transcription factors, 163 regulated genes, and 162 regulatory interactions) [3]. In order to validate and expand this network we utilized expression profiles for 13 yeast strains with a transcription factor coding gene deleted or overexpressed and genome-wide location analysis data for 41 transcription factors [4]. Literature (162) The expression and location Location analysis (70) Deletion profiles (69) data provided support for 60 (37%) of the known 23 14 regulatory interactions. The 102 16 17 data together with known cell 30 physiology could be used to 6 expand the network to include an additional 46 interactions Fig. 6: Numbers of regulatory interactions (total 208) derived (28% increase).
from different data sources.
Conclusions •The consistency between known network structures and gene expression data is in general low probably due to the lack of variability in the expression levels of transcription factors •For specific subsystems such as nitrogen utilization in yeast integrating bottom-up reconstruction with top-down approaches appears to be feasible •Deletion profile and location analysis data can be used to significantly expand known regulatory networks even for well-understood subsystems
References 1. Covert, M.W. & Palsson, B.Ø. Transcriptional Regulation in Constraints-based Metabolic Models of Escherichia coli. J. Biol. Chem. 277, 28058-64 (2002). 2. Herrgård, M.J., Covert M.W. & Palsson, B.Ø. Reconciling Gene Expression Data with Known Genome-Scale Regulatory Network Structures. Submitted. 3. Herrgård, M.J. & Palsson, B.Ø. Integrated model of S. cerevisiae central metabolism and its transcriptional regulation. In preparation. 4. Lee, T.I. et al. Transcriptional regulatory networks in S. cerevisiae. Science 298,799-804. (2002).
For more information see http://systemsbiology.ucsd.edu.
Acknowledgments The authors thank the Finnish Fulbright Center (Graduate scholarship to MH), the National Science Foundation (BES 01-20363), and the National Institutes of Health (GM57089) for their support.