Molecular Systems Biology Peer Review Process File
A framework for mapping, visualisation and automatic model creation of signal transduction network Carl-Fredrik Tiger, Falko Krause, Gunnar Cedersund, Robert Palmér, Edda Klipp, Stefan Hohmann, Hiroaki Kitano, Marcus Krantz Corresponding author: Marcus Krantz, Humboldt-Universität zu Berlin
Review timeline:
Submission date: Editorial Decision: Revision received: Editorial Decision: Revision received: Accepted:
07 July 2011 17 August 2011 16 November 2011 14 December 2011 08 March 2012 16 March 2012
Transaction Report: (Note: With the exception of the correction of typographical or spelling errors that could be a source of ambiguity, letters and reports are not edited. The original formatting of letters and referee reports may not be reflected in this compilation.)
1st Editorial Decision
17 August 2011
Thank you again for submitting your work to Molecular Systems Biology. We have now heard back from the three referees who agreed to evaluate your manuscript. As you will see from the reports below, the referees find the topic of your study of potential interest. However, they raise substantial concerns on your work, which, I am afraid to say, preclude its publication in its present form. The first two reviewers provided divided recommendations, with the first reviewer rather positive and the second reviewer clearly negative. Nonetheless, the specific points these two reviewers raise are largely congruent, with both clearly indicating that more work is needed to compare this framework to other existing frameworks, clarify the novel aspects, and provide clearer connections to the existing literature on the topic. Some of these concerns can probably be addressed with some additional discussion, but other points are more fundamental. For example, the reviewers had clear concerns about the generality of this framework, i.e. its ability to handle more diverse reaction types and localization information, and convincingly addressing these concerns may require additional analysis and/or further development of this framework. The last reviewer, an expert on yeast MAPK signaling was generally supportive, and recognized the value of the signaling network description included in this work, but also felt that localization information would be an important addition to this model. If you feel you can satisfactorily deal with these points and those listed by the referees, you may wish to submit a revised version of your manuscript. Please attach a covering letter giving details of the way in which you have handled each of the points raised by the referees. A revised manuscript will be once again subject to review and you probably understand that we can give you no guarantee at this stage that the eventual outcome will be favorable.
© European Molecular Biology Organization
1
Molecular Systems Biology Peer Review Process File
*** PLEASE NOTE *** As part of the EMBO Publications transparent editorial process initiative (see our Editorial at http://www.nature.com/msb/journal/v6/n1/full/msb201072.html), Molecular Systems Biology will publish online a Review Process File to accompany accepted manuscripts. When preparing your letter of response, please be aware that in the event of acceptance, your cover letter/point-by-point document will be included as part of this File, which will be available to the scientific community. More information about this initiative is available in our Instructions to Authors. If you have any questions about this initiative, please contact the editorial office (
[email protected]). Yours sincerely, Editor - Molecular Systems Biology
[email protected] --------------------------------------------------------------------------Referee reports: Reviewer #1 (Remarks to the Author): I'm very enthusiastic but this manuscript needs revisions, mainly to compare and contrast the present work with earlier related work. The authors introduce new types of graphs and other data structures (e.g., tables for lists of reactions and lists of contingencies). These data structures are useful for visualizing/storing information about protein-protein interactions with consideration of the subunits, domains, linear motifs, and posttranslational modifications involved in the interactions, i.e., the site-specific details. The data structures are all introduced in the various panels of Figure 1. The rest of the figures in the manuscript illustrate how these data structures can be applied to visualize a large collection of information about the site-specific details of interactions of signaling proteins in yeast. Software, called YeastMAP, is provided by the authors for drawing the various types of graphs. The input data for YeastMAP can be used not only to draw graphs but also to automatically construct a specification of a rule-based model in the BioNetGen language. A rule-based model for a cell signaling system is a model in which proteins are treated as agents that interact according to rules, which which are specified using a programming like language and that are consistent with certain physicochemical principles. Thus, the various maps shown in the figures have a direct and unambiguous relationship with an executable model. However, the figures alone do not completely specify an executable model, in part because parameter estimates are needed for a complete specification. Kitano and co-workers are well known for constructing impressively large process diagrams for various cell signaling systems. One of these diagrams is often included in a slide deck when a speaker wants to impress her audience with the complexity of cell signaling. The graphs/data structures described in this report represent an important advance on process diagram representation of cell signaling systems in that there is now explicit consideration of the parts of proteins responsible for interactions and the contextual requirements that must be satisfied for interactions to occur (i.e., the elements of the authors' contingency list). The authors identify three goals addressed by their work: "...(i) unambiguously describing the network, (ii) visualising it without simplifications or unsupported assumptions, and (iii) automatically generating mathematical models from knowledge in databases." These goals are very similar to those of work recently reported online that is not cited: Chylek LA et al. (2011) Mol. BioSyst. doi:10.1039/c1mb05077j There is also earlier highly related work of Kohn and co-workers that should probably also be acknowledged - see Kohn KW et al. (2006) Mol. Syst. Biol. 2, 51. The authors present some new ideas and I am very enthusiastic about this manuscript but there is inadequate comparison of the authors' ideas with other ideas in the literature. For example, it would be very interesting to see how the graphs used by the authors to visualize their yeast interaction network compare to an extended contact map visualization of the same network (see Chylek et al., 2011). The rule-based model reported by the authors, which can be found in the model.bngl file of the
© European Molecular Biology Organization
2
Molecular Systems Biology Peer Review Process File
supplementary material, is closely related to a rule-based model developed by Ty Thomson, which can be found at this URL: http://yeastpheromonemodel.org/wiki/Main_Page This work should probably be cited. The Thomson model is more advanced than the model described in the manuscript, in that the Thomson model provides parameter estimates, whereas the parameter values are all 1 in the model.bngl file of the supplementary material. The Thomson model is also more advanced in that references are given to justify the model specification in a typical scholarly fashion, whereas in this manuscript, only a reading list of >180 papers is provided in the Materials and Methods section. I know that these authors have justified their process diagrams with such reading lists in past publications, but I find it to be regrettable scholarship. It is really impossible to check the thinking that went into the model specification without more information about how the model specification relates to the papers in the reading list. I actually find the reading list worse than nothing at all because the long list of references from the reading list distract from other references cited in the normal scholarly way in the main text. The contingency matrix concept illustrated in Figure 1E seems to be related to the state transition table concept of the MFA formalism of Yang J et al. (2010) IET Syst. Biol. 4, 453-466. The reaction graph concept of Figure 1F is related to the extended contact map concept (Chylek et al., 2011). The regulatory graph concept of Figure 1G seems related to the concepts of an influence map (with reactions added) and/or a story - see Danos V et al. (2007) Lect. Notes Comput. Sci. 4703, 17-41. The authors introduce two input lists, a reaction list (which can be viewed as a list of context-free rules) and a contingency list (which indicates contextual constraints). These lists are somehow related to the rules of the model.bngl file in the supplementary material, but the mapping is not explained in sufficient detail. Is the mapping one-to-one? Given a specification of a model in the BioNetGen language, can I unambiguously obtain a reaction list and a contingency list? Or does the mapping only go one way in the other direction? The interactions or reaction types considered by the authors are all reversible association reactions (labeled as ppi reactions) or phosphorylation reactions (labeled as P+ reactions). These are important types of interactions, but there are many other types of reactions that take place in cell signaling systems. The authors should comment on how at least a few other types of reactions should be handled within their framework. How are multiple chemically indistinguishable components, such as the two antigen-combining sites of an IgG antibody, treated in the authors' framework? I'm not sure that such cases can be captured in a contingency matrix (Figure 1E). For example, one can show that for A to bind B, A can't have already bound B, which makes sense. However, what should be done if A can interact with either of two B's? I don't understand the circular layout of components of proteins. Is some unstated convention being followed? It seems to me that a linear layout from the N-terminus to the C-terminus is natural for the component parts of a protein. A circular layout could be a sensible compact alternative if there is, for example, a convention that puts the most N-terminal component at 12 o'clock and then adds other components clockwise. But it seems like the layout of component parts is arbitrary. The authors might want to rethink the following comment: "Similarly, review articles proved useful for orientation but not for mapping, as review papers tend to propagate more than merely solid facts." Adding legends to the figures to explain symbols might be a little helpful for some readers even though the symbols are explained in the captions. Typos/grammar issues: p. 15: "This is fully captured be the combined reaction and contingency information." by, not be. p. 4 "This is due to the requirement that all modification and interaction partners of are defined in the specific states..."
© European Molecular Biology Organization
3
Molecular Systems Biology Peer Review Process File
Partners of what? p. 7: "These states in turn correspond to the two next last reactions in the reaction lists (Fig 1C)." Next to last, or what? This sentence is confusing. Reviewer #2 (Remarks to the Author): Tiger et al. propose a framework for generating models of signal transduction based on two types of elements, reactions and contingencies, and present various visualization options their approach provides. Modeling signaling networks is a fundamental and active field of systems biology, where novel approaches are required, so therefore the article addresses a relevant question. However, the authors do not clearly place the contribution in the context of the current state of the art, in particular between the (quite active and broad) field of rule-based modeling and the Systems Biology Graphical Notation. While the approach presents interesting solutions to visualize signaling networks, it seems to be based on already existing foundations, rather than a novel conceptual framework. The modeling basis is rule-based modeling. The visualization challenges correspond to those that SBGN tries to address (in particular the Entity-relationship, see below). The authors should place their contribution precisely in the context of these 2 ongoing efforts, as outlined below. This paper has three results: conceptual framework, software, and a curated network. We don't think the conceptual framework is novel; more focused should be put on the tool and/or the network. Specific comments Major * The article contains many confusing sentences, and uses ambiguous terms with overloaded meanings. For example, in the title and abstract, the word "Mapping" is used. This word is overloaded with many meanings, and it is not very clear immediately what is meant exactly by mapping in the context of this paper. The sentence "more concise mapping adapted to experimental data" - does this refer to mapping experimental data onto pathways, such as many other tools do (e.g. GenMAPP)? If not, in what way is the mapping adapted to experimental data? What type of experimental data is referred to here? This should be explained better. Similarly, the word framework: does this refer to a software framework or a conceptual framework? Also authors use the term "data structure", but this may not be obvious for a general reader what is meant. They need explain upfront what they mean by this. * The mathematical basis of this formalisms seems to be rule-based modeling. The authors should describe better how their work relates to the multiple efforts in this field. Basically, what they call reactions seem to us to correspond to rules. The contingencies are the interrelationships among the rules. The work of Faeder/Blinov/Hlavacek, Ferret/Danos/Fontana, Conzelman/Gilles, Kholodenko, etc. addresses this and should be discussed how they relate to the framework presented here. Some of these groups also develop visual tools to set up these models, and the differences to the proposed approach should be shown. Also, there are online databases of rule-based models. The authors seem to claim the opposite?: "the classical rule-based modeling frameworks lack all the database properties of our framework" * Why did the authors not include changes in localisation of proteins? This is a fundamental event in signal transduction, and indeed data is becoming lately available to consider this. Novel developments in rule-based modeling are addressing this type of events. * As the authors state, SBGN is a new standard for graphically representing biological processes. SBGN has three sub-languages, namely process description (PD), activity flow (AF) and entity relationship (ER). The authors contrast their approach to PD and AF, but oddly enough, not to ER.
© European Molecular Biology Organization
4
Molecular Systems Biology Peer Review Process File
This is odd because, of the three sub-languages, ER is the one that is most suitable to represent rulebased modelling. ER can represent complex formation, phosphorylation and contingencies without problem. The authors must contrast their approach to ER, and either implement an ER version of the example or at least explain why this is not possible. The work of Kohn et al. on dealing with the combinatorial explosion in MIM (which is very similar to SBGN-ER) could be helpful in this regard (Kohn et al., Mol Syst Biol. 2006;2:51. Epub 2006 Oct 3.) * Since the tool is named YeastMap, it appears that it's suitable only for Yeast networks, but the article aims at presenting a general framework. How hard would it be to adapt the tool to other species, and what are the intentions of the authors in this regard? Was YeastMap specifically created to support the framework presented in this article, or is it a side project? * Finally, is the software that goes with this article open source? Either way, this should be stated explicitly. We would highly recommend that the algorithms and software used be published under an open source license, otherwise the use of this framework will be extremely limited. What is needed to run the software? Minor * Since its first publication, SBGN has been under continual active development. The authors rightfully point to certain problems with representing combinatorial complexity in SBGN, but that doesn't mean that these issues could not be addressed in future versions. It would be nice if the authors acknowledged that SBGN is not set in stone, and that issues brought forward here could be used to improve SBGN in the future. * Figures 3 and 4 contain too much detail to make them readable. Sup Figs 1,2,3 appear to be provided not as pdf but as xml. * Introduction: use of (i) and (ii) (iii) and referring to this later on is a bit confusing. The meaning is clear with some extra thought but that is putting a lot of work on the shoulders of the reader. What would be much clearer is to state explicitly that these requirements will be referred back to later in the text, and then refer to them as "requirement i", "requirement ii", etc. * In the results section "We first distil the available knowledge" -> distil should be distill Reviewer #3 (Remarks to the Author): These authors developed a new framework to organize a large body of information typical of eukaryotic cellular signal transduction. As a demonstration, they applied the method to the budding yeast MAP kinase signaling pathways, which are arguably the best-characterized eukaryotic signaling pathways. Their survey of literature is comprehensive, and the resulting model depicts the current state of the collective knowledge. From a biologist's point of view, it would have been even more useful if they also incorporated "where"-aspect. Hog1 in the cytoplasm and Hog1 in the nucleus could be quite different in their function. I presume that it can be easily done in this framework by assigning different states to Hog1, but a more explicit (visual) representation of subcellular localization might be useful for biologists.
1st Revision - authors' response
© European Molecular Biology Organization
16 November 2011
5
Please find our revised manuscript entitled: “A framework for mapping, visualisation and automatic model creation of signal transduction network” attached. We have carefully revised the manuscript according to your and the reviewers’ recommendations, as detailed below. In particular, we have: 1) Expanded the comparison to previous work, by completely reworking the introduction to more clearly place our contribution in relation to the state of the art in visualisation and rule based modelling, and by including comparisons of our results to previous work throughout the manuscript. 2) Clarified the novel aspect, and especially that we provide a framework that integrates three critical levels of network analysis: Network definition, visualisation and mathematical modelling. 3) Clarified in the main text how the reactions and contingencies are individually referenced in the network definition (Table S1) by PubMed identifiers. This network is probably among the most stringently referenced reconstructions, as each entry is linked to one or more primary research paper after careful evaluation. 4) Illustrated the generality of the framework. We have included the new Table 1 with the reactions used in the MAP kinase network and explain that the format is flexible and easily extendable. In fact, the list of implemented reactions has already grown since submission, as explained in the reply to reviewer #3. The framework is completely organism independent, and we have renamed the software tool rxncon to avoid giving any other impression. 5) Explained that we decided not to include relocalisation in the MAP kinase network mapping as the information on regulation of (re)localisation was too sparse, but that the format can easily be extended to encompass spatial information. In fact, as detailed in the reply to reviewer #3, we have started to work on localisation reactions and several are already fully implemented. Taken together, these and the other changes listed below have significantly improved the manuscript, and we would like to thank our reviewers for their constructive feedback. We hope that you will find the revised manuscript suitable for publication in Molecular Systems Biology.
Yours Sincerely,
Marcus Krantz on behalf of the authors
Reviewer #1 (Remarks to the Author):
I'm very enthusiastic but this manuscript needs revisions, compare and contrast the present work with earlier related work.
mainly
to
Answer: Thank you. We have expanded the comparison to previous efforts as specified below. The authors introduce new types of graphs and other data structures (e.g., tables for lists of reactions and lists of contingencies). These data structures are useful for visualizing/storing information about proteinprotein interactions with consideration of the subunits, domains, linear motifs, and post-translational modifications involved in the interactions, i.e., the site-specific details. The data structures are all introduced in the various panels of Figure 1. The rest of the figures in the manuscript illustrate how these data structures can be applied to visualize a large collection of information about the site-specific details of interactions of signaling proteins in yeast. Software, called YeastMAP, is provided by the authors for drawing the various types of graphs. The input data for YeastMAP can be used not only to draw graphs but also to automatically construct a specification of a rule-based model in the BioNetGen language. A rule-based model for a cell signalling system is a model in which proteins are treated as agents that interact according to rules, which which are specified using a programming like language and that are consistent with certain physicochemical principles. Thus, the various maps shown in the figures have a direct and unambiguous relationship with an executable model. However, the figures alone do not completely specify an executable model, in part because parameter estimates are needed for a complete specification. Answer: Correct. The end product of this work is a network definition that is stringent enough to automatically convert to a mathematical model, and the framework (and software) that links this network definition to visualisation and modelling by automatic export. We prove this by generation of the BioNetGen input file. Trivial parameters (1) and initial amount (100) are only included to allow test simulations. We make no attempts to further analyse this qualitative model. Kitano and co-workers are well known for constructing impressively large process diagrams for various cell signaling systems. One of these diagrams is often included in a slide deck when a speaker wants to impress her audience with the complexity of cell signaling. The graphs/data structures described in this report represent an important advance on process diagram representation of cell signaling systems in that there is now explicit consideration of the parts of proteins responsible for interactions and the contextual requirements that must be satisfied for interactions to occur (i.e., the elements of the authors' contingency list). The authors identify three goals addressed by their work: "...(i) unambiguously describing the network, (ii) visualising it without simplifications or unsupported assumptions, and (iii) automatically generating mathematical models from knowledge in databases." These goals are very similar to those of work recently reported online that is not cited: Chylek LA et al. (2011) Mol. BioSyst. doi:10.1039/c1mb05077j There is also earlier highly related work of Kohn and co-workers that should probably also be acknowledged - see Kohn KW et al. (2006) Mol. Syst. Biol. 2, 51. The authors present some new ideas and I am very enthusiastic about this manuscript but there is inadequate comparison of the authors' ideas with other ideas in the literature. For example, it would be very interesting to see how the graphs used by the authors to visualize their
yeast interaction network compare to an extended contact map visualization of the same network (see Chylek et al., 2011). Answer: We have completely reworked the introduction to better describe the current state of the art, and expanded the comparisons in the results section as we introduce our framework. Unfortunately, neither extended contact maps nor entity relationship diagrams are supported by software tools with text based network import, which precludes visualisation of the MAP kinase network in these formats. However, we now clearly state that the reaction and contingency information is suitable for visualisation in these formats. The updated text sections now read: Introduction: “…These advantages are mirrored on the visualisation side by graphical reaction rules, which use the process description format to display individual rules (Blinov et al, 2006). Network level visualisation has used either topological contact maps (Danos, 2007) or entity relationship diagrams (Le Novere et al, 2009), and these complementary visualisation formats have recently been combined in the extended contact map (Chylek et al, 2011). Contact maps have software support, but neither entity relationship diagrams nor extended contact maps can be generated automatically from the rule based models. …” Results: “We address the second issue; comprehensive visualisation, with two novel forms of visualisation; the contingency matrix and the regulatory graph. These also keep reactions and contingencies separate and hence avoid the combinatorial explosion and implicit assumptions. Both include the complete information about reactions (C1) and contingencies (C2). This data structure is also well suited for visualisation in entity relationship diagrams or extended contact maps, but these cannot be generated automatically (Chylek et al, 2011; Le Novere et al, 2011). …” Results: “…The full reaction graph displays the domains and residues involved in each reaction. The protein parts are independent nodes and defined as neighbours (proteins can have domains or residues, domains can have subdomains or residues, subdomains can have residues). The inclusion of domain information makes the reaction graph similar to the (extended) contact maps (Chylek et al, 2011; Danos, 2007). The reaction graph and contact maps are both purely topological and do not include any contextual information, in contrast to the extended contact map which e.g. may show that binding only occurs to phosphorylated residues. …” Discussion: “…Furthermore, we show that this format is stringent and unambiguously define both rule based models and graphical formats such as the activity flow diagram (condensed reaction graph) and process description formats of SBGN. We are also convinced that the information would suffice to define entity relationship diagrams and extended contact maps, and that these formats would be suitable to visualise the reaction and contingency information. However, automatic visualisation in these formats would require further software development. …” The rule-based model reported by the authors, which can be found in the model.bngl file of the supplementary material, is closely related to a rule-based model developed by Ty Thomson, which can be found at this URL: http://yeastpheromonemodel.org/wiki/Main_Page This work should probably be cited. The Thomson model is more advanced than the model described in the manuscript, in that the Thomson model provides parameter estimates, whereas the parameter values are all 1 in the model.bngl file of the supplementary material. Answer: We now include a direct comparison to this model by translating it to the rxncon format and regenerating the rule based model. This
translation is unambiguous in both directions and should in principle be possible to implement. We have included a section in the results describing the main conclusions on the translation process, a larger section in the supplementary material, the new Table S3 and the Supplementary file 2 that aligns the original and regenerated versions of the model. The added sections read: Results: “…The expansion to rules is fully defined in our data format and the rxncon software tool automatically generates the input file for the computational tool BioNetGen (Blinov et al, 2004). This file can be used for rule-based modelling, network free simulation and creation of SBML files. The translation to and from the rule based format is unambiguous in both directions, and we illustrate this with translation of a rule based model of the pheromone response pathway (yeastpheromonemodel.org). This model contains lumped reactions which we translate to combinations of elemental reactions, resulting in a different equation structure but the same functionality given appropriate choice of rate constants (Table S3). Furthermore, we cannot distinguish different identical proteins in e.g. homodimers, and can therefore not define strict trans reactions within such dimers. Apart from these issues, we can reproduce the same model with only cosmetic/nomenclature differences (see supplementary material for details). …” Supplementary material: “To show that the translation process between our format and the rule based format is bidirectionally unambiguous, we translated the yeastpheromonemodel.org model to the rxncon format (Table S3). This network contains 132 elemental reaction and 514 contingencies (compared to 222 and 313, respectively, in the curated MAP kinase network). Rule based reactions corresponding to single elemental reactions could generally be defined accurately in our format. Lumped reactions needed to be split in their composite elemental reactions. In this particular model, they combined enzymatic modification with enzyme-substrate dissociation, and we consequently implemented this with the catalytic modification (P+/P), the protein-protein interaction (ppi) and a pair of contingencies which required the source state for the interaction and the interaction for modification. We then used this network to regenerate the rule based model and compared the two versions (Supplementary file 2). Note that the names differ between the original and regenerated version for reasons that will be explained below, but we have carefully compared them and annotated functional differences in the model file. We regenerate the same functionalities, with the following exceptions: 1. Lumped reactions remain separate, but the same functionality can be achieved by choosing appropriate rate constants. 2. We lack the ability to distinguish between the different parts of a homodimer. Hence, we cannot enforce trans reactions in the Ste5 scaffold. There are also cosmetic differences between the models. The yeastpheromonemodel.org model assumes that each binding site can only have a single interaction partner and hence only indicates whether or not a site is occupied. In contrast, our more general format does not infer mutual exclusive binding from the domain names, and hence addresses this with contingencies. This leads to much longer protein definitions and many more contingencies, but should not alter model behaviour. Finally, we observed the following set of equations in the original model; Ptp(MAPK_site) + Fus3(docking_site, Y182~PO4) Ptp(MAPK_site!1).Fus3(docking_site!1, Y182~PO4) Ptp(MAPK_site!1).Fus3(docking_site!1, Y182~PO4) -> Ptp(MAPK_site) + Fus3(docking_site, Y182~none)
which means that Ptp and Fus3 only interacts (and dissociates!!) when Fus3 is phosphorylated on Tyr182, and that dephosphorylation is linked to dissociation. However, the spontaneous dephosphorylation: Fus3(Y182~PO4) -> Fus3(Y182~none) can also occur within the complex. Hence, the system will slowly – depending on rate constants – accumulate an inactive Ptp-Fus3 dimer that cannot dissociate or become rephosphoryalted, which is likely to be an oversight in the model creation. Taken together, rule definition directly in the rule based format is still more flexible than the framework we propose here. However, this framework allows definition of a rule based model and the translation is bidirectionally unambiguous. As the rxncon tool is developed, we expect this framework to be a potent tool also for rule base model definition..” The Thomson model is also more advanced in that references are given to justify the model specification in a typical scholarly fashion, whereas in this manuscript, only a reading list of >180 papers is provided in the Materials and Methods section. I know that these authors have justified their process diagrams with such reading lists in past publications, but I find it to be regrettable scholarship. It is really impossible to check the thinking that went into the model specification without more information about how the model specification relates to the papers in the reading list. I actually find the reading list worse than nothing at all because the long list of references from the reading list distract from other references cited in the normal scholarly way in the main text. Answer: Unfortunately, we failed to clearly indicate this link in the main text. We have in fact assigned one or more original research paper as reference to each reaction and contingency. The references are specified as PubMedIDs in the “PubMedIdentifier(s)” column of the reaction and contingency lists (Table S1). Hence, this is one of the most carefully referenced networks available. We have now clarified this in the results and methods sections, which now read: Results: “…The degree of experimental evidence has been evaluated manually and individually for each entry, and references to primary research papers supporting each interaction have been included in the reaction and contingency lists (column “PubMedIdentifier(s)”). …” Methods: “The MAP kinase network map is based on the papers listed below. The specific reference(s) are listed for each reaction and contingency individually in the reaction and contingency lists in the “PubMedIdentifier(s)” column with their PMID number.” The contingency matrix concept illustrated in Figure 1E seems to be related to the state transition table concept of the MFA formalism of Yang J et al. (2010) IET Syst. Biol. 4, 453-466. Answer: There are similarities between the dependency matrix in Yang et al and the contingency matrix in Figure 1E, and we have added a comparison in the corresponding results section. The state transition table of Yang et al. defines the link between source states and target states (via reactions). In our system, one of these is always the elementary state defined in the reaction list and the other is the unmodified complement. Hence, this information is already included in the reaction list. The result section with the comparison now reads: “The contingency matrix integrates the information in the reaction and contingency lists (Fig 1E). The matrix is spanned by the reactions and
their corresponding states (C1) and populated by the contingencies of reactions on states (C2). Each row corresponds to one elemental reaction and each column corresponds to one elemental state. The symbol in each reaction-state intersection specifies how that specific reaction depends on that specific state. Together, one row contains the complete set of rules a reaction follows, and hence describes how it works in every specific state. This is related to a dependency matrix (Yang et al, 2010), although the entries in the contingency matrix are more detailed and unambiguous. …” The reaction graph concept of Figure 1F is related to the extended contact map concept (Chylek et al., 2011). Answer: We now include a direct comparison to extended contact maps in the description of the reaction graph, which reads: “…The full reaction graph displays the domains and residues involved in each reaction. The protein parts are independent nodes and defined as neighbours (proteins can have domains or residues, domains can have subdomains or residues, subdomains can have residues). The inclusion of domain information makes the reaction graph similar to the (extended) contact maps (Chylek et al, 2011; Danos, 2007). The reaction graph and contact maps are both purely topological and do not include any contextual information, in contrast to the extended contact map which e.g. may show that binding only occurs to phosphorylated residues. …” The regulatory graph concept of Figure 1G seems related to the concepts of an influence map (with reactions added) and/or a story - see Danos V et al. (2007) Lect. Notes Comput. Sci. 4703, 17-41. Answer: We now include a direct comparison to influence graphs and “stories” in the section where we introduce the regulatory grapy, which now reads: “…The regulatory graph can easily be translated into an influence graph, which can be used for structural analysis of the network (Kaltenbach et al, 2011). In contrast to the influence graph or “story” (Danos, 2007), the regulatory graph strictly separates the effects of reactions (production or destruction of states) and the modifiers (increase or decrease in reaction rates) via distinct edge types. Furthermore, only the (modified) elemental states are displayed and the (the unmodified) complementary source/target state is implicit. Hence, like in the “stories”, cyclic motifs only appear when there is a true feedback in the system. This visualises both the (possible) sequence of events and the feedbacks clearly. However, in contrast to the “story”, the regulatory graph is comprehensive and simultaneously visualises all possible paths or “stories”. …” The authors introduce two input lists, a reaction list (which can be viewed as a list of context-free rules) and a contingency list (which indicates contextual constraints). These lists are somehow related to the rules of the model.bngl file in the supplementary material, but the mapping is not explained in sufficient detail. Answer: We have now moved a table summarising the translation of elemental reactions to basic rules in the BNGL format from the supplement to the main text (Table 2; see below). We have also somewhat expanded the description in the results section and included clear references to the supplementary material. This part of the results and the legend to Table 2 now reads: Results: “The contingency matrix is a template for automatic generation of mathematical models (Fig 1J). Each elemental reaction corresponds to a basic (context free) rule in a rule or agent based model (Table 2), or, in
other words, a set of rules that share a reaction centre (Chylek et al, 2011). All contextual constrains on an elemental reaction is defined in a single row in the contingency matrix, and this row defines the elemental reaction’s implementation in the rule based format. The basic rule suffices if there are no known modifiers of a particular elemental reaction (i.e. only “0” and “?” apart from the intersection with its own state(s) (which is always “x” for a product state and “!” for a source state)). Every other contingency splits the expression in two rules; one when that elemental state is true and one when it is false. The number of rules needed only increases with the number of quantitative modifiers (“K+” and “K-“) as the qualitative modifiers sets the rate constant to zero in either the “true” (for “x”) or false (for “!”) case (see Supplementary methods for details). The expansion to rules is fully defined in our data format and the rxncon software tool automatically generates the input file for the computational tool BioNetGen (Blinov et al, 2004). This file can be used for rule-based modelling, network free simulation and creation of SBML files. The translation to and from the rule based format is unambiguous in both directions, and we illustrate this with translation of a rule based model of the pheromone response pathway (yeastpheromonemodel.org). This model contains lumped reactions which we translate to combinations of elemental reactions, resulting in a different equation structure but the same functionality given appropriate choice of rate constants (Table S3). Furthermore, we cannot distinguish different identical proteins in e.g. homodimers, and can therefore not define strict trans reactions within such dimers. Apart from these issues, we can reproduce the same model with only cosmetic/nomenclature differences (see supplementary material for details). Hence, the framework addresses the issue of (iii) automatic model generation from the database of biological information.” Table 2: “Implementation of elemental reactions in the rule based format. The table displays how the different elemental reactions in Table 1 are translated to the rule based format. See supplementary methods for additional details.”
Is the mapping one-to-one? Given a specification of a model in the BioNetGen language, can I unambiguously obtain a reaction list and a contingency list? Or does the mapping only go one way in the other direction? Answer: We now include such an example via translation of the yeastpheromonemodel.org model to our format and back again. As outlined above, this translation is unambiguous in both direction but has not yet been implemented, and may be difficult to fully automate considering that
rule based models might include lumped reactions. See comments above for text changes. The interactions or reaction types considered by the authors are all reversible association reactions (labeled as ppi reactions) or phosphorylation reactions (labeled as P+ reactions). These are important types of interactions, but there are many other types of reactions that take place in cell signaling systems. The authors should comment on how at least a few other types of reactions should be handled within their framework. Answer: The MAP kinase network map already includes thirteen different reactions. We have now clarified this in the new Table 1. This list will grow as we continue to update the rxncon software, and we have in fact already implemented additional reaction types (see comments to reviewer #3 for a complete list of already implemented reactions). We have also added a statement about the extendibility of the framework in the first paragraph on the data structure, which now reads: Results: “The events in a signal-transduction network can be categorized in four types: (1) catalytic modifications, (2) bindings and interactions, (3) degradation and synthesis, (4) changes in localisation. Due to the limited information on spatial (re)distribution of components, we have focused on types 1-3 here (Table 1). However, the framework is fully capable to include localisation reactions and the rxncon tool will be upgraded to encompass these in the future. …” Table 1: “Thirteen reaction types were used to map the MAP kinase network. The table indicates reaction type and classification. Additional details are provided in the “Reaction Definition” sheet of Table S1 and S2.”
How are multiple chemically indistinguishable components, such as the two antigen-combining sites of an IgG antibody, treated in the authors' framework? I'm not sure that such cases can be captured in a contingency matrix (Figure 1E). For example, one can show that for A to bind B, A can't have already bound B, which makes sense. However, what should be done if A can interact with either of two B's? Answer: The model includes both homodimers and self-interactions and can handle these reaction types. However, as illustrated in the comparison of the yeastpheromonemodel.org model, we cannot distinguish the two components
of a homodimer and hence not enforce e.g. trans phosphorylations. This has been detailed in the text, which now reads: “…Furthermore, we cannot distinguish different identical proteins in e.g. homodimers, and can therefore not define strict trans reactions within such dimers. …” However, the IgG example could be addressed with the following reactions and contingencies: IgGH ppi IgGH IgGH ppi IgGL Antigen i IgGL
# Heavy chain dimerization # Heavy – light chain interaction # One chain interacts with antigen
Antigen_i_IgGL ! AND IgGH—IgGH AND IgGH—IgGH
# Req. Boolean state of assembled IgG # Defines Boolean: Must have IgGH dimer # Defines Boolean: Must have IgGL—IgGH
Which means that the antigen can only bind the antibody if it is assembled. I don't understand the circular layout of components of proteins. Is some unstated convention being followed? It seems to me that a linear layout from the N-terminus to the C-terminus is natural for the component parts of a protein. A circular layout could be a sensible compact alternative if there is, for example, a convention that puts the most N-terminal component at 12 o'clock and then adds other components clockwise. But it seems like the layout of component parts is arbitrary. Answer: The layout is done in Cytoscape and can be customised by the user (within the limit of that software). We prioritised layout concerns over linear motif organisation, and tried to minimise edge crossings and maximise readability. The basic layout is given by the data structure, as the protein parts are independent nodes and defined as neighbours (proteins can have domains or residues, domains can have subdomains or residues, subdomains can have residues). This could easily be reshaped and restructured to e.g. give linear sequences of domains or nested boxes where the domains/subdomains/residues are nested inside the protein and each other according to sequence order or 3d structure. But in the MAP kinase case, some domains are vaguely defined and may even be overlapping, which precludes an accurate linear layout. We specify this in the results and the legends to figure 2, which now read: Results: “…The full reaction graph displays the domains and residues involved in each reaction. The protein parts are independent nodes and defined as neighbours (proteins can have domains or residues, domains can have subdomains or residues, subdomains can have residues). ….” Figure 2: “…The domain layout in (A) prioritises readability and domain organisation does not reflect linear sequence or protein structure. …” The authors might want to rethink the following comment: "Similarly, review articles proved useful for orientation but not for mapping, as review papers tend to propagate more than merely solid facts." Answer: We have rephrased the statement. It now reads: “…The mapping is based solely on primary research papers and de facto shown data to ensure a high quality network reconstruction. …” Adding legends to the figures to explain symbols might be a little helpful for some readers even though the symbols are explained in the captions.
Answer: We have added labels to Figure 3 and legends to Figures 2, 4 and 5. Typos/grammar issues: p. 15: "This is fully captured be the combined reaction and contingency information." by, not be. Answer: Corrected p. 4 "This is due to the requirement that all modification and interaction partners of are defined in the specific states..." Partners of what? Answer: We have rephrased this statement. This part now reads: “…The process description could meet each of the three criteria above but its utility is severely affected by the combinatorial explosion. It is based on a specific state description, which means that, for each component, each possible combination of modifications and interaction partners must be accounted for explicitly. Hence, only very simple systems can be described completely and only very few models include the entire state space …” p. 7: "These states in turn correspond to the two next last reactions in the reaction lists (Fig 1C)." Next to last, or what? This sentence is confusing. Answer: We have rephrased this statement. This part now reads: “…The data structure is illustrated with a simplified version of the Sho branch of the HOG-pathway (Fig 1B). The reaction list state that e.g. Hog1 phosphorylates (“P+”) Hot1 (Fig 1C; eighth reaction; on the last row), and the contingency list state that this reaction requires (“!”) that Hog1 is phosphorylated on both Thr174 and Tyr176 (Fig 1D; last two rows). These states in turn correspond to the reactions six and seven, respectively (Fig 1C). …”
Reviewer #2 (Remarks to the Author): Tiger et al. propose a framework for generating models of signal transduction based on two types of elements, reactions and contingencies, and present various visualization options their approach provides. Modeling signaling networks is a fundamental and active field of systems biology, where novel approaches are required, so therefore the article addresses a relevant question. However, the authors do not clearly place the contribution in the context of the current state of the art, in particular between the (quite active and broad) field of rule-based modeling and the Systems Biology Graphical Notation. Answer: We have expanded this comparison according to the recommendation of both reviewers. In particular, we completely reworked the introduction and added comparison when we describe the framework. See specific points for detailed responses. While the approach presents interesting solutions to visualize signaling networks, it seems to be based on already existing foundations, rather than a novel conceptual framework. The modeling basis is rule-based modeling. The visualization challenges correspond to those that SBGN tries to address (in particular the Entity-relationship, see below). The authors should place their contribution precisely in the context of these 2 ongoing efforts, as outlined below. Answer: The main result of this paper is the integration between network definition, visualisation and mathematical modelling. We provide a format that allows stringent network definition, at the granularity of empirical data, and link this to automatic visualisation in a range of formats and automatic model creation. Some of these formats are established, such as the process diagram and rule based modelling, others are inspired by other efforts in the field, but several aspects are new; in particular the framework that integrates these three levels of network analysis. We now state this more clearly in the final part of the introduction, which reads: “…Taken together, we provide a framework that integrates the three levels of network analysis; definition, visualisation and mathematical modelling. The framework and its supporting software tool allow highly stringent reconstructions of complex signal transduction networks in a biologically accurate and intuitive format, which allows automatic visualisation in a range of formats, and which unambiguously defines a mathematical model. We expect this to be highly useful for the community and envision a common framework to bridge different standards as well as experimental and theoretical systems biology efforts.” This paper has three results: conceptual framework, software, and a curated network. We don't think the conceptual framework is novel; more focused should be put on the tool and/or the network. Answer: See point above. We are convinced that the most important contribution is the framework itself, which integrates network definition with automatic visualisation and model export. We envisage this or a similar framework as a standard that would greatly facilitate model/network construction and reusability. Specific comments Major
* The article contains many confusing sentences, and uses ambiguous terms with overloaded meanings. For example, in the title and abstract, the word "Mapping" is used. This word is overloaded with many meanings, and it is not very clear immediately what is meant exactly by mapping in the context of this paper. The sentence "more concise mapping adapted to experimental data" - does this refer to mapping experimental data onto pathways, such as many other tools do (e.g. GenMAPP)? If not, in what way is the mapping adapted to experimental data? What type of experimental data is referred to here? This should be explained better. Answer: We have reconsidered each sentence using mapping to eliminate any ambiguity. All remaining instanced refer to mapping in the dictionary sense: “The act or process of making a map”, which in this context means to stringently define the signal transduction network. The adaptation to empirical data refers to the fact that theoretical states (in this framework) correspond to empirical observations. This is very different from the specific states used in most models, and even from more advanced states in a rule based model (although these are generally much closer to – and sometimes at – the granularity of empirical data). We now highlight the importance of this already in the first paragraph of the introduction, and explain further throughout the manuscript. The introduction sections now read: Introduction: “… This explosion refers to the fact that the specific state of each component is determined by multiple covalent modifications or interaction partners, and that these possibilities rapidly combine to a very large number of possible specific states. Experimental data do not generally distinguish between all these specific states, but instead focus mostly on reactions between pairs of components, usually giving no or limited information on other modifications or interaction partners of the reactants. Hence, there is a discrepancy between the granularity of the empirical data and the highly defined specific states used in most mathematical models. This makes the interpretation and use of empirical data in the context of such model states ambiguous and often arbitrary. …” Introduction: “…The rule definition format is also a significant step towards the granularity of empirical data, as compared to the abstract specific states. …” Introduction: “…The key feature of our framework is the strict separation of elemental reactions (and their corresponding states); which defines the possible signalling events in the network, from contingencies; which describes the contextual constrains on these reactions. Importantly, each elemental reaction corresponds directly to a single empirical observation, such as a protein-protein interaction or a specific phosphorylation. The contingencies define the constraints on these elemental reactions in terms of one or more elemental states, e.g., by defining the active state of a protein kinase or the composition of a functional protein complex. Hence, the format directly link model states to empirical observations at the same level of granularity, which pre-empts the need for additional assumptions or extrapolations. …” Similarly, the word framework: does this refer to a software framework or a conceptual framework? Answer: Framework refers to the conceptual framework and its extensions; i.e. the formats and methods we use. We clearly distinguish the framework from the software tool in the text (but it is of course an implementation of the framework).
Also authors use the term "data structure", but this may not be obvious for a general reader what is meant. They need explain upfront what they mean by this. Answer: We explain the data structure immediately after we introduce the concept (first part of the results), and only use it in this context (first parts of the results, Figure 1 legend). * The mathematical basis of this formalisms seems to be rule-based modeling. The authors should describe better how their work relates to the multiple efforts in this field. Basically, what they call reactions seem to us to correspond to rules. The contingencies are the interrelationships among the rules. Answer: Correct. We export our network to the rule based format as it is the equation format most similar to the data structure of our framework. We have not changed this format in any way; our contribution is the automatic generation of a rule based model from a data format that has many additional advantages. We have now expanded the description of this relationship in the results section and added Table 2, which shows how elemental reactions are implemented as rules. See comments to reviewer #1 above for details. The work of Faeder/Blinov/Hlavacek, Ferret/Danos/Fontana, Conzelman/Gilles, Kholodenko, etc. addresses this and should be discussed how they relate to the framework presented here. Some of these groups also develop visual tools to set up these models, and the differences to the proposed approach should be shown. Answer: As mentioned above, we have reworked the introduction and added comparisons to the results section where we present our framework. We now compare our work to graphical reaction rules, contact maps, interaction graphs, stories, dependency matrices, extended contact maps and the three SBGN formats. The comparisons in the results section now read. Results: “…An elemental state is similar to an empirical observation, such as an interaction between two proteins or a specific modification at a specific site on a specific protein. If a protein has been phosphorylated on two sites, this corresponds to two different elemental states. In other words, the elemental states correspond to overlapping (non-disjoint) sets. This is different from the specific states in ordinary state transition models, but analogous to the macroscopic states used in the works by Conzelmann and Kholodenko et al. (Borisov et al, 2008; Conzelmann et al, 2008).…” Results: “…There are six distinct reaction contingencies; the Effector can be absolutely required (!), positive (K+), completely neutral (0), negative (K-), absolutely inhibitory (x), or of unknown effect (?). These overlap partially with the influences of entity relationship diagrams (Le Novere et al, 2011), but distinguish between no effect (0) and no known effect (?). …” Results: “…We address the second issue; comprehensive visualisation, with two novel forms of visualisation; the contingency matrix and the regulatory graph. These also keep reactions and contingencies separate and hence avoid the combinatorial explosion and implicit assumptions. Both include the complete information about reactions (C1) and contingencies (C2). This data structure is also well suited for visualisation in entity relationship diagrams or extended contact maps, but these cannot be generated automatically (Chylek et al, 2011; Le Novere et al, 2011). Instead, we provide export to the reaction graph/activity flow diagram and the process
description, though neither of these can fully and accurately represent the network as discussed below. …” Results: “…The contingency matrix integrates the information in the reaction and contingency lists (Fig 1E). The matrix is spanned by the reactions and their corresponding states (C1) and populated by the contingencies of reactions on states (C2). Each row corresponds to one elemental reaction and each column corresponds to one elemental state. The symbol in each reaction-state intersection specifies how that specific reaction depends on that specific state. Together, one row contains the complete set of rules a reaction follows, and hence describes how it works in every specific state. This is related to a dependency matrix (Yang et al, 2010), although the entries in the contingency matrix are more detailed and unambiguous. …” Results: “…The full reaction graph displays the domains and residues involved in each reaction. The protein parts are independent nodes and defined as neighbours (proteins can have domains or residues, domains can have subdomains or residues, subdomains can have residues). The inclusion of domain information makes the reaction graph similar to the (extended) contact maps (Chylek et al, 2011; Danos, 2007). The reaction graph and contact maps are both purely topological and do not include any contextual information, in contrast to the extended contact map which e.g. may show that binding only occurs to phosphorylated residues. We also use a condensed variant that displays only the central node for each component and collapses multiple reactions of the same kind between a pair of components to a single edge, and hence corresponds to the activity flow diagram of SBGN (Le Novere et al, 2009). …” Results: “…The regulatory graph can easily be translated into an influence graph, which can be used for structural analysis of the network (Kaltenbach et al, 2011). In contrast to the influence graph or “story” (Danos, 2007), the regulatory graph strictly separates the effects of reactions (production or destruction of states) and the modifiers (increase or decrease in reaction rates) via distinct edge types. Furthermore, only the (modified) elemental states are displayed and the (the unmodified) complementary source/target state is implicit. Hence, like in the “stories”, cyclic motifs only appear when there is a true feedback in the system. This visualises both the (possible) sequence of events and the feedbacks clearly. However, in contrast to the “story”, the regulatory graph is comprehensive and simultaneously visualises all possible paths or “stories”. …” Also, there are online databases of rule-based models. The authors seem to claim the opposite?: "the classical rule-based modeling frameworks lack all the database properties of our framework" Answer: We use “database” in the stricter sense of the word rather than as a synonym for “data repository”. “Database property” here refers to the reaction and contingency lists. The rxncon tool accesses (selected parts of) these data to generate customised network maps or models. In addition, the database provides one-to-one annotation for references (see column “PubMedIdentifier(s)” in the reaction and contingency lists in Table S1). To the best of our knowledge, these properties are not yet implemented for rule based models. We believe that the format here is a more appropriate database format, which directly correspond to a rule based model. * Why did the authors not include changes in localisation of proteins? This is a fundamental event in signal transduction, and indeed data is becoming lately available to consider this. Novel developments in rule-based modeling are addressing this type of events.
Answer: We agree that localisation is a fundamental aspect of signal transduction. We are in fact implementing localisation in the rxncon format but chose to exclude it in the MAP kinase network map due to the incomplete information on protein (re)localisation available in the literature. See the answer to reviewer #3 below for further details and a complete list of currently implemented reactions. * As the authors state, SBGN is a new standard for graphically representing biological processes. SBGN has three sub-languages, namely process description (PD), activity flow (AF) and entity relationship (ER). The authors contrast their approach to PD and AF, but oddly enough, not to ER. This is odd because, of the three sub-languages, ER is the one that is most suitable to represent rule-based modelling. ER can represent complex formation, phosphorylation and contingencies without problem. The authors must contrast their approach to ER, and either implement an ER version of the example or at least explain why this is not possible. The work of Kohn et al. on dealing with the combinatorial explosion in MIM (which is very similar to SBGN-ER) could be helpful in this regard (Kohn et al., Mol Syst Biol. 2006;2:51. Epub 2006 Oct 3.) Answer: To the best of our knowledge, entity relationship diagrams lack support for text based import, which precludes use of this format for automatic visualisation. We now clearly state this in the text, and also acknowledge that export to the entity relationship diagram format should be possible given text based import. We have also thoroughly reworked the introduction to give a more complete overview of the SBGN formats. In addition, we also stress this in the results and discussion sections. These parts of the text now read: Introduction: “…These issues are partially addressed in the entity relationship diagram, or molecular interaction map, which comes in two flavours; explicit and implicit (called heuristic and combinatorial by the author (Kohn et al, 2006)). The explicit version requires all specific states to be displayed and hence share the limitations of the process description. In contrast, the implicit version displays only the possible reaction types (or elemental reactions, as we will call them below) and hence largely avoids the combinatorial explosion. The entity relationship diagram represents each component as a single node and reactions in a condensed format. While not as intuitive as the other SBGN formats, it has the advantage of concentrating all information on a given protein and works especially well for simple regulatory circuits, as the concentrated information makes it difficult to trace the order of events in more complex networks. Unlike the other SBGN formats, entity relationship diagrams cannot yet be expressed in a standardised document format.. …” Results“… This data structure is also well suited for visualisation in entity relationship diagrams or extended contact maps, but these cannot be generated automatically (Chylek et al, 2011; Le Novere et al, 2011). …” Discussion: “…Furthermore, we show that this format is stringent and unambiguously define both rule based models and graphical formats such as the activity flow diagram (condensed reaction graph) and process description formats of SBGN. We are also convinced that the information would suffice to define entity relationship diagrams and extended contact maps, and that these formats would be suitable to visualise the reaction and contingency information. However, automatic visualisation in these formats would require further software development. …” * Since the tool is named YeastMap, it appears that it's suitable only for Yeast networks, but the article aims at presenting a general framework. How
hard would it be to adapt the tool to other species, and what are the intentions of the authors in this regard? Answer: The tool is completely species independent. The fundamental signal transduction reactions are the same for all organisms, and the list of reactions can easily be extended to include further aspects of signal transduction. The only species specific aspect is the translation between common and systematic gene names (i.e.; linking each component to unique standard identifiers). This data is stored separately in a separate sheet (“(VI) ORF IDs S. cerevisiae”) and can easily be replaced or extended with that of any other organism(s). We now specify this in the text, which reads: Abstract: “…The framework is species independent and we expect that it will have wider impact in signalling research on any system.” Table S1: “…Columns G-H and J-K are organism specific; G and J translates the common gene names in columns L and Q to unique IDs (using the information in “ORF IDs S. cerevisiae”). Columns H and K lists the species the components are from. Here, this is always S. cerevisiae, but we included this feature to generalise the format to e.g. Host-Pathogen interactions and to make it compatible with PSICQUIC. …” Table S2: “…(VI) To change the mapping between unique IDs and common names, paste a two column list of Unique IDs (column A) and common names (column B) into column A and B – and keep the functions in C and D. …” Was YeastMap specifically created to support the framework presented in this article, or is it a side project? Answer: Yes, the software tool was developed specifically to support this framework. The framework was developed to enable comprehensive and accurate mapping of signal transduction networks. The name was a heritage from this starting point, but is now inappropriate. Hence, we have renamed it rxncon – for reaction and contingency based mapping – to avoid association with any specific organism. The name has been changed throughout the manuscript. * Finally, is the software that goes with this article open source? Either way, this should be stated explicitly. We would highly recommend that the algorithms and software used be published under an open source license, otherwise the use of this framework will be extremely limited. What is needed to run the software? Answer: The software is released under to LGPL license and can be freely downloaded from rxncon.org, as specified in the supplementary material. We have now clarified this also in the first paragraph of the results section The software requires only Web2py which is included in the packages provided for download. Some functionalities require external open source software, as specified in the supplementary material. These sections now read: Results: “…The framework has been implemented in the rxncon software tool that is distributed freely under the open source LGPL license and can be downloaded from www.rxncon.org.” Supplementary Material: “The rxncon software is released under a LGPL license and can be freely downloaded from www.rxncon.org. rxncon is a browser based software for desktop use. It was created using the web framework web2py (www.web2py.com). The graph visualisation relies on the software Cytoscape (www.cytoscape.org) which is interfaced using the CytoscapeRPC plugin (https://wiki.nbic.nl/index.php/CytoscapeRPC). The models rxncon exports are encoded in the BioNetGen (www.bionetgen.org)
modelling language BNGL. Model files can be simulated using the BioNetGen software as well as NFsim (emonet.biology.yale.edu/nfsim/). In addition to that BioNetGen supports the export of Systems Biology Markup Language (SBML) XML documents. The SBML export can be further improved by the rxncon software and visualised the using the CellDesigner (www.celldesigner.org) software.” Minor * Since its first publication, SBGN has been under continual active development. The authors rightfully point to certain problems with representing combinatorial complexity in SBGN, but that doesn't mean that these issues could not be addressed in future versions. It would be nice if the authors acknowledged that SBGN is not set in stone, and that issues brought forward here could be used to improve SBGN in the future. Answer: We share the reviewer’s conviction that SBGN is an important standardisation effort, and we make this clearer in the discussion and acknowledge that SBGN is evolving in the introduction. Finally, we explicitly state that we envision this or similar frameworks as standards. These sections now read: Introduction: “…However, the SBGN standards are under continuous development and these issues will likely be addressed in the future through the SBGN markup language, SBGN-ML.” Discussion: “… This issue has been widely recognised and substantial efforts have been committed to improve and standardise our tools for visualisation and modelling of cellular networks (Hucka et al, 2003; Le Novere et al, 2009). These standardisation efforts are essential for data exchange and reusability, …” Discussion: “…Hence, we advocate a more fundamental level of network definition than graphical or mathematical formalism. We envisage this or a similar framework as a standard to greatly facilitate model/network construction, exchange and reusability.” * Figures 3 and 4 contain too much detail to make them readable. Answer: The main message of these figures is not in the text (which is not readable in A4 format). Even at size A4, Figure 3 clearly shows that the contingency matrix is by necessity sparsely populated (black fields can never have contingencies), but even so we only have knowledge of a fraction of the possible contingencies (most remaining fields are grey, i.e. unknown). Figure 4 clearly shows the information transfer through the network (edges), and if it is activating or inhibiting (edge colours). This outlines the three pathways much more clearly than e.g. the topological map (Fig 2). Hence, the critical message is conveyed clearly even at A4 size, and details (text) can be accessed at higher magnification. Please remember that these are complete network descriptions. There is currently no way to display this information more compactly than this. Sup Figs 1,2,3 appear to be provided not as pdf but as xml. Answer: We have replaced these files with pdf versions. * Introduction: use of (i) and (ii) (iii) and referring to this later on is a bit confusing. The meaning is clear with some extra thought but that is putting a lot of work on the shoulders of the reader. What would be much clearer is to state explicitly that these requirements will be referred back to later in the text, and then refer to them as "requirement i", "requirement ii", etc.
Answer: We now restate the issue also in words at each instance that these references are used. * In the results section "We first distil the available knowledge" -> distil should be distill Answer: Our spell check says otherwise and our dictionary allows for both spellings. We defer to the copy editors.
Reviewer #3 (Remarks to the Author): These authors developed a new framework to organize a large body of information typical of eukaryotic cellular signal transduction. As a demonstration, they applied the method to the budding yeast MAP kinase signaling pathways, which are arguably the best-characterized eukaryotic signaling pathways. Their survey of literature is comprehensive, and the resulting model depicts the current state of the collective knowledge. From a biologist's point of view, it would have been even more useful if they also incorporated "where"-aspect. Hog1 in the cytoplasm and Hog1 in the nucleus could be quite different in their function. I presume that it can be easily done in this framework by assigning different states to Hog1, but a more explicit (visual) representation of subcellular localization might be useful for biologists. Answer: We agree with reviewers that an ideal network map also includes localisation information. Hence, we have implemented localisation reactions (see below for a complete list of implemented reactions). As reviewer #3 correctly assumes, we have done this via localisation states. The visual representation can be achieved with the regulatory graph and creative layout, e.g. by clustering nuclear import and export reactions around an imaginary or displayed nuclear membrane. However, we took the decision not to include spatial information in the MAP kinase network map as we deemed the available information too sparse. Even for the well characterised MAPK signalling network there is limited information on localisation and, more importantly, the regulation of localisation for most components. We like to stress that the framework fully supports mapping of localisation, and we are currently working on implementing this. We have already implemented a range of transport mechanisms as well as the most common translocation boundaries in a format that can easily be extended by the user (see table below). Hence, it will be easy for users to include localisation aspects they deem important. In addition, the rxncon software will be updated as new reaction types are implemented, and the user will be alerted via the automatic update system when these are available.
MSB - This email has been sent through the NPG Manuscript Tracking System NY-610A-NPG&MTS
Molecular Systems Biology Peer Review Process File
2nd Editorial Decision
14 December 2011
Thank you again for submitting your work to Molecular Systems Biology. We have now heard back from the referee who agreed to evaluate your revised manuscript. As you will see, this reviewer (#2 during the initial round of review) still has substantial concerns, which we feel are sufficient to preclude publication of this work. This reviewer already had clear concerns regarding the significance of the conceptual advance presented by this work during the first round of review, and s/he feels that the revised manuscript has not sufficiently addressed this issue. A key part of his/her concern is that the current framework does not fully support existing community standards in the visual representation of biological networks, and in particular lacks support for the SBGN Entity Relationship language. As you note in your Introduction section, SBGN-ER has important advantages for controlling combinatorial explosion when visualizing biological models, a point this reviewer emphasizes with a SBGN-ER diagram of the model you present in Fig. 1 (see attached PDF). Moreover, you write, "the most important contribution is the framework itself, which integrates network definition with automatic visualization and model export." Given that automatic visualization forms an essential part of this framework, we feel this reviewer's concerns are highly-relevant and cannot be disregarded. In addition, we do feel that issues of this nature could affect the degree to which this framework is adopted by a broad range of researchers. Overall, given these concerns and the very clear recommendation by this reviewer that this work is not yet a sufficiently decisive advance for Molecular Systems Biology, we feel that we cannot offer to publish this work in its present form. In most cases, Molecular Systems Biology only allows a single round of major revision. The other two reviewers, however, were more supportive during the first round of review, and Reviewer #2 does indicate that addition of SBGN-ER support would fill an important hole in this current work and provide a more substantial advance. As such, we would like to provide you with one more opportunity to address this reviewer's concerns and to submit a final revised manuscript. We do recognize that SBGN-ER visualization support would probably not be trivial to implement, especially given the current dearth of software support for SBGN-ER. Nonetheless, reviewer #2 believes that some type of visualization should be achievable, and clearly automatic generation of SBGN-ER would be a valuable addition to this framework, even if somewhat rudimentary. Other improvements that provide a more complete software solution might also be helpful, and could help to further support this as a framework that will be useful to other researchers (for example, direct output of SBML from rxncon). If you feel you can address these remaining concerns, you may wish to submit a revised version of your manuscript. Please attach a covering letter giving details of the way in which you have handled each of the points raised by the referees. A revised manuscript may once again be subject to review and you probably understand that we can give no guarantee at this stage that the eventual outcome will be favorable. Sincerely, Editor - Molecular Systems Biology
[email protected] --------------------------------------------------------------------------Referee reports: Reviewer #2 (Remarks to the Author):
© European Molecular Biology Organization
6
Response to authors The authors have rewritten and updated the manuscript with the aim to answer the questions and concern of these and the other reviewers. We thank them for their efforts, and we believe that the manuscript is now better place in the context of other works. We consider the work to provide some interesting and useful additions to the state of the art, that we believe is an incremental improvement over existing formalisms and tools. We also feel that at the technical level this work could be more complete: e.g. as the authors acknowledge spatial aspects can be added but are still in progress, and compatibility with SBGN is not present but in our opinion (see below), this should be doable. p4: “Unlike other SBGN formats, entity relationship diagrams cannot yet be expressed in a standardised document format”. Of course, being a visualization standard, it is perfectly fine to use any standard graphical format such as SVG or PNG to publish SBGN diagrams. In the same context, the authors mention that “The rxncon tool provides automatic export to established visual formats” (p5, near the bottom). If by this you mean a graphical format such as SVG or PNG, then why can’t SBGN-ER be supported? In p9, the authors state that “that entity relationship diagrams can not be generated automatically”. This is somewhat different to the statement in p16 “automatic visualization in these formats would require further software development”. The former appears to mean “impossible” but the latter says “requires a lot of effort”. We’ve taken the liberty to manually draw an SBGN-ER diagram corresponding to the example in the article (Use it freely as you see fit). This diagram contains both the reactions and the contingencies, and we believe that it is easier to understand, and more visually appealing than figure 1G. In figure 1G it’s hard to distinguish reaction types (PPI vs. P+), and it’s hard to see which elemental reactions affect the same entity. We note that the authors consider that automatic conversion to SBGN-ER to be be outside the scope of this article, but we feel it can’t be too hard, as long as you don’t expect a nice layout. We see no reason in principle why the drawing of SBGN-ER could not be automated. Doing so would be a major advantage, because SBGN-ER represents a significant standardization effort of more than a decade, and takes into account visual appeal as well as mathematical stringency.
Molecular Systems Biology Peer Review Process File
2nd Revision - authors' response
08 March 2012
We have expanded the rxncon tool to support all three SBGN formats. In particular, the rxncon software now includes a function for automatic generation of SBGN-ER. We illustrated this with a panel in Figure 1 showing the automatic export of the example model (compare to reviewer #2’s diagram). Moreover, there is now also a function for direct export to SBGNPD. We have now added export to SBGN-ER and integrated SBML and SBGN-PD export into the rxncon software tool (the SBML/SBGN-PD export requires BioNetGen, but is completely handled by the rxncon software with no input required from the user). Reviewer #2 (Remarks to the Author): The authors have rewritten and updated the manuscript with the aim to answer the questions and concern of these and the other reviewers. We thank them for their efforts, and we believe that the manuscript is now better place in the context of other works. We consider the work to provide some interesting and useful additions to the state of the art, that we believe is an incremental improvement over existing formalisms and tools. We also feel that at the technical level this work could be more complete: e.g. as the authors acknowledge spatial aspects can be added but are still in progress, and compatibility with SBGN is not present but in our opinion (see below), this should be doable. Answer: The rxncon software now supports direct export to SBGN-ER and SBGNPD, and uses the SBGN compatible Biographer tool for visualisation of both SBGN-ER and SBGN-PD. Moreover, the reaction graph contains all information required to generate SBGN-AF, which we illustrate with a new panel in Figure S1. Hence, the framework is now fully compatible with SBGN. We chose to include the reaction graph instead of the SBGN-AF as we strive to avoid any decontextualized “activation” information. Consequently, and now also stated in the text, visualisation in SBGN-AF generates a network consisting of the components and a large number of (often reciprocal) “modulation” edges, and we are convinced that the reaction graph visualisation provides a more informative network representation. The legend to Figure S1 now includes: “(B) Activity flow diagram corresponding to the reaction graph in Figure 1F. The activity flow diagram is generated by taking the condensed reaction graph information, replacing all unidirectional edges (P+ in this case) with “modulation” edges, replacing all non- or bidirectional edges (ppi in this case ) with two reciprocal “modulation” edges, and eliminating all duplicates. The “modulation” edges were used to avoid generic (decontextualised) statements of activity. This is essential to maintain the stringency, as protein activity may be highly dependent on the context of activation. E.g., the cyclin dependent kinase can be activated by a number of cyclins, but the target sets can vary greatly depending on the specific cyclin.” p4: “Unlike other SBGN formats, entity relationship diagrams cannot yet be expressed in a standardised document format”. Of course, being a visualization standard, it is perfectly fine to use any standard graphical format such as SVG or PNG to publish SBGN diagrams. Answer: We have now developed export routines for SBGN-ER, implemented them in the rxncon software, and added a panel to Figure 1 showing the automatically generated (but manually laid out) entity relationship diagram of the example network. The SBGN-ER diagrams are visualized using Biographer, which can be accessed over the web (http://biographer.biologie.hu-berlin.de) or installed locally (http://code.google.com/p/biographer/). Biographer can export the images as SVG, and can also be used for SBGN-PD visualisation. In the same context, the authors mention that “The rxncon tool provides automatic export to established visual formats” (p5, near the bottom). If by this you mean a graphical format such as SVG or PNG, then why can’t SBGN-ER be supported? Answer: This referred to established formats for visualization of cellular networks and specifically to SBGN-PD and the reaction graph/SBGN-AF formats. This now also includes SBGN-ER.
© European Molecular Biology Organization
7
Molecular Systems Biology Peer Review Process File
In p9, the authors state that “that entity relationship diagrams can not be generated automatically”. This is somewhat different to the statement in p16 “automatic visualization in these formats would require further software development”. The former appears to mean “impossible” but the latter says “requires a lot of effort”. We’ve taken the liberty to manually draw an SBGN-ER diagram corresponding to the example in the article (Use it freely as you see fit). This diagram contains both the reactions and the contingencies, and we believe that it is easier to understand, and more visually appealing than figure 1G. In figure 1G it’s hard to distinguish reaction types (PPI vs. P+), and it’s hard to see which elemental reactions affect the same entity. Answer: Thank you, this example was helpful. As shown in Figure 1J, the rxncon tool can now generate the corresponding graph automatically (with a little help in the layout, of course). We have also added a brief section on SBGN-ER which discusses the different advantages with SBGN-ER and the regulatory graph. This section reads: “Finally, the rxncon tool provides export to entity relationship diagrams (Fig 1J). Like the regulatory graph, the entity relationship diagram displays reactions and contingencies separately and hence largely avoids the combinatorial complexity. The entity relationship diagram has the advantage of concentrating all information on a given protein around a central node, which works especially well for simple regulatory circuits. This emphasises the role of each component within the network, in contrast to the regulatory graph which emphasises the information flow through the network.” We note that the authors consider that automatic conversion to SBGN-ER to be outside the scope of this article, but we feel it can’t be too hard, as long as you don’t expect a nice layout. We see no reason in principle why the drawing of SBGN-ER could not be automated. Doing so would be a major advantage, because SBGN-ER represents a significant standardization effort of more than a decade, and takes into account visual appeal as well as mathematical stringency. Answer: Done.
© European Molecular Biology Organization
8