Modeling and Estimation of Dynamic EGFR Pathway by Data ...

2 downloads 18545 Views 564KB Size Report
Cell Illustrator is a model building tool based on the Hybrid Functional Petri net ... Both data acquisition and parameter fitting are the basic steps for creating an.
Genome Informatics 17(2): 226–238 (2006)

226

Modeling and Estimation of Dynamic EGFR Pathway by Data Assimilation Approach Using Time Series Proteomic Data Shinya Tasaki1,3∗

Masao Nagasaki2∗

Masaaki Oyama1

[email protected]

[email protected]

[email protected]

Hiroko

Hata1

[email protected]

Tomoyuki

Higuchi4

[email protected] 1 2 3 4



Kazuko

Ueno2

[email protected]

Sumio

Sugano3

[email protected]

Ryo Yoshida2 [email protected]

Satoru Miyano2 [email protected]

Medical Proteomics Laboratory, Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan Human Genome Center, Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan Department of Medical Genome Science, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba 277-8562, Japan Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 1068569, Japan These authors contributed equally.

Abstract Cell Illustrator is a model building tool based on the Hybrid Functional Petri net with extension (HFPNe). By using Cell Illustrator, we have succeeded in modeling biological pathways, e.g., metabolic pathways, gene regulatory networks, microRNA regulatory networks, cell signaling networks, and cell-cell interactions. The recent development of tandem mass spectrometry coupled with liquid chromatography (LC/MS/MS) technology has enabled researchers to quantify the dynamic profile of a wide range of proteins within the cell. The proteomic data obtained by using LC/MS/MS has been considerably useful for introducing dynamics to the HFPNe model. Here, we report the first introduction of the time-series proteomic data to our HFPNe model. We constructed an epidermal growth factor receptor signal transduction pathway model (EFGR model) by using the biological data available in the literature. Then, the kinetic parameters were determined in the data assimilation (DA) framework with some manual tuning so as to fit the proteomic data published by Blagoev et al. (Nat. Biotechnol., 22:1139–1145, 2004). This in silico model was further refined by adding or removing some regulation loops using biological background knowledge. The DA framework was used to select the most plausible model from among the refined models. By using the proteomic data, we semi-automatically constructed a well-tuned EGFR HFPNe model by using the Cell Illustrator coupled with the DA framework.

Keywords: EGFR pathway model, proteome data, HFPNe, Petri net, data assimilation, model selection, parameter estimation

1

Introduction

The epidermal growth factor receptor (EGFR) pathway is one of the best-studied signal transduction systems, and regulates important cellular processes, e.g., DNA synthesis and cell proliferation, migration, and adhesion. In many cases, a defective EGFR pathway is known to be a cause of cancer. Recently, the comprehensive EGFR pathway map was constructed based on the information obtained

Modeling and Estimation of Dynamic EGFR Pathway by DA

227

from the literature [28]. This map revealed the entire structure of the EGFR pathway network with its high complexity. The EGFR pathway has been well analyzed by using in silico modeling [37, 38]. The in vivo dynamics of several proteins introduced in these models were obtained from results of biochemical experiments, e.g., western blotting. These protein data were used for parameter estimation to refine the in silico model. Both data acquisition and parameter fitting are the basic steps for creating an in silico simulation model. Thus, improvements of these processes are crucial for achieving precise in silico simulation. One of the major challenges is the acquisition of in vivo data on signal transduction pathways. During intracellular signal transduction, a large number of molecules are regulated by multiple mechanisms, e.g., binding, phosphorylation, dephosphorylation, localization, degradation, and Ca signaling. Thus, more comprehensive measurement methods are required for obtaining global profiles of cellular signaling. Another challenge is the establishment of a link between the in silico model and the in vivo data. As the amount of in vivo data increases, it becomes more difficult to estimate the kinetic parameters of in silico models that represent the in vivo dynamics. Studies aimed at addressing these problems will be challenging because both experimental and computational problems act as bottlenecks in creating a superior in silico EGFR model. In recent years, proteome analysis with tandem mass spectrometry coupled with liquid chromatography (LC/MS/MS) technology has improved greatly [16, 31, 46]. LC/MS/MS can identify a large number of proteins and peptides from a highly complex sample. Moreover, the establishment of stable isotope labeling methods has made it possible to quantitatively measure proteins and peptides in samples [29, 30]. The quantitative results from LC/MS/MS experiments are very useful for creating more complex and precise EGFR models. Since 1999, we have been developing a software tool called Cell Illustrator [24, 25, 26, 47, 49, 51]. For this, we defined a modeling architecture known as the Hybrid Functional Petri net (HFPN) and its extension (HFPNe). By using this software, we have been analyzing models of pathways such as metabolic pathways, signaling pathways, gene regulatory networks, and microRNA regulatory networks, [8, 9, 20, 21, 22, 35, 41, 44, 47, 49, 51]. Kinetic parameters in these models were tuned manually based on biological knowledge available in the literature. However, with the increase in knowledge about the elements involved in a biological pathway, it becomes extremely difficult to manually introduce these parameters into the models. To overcome this difficulty, Geoffey et al. [17] proposed separating the MAPK pathway model with HFPN into six independent subpathways and estimated their parameters by using a genetic algorithm. However, it must be noted that their method can be applied only if all reactions conform to Michaelis-Menten kinetics. As an alternate computational approach, we have developed data assimilation (DA) framework that can automatically estimate these parameters by using in vivo data. In our previous study, we demonstrated that parameter estimation and model selections based on the DA function very well when coupled with the HFPN model of the mammalian circadian clock [27]. However, since this circadian clock model is an extremely simple model that consists of only 12 entities and 28 processes, it would be a major challenge to apply the DA method effectively to more complex systems. In this study, we have compiled the information available in the literature and modeled the EGFR pathway. By using the extensive timeseries quantitative data on proteins reported by Blagoev et al. [2], we semi-automatically performed the estimation of the fine-tuned parameters based on the DA framework. This method also allowed us to select the best-estimate model that matched the in vivo data from among several hypothesized EGFR models. Our model is the first HFPNe model that is compatible with time-series proteomic data measured by LC/MS/MS, and our study indicates that the DA method is applicable to relatively complex systems. This paper is organized as follows. In Section 2.2, we present the EGFR pathway model for the transduction of epidermal growth factor (EGF) signals; this model has been constructed based on biological data available in the literature. In Section 2.3, as in vivo data, we use the time-series data of tyrosine-phosphorylated proteins in EGF-stimulated HeLa cells with LC/MS/MS technology as reported by Blagoev et al. [2]. In Section 2.4, we demonstrate how to combine the above mentioned in

Tasaki et al.

228

vivo data and the parameter estimation method using the DA. The components of the EGFR model are larger than those of the previous circadian model presented in the study by Nagasaki et al. [27]. To overcome the problems caused by this difference, we divide the EGFR model into three sub-pathways and combine these sub-pathways after applying parameter estimations based on the DA framework. The procedures used for division and merging are also described in Section 2.4. In Section 3.1, we compare the in silico model and the in vivo experimental results and discuss its performance. One of the merits of the DA method is that the best model among various hypothesized models can be selected. In Section 3.2, we present the developed in silico models in which hypothesized regulations are added or removed from the original model. The best model is selected from among these models using the DA method and its performance is discussed.

2.1

Hybrid Functional Petri Net with Extension

Petri net is a network that consists of the following elements: place, transition, arc, and token. In this paper, we use entity, process, and connector for intuitive notations. An entity (depicted as a circle) can hold tokens as its content. A process (depicted as a rectangle) has connectors originating from some entities and connectors exiting the process to enter some entities. A process connected with these connectors defines a firing rule in terms of the contents of the entities to/from which the connectors are attached. When modeling with the Petri net, an entity represents the amount/density of a biological molecule/object, and a process defines the speed/condition/mechanism of interaction/reaction/transfer among the entities connected by the connectors. The conventional Petri net can be used to model only the discrete features in biological pathways, e.g., logical regulatory relationships between genes. To overcome this limitation, we have defined the concept of Hybrid Functional Petri net with extension (HFPNe) [24]. When the notion of object corresponds to the Java class and method, a more detailed biological pathway modeling can be realized using systematic methods, e.g., alternative splicing, ribosomal frameshifting, subcellular localization information [24]. HFPNe has three types of entities - discrete, continuous, and generic - and three types of processes - discrete, continuous, and generic. The concepts of discrete entity and discrete process are identical to those in the traditional discrete Petri net. A continuous entity can hold a real number as its content. A continuous process fires continuously at the speed of the parameter assigned to it. A generic entity can hold any object, e.g., an mRNA sequence, or the phosphorylation state of a protein; moreover, a generic process handles complex reactions by updating the state of connected entities, e.g., degradation, translation, and phosphorylation, for complex pathway modeling. Three types of connectors are used in HFPNe, and a specific value is assigned to each connector as a threshold script. When a process connector (depicted as a solid line connector in Figure 1) with a threshold script is attached to a discrete/continuous/generic process (depicted as a single circle/double circle/a single circle with a cross, respectively), a certain number of tokens are transferred throughout the process connector only if the evaluated result of the threshold script is true. The activity rule of an association connector is identical to that of a process connector in terms of the threshold, but the content of the entity at the source of the association connector is not changed by activation. An association connector (depicted as a dashed line connector in Figure 1) can be used to represent enzyme activity since the enzyme itself is not consumed in a reaction. An inhibitory connector (depicted as a line terminating with a small bar in Figure 1) with a threshold script enables the process to remain active only if the evaluated result of the threshold script is false. For example, an inhibitory connector can be used to represent repressive activity in gene regulation. The formal definition is available in the published study of Nagasaki et al. [24].

2.2

In silico EGFR Model Based on Information in Published Literature

Figure 2 shows an HFPNe model that has been constructed using the compiled and interpreted information on the EGFR signal transduction pathways available in the literature [4, 5, 6, 7, 10, 11,

Modeling and Estimation of Dynamic EGFR Pathway by DA

229

Figure 1: Essentially, Petri nets are constructed using three types of symbols for entities, processes, and connectors. In the Cell Illustrator, both sets of entities and processes are classified into discrete, continuous, and generic types, and entities and processes can be replaced with pictures reflecting the biological images. This replacement makes the HFPNe model of a biological pathway more comprehensible for biologists.

Figure 2: HFPNe model of EGFR pathway. For entities and processes, pictures reflecting biological images are used (see Figure 1). Biological meanings of transitions T1–T31 are summarized in Table 1.

Tasaki et al.

230

Table 1: Biological facts obtained from the literature and assigned to processes in the HFPNe model in Figure 2. #1: Corresponding processes in the HFPNe. #2: Corresponding sub-pathways in Figure 2. Biological phenomena from experimental data in literature Binding of EGF to EGFR induces the dimerization of the receptors resulting in autophosphorylation of the receptors. c-Cbl binds to the tyrosine-phosphorylated EGFR and simultaneously c-Cbl is tyrosine phosphorylated. c-Cbl catalyzes ubiquitination of EGFR. The ubiquitinated EGFR is degraded by the proteasome/lysosome. Vav2 binds to tyrosine-phosphorylated EGFR via its SH2 domain and is tyrosine phosphorylated by EGFR. Vav2 activates Rac1 by promoting the exchange of bound GDP for GTP. Activated Rac/Cdc42 induces activation of MEKKs. We modeled one MEKK as representative of MEKKs that mediate p38MAPK phosphorylation. Activated MEKK phosphorylates MKK3/4/6/7. Phosphorylated MKK3/4/6/7 phosphorylate p38MAPK. We modeled MKK3/4/6/7 as one protein for simplification. Grb2 associates with tyrosine-phosphorylated EGFR. Shc binds to the tyrosine-phosphorylated EGFR. Shc is tyrosine phosphorylated and interacts with Grb2 Sos1 binds to Grb2. Complex of Sos1 associated with EGFR catalyzes Ras GTP/GDP exchange. Ras/GTP recruits Raf to the plasma membrane and promotes Raf activation process. Activated Raf activates both MKK1 and MKK2. Activated MKK1 and MKK2 phosphorylate Erk2. Sos1 is phosphorylated by phosphorylated Erk2, resulting in decreased binding affinity to Grb2. Dok binds to phosphorylated EGFR and is phosphorylated. RasGAP binds to phosphorylated Dok and upregulates the RasGTP dephosphorylation. Activated Rac/Cdc42 induces phosphorylation of MKK1 and MKK2 indirectly.

#1 T1 T2 T3 T4 T5 T6 T7

#2 i i i i i i i

Process type Association Association Phosphorylation Association Phosphorylation Ubiquitination Degradation

Reference

T8 T9 T10

ii ii ii

Association Phosphorylation Exchange

T11

ii

Activation

T12

ii

Phosphorylation

T13

ii

Phosphorylation

T14 T15 T16 T17 T18 T19

iii iii iii iii iii iii

Association Association Phosphorylation Association Association Exchange

[18, 19] [34, 36] [39, 40, 45] [4] [20]

T20

iii

Association

[5]

T21 T22 T23 T24 T25 T26

iii iii iii iii iii iii

[5]

T27

iii

T28 T29 T30 T31

iii iii iii iii

Phosphorylation Phosphorylation Phosphorylation Phosphorylation Phosphorylation Dissociation Association Phosphorylation Association Dephosphorylation Phosphorylation Phosphorylation

[28]

[42]

[23, 32, 41] [7]

[10, 11, 18, 42]

[39] [6]

[1, 15] [10, 11, 18, 42]

12, 18, 19, 20, 23, 32, 34, 36, 39, 40, 41, 42, 43, 45]. We have focused on the pathways that are mediated by tyrosine-phosphorylation with EGF stimulation. Since, tyrosine-phosphorylation is the major mechanism of EGFR signal transduction and the proteomic quantification methods for tyrosinephosphorylated proteins have been established (see Section 2.3). Our EGFR model consists of three major sub-pathways: (i) EGFR activation and degradation pathway, (ii) p38 MAPK activation pathway, and (iii) ERK2 activation pathway. The pathway of p38MAPK activation affects the activation of Erk2 but has been poorly-studied in other EGFR simulation models [37, 38]. Therefore, we have

Modeling and Estimation of Dynamic EGFR Pathway by DA

231

modeled sub-pathway (ii) as well as other sub-pathways. Sub-pathway (i) contains processes T1–T7 as shown in Table1. After the phosphorylation of EGFR by EGF stimulation, c-Cbl binds to the phosphorylated EGFR and is phosphorylated. EGFR is ubiquitinated by c-Cbl and then degraded [T1, T2, T3, T4, T5, T6, and T7]. Sub-pathway (ii) contains processes T8–T13. After the phosphorylation of EGFR, Vav2 also binds to the phosphorylated EGFR and activates Rac1/Cdc42 [T8, T9, and T10]. Rac/Cdc42 activates the MAPK cascade, MEKKs, and p38 MAPK [T11, T12, and T13]. Sub-pathway (iii) contains processes T14–T31. After the phosphorylation of EGFR, related proteins - Shc, Grb2, and Sos1 - bind to this receptor and change their state, i.e., into the phosphorylated form [T14, T15, T16, T17, and T18]. Downstream of Sos1, Ras activates the MAPK cascade, Raf, Mkk1/2, and Erk2 [T19, T20, T21, T22, T23, and T24]. The activated Rac/Cdc42 also induces phosphorylation of Erk2 [T30 and T31]. Activated Erk2 then inactivates Sos1 by phosphorylation [T25 and T26]. After the phosphorylation of EGFR, Dok is also recruited to the phosphorylated EGFR and is activated. The activated Dok reduces the Ras activity via dephosphorylation [T27, T28, and T29]. In the EGFR pathway, activated EGFR binds to many components (Shc, Grb2, Dok, Vav, and c-Cbl in this model). Thus, if all possible interactions among the components are presented in the model, the pathway model needs many processes and will become too complicated. To overcome this difficulty, we have applied three hypotheses to simplify the EGFR model without compromising the necessary information. First, each adaptor protein, with the exception of c-Cbl, competitively binds to EGFR. Second, the phosphorylated EGFR is ubiquitinated by c-Cbl and degraded in the proteasome, regardless of the EGFR binding states. Third, the activity of EGFR is not affected by its ubiquitinated state. In this model, phosphorylated EGFR entities are merged to form one entity whether they are ubiquitinated or not. The ubiquitinated EGFR ratio and the degradation speed of the merged entity are calculated independently by using a special continuous entity that sums the total amount of phosphorylated EGFR using a special generic process. This simplification resulted in the reduction of the total elements in our model.

2.3

In vivo Data: Proteome Data Obtained by LC/MS/MS Using the SILAC Method

Several modified LC/MS/MS approaches have been developed for quantitative proteomics using protein labeling methods, e.g., stable isotope labeling with amino acids in cell culture (SILAC), isotopecoded affinity tags (ICAT) [13], isobaric tags for relative and absolute quantitation (iTRAQ) [33] and CDIT (culture-derived isotope tags) [14]. These approaches allow researchers studying signal transduction pathways in a cell to measure the global time-dependent profiles of tyrosine-phosphorylated proteins [2, 48]. From these methods, we used the data from the time-series data based on the SILAC method used by Blagoev et al. [2] as an input for parameter estimations and model selections with DA. In general, antibody-based methods have been used for the quantification of protein amount. Compared to these conventional methods, LC/MS/MS-based quantification methods have various merits with regard to their comprehensiveness and data accuracy. First, LC/MS/MS-based methods can quantify a large number of proteins including novel proteins during a single measurement. In contrast, in antibody-based methods, the number of measurable proteins depends on the number of antibodies. Since the optimal conditions for each antibody are different, several experiments under different conditions are essential. Second, the quantitative data obtained using LC/MS/MS are less biased than those obtained using antibody -based techniques. The quality of the data largely depends on the antibody’s affinity and specificity for its antigen. With regard to the western blotting technique, it is difficult to eliminate researchers’ subjectivity to determine the band range for quantification. On the other hand, quantification by LC/MS/MS is based on the common properties of protein; hence, its data is more reliable. The basis of the SILAC method is the labeling of all cellular proteins by growing cells in media containing stable isotope-containing amino acids [29, 30]. The SILAC method is a better protein

Tasaki et al.

232



4GNCVKXG#OQWPVU

#DUQNWVG#OQWPVU



stimulation

stimulation



V

T3

V

e7

T4

e6

stimulation

e4

e1 phosphorylation

e2

T1

assembly

e3

T2

e5

Figure 3: Summary of the method used for calculating the relative values of phosphorylated proteins with HFPNe. The lower box shows an example pathway model that consists of seven entities [e1–e7] and four processes [T1–T4]. With the e1 stimulation, e2 is phosphorylated. The phosphorylated e2 binds to e4. Using the experimental method described in Section 2.3, the amounts of e3 and e5 are measured relative to the unstimulated state. To calculate the corresponding value with HFPNe, we introduce two continuous entities [e6, e7] and two generic processes [T3, T4]. The values of e3 and e5 at 0 pt are stored in e6 by the T3 process that fires only at the stimulated time point. The T4 process continuously calculates the values of e3, e5, and e6 entities at every time point and stores the ratio of the sum of e3 and e5 to e6 into e7.

labeling method for LC/MS/MS measurement because of its high labeling efficiency and specificity as compared to other methods that are based on chemical incorporation. The actual scheme of measuring the relative quantitative data of tyrosine-phosphorylated proteins by using the SILAC method is as follows. Cells are grown in a medium containing only one form of an amino acid until it is completely incorporated into all the cellular proteins. Each cultured cell is stimulated by EGF for the indicated time interval and lysed. The lysates are mixed and the tyrosine-phosphorylated proteins are affinity purified using an anti-phosphotyrosine antibody. The purified tyrosine-phosphorylated proteins are fragmented by a protease, e.g., trypsin, followed by LC/MS/MS analysis. By using this approach, the time-series data of 81 tyrosine-phosphorylated proteins related to EGFR signaling was measured by Blagoev et al. [2]. Hence, the quantitative data obtained by LC/MS/MS using the SILAC method appears promising. Thus, in this paper, unlike the procedures followed in other studies [37, 38], the results of the SILAC method are used.

2.4

How to Combine the in vivo Data and the in silico EGFR Model

In our previous study, we performed parameter estimations of the mammalian circadian clock model by using the simulation data with the absolute quantitative data on mRNAs and proteins [27]. On the other hand, the time-series data of tyrosine-phosphorylated proteins in the EGFR pathway measured by the SILAC method is the relative quantitative data. Thus, in this simulation model, we needed to calculate the relative quantitative value for all entities, each of which represents the quantity of samples in vivo. In addition, each value of the in vivo data denotes the relative ratio of tyrosinephosphorylated protein among samples. This implies that the same tyrosine-phosphorylated protein having different states, e.g., single or complex, is mixed into one value. Since it is not possible to directly compare an entity of the in silico pathway model and a value of the in vivo data for applying the parameter estimation framework, we had to introduce two types of special continuous entities (e6 and e7) with generic processes (T3 and T4), as shown in Figure 3. The first type of continuous entity retains all related entity values under an unstimulated steady state (e6 in Figure 3). The other stores the ratio of steady state values, retained in the first type of special entities, to the entity values in the current time-step after EGF stimulation (e7 in Figure 3). When carrying out in vivo and in vitro experiments for cell stimulation with growth factors, it is important to adjust cells to a constant

Modeling and Estimation of Dynamic EGFR Pathway by DA

Fold Activation

Fold Activation

50 40

p-EGFR

30 20 10 0 0 10 8 6 4 2 0 0

10

20 [min]

40

233

8

p-c-Cbl

30

6

20

4

10

2

0 0

5 p-Erk2 4 3 2 1 10 20 [min] 0 0

10

20[min]

p-Vav2

0 0 8 6

p-Shc

10

20[min]

p-p38MAPK

4 2 10

20[min] 0 0

10

20[min]

Figure 4: Comparison of simulation results and corresponding in vivo data. The blue diamonds represent the in silico simulation result. The red squares (five points for each plot) represent the result observed in vivo. state before stimulation. The in vivo data in this paper also conform to the cell condition of culturing in serum-free medium for 16 h before EGF stimulation. This condition is usually called the steady state. It is considered that in the steady state, the amount of each molecule, except for the cell cycle elements, involved in the signal transduction pathway is constant. For emulating the steady state of an in vivo model, we applied 5000 pt simulation before EGF stimulation in the in silico model (pt is the virtual time of the HFPNe model; here, 60 pt = 0.1 h).

3.1

Parameter Estimation Procedure and Results

The in vivo data we used as input for parameter estimation with the DA contained six phosphorylated protein profiles (EGFR, Shc, c-Cbl, Vav2, Erk2, and p38MAPK). The proTable 2: Fitting scores of the observed cedure of parameter estimation by our DA method and its phosphorylated proteins. algorithmic details are described in our previous paper [27]. Entity Name Fitting score [%] The EGFR model shown in Figure 2 consists of 53 entities p-EGFR 99.9 and 115 processes. Because the number of free paramep-c-Cbl 98.5 ters is large, it is difficult to estimate the parameters of p-Shc 94.4 the entire network at once. As described in Section 2.2, p-Vav2 98.2 we have divided the model into three sub-pathways with p-Erk2 97.8 respect to the biological aspects and network structure: (i) p-p38MAPK 5.4 EGFR regulation pathway, (ii) p38 MAPK activation pathway, (iii) ERK2 activation pathway. These sub-pathways have following features. Sub-pathway (i) can be treated separately from sub-pathway (ii) and (iii), because all the signal molecules in sub-pathway (i) are not regulated by the molecules in sub-pathway (ii) and (iii). Sub-pathway (ii) is regulated only by the molecules in sub-pathway (i). Sub-pathway (iii) is subject to regulation by both sub-pathway

Tasaki et al.

234

Table 3: Description of the original model in Section 3.1 and the nine additionally hypothesized models. Name model model model model model model model model model model

1 2 3 4 5 6 7 8 9 10

Add/Remove - (original) add add add add add add add add remove(control)

Description Original model. p38MAPK directly inactivates MKK3/4/6/7. Erk2 directly inactivates MKK3/4/6/7. p38MAPK itself activates the phosphatase of p38MAPK. Erk2 directly activates the phosphatase of p38MAPK. The combination of model 2 and model 4. The combination of model 3 and model 5. The combination of model 2 and model 5. The combination of model 3 and model 4. The activation process of Rac/Cdc42 by Vav2 is removed (T22 in Figure 2).

(i) and (iii) but it doesn’t affect other sub-pathways. This pathway decomposition based on network structure will be applicable for other models including many parameters. By effectively using the above features, the parameters of each sub-pathway were estimated and joined according to the following instructions (it must be noted that the estimation of the parameters in each sub-pathway was performed by the DA method after refining the range of each parameter by manual tuning). First, the parameters of sub-pathway (i) were estimated by the DA method and fixed. By fixing the parameters in sub-pathway (i), the influence of sub-pathway (i) on sub-pathway (ii) and (iii) became constant. Second, we estimated the parameters in sub-pathway (ii) and (iii) in that order since sub-pathway (ii) influences sub-pathway (iii) but no influence exists in the reverse direction. Regarding parameter estimation of sub-pathway (ii), some parameter sets provided results that were similar to the in vivo data. However, the influence of sub-pathway (ii) on sub-pathway (iii) is different among these estimated results. Thus, we treated each of these candidate parameter sets as the fixed parameter in the parameter estimation step of sub-pathway (iii), and selected the best parameter set among them. By using the above processes, the estimation of the parameters in the entire model was completed. All the estimated parameters are shown in the supplementary information [50]. A comparison of the simulation results with these estimated parameters and corresponding in vivo data is shown in Figure 4. Table 2 shows the fitting scores calculated using (1 −

N  i

(Si − Ri )2 /

N  i

Ri2 ) × 100, where

N is the number of observed points, and Si and Ri (0 ≤ i ≤ N ) are the values of simulated and observed data at time point i, respectively. The fitting scores of all proteins except for p-p38MAPK were excellent. As in Figure 4, the discrepancy of p-p38MAPK between our simulation result and the in vivo data was the lack of down regulation at 10 to 20 min. For the downregulation of an activated MAPK, some factors that promote dephosphorylation either inhibit phosphorylation or increase degradation are usually necessary. Thus, we assumed that the absence of such factors in our model was the cause of the difference in the dynamics of p-p38MAPK.

3.2

Model Selection

The important function of the DA framework is to objectively compare a number of hypothesized models and to extract the best model from among them. The fitting score of p-p38MAPK was much worse than that of the other proteins examined. To extract models that are better than the previous model (model 1) and fit well with the experimen-

Modeling and Estimation of Dynamic EGFR Pathway by DA

235

tal results for p-p38MAPK, we create eight EGFR models (model 2–9) by introducing additional hypothesized regulations that are related to the p38MAPK phosphorylation and dephosphorylation to the original model. In addition, as a control, one model (model 10) was created by removing the upstream regulator that influences p38MAPK phosphorylation. The details of these models are summarized in Table 3. For modeling regulations that inhibit the phosphorylation of p38MAPK, two models were created. Model 2 introduced the inhibitory regulation to MKK3/4/6/7 activity from p38MAPK. In contrast, model 3 introduced the same effect from Erk2. In order to model regulations that promote dephosphorylation of p38MAPK, two models were created. In model 4, the phosphatase that is directly activated by p38MAPK itself was modeled. Similarly, model 5 introduced the phosphatase directly activated by Erk2. Other models model 6–9 were combinations of these regulations. The best model was selected based on the approximated marginal likelihood [27]. The features of approximated marginal likelihood are as follows: if the p38MAPK fitting score is improved by adding parameters, due to the penalty for the newly added parameters, the score does not always improve. As summarized in Figure 5, five models (model 2, 4, 6, 8, 9) were better than model 1 (original model). The results showed that models 3, 5, and 7, which have additional regulation only from Erk2, were not acceptable. On the other hand, the models with additional regulation from p38MAPK are more applicable than the original one. Especially, model 6 including both direct inactivation of MKK/3/4/6/7 and activation of phosphatase by p38 MAPK was the best among the 10 models. In summery, our in silico model and statistical framework suggested that some p38MAPK inhibitory regulations are derived not from Erk2 activation but from p38MAPK activation itself.

4

Conclusions and Future Work

We have concentrated on how to apply the high throughput proteomic data measured by LC/MS/MS with the SILAC method to in silico model parameter estimations with the DA. As a model system, we selected the EGFR pathway, which is one of the best-studied signal transduction pathways. We modeled the EGFR pathway by organizing the biological literature with the HFPNe architecture, and combined the in vivo data measured by the SILAC method with the parameter estimation method based on the DA. To perform efficient parameter estimation, we devised a method to simplify the complicated in silico model while retaining its generality. In summary, we divided the entire pathway into some sub-pathways (three in this model) in order to reduce interactions among them, and set the parameter estimation order among them depending on the directions of interactions that were shared Figure 5: The approximated marginal likelihoods by them. Our simulation results indicated that for 10 models in Table 3. The model with a higher the presence of unknown additional regulations score in the y-axis is better than the others. Model in the in silico model. Thus, we created eight 6 is the best model among them. models with additional hypothesized regulations and validated the hypothesized models by the DA method. The time-series data acquired by the proteomic approach includes many function-unknown proteins [2, 48]. This indicates that biological knowledge is still insufficient, even in the well-studied EGFR pathway. For modeling these proteins, it is essential to understand their roles in signal transduction. However, the experimental analysis of

236

Tasaki et al.

each protein takes long time for obtaining its detailed functions in the pathway. Thus, based on high throughput computational methods, predicting protein function is one of the goals of in silico analyses and they have been studied extensively by using various approaches [47]. The Bayesian network inference is one of the promising approaches for the estimation of biological network causality. By the Bayesian network analysis, candidate signal transduction pathways including function-unknown proteins can be organized based on the proteomic data [3, 34]. Using information on these candidates, we will be able to refine our in silico model with the DA framework by performing the following steps. First, each candidate will be modeled with the HFPNe architecture and its dynamics will be refined by the parameter estimation function of the DA. Then, the best model from among these candidates will be selected by the model selection function of the DA. Theoretically, all the proteins measured by LC/MS/MS can be modeled with their dynamics through these steps. Our results showed that the DA could work with a relatively complex system. The next challenge would be to evaluate whether the DA framework can be applied to a highly complex system that reflects the events within the cell in detail. We believe that in the future it will be possible to extend our in silico model by the combination of the DA framework and the Bayesian technology.

References [1] Bernards, A. and Settleman, J., GAP control: regulating the regulators of small GTPases, Trends. Cell. Biol., 14(7):377–385, 2004. [2] Blagoev, B., Ong, S.E., Kratchmarova, I., and Mann, M., Temporal analysis of phosphotyrosinedependent signaling networks by quantitative proteomics, Nat. Biotechnol., 22(9):1139–1145, 2004. [3] Bose, R., Molina, H., Patterson, A.S., Bitok, J.K., Periaswamy, B., Bader, J.S., Pandey, A., and Cole, P.A., Phosphoproteomic analysis of Her2/neu signaling and inhibition, Proc. Natl. Acad. Sci. U S A, 103(26):9773–9778, 2006. [4] Chardin, P., Camonis, J.H., Gale, N.W., van Aelst, L., Schlessinger, J., Wigler, M.H., and BarSagi, D., Human Sos1: a guanine nucleotide exchange factor for Ras that binds to GRB2, Science, 260(5112):1338–1343, 1993. [5] Chong, H., Vikis, H.G., and Guan, K.L., Mechanisms of regulating the Raf kinase family, Cell Signal., 15(5):463–469, 2003. [6] Corbalan-Garcia, S., Yang, S.S., Degenhardt, K.R., and Bar-Sagi, D., Identification of the mitogen-activated protein kinase phosphorylation sites on human Sos1 that regulate interaction with Grb, Mol. Cell. Biol., 16(10):5674–5682, 1996. [7] Crespo, P., Schuebel, K.E., Ostrom, A.A., Gutkind, J.S., and Bustelo, X.R., Phosphotyrosinedependent activation of Rac-1 GDP/GTP exchange by the vav proto-oncogene product, Nature, 385(6612):169–172, 1997. [8] Doi, A., Fujita, S., Matsuno, H., Nagasaki, M., and Miyano, S., Constructing biological pathway models with hybrid functional Petri nets, In silico Biol., 4(3):271–291, 2004. [9] Doi, A., Nagasaki, M., Fujita, S., Matsuno, H., and Miyano, S., Abstract Genomic Object Net: II. Modelling biopathways by hybrid functional Petri net with extension, Appl. Bioinformatics, 2(3):185–188, 2003. [10] Du, Y., Bock, B.C., Schachter, K.A., Chao, M., and Gallo, K.A., Cdc42 induces activation loop phosphorylation and membrane targeting of mixed lineage kinase 3, J. Biol. Chem., 280(52):42984–42993, 2005. [11] Fanger, G.R., Johnson, N.L., and Johnson, G.L., MEK kinases are regulated by EGF and selectively interact with Rac/Cdc42, EMBO J., 16(16):4961–4972, 1997. [12] Guan, Z., Buckman, S.Y., Pentland, A.P., Templeton, D.J., and Morrison, A.R., Induction of cyclooxygenase-2 by the activated MEKK1 –> SEK1/MKK4 –> p38 mitogen-activated protein kinase pathway, J. Biol. Chem., 273(21):12901–12908, 1998.

Modeling and Estimation of Dynamic EGFR Pathway by DA

237

[13] Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R., Quantitative analysis of complex protein mixtures using isotope-coded affinity tags, Nat. Biotechnol., 17:994– 999, 1999. [14] Ishihama, Y., Sato, T., Tabata, T., Miyamoto, N., Sagane, K., Nagasu, T., and Oda, Y,. Quantitative mouse brain proteomics using culture-derived isotope tags as internal standards, Nat. Biotechnol., 23:617–621, 2005. [15] Jones, N. and Dumont, D.J., Recruitment of Dok-R to the EGF receptor through its PTB domain is required for attenuation of Erk MAP kinase activation, Curr. Biol., 9(18):1057–1060, 1999. [16] Kaji, H., Saito, H., Yamauchi, Y., Shinkawa, T., Taoka, M., Hirabayashi, J., Kasai, K., Takahashi, N., and Isobe, T., Lectin affinity capture, isotope-coded tagging and mass spectrometry to identify N-linked glycoproteins, Nat. Biotechnol., 21:627–629, 2003. [17] Koh, G., Teong, H.F., Clement, M.V., Hsu, D., and Thiagarajan, P.S., A decompositional approach to parameter estimation in pathway modeling: a case study of the Akt and MAPK pathways and their crosstalk, Bioinformatics, 22(14):272–280, 2006. [18] Lee, Y.N., Malbon, C.C., and Wang, H.Y., G alpha 13 signals via p115RhoGEF cascades regulating JNK1 and primitive endoderm formation, J. Biol. Chem., 279(52):54896–54904, 2004. [19] Lowenstein, E.J., Daly, R.J., Batzer, A.G., Li, W., Margolis, B., Lammers, R., Ullrich, A., Skolnik, E.Y., Bar-Sagi, D., and Schlessinger, J., The SH2 and SH3 domain-containing protein GRB2 links receptor tyrosine kinases to ras signaling, Cell, 70(3):431–442, 1992. [20] Margolis, B. and Skolnik, E.Y., Activation of Ras by receptor tyrosine kinases, J. Am. Soc. Nephrol., 5(6):1288–1299, 1994. [21] Matsuno, H., Doi, A., Nagasaki, M., and Miyano, S., Hybrid Petri net representation of gene regulatory network, Pac. Symp. Biocomput., 341–352, 2000. [22] Matsuno, H., Tanaka, Y., Aoshima, H., Doi, A., Matsui, M., and Miyano, S., Biopathways representation and simulation on hybrid functional Petri net, In silico Biol., 3(3):389–404, 2003. [23] Moores, S.L., Selfors, L.M., Fredericks, J., Breit, T., Fujikawa, K., Alt, F.W., Brugge, J.S., and Swat, W., Vav family proteins couple to diverse cell surface receptors, Mol. Cell. Biol., 20(17):6364–6373, 2000. [24] Nagasaki, M., Doi, A., Matsuno, H., and Miyano, S., A versatile petri net based architecture for modeling and simulation of complex biological processes, Genome Inform., 15(1):180–197, 2004. [25] Nagasaki, M., Doi, A., Matsuno, H., and Miyano, S. Ed. Chen, Y. P., Computational modeling of biological processes with Petri net based architecture, In Bioinformatics Technologies, Springer Press, 179, 2005. [26] Nagasaki, M., Doi, A., Matsuno, H., and Miyano, S., Genomic Object Net: I. A platform for modelling and simulating biopathways, Appl. Bioinformatics, 2(3):181–184, 2003. [27] Nagasaki, M., Yamaguchi, R., Yoshida, R., Imoto, S., Doi, A., Tamada, Y., Matsuno, H., Miyano, S., and Higuchi, T., Genomic data assimilation for estimating hybrid functional petri net from time-course gene expression data, Genome Inform., 17(1):112–123, 2006. [28] Oda, K., Matsuoka, Y., Funahashi, A., and Kitano, H., A comprehensive pathway map of epidermal growth factor receptor signaling, Mol. Syst. Biol., msb4100014:E1–E17, 2005. [29] Ong, S.E., Blagoev, B., Kratchmarova, I., Kristensen, D.B., Steen, H., Pandey, A., and Mann, M., Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics, Mo. Cell. Proteomics, 1:376–386, 2002. [30] Ong, S.E., Foster, L.J., and Mann, M., Mass spectrometric-based approaches in quantitative proteomics, Methods, 29(2):124–130, 2003. [31] Oyama, M., Itagaki, C., Hata, H., Suzuki, Y., Izumi, T., Natsume, T., Isobe, T., and Sugano, S., Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs, Genome Res., 14:2048–2052, 2004.

238

Tasaki et al.

[32] Pandey, A., Podtelejnikov, A.V., Blagoev, B., Bustelo, X.R., Mann, M., and Lodish, H.F., Analysis of receptor signaling pathways by mass spectrometry: identification of vav-2 as a substrate of the epidermal and platelet-derived growth factor receptors, Proc. Natl. Acad. Sci. USA, 97(1):179– 184, 2000. [33] Ross, P.L., Huang, Y.N., Marchese, J.N., Williamson, B., Parker, K., Hattan, S., Khainovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S., Juhasz, P., Martin, S., Bartlet-Jones, M., He, F., Jacobson, A., and Pappin, D., Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents, Mol. Cell. Proteomics, 3:1154–1169, 2004. [34] Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., and Nolan, G.P., Causal protein-signaling networks derived from multiparameter single-cell data, Science, 308(5721):523–529, 2005. [35] Saito, A., Nagasaki, M., Doi, A. Ueno, K., Matsuno, H., and Miyano, S., Cell fate simulation model of gustatory neurons with microRNAs double-negative feedback loop by hybrid functional Petri net with extension, Genome Inform., 17(1):100–111, 2006. [36] Sakaguchi, K., Okabayashi, Y., Kido, Y., Kimura, S., Matsumura, Y., Inushima, K., and Kasuga, M., Shc phosphotyrosine-binding domain dominantly interacts with epidermal growth factor receptors and mediates Ras activation in intact cells, Mol. Endocrinol., 12(4):536–543, 1998. [37] Sasagawa, S., Ozaki, Y., Fujita, K., and Kuroda, S., Prediction and validation of the distinct dynamics of transient and sustained ERK activation, Nat. Cell. Biol., 7(4):365–373, 2005. [38] Schoeberl, B., Eichler-Jonsson, C., Gilles, E.D., and Muller, G., Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors, Nat. Biotechnol., 20(4):370–375, 2002. [39] Seger, R. and Krebs, E.G., The MAPK signaling cascade, FASEB J., 9(9):726–735, 1995. [40] Skolnik, E.Y., Lee, C.H., Batzer, A., Vicentini, L.M., Zhou, M., Daly, R., Myers, M.J., Jr, Backer, J.M., Ullrich, A., White, M.F., and Schlessinger, J., The SH2/SH3 domain-containing protein GRB2 interacts with tyrosine-phosphorylated IRS1 and Shc: implications for insulin control of ras signaling, EMBO J., 12(5):1929–1936, 1993. [41] Tamas, P., Solti, Z., Bauer, P., Illes, A., Sipeki, S., Bauer, A., Farago, A., Downward, J., and Buday L., Mechanism of epidermal growth factor regulation of Vav2, a guanine nucleotide exchange factor for Rac, J. Biol. Chem., 278(7):5163–5171, 2003. [42] Thien, C.B. and Langdon, W.Y., Cbl: many adaptations to regulate protein tyrosine kinases, Nat. Rev. Mol. Cell. Biol., 2(4):294–307, 2001. [43] Tibbles, L.A., Ing, Y.L., Kiefer, F., Chan, J., Iscove, N., Woodgett, J.R., and Lassam, N.J., MLK-3 activates the SAPK/JNK and p38/RK pathways via SEK1 and MKK3/6, EMBO J., 15(24):7026–7035, 1996. [44] Troncale, S., Tahi, F., Campard, D., Vannier, J.-P., and Guespin, J., Modeling and simulation with hybrid functional Petri nets of the role of interleukin-6 in human early haematopoiesis, Pac. Symp. Biocomput., 11:427–438, 2006. [45] van, der, Geer, P., Wiley, S., Gish, G.D., and Pawson, T., The Shc adaptor protein is highly phosphorylated at conserved, twin tyrosine residues (Y239/240) that mediate protein-protein interactions, Curr. Biol., 6(11):1435–1444, 1996. [46] Washburn, M.P., Wolters, D., and Yates, JR, 3rd, Large-scale analysis of the yeast proteome by multidimensional protein identification technology, Nat. Biotechnol., 19:242–247, 2001. [47] Watson, J.D., Laskowski, R.A., and Thornton, J.M., Predicting protein function from sequence and structural data, Curr. Opin. Struct. Biol., 15(3):275–284, 2005. [48] Zhang, Y., Wolf-Yadlin, A., Ross, P.L., Pappin, D.J., Rush, J., Lauffenburger, D.A., and White, F.M., Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules, Cell Proteomics, 4(9):1240–1250, 2005. [49] http://www.csml.org/ [50] http://www.jsbi.org/journal/GIW06/GIW06F032Suppl.html [51] http://www.cellillustrator.org/

Suggest Documents