Bayesian Emulation and Calibration of a Stochastic ... - CiteSeerX

7 downloads 0 Views 755KB Size Report
Daniel A. HENDERSON, Richard J. BOYS, Kim J. KRISHNAN, Conor LAWLESS, and ... D. A. Henderson is a Lecturer, School of Mathematics & Statistics, New-.
Bayesian Emulation and Calibration of a Stochastic Computer Model of Mitochondrial DNA Deletions in Substantia Nigra Neurons Daniel A. HENDERSON, Richard J. BOYS, Kim J. KRISHNAN, Conor LAWLESS, and Darren J. WILKINSON This article considers the problem of parameter estimation for a stochastic biological model of mitochondrial DNA population dynamics using experimental data on deletion mutation accumulation. The stochastic model is an attempt to describe the hypothesized link between deletion accumulation and neuronal loss in the substantia nigra region of the human brain. Inference for the parameters of the model is complicated by the fact that the model is both analytically intractable and slow to sample from. We show how the stochastic model can be approximated using a simple parametric statistical model with smoothly varying parameters. These parameters are treated as unknown functions and modeled using Gaussian process priors. Several simplifications of our Bayesian model are implemented to ease the computational burden. Throughout the article, we validate our models using predictive simulations. We demonstrate the validity of our fitted model on an independent dataset of substantia nigra neuron survival. KEY WORDS: Bayesian modeling; Computer experiments; Gaussian process; Intractable likelihood; Inverse problem; Systems biology.

1. INTRODUCTION

designed to halt or reverse the decline in neurons associated with Parkinson’s disease. Such a model may also provide a means for predicting changes in the incidence of Parkinson’s disease in aging populations. This article focuses on the calibration of the stochastic biological model of deletion accumulation and cell death so that it may be used for predictive in silico experiments. Here, calibration is used to refer to the estimation of the parameters that define the model. The parameters are estimated based on quantitative experimental data on deletion accumulation in individuals without Parkinson’s disease. We adopt a Bayesian approach to the problem of inferring the unknown values of the parameters of the biological model. This inference problem is made difficult by the fact that the distribution of data sampled from the biological model is analytically intractable. However, forward simulation from the model is possible. Statistical inference based on such simulation models with intractable likelihoods has received considerable attention in the statistics literature; see Diggle and Gratton (1984) for an early account. This problem has been addressed subsequently using, among others, methods of indirect inference (Gourieroux et al. 1993), approximate Bayesian computation (Beaumont et al. 2002), and likelihood-free Markov chain Monte Carlo (MCMC) (Marjoram et al. 2003). In this article, we show how exact Bayesian inference via MCMC can still proceed in theory, using simulations from the biological model. Unfortunately, forward simulation from the biological model is too slow to be used as part of an MCMC scheme and so some form of approximation is required. We propose to emulate the stochastic model, that is, we fit a statistical model to output from the stochastic simulation model. This method of emulating, or modeling, the simulation model is commonplace in the deterministic computer models literature; see Kennedy and O’Hagan (2001) for a description and references. In particular, we construct a parametric statistical model for the biological model whose parameters are smooth nonlinear functions of the biological model parameters. We

Neuronal loss in the substantia nigra region of the human brain is associated with the major symptoms of Parkinson’s disease (Hassler 1938; Fearnley and Lees 1991). Deletion mutations—hereafter referred to as deletions—in mitochondrial DNA (mtDNA) have been observed to accumulate with age to high levels in substantia nigra neurons, with deletion levels observed in Parkinson’s disease patients being higher than those in controls (Bender et al. 2006). The role that mtDNA deletions play in neuronal loss is yet to be established. Understanding how mtDNA deletions accumulate is important for testing hypotheses about possible links between deletion accumulation and cell death. In this article, we describe a stochastic kinetic model of mtDNA deletion accumulation in substantia nigra neurons, which incorporates the death of cells with a high mtDNA deletion load. The model attempts to describe some of the features of the underlying biological processes, the ultimate aim being to build a mechanistic model that will enable the accurate assessment of interventions

D. A. Henderson is a Lecturer, School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne, NE1 7RU, U.K. (E-mail: [email protected]). R. J. Boys is Professor of Applied Statistics, School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne, NE1 7RU, U.K. (E-mail: [email protected]). K. J. Krishnan is a Research Fellow, Mitochondrial Research Group, School of Neurology, Neurobiology, and Psychiatry, Medical School, Newcastle University, Newcastle upon Tyne, NE2 4HH, U.K. (E-mail: [email protected]). C. Lawless is a Research Associate, Institute for Ageing and Health, Newcastle University, Newcastle upon Tyne, NE4 6BE, U.K. (E-mail: [email protected]). D.J. Wilkinson is Professor of Stochastic Modeling, School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne, NE1 7RU, U.K. (E-mail: [email protected]). The authors are affiliated with the Center for Integrated Systems Biology of Ageing and Nutrition (CISBAN) at Newcastle University, which is supported jointly by the Biotechnology and Biological Sciences Research Council (BBSRC) and the Engineering and Physical Sciences Research Council (EPSRC). This work was funded directly by the BBSRC Bioinformatics and e-Science Program II, grant number BBS/B/16550. Part of this work was carried out during the Program on Development, Assessment and Utilization of Complex Computer Models at the Statistical and Applied Mathematical Sciences Institute (SAMSI), Research Triangle Park, NC 27709–4006. The authors wish to acknowledge Carole Proctor, for her input in developing the biological model, and the Bioinformatics Support Unit, Newcastle University, for use of their services in this project. The authors thank the editor, two anonymous referees, and an associate editor for their insightful comments and suggestions on earlier versions of this article.

 2009 American Statistical Association Journal of the American Statistical Association March 2009, Vol. 104, No. 485, Applications and Case Studies DOI 10.1198/jasa.2009.0005 76

Henderson et al.: Bayesian Emulation and Calibration of a Stochastic Computer Model

model these smooth functions nonparametrically using Gaussian process priors. Gaussian processes are the model of choice for deterministic functions in the computer models literature (Sacks et al. 1989; Santner et al. 2003). The novelty here is that we are attempting to emulate a stochastic rather than a deterministic model. We believe that this is the first serious attempt to do so. Our proposed methodology differs from that of Molina et al. (2005), whose aim was to calibrate a slow stochastic simulation model of a traffic network. Rather than emulate the stochastic model based on samples from it, they were able to directly formulate an approximation to the stochastic model based on knowledge of the model. Such an approximation is unavailable for the biological model in this article. Our proposed approach is generic and it seems likely that this approach will work well when applied to other complex stochastic computer models. The ultimate aim of our analysis is to infer the parameters of the biological model based on our Bayesian model and experimental data. By replacing the simulation model with the emulator, we are able to proceed with an MCMC scheme and thereby simulate values from the posterior distribution of the model parameters. Throughout the article, we use predictive simulations to validate aspects of the modeling process. In particular, we implement several simplifications of the full Bayesian model, which yield considerable savings in computational time. This is achieved by dispensing with some sources of uncertainty in the model, which appear to have little impact on the predictive performance of the model. The remainder of the article is structured as follows. In Section 2, we describe the biological model and the parameters we wish to estimate. In Section 3, we describe the experimental data that are used to calibrate the biological model. We also describe our proposed Bayesian model, which links the biological model to the experimental data. Sections 4 and 5 continue with a description of our Gaussian-process-based emulation model and how it can be fitted to samples from the simulator. Results of our calibration of the model are included in Section 6 together with results of some validatory predictive simulations. In Section 7, we use an independent dataset on neuron survival to provide further validation of our calibrated model. The article concludes in Section 8 with a discussion. Supplementary material associated with this article is available from http://www.amstat.org/publications/jasa/supplemental_materials/. 2. BIOLOGICAL MODEL Our biological model of mtDNA population dynamics has been constructed based on an assumption of random genetic drift (Elson et al. 2001). It describes the random accumulation of deletions in a population of mtDNA within substantia nigra neurons and the deterministic removal or death of neurons, which have a high proportion of mtDNA deletions. The population of mtDNA within a cell is made up of two types or species: Y1 represents the number of copies of healthy mtDNA and Y2 represents the number of copies of mtDNA with deletions. The model assumes that, at any instant in time, one of five reactions can occur at random within the cell. The ith type of reaction is denoted Ri, and each type of reaction occurs at a particular rate; the rate constant for reaction i is denoted by

77

ci, i ¼ 1, . . ., 5. The five chemical reactions are listed in (1) in the standard format for representing a set of chemical reactions; see Wilkinson (2006) for further details of representations of stochastic kinetic models. R1 R2 R3 R4 R5

: : : : :

Y1 Y1 Y1 Y2 Y2

! ! ! ! !

Y2 2Y 1 ; 2Y 2 ;

Mutation Synthesis Degradation Mutant Synthesis Mutant Degradation

ð1Þ

Note that, although mtDNA mutations occur during mtDNA replication, mutation events (reaction R1) are rare compared with synthesis and degradation and so we have chosen to model R1 as a simple transformation. The hazard of reaction i, hi(Y, ci) depends both on the number of molecules of the species Y ¼ (Y1, Y2)T and on the stochastic rate constant ci, whose value is unknown. Specifically, the hazards are h1 ðY; c1 Þ ¼ c1 Y 1 ; h3 ðY; c3 Þ ¼ c3 Y 1 ;

1000c3 Y 1 ; Y1 þ Y2 1000c3 Y 2 h4 ðY; c3 Þ ¼ ; Y1 þ Y2 h2 ðY; c3 Þ ¼

ð2Þ

h5 ðY; c3 Þ ¼ c3 Y 2 ; the derivation of these hazards is contained in Section 1 of the supplementary material. This model for single cell mtDNA dynamics is a Markov jump process in continuous time with discrete state space. The model is analytically intractable due to the nonlinear hazards (2), but it can be simulated exactly by using Gillespie’s discrete event simulation algorithm (Gillespie, 1977), among other algorithms; see Wilkinson (2006) for details. Likelihood-based inference for such models based on partial observations is extremely challenging due to having to average over the distribution of possible sample paths; see Boys et al. (2008) for further details. The preceding stochastic kinetic model describes a population of mtDNA in a single neuron. A further assumption in the model is that a cell dies when its proportion of mtDNA deletions, p ¼ Y2/(Y1 þ Y2), attains or exceeds the lethal proportion threshold, t 2 (0, 1]. Mitochondrial threshold effects are discussed in Rossignol et al. (2003). Although this is a type of accidental cell death, or necrosis, rather than strictly programmed cell death such as apoptosis, we model it as a deterministic rather than a stochastic mechanism. In summary, the biological model has three parameters whose values we wish to infer from data: the mutation rate c1, the degradation rate c3, and the lethal threshold t. The parameters c1 and c3 are rates and, therefore, take strictly positive values, so it is convenient to work with the log-transformed parameters u1 ¼ log(c1) and u2 ¼ log(c3). We denote u3 ¼ t, and will refer to u1, u2, and u3 as the calibration parameters. Note that the biological model is sometimes referred to as the computer model, the simulation model, or the simulator depending on the context.

2.1 Prior Beliefs about Calibration Parameters Prior distributions for the three calibration parameters were specified based on beliefs elicited from an expert in the

78

Journal of the American Statistical Association, March 2009

modeling of mtDNA population dynamics. The elicitation was performed using standard techniques, such as those described in Garthwaite et al. (2005). The expert thought that parameter independence was plausible a priori, as the biological mechanisms underlying mutation and degradation are completely different. They chose normal distributions to model their beliefs about the logtransformed rates u1 and u2 and a uniform distribution for the lethal threshold u3, that is, u1 ; Nðau1 ; b2u1 Þ;

u2 ; Nðau2 ; b2u2 Þ;

u3 ; Uðau3 ; bu3 Þ:

The lower and upper 2.5% points of their prior distribution for c1 ¼ eu1 were based on previous investigations of Elson et al. (2001) and stated to be around 106 and 103 per day, resulting in choices au1 ¼ 9 logð10Þ=2 ’ 10:4 and bu1 ¼ 3 logð10Þ=ð2 3 1:96Þ ’ 1:8. The expert’s beliefs about u2 were based on previous knowledge of mtDNA half-life in neurons of other organisms (Gross et al. 1969) and transformed into those about the degradation rate by appealing to the relationship between half-life and degradation rate in a simple exponential decay model. Using 15 and 31 days as the lower 2.5% point and median of the prior for the half-life gives values au2 ¼ logflogð2Þ=31g ’ 3:80 and bu2 ¼ logð31=15Þ=1:96 ’ 0:37. The choice of uniform prior distribution for the lethal threshold was partly a consequence of the expert being adamant that values less than 0.5 were simply not possible and having no strong opinion about which values were more likely than others. These opinions were based to some extent on the work of Hayashi et al. (1991). Therefore, we have taken au3 ¼ 0:5 and bu3 ¼ 1.

Figure 1. Deletion accumulation data. Experimental RT-PCR measurements for 15 individuals using two techniques. The open and closed circles correspond to techniques 1 and 2, respectively.

The data are depicted in Figure 1. They show a general increasing trend, with older individuals tending to have larger measurements and therefore higher proportions of deletions than younger individuals, as one might expect. The data also show high levels of variation at all ages. Some of the measurements are negative, and this is particularly the case for those measurements obtained using technique 1, which tend to be lower than those obtained using the other technique. Clearly, yi must be non-negative. We assume that these negative measurements simply result from the large amount of variability associated with the measurement process rather than from a violation of the fundamental assumptions underlying the use of this procedure.

3. ANALYSIS OF DELETION ACCUMULATION DATA To calibrate the biological model, quantitative data on the accumulation with age of mtDNA deletions in substantia nigra neurons were obtained (postmortem) from 15 individuals without Parkinson’s disease ranging in age 19–91 years. The real-time polymerase chain reaction (RT-PCR) experiments, which produced these data, were performed by one of us (KJK) in collaboration with colleagues from the Mitochondrial Research Group at Newcastle University. The data take the form of RT-PCR–based measurements, zi, of the quantity yi ¼ log2(1  pi), where pi is the true unobservable proportion of mtDNA deletions in a sample of 25 neurons from a slice of brain tissue from the substantia nigra of a particular individual. Samples were taken of 25 neurons to ensure that sufficient DNA was available to perform RT-PCR while leaving enough neurons for multiple samples. Further details justifying the relationship between yi and pi are available in Section 2 of the supplementary material. Multiple independent measurements were made for each of the 15 individuals. Therefore, the index i refers to the ith measurement not to the ith individual. Measurement i was obtained by using one of two experimental techniques, these corresponding to slightly different experimental procedures; see Krishnan et al. (2007) and Bender et al. (2006) for detailed descriptions of techniques 1 and 2. The measurements together with other variables such as the age xi (in years) of the individuals from whom they were obtained are tabulated in Section 2 of the supplementary material.

3.1 Bayesian Model The RT-PCR procedure gives measurements of yi, which are subject to experimental error. Empirical evidence suggests that assuming a normal distribution for this error is appropriate (Larionov et al. 2005). Further information (given in Section 2 of the supplementary material) shows that it is reasonable to assume the same mean but different levels of variability for the two experimental techniques. Therefore, we model the n ¼ 90 measurements as zi j yi ; foi ; Nðyi ; f1 oi Þ;

i ¼ 1; 2; . . . ; n;

where foi is the precision of the measurements when using technique oi 2 {1, 2}. This model does not explicitly account for the repeated measurements available on both individuals and experimental techniques. However, it does capture the main source of variation. Investigations (to be reported elsewhere) show that incorporating additional complexity in the measurement model (using random effects) does not significantly affect the main findings of this article. Prior uncertainty about the two precision parameters, f1 and f2, is represented through semiconjugate independent gamma distributions, fj ; Gammaðafj ; bfj Þ;

j ¼ 1; 2;

with means Eðfj Þ ¼ afj =bfj . We have only fairly weak prior beliefs about the precision parameters and reflect this in the

Henderson et al.: Bayesian Emulation and Calibration of a Stochastic Computer Model

choice afj ¼ 1 and bfj ¼ 1=16, for j ¼ 1, 2. This corresponds to a prior expectation for the measurement error standard deviation of 0.25, which seems reasonable in light of the data displayed in Figure 1. The stochastic kinetic model enters the Bayesian model through the distribution of yi ¼ log2(1  pi), where pi is the true proportion of deletions, i ¼ 1, . . ., n. We denote the marginal distribution of yi by S(u, xi), since it depends on the calibration parameters u ¼ (u1, u2, u3)T and on the age of the individual, xi. Note that S(u, xi) implicitly incorporates cell survival as, in the simulation of each cell, if at any time up to xi, the proportion of deletions attains or exceeds the lethal threshold u3, then that neuron is deemed to have died and is not used in the final sample. Further simulations must be performed to ensure that there are exactly 25 cells in the sample. Thus, if Y 1 and Y 2 denote the number of copies of normal mtDNA and of mtDNA with deletions, respectively, in the sample of 25 cells at time xi, then a single realization of the proportion of mtDNA with deletions in the sample of 25 cells is pi ¼ Y 2 =ðY 1 þ Y 2 Þ: The realization of yi is obtained by transforming the realized value of pi using yi ¼ log2(1  pi). The unobserved quantities yi are crucial to the Bayesian model, as they link the parameters of the stochastic kinetic model u to the observed data z ¼ (z1, . . ., zn)T and x ¼ (x1, . . . xn)T. In summary, the full Bayesian model is represented in the following hierarchical structure: zi j yi ; foi ; Nðyi ; f1 oi Þ;

i ¼ 1; . . . ; 90;

yi j u; xi ; Sðu; xi Þ; i ¼ 1; . . . ; 90; fj ; Gammaðafj ; bfj Þ; j ¼ 1; 2; u1 ; Nðau1 ; b2u1 Þ;

u2 ; Nðau2 ; b2u2 Þ;

case for our stochastic kinetic model. For example, gillespie2 (available from www.basis.ncl.ac.uk/software.html), an efficient ANSI C implementation of Gillespie’s exact discrete event simulation algorithm (Gillespie, 1977), takes on average 2 minutes (and sometimes considerably longer than this) to generate a single candidate vector of latent data on a processor running at 2.2 GHz with 8 GB RAM. Not surprisingly, this slow simulation prohibits the use of an MCMC scheme for sampling from the posterior distribution of the parameters u. To progress, some form of approximation of the computer model is required. 4. AN APPROXIMATION TO THE STOCHASTIC KINETIC MODEL There are several alternative strategies that could be used for approximating output from the stochastic kinetic computer model. One such strategy is to use a fast approximate simulation algorithm in place of the exact Gillespie algorithm. This, of course, would imply a change in the underlying model. Several alternative approximate simulation algorithms are described in Wilkinson (2006). However, it is likely that none of these approximate algorithms would increase the speed of the simulation to the levels necessary to be used in an MCMC scheme. A more promising strategy is to construct a statistical model of the computer model itself. This approach is commonplace in the deterministic computer model literature in which the statistical model of the computer model is called an emulator (Kennedy and O’Hagan 2001). In this article, we adopt the approach of emulating the computer model. The emulator is fitted to simulated data from the stochastic kinetic computer model.

u3 ; Uðau3 ; bu3 Þ:

The complex nature of the kinetic model means that the distribution of pi, the proportion of deletions, and hence the distribution of the latent data (yi) do not have a known parametric form; essentially, we cannot write or compute directly p(yi|u, xi), the probability density of the yi. 3.2 Simulation-based Bayesian Inference Inferences about the values of the unknown quantities in the model are based on their joint posterior density: pðu; f; y j z; xÞ } pðuÞpðfÞpðy j u; xÞpðz j y; fÞ; T

79

ð3Þ

where f ¼ (f1, f2) . The computations (i.e., integrations) necessary to obtain this density are too complex to be performed analytically. However, we can use a sampling-based approach to make inferences— despite the fact that we cannot compute p(y| u, x) directly—as we can sample from it. The numerical scheme uses a Metropolis–Hastings within Gibbs procedure to sample values from the joint posterior density p(u, f, y|z, x). In particular, it uses independence proposals for the latent data from the stochastic kinetic model to avoid calculation of the density p(y |u, x). Details are provided in Section 3 of the supplementary material. This MCMC procedure will require many thousands or millions of samples from the computer model, so it is essential that the model is fast to simulate. Regrettably, this is not the

4.1 Simulation Model Output We denote the inputs to the computer model by uj ¼ ðuTj ; xj ÞT . These inputs are the vector of calibration parameters uj and the age (or time) xj at which the output is sought. Suppose nD runs of the simulation model are available to us, and denote the matrix of nD values of the inputs at which the model will be run by D. The design matrix D is an nD 3 4 matrix with jth row equal to uj. The simulation model is run at each combination of inputs uj in D to obtain output. A single run of our simulation model with input u corresponds to a simulation of the mtDNA dynamics in a single neuron and gives three outputs: Y1(u), the number of normal copies of mtDNA in the neuron at time x; Y2(u), the number of copies of mtDNA with deletions in the neuron at time x; and S(u), a binary variable that is equal to 1 if the neuron survived until time x and is equal to 0 otherwise. Recall that a cell dies if the proportion of deletions attains or exceeds the lethal threshold u3. 4.2 A Flexible Nonparametric Model We now construct a surrogate probabilistic model (emulator) for the simulation model output. Let y(u) denote a sample of latent data from the simulation model using inputs u. Recall that the latent data are related to the proportion of deletions in 25 cells through y(u) ¼ log2{1  p(u)}, where the sampled proportion of deletions is denoted p(u). Clearly, p(u) can

80

Journal of the American Statistical Association, March 2009

be estimated from the simulator output by the proportion of mtDNA copies with deletions, Y 2 ðuÞ=fY 1 ðuÞ þ Y 2 ðuÞg , where Y 1 ðuÞ and Y 2 ðuÞ are the number of copies of normal mtDNA and mtDNA with deletions in a sample of 25 cells, respectively. Rather than model y(u) directly, we model r(u) ¼ logit{p(u)} ¼ log[p(u)/{1  p(u)}], which is the logit-transformed proportion of deletions. The logit transformation ensures that our quantity of interest r(u) has unconstrained support. We assume that the logit proportion of deletions is normally distributed with mean h(u) and log standard deviation j(u), for some functions h() and j(), that is rðuÞ ; NðhðuÞ; expf2jðuÞgÞ: The normality assumption was assessed and verified informally by studying the output from multiple runs of the computer model. We assume that the mean h() and log standard deviation j() are unknown smooth functions of the computer model inputs u ¼ ðuT ; xÞT . Specifically, we model h and j nonparametrically using independent Gaussian process (GP) priors, hðÞ ; GPfmh ðÞ; ch ð; Þg;

jðÞ ; GPfmj ðÞ; cj ð; Þg;

where we use the notation GP{m(), c(, )} to denote a GP with mean function m() and covariance function c(, ). Although several other approaches exist for flexible nonlinear/nonparametric modeling of an unknown function, we chose a GPbased model largely because it had been used very effectively in the deterministic computer models literature; see O’Hagan (2006) for a tutorial that highlights several advantages of using GPs for this task. We take a relatively uninformative prior specification for the mean and covariance functions, our main assumption being that h and j are smooth functions of the inputs. The mean and covariance functions for h and j depend on vectors of parameters denoted ch ¼ (ch,1, . . ., ch,6)T and c j ¼ (cj,1, . . ., cj,6)T, respectively. Although it is common to model the mean function as a linear combination of simple functions of the inputs (Kennedy and O’Hagan 2001), we have found a simple constant mean function model to work well in practice and so use m*() [ c*,1. Our choice of covariance function takes the following stationary Gaussian form: c ðu; u0 Þ ¼ expð2c;2 Þ expfðu  u0 ÞT V ðu  u0 Þg; where  denotes either h or j. This form of covariance function implies that the functions h() and j() are smooth and infinitely differentiable. It is widely used in the computer models literature partly because it leads to realizations that vary smoothly over the input space, a property which is often desirable in the context of computer model emulation. For further justification of this choice of covariance function and the implied assumption of stationarity, see Section 2.3 of Santner et al. (2003). Although nonstationary GP-based models have been applied to certain deterministic computer models (Gramacy and Lee 2008), we have found the simple Gaussian covariance function performs well in the context of this application. The function c*(, ) is controlled by exp(c*,2), a positive dispersion parameter, and by V* ¼ diag{exp(c*,3), . . ., exp(c*,6)}, a 4 3 4 diagonal matrix of positive roughness parameters. Smaller values of the roughness parameters lead to smoother realizations from the GP.

To complete the Bayesian formulation of the emulator, we take independent uniform U(20, 20) prior distributions for each element of ch and c j. This noninformative choice is motivated by the fact that we have very little prior information regarding the values of these parameters. The choice of uniform prior also reflects our desire to treat the emulation task as generically as possible. 4.3 Experimental Design We now turn to the choice of simulation design, that is, the combinations of parameter values and covariates at which the computer model is run. There is a large literature on the design and analysis of computer experiments, which has almost exclusively focused on designs for modeling deterministic simulators; see Santner et al. (2003) for a recent overview. In contrast, our aim is to calibrate a stochastic simulation model and to use it for predictive simulations. The related design problem is an altogether more complex task, which we approach here in a fairly heuristic manner. Recall that our aim is to model the random variable r(u) by using simulated data from runs of the stochastic kinetic model at the nD input configurations D. To get an idea of the level of stochastic variation in the output, it is natural and potentially quite useful to have multiple independent runs of the model at the same inputs. Multiple simulations increase significantly the amount of CPU time required to obtain the simulator output. With limited CPU time, this results in a tradeoff between the number of simulations, M, and the number of design points, nD. For the design used in this article, values of nD ¼ 250 and M ¼ 1,000 were chosen largely on the basis of required CPU time. Our choice of the nD ¼ 250 design points was motivated by a desire to cover the input space well while having a range of interpoint distances, which may be beneficial when estimating GP parameters, as pointed out by Rougier (2001). Details of our choice of nD-point simulation design are provided in Section 4 of the supplementary material. For the analyses reported in this article, we chose to run M ¼ 1,000 independent simulations for each input configuration in the design D. There were two main considerations that influenced this choice. The first was that each latent data point must be derived from 25 neurons. This means that we have at most 40 independent realizations of the replicate data for each input configuration. Furthermore, not all 1,000 cells will survive, so for some input configurations we may have considerably fewer than 40 replications and in some cases we will have zero replications. The second limiting factor is the amount of CPU time taken for such simulations. The resulting M ¼ 1,000 simulations of our nD ¼ 250-point simulation design took a total of 10.4 days of CPU time to run on 2.2 GHz processors, with 8 GB RAM. Fortunately, the simulations were farmed out to multiple nodes of an 80-node cluster, which meant that they could be run overnight in a matter of hours. 4.4 Training Data The data that are used to fit the emulator are derived from the simulation output as follows. For each input configuration uj,

Henderson et al.: Bayesian Emulation and Calibration of a Stochastic Computer Model

the output from the surviving cells was pooled into nj random samples of size 25. The corresponding logit proportion in each random sample of size 25, wjk ¼ rk(uj), was estimated using the empirical logistic transform as n o n o wjk ¼ log Y 2;k ðuj Þ þ 0:5  log Y 1;k ðuj Þ þ 0:5 ; where Y 1;k ðuj Þ and Y 2;k ðuj Þ, for k ¼ 1, . . ., nj, are the number of normal copies of mtDNA and the number of copies of mtDNA with deletions in the sample of 25 cells, respectively. The empirical logistic transform was used to avoid numerical problems with proportions equal to zero or one. We denote the training data that were used to fit the emulator by D*, the design matrix containing all nD* input configurations for which we have at least one realization of replicate data, and W, the collection of vectors of computer model output wj ¼ ðwj1 ; . . . ; wjnj ÞT corresponding to the input uj from D*. For the design used in this article, we have nD* ¼ 183 out of nD ¼ 250 of the original design points with at least one realization. The fact that we have no training data in some regions of the input space may at first appear to be a source of sampling bias and therefore a cause for concern. However, the regions where we have no training data are the regions where the cells did not survive. The experimental data we have are (by definition) for cells that did survive, and so we are primarily interested in emulating the computer model in those regions in which cells survived. In other words, the regions where we do not have any training data are regions that will have low posterior weight, and so any sampling bias caused by this procedure is likely to be small. The alternative approach of running the model until a prespecified number of cells survive for the given input parameters (as opposed to running a fixed number of cells) is computationally inefficient, because a disproportionate amount of CPU time is spent on input configurations where very few cells survive and therefore to regions with low posterior weight. 5.

BAYESIAN INFERENCE VIA MCMC

Given the training data {D*, W} and the observed data {z, x}, we can in theory fit an integrated Bayesian model for joint emulation and calibration using an MCMC scheme. However, the scheme is computationally extremely challenging, requiring run lengths that effectively prohibit its use. Therefore, we adopt a pragmatic approach, which performs the emulation stage and calibration stage separately, as advocated by Bayarri et al. (2007). The interested reader is referred to Section 5 of the supplementary material for further details of the integrated model for Bayesian emulation and calibration. 5.1 Practical Bayesian Emulation Given the training data {D*, W}, we can fit the emulation model using MCMC as follows. We let hj ¼ h(uj) and jj ¼ j(uj) for j ¼ 1, . . ., nD*, and therefore h ¼ ðh1 ; . . . ; hnD ÞT and j ¼ ðj1 ; . . . ; jnD ÞT . Here, nD* is the number of design points at which we have training data. We wish to sample from the joint posterior distribution of the GP model parameters c h and cj, the vector of means h, and the vector of log standard deviations j, that is, from

81

pðc h ;c j ;h; jj D ; WÞ } pðh j c h ;D Þpðjjcj ;D ÞpðW j h;jÞ; ð4Þ in which the uniform joint densities of the GP parameters c h and cj are subsumed into the proportionality sign. Note that the conditional densities of the latent means h and the latent log standard deviations j are multivariate normal (from the definition of a GP), and the conditional density of the emulator training data W is the product of independent normal densities. A Metropolis–Hastings within Gibbs scheme can be used to sample values from a Markov chain with stationary density equal to the joint posterior density (4). However, the emulation model contains a large number of potentially highly correlated parameters—recall that both h and j have nD ¼ 183 elements. The MCMC algorithm for sampling from this posterior distribution is not only slow to run because of the matrix calculations that are required throughout, but, because of the dimensionality of the parameter space, we require many millions of samples from the Markov chain to satisfy conventional convergence criteria. We therefore have chosen to simplify the emulation model to reduce the computational burden. The first simplification we make is to fix h and j at sensible values and then perform inference conditional on these values. Specifically, for those j 2 f1; . . . ; nD g with nj > 4, that is, only those design points with more than Pnj four samples, we replace hj by the sample mean, h b j ¼ k¼1 wjk =nj , and jj by the P log-transformed sample standard deviation, b jj ¼ 0:5 nj logf k¼1 ðwjk  h b j Þ2 =nj g, of the training data W. This reduces the number of design points in the training data from nD ¼ 183 to nD ¼ 171. The emulation task is then reduced to that of fitting two independent GPs to two independent datasets. The joint posterior density is b; b b Þpðc j j b jÞ ¼ pðch j h jÞ}pðb h j c h Þpðb j j cj Þ: pðc h ; cj j h ð5Þ We also make a slight amendment to the model to account for the fact that we are replacing the unknown h and j by b and b sample-based estimates h j, respectively. In traditional GP regression, a nugget term is used to account for the fact that the underlying process is observed with error. Here, the error with which we observe the underlying processes h and j is due to sampling error. Therefore, we introduce nugget parameters ch,7 and cj,7 and use   b j ch ; N mh^ ; Sh^ þ expðch;7 ÞI and h   b j j c j ; N mj^ ; Sj^ þ expðcj;7 ÞI where mh^ is an nD vector with each element equal to ch,1, Sh^ is an nD 3 nD positive definite covariance matrix with the (i, j)th element Sh^ ði; jÞ ¼ ch ðui ; uj Þ, and I is an nD 3 nD identity matrix; and similarly for j replacing h. Although we may be able to formulate some beliefs about the distribution of the unknown quantities ch,7 and cj,7 based on considerations of finite sampling error, we chose to assign these parameters the same uninformative U(20, 20) prior distribution as the other elements of the now sevendimensional vectors ch and c j .

82

Journal of the American Statistical Association, March 2009

Figure 2. Emulator validation. (a)–(e) Estimates of the posterior predictive density of the logit proportion of deletions at design points (a) uy1 ¼ ð13:85; 4:08; 0:974; 18ÞT , (b) uy2 ¼ ð11:47; 3:54; 0:921; 103ÞT , (c) uy3 ¼ ð12:05; 3:75; 0:632; 46ÞT , (d) uy4 ¼ ð11:66; 4:21; 0:796; 67ÞT , and (e) uy5 ¼ ð10:58; 3:00; 0:893; 65ÞT . The validation data are denoted by the vertical lines. (f) Pointwise posterior predictive means and equal-tailed 95% posterior predictive probability intervals for the proportion of deletions over time based on u ¼ ð10:18; 4:51; 0:962ÞT ; the validation data are denoted by the circles, and the simulator was run at ages 20, 40 60, 80 and 100. The solid lines correspond to the emulator fitted in Section 5.1. The dashed lines correspond to the emulator based on fixed GP parameters and the dotdashed lines to the simplified emulator, both described in Section 5.3.

Sampling from the 14-dimensional posterior distribution (5) is straightforward using two separate Metropolis–Hastings algorithms with multivariate normal random walk proposals centered at the current sampled vectors ch and c j . The covariance matrices of the proposal distributions were tuned from pilot runs of the MCMC scheme. Several independent b; b Markov chains with stationary density pðc h ; c j j h jÞ were simulated from different starting points, and the sampled values were observed to settle down to the same distribution. After an initial burn-in period of 50,000 iterations, the simulated values from a single chain were used for making inferences; every 100th sampled value was stored to give an essentially uncorrelated sample of 500 values. 5.2 Emulator Validation The validity of the emulator as a model for the simulation model can be assessed by comparing simulated output from the simulation model to predictive simulations from the fitted emulation model. This method of model validation by predictive simulations is advocated in Gelman et al. (2004). Simulation from the emulator is straightforward: for a given set of inputs u, we first sample a set of GP hyperparameters c h ; c j from their posterior distribution; then, conditional on these values, we sample from the multivariate normal distributions

for hðuÞ and jðuÞ; and finally, conditional on hðuÞ and jðuÞ, we sample from the resulting normal distribution of the emulator output r. A more detailed description of emulator simulation is provided in Section 6 of the supplementary material. Using this simulation algorithm, a set of independent validation data were simulated from the computer model. The ny ¼ 5 input configurations at which the computer model was run were sampled from the prior distribution for u; and were sampled uniformly from the integers {1, 2, . . ., 110} for x. For each configuration, M ¼ 1,000 cells were simulated independently. The validation data are depicted in Figures 2(a)–(e) together with estimates of the posterior predictive density of the logit transformed proportion of deletions. The estimated densities (solid lines) in Figures 2(a)–(e) are centered on the data (shown by the small vertical lines) and give a good description of the dispersion in the data. Therefore, it would appear that the emulator is a valid model for the simulator over a range of values for the calibration parameters. It is also of interest to assess the validity of the emulator at values of the calibration parameters which are important in terms of calibration, because these are the values at which we require the emulator to be a good model for the simulator. Figure 2(f) shows summaries of predictive simulations from the emulator over time for the input combination u ¼ ð10:18; 4:51; 0:962ÞT , which

Henderson et al.: Bayesian Emulation and Calibration of a Stochastic Computer Model

is the posterior mean of u (see Section 6.1), together with data simulated from the computer model with this value of u at ages 20, 40, 60, 80 and 100. The match between the simulated data and the emulator is good, although the emulator seems to overestimate the value of the output at age 20. This is not too worrying because most of the experimental data are for individuals over 40 years of age, and for predictive purposes, it is these higher ages that are of most interest. In short, predictive simulations from the emulator match up with independent data sampled from the computer model. Therefore, we can have some confidence that the emulator is a good surrogate for the computer model. 5.3 Simplifying the Emulation Model Although the emulation model appears to be fit for purpose, we consider two further simplifications to it to speed up computation. The first is that we fix c h and c j at their posb h and c b j ; as estimated from the sampled values terior means c from the simplified posterior density pðc h ; cj jb h; b jÞ. The dashed lines in Figure 2 depict our estimated posterior predictive densities based on this simplification of the emulation model. It is interesting to note that the effect on predictive performance of this simplification is negligible; the estimated densities in Figure 2 are virtually indistinguishable from those based on the full emulation model. This shows that the simplified model is a good surrogate for the simulation model. It also highlights the more general point that not all sources of uncertainty have the same impact on predictive performance and that there can be considerable computational benefit from investigating potential simplifications of this sort. Another simplification we have found that has little impact on predictive performance is to ignore the inherent variability in the GPs. Specifically, we simplify the emulation model so that simulation from the multivariate normal conditional distributions for h and j is replaced by the conditional mean of the relevant GP. This change gives a simple form for the required posterior predictive distribution, namely, ^ Dy ; Nðh ; expf2jy;j gÞ; j ¼ 1; . . . ; ny ; ^ h; c ^ j; h ^ ; j; ry;j j c y;j where Dy is the prediction design matrix containing the values of the inputs at which output is required. As the parameters of this normal distribution are simple deterministic functions of the calibration parameters, it provides an emulation model from which it is very easy and quick to simulate. The dot-dashed lines in Figure 2 show the effect on the predictive performance of this simplification. There is very little difference in the predictive densities. We note that in regions of parameter space for which validation data are hard to obtain—because not enough cells survive—we have observed that the predictive distributions based on this simplified emulator are considerably narrower than those from the other two more sophisticated emulators. Although there may be room for improvement in the performance of this simplified emulation model in some regions of the parameter space, we believe that this is outweighed by the considerable benefits afforded by its simple probabilistic form and the speed with which we can sample from it. We therefore use this simplified emulation model for calibration.

83

5.4 Calibration The simplified emulation model described in Section 5.3 is used as a substitute for the simulation model in the calibration scheme. The joint posterior density for the unknown quantities in the Bayesian model using this simplified emulation model is pe ðu; f; r j z; xÞ } pðuÞpðfÞpe ðr j u; xÞpðz j r; fÞ;

ð6Þ

T

where we use the emulator output r ¼ ðr1 ; . . . ; rn Þ instead of using the simulator output y: Note that the density of the emulator output is simply the product of n ¼ 90 independent normal densities; that is, ( ) n Y ðri  hy;Iðxi Þ Þ2 1 pffiffiffiffiffiffi exp  pe ðr j u; xÞ ¼ ; 2 expð2jy;Iðxi Þ Þ 2p expðjy;Iðxi Þ Þ i¼1 ð7Þ where the vectors of ‘‘predictive’’ means and log standard deviations, hy ¼ ðhy;1 ; . . . ; hy;nx ÞT and j y ¼ ðjy;1 ; . . . ; jy;nx ÞT , respectively, correspond to the nx ¼ 13 unique ages x ¼ ð19; 20; 32; 42; 44; 51; 56; 72; 75; 77; 81; 89; 91ÞT at which we have observations. The function Iðxi Þ ¼ fj: xi ¼ xj g is needed to map individual ages xi to the unique ages x . The vectors hy and jy are deterministic functions of the calibration parameters u and the ages x; and they are given by the conditional mean of the relevant GP. The other component densities follow from the description of the Bayesian model given in Section 3.1, with r replacing y: Sampling from the joint posterior distribution (6) is carried out using a Metropolis–Hastings within Gibbs procedure, similar to that used in Section 3.2. We have more flexibility now, however, as we can easily compute the joint density pe ðr j u; xÞ. This allows us to separate out the proposal for u from that of r (equivalent to y previously). Furthermore, the ability to compute the density pe ðr j u; xÞ means that we need not be restricted to proposals for r from its prior distribution, the emulation model. We have used a mixture transition kernel that provides global updates using the emulator (with probability pind) and local moves using a Metropolis random walk with multivariate normal innovations (with probability 1  pind). We have found that this procedure works well in practice. For the results in the article, a value of pind ¼ 0.1 was used and appeared to be satisfactory. 6. RESULTS 6.1 Model Calibration Five independent Markov chains with stationary density pe ðu; f; r j z; xÞ were sampled using the scheme outlined in Section 5.2. Each chain was started with different initial values and run for one million iterations after an initial one million iterates were discarded as burn-in. The covariance matrices of the multivariate normal proposal distributions were tuned by using sampled values from pilot runs of the MCMC scheme. Convergence was assessed via graphical and numerical summaries. Based on these summaries, there was no evidence of lack of convergence: the distributions of sampled values from the five chains were practically indistinguishable. Post burn-in, each chain was thinned taking every 1,000th iterate and the

84

Journal of the American Statistical Association, March 2009

Figure 3. Posterior summaries from the MCMC output. (a)–(c) Histograms of the marginal posterior distributions of the calibration parameters (solid lines indicate the prior distributions). (d) Posterior means (circles) and equal-tailed 95% posterior probability intervals (lines) for the true proportion of deletions pi, in sample i ¼ 1, . . ., 90. (e)–(f) Histograms of the marginal posterior distributions of the measurement error precisions (solid lines indicate the prior distributions).

resulting output combined to give a sample of 5,000 (effectively) uncorrelated values from the joint posterior distribution. The main focus of the calibration exercise is to provide inferences for the calibration parameters: u1, the log mutation rate; u2, the log degradation rate; and u3, the lethal threshold. Figures 3(a)–(c) show the marginal posterior densities of these parameters obtained from the MCMC output, together with their prior densities. Monte Carlo–based estimates of their posterior means and 95% equal-tailed posterior probability intervals are reported in Table 1. It is clear that the data have been reasonably informative about these parameters. For example, uncertainty about u1 and u3 has been reduced, and although the level of uncertainty about u2 remains at a similar level, beliefs about its likely value have changed. Interestingly, the posterior analysis suggests that the lethal threshold u3 is likely to be close to one (and very unlikely to be less than 0.75), indicating that cells die only when their mutation load is very high. This finding is of significant biological interest, because it is not yet widely accepted that neurons can continue to surTable 1. Posterior means and 95% equal-tailed posterior probability intervals for the calibration parameters Parameter

Mean

95% interval

u1 u2 u3

10.18 4.51 0.962

(10.57, 9.79) (5.09, 3.92) (0.868, 0.999)

vive despite carrying such high levels of mtDNA deletions. Plausible values for the mutation rate are also of scientific interest, because the rate is currently considered to be very uncertain. However, our analysis suggests that it is very likely to be between 105 and 104 per day. These inferences are based on using the simplified emulation model described in Section 5.3. We note that very similar inferences were obtained when using the more complex emulator that incorporated uncertainty in the GP (using fixed GP parameters); see the dashed lines in Figure 2. For instance, the posterior means for u1 and u2 were the same to one decimal place and those for u3 were the same to two decimal places. The similarity in the posterior distribution obtained using both emulators confirms our earlier intuitive decision to proceed with the simplified emulation model based on its validation performance. We can also use the MCMC output to study pi, the true unobservable proportion of deletions in the ith brain sample, i ¼ 1, . . ., 90. Posterior summaries of these quantities calculated using Monte Carlo estimates are depicted in Figure 3(d). It shows that the proportion of deletions increases steadily with age, from around 0.2–0.4 at age 20 years, to around 0.55–0.7 at age 90 years. This increasing pattern is entirely consistent with the scientific hypothesis underpinning the model. Marginal posterior densities for the measurement error parameters f1 and f2 are depicted in Figures 3(e) and (f). These figures show that the two experimental techniques have a fairly similar accuracy. Uncertainty about the precision of

Henderson et al.: Bayesian Emulation and Calibration of a Stochastic Computer Model

experimental technique 2 (f2) has reduced more than that of experimental technique 1 (f1) as a consequence of having more measurements made using technique 2. Looking at posterior means suggests that technique 2 is (slightly) more accurate than experimental technique 1. This conclusion is reinforced by the calculation of Pðf2 > f1 jzÞ ’ 0:81 from the MCMC output. It is important to recognize that the inferences we have reported for the calibration parameters u have been determined using the emulation model and not the simulation model. Thus, it is difficult to make strong statements about these parameters in terms of the simulation model directly. However, we can build confidence in our inferences by studying whether there is a good match between predictive simulations from our calibrated model and the observed data.

6.2 Model Validation Predictive simulations were used to assess the validity of our Bayesian model and, thereby, our model for the underlying biological mechanism. This involved obtaining K ¼ 5,000 samples from the posterior predictive distribution zpred jz; x; o; where o ¼ ðo1 ; . . . ; on ÞT specifies the measurement technique used for each observation, by sampling from, for k ¼ 1, . . ., K, ðkÞ

ðkÞ

ðkÞ

ðkÞ

1 ; Nð log2 f1  expitðri Þg; ðfðkÞ oi Þ Þ; ðkÞ f2

Figure 4. Monte Carlo estimates of equal-tailed 95% posterior predictive probability intervals (lines) for zi; the RT-PCR measurement for sample i ¼ 1, . . ., 90. The squares (j) and dots (d) correspond to measurements made using techniques 1 and 2, respectively. The circles (s) are posterior predictive means.

inappropriately centered. On the whole, however, these predictive simulations would suggest that our Bayesian model (including the emulator) is valid for the purpose for which it is intended. 7. EXTERNAL VALIDATION: NEURON SURVIVAL DATA

ðkÞ

zpred;i j ri ; f1 ; f2 ; oi ðkÞ ðkÞ ri ; f1 ; and

85

i ¼ 1; . . . ; n;

where are draws from the MCMC output, that is, from the posterior distribution (6) and expit(x) ¼ exp(x)/ {1 þ exp(x)} is the inverse of the logit function. If samples from the posterior predictive distribution of the RT-PCR measurements are very dissimilar to the actual observations, then it is indicative that the model is in some way inadequate. Put another way, we can conclude that the model is not valid for the purpose for which it is intended if the actual observations have low posterior predictive probability. This is an example of internal model validation. Figure 4 shows the observed data together with graphical summaries of their posterior predictive distributions. The observed data appear to be predicted quite well, based on the model, and on the data themselves. Note that, here and throughout, we base our assessments of model fit purely on the basis of graphical summaries such as this. More formal numerical summaries of predictive fit, such as those described in Gelman et al. (2004), for example, may also be used. The model fails to predict the very small observation of 0.918 for one of the 51-year-old individuals. However, the other observations mostly lie within, or fall just outside, the equal-tailed 95% posterior predictive probability intervals, and this is consistent with a well-fitting model. The posterior predictive intervals for measurements made using technique 1 are slightly wider than those for technique 2, a reflection of the larger precision for measurements made using technique 2; see Figures 3(e) and (f). We also note that the measurements made using technique 1 have a tendency to lie toward the lower tail of the posterior predictive distributions, and this may highlight a slight deficiency of the model. The predictive intervals for some of the older individuals are very wide and possibly

A more challenging test of the assumptions underpinning the biological model is to see how well it predicts other (independent) experimental data, that is, to use external validation via out-of-sample prediction. We have chosen to validate the model by using data on age-related substantia nigra neuron survival. These data are directly related to the cell survival aspect of the biological model. Specifically, the data are taken from Table 2(A) in Fearnley and Lees (1991) and comprise the number of neurons surviving in samples from the caudal substantia nigra of 36 individuals without Parkinson’s disease. Note that the data used in this article are a corrected version of that in Fearnley and Lees (1991): the total number of surviving neurons for the 22-year-old individual is 792, not 692. These (corrected) data are tabulated in Section 7 of the supplementary material. Figure 5 shows a scatterplot of number of neurons surviving against the age of these individuals and indicates that the number of neurons probably decreases with age. The figure also includes predictive means and equal-tailed 95% predictive probability intervals that were obtained to validate the biological model. To obtain these validation results, we first had to construct and validate an emulator for neuron survival using a similar procedure to that described in Section 5. Details of this procedure together with details of our proposed binomial measurement error model are given in Section 7 of the supplementary material. In summary, we sampled values of the calibration parameters u from their posterior distribution given the deletion accumulation data; then, sampled from the newly constructed emulator for neuron survival with these calibration parameters as inputs; and finally, sampled from the binomial measurement error model conditional on the sampled number of surviving neurons.

86

Journal of the American Statistical Association, March 2009

Figure 5. Neuron survival data. The dots are the observed number of neurons, the circles represent predictive means, and the lines represent equal-tailed 95% predictive probability intervals.

From Figure 5, the predictive distributions suggest that the distribution of the observed number of neurons evolves smoothly with age. In particular, the mean of the distribution stays fairly constant to around age 45, after which it decreases smoothly. Also, the variance of the distribution increases smoothly with age, indicating that there is more uncertainty about the observed number of neurons at higher ages than there is at lower ages. These predictions not only match up with what we might expect from the underlying biological model, but they are also consistent with the observed survival data themselves. Most of the observed values lie within or just outside the estimated equal-tailed 95% prediction intervals. Although there appears to be some bias in predictions—the majority of observed values lie below their predictive mean—the overall impression is that the model does explain the observed patterns of neuron loss reasonably well. Thus, we can be reasonably confident that the model calibration has been successful and that we can use our fitted emulation models for performing predictive in silico experiments. Overall, we conclude that the simplified Bayesian model provides a reasonable fit to the independent validation data. This is despite the fact that the biological model is relatively simplistic, having only three unknown parameters. 8.

DISCUSSION

We have presented a case study in which we have calibrated a biological model of mtDNA deletion accumulation and cell death based on RT-PCR measurements. From a biological perspective, our analysis suggests that neurons can continue to survive despite carrying very high levels of mtDNA deletions. We have also considerably reduced scientific uncertainty regarding the mutation rate, which we find is very likely to be between 105 and 104 per day. Our general approach involved emulating the stochastic kinetic simulation model, because it was both analytically intractable and too slow to simulate. This was achieved by modeling the output from the simulation model using a simple parametric model (for example, normal or binomial). The parameters of this model were assumed to vary smoothly with the parameters of the biological model, and suitably transformed versions thereof were modeled with GPs. This generic approach was shown to work well, and it seems likely that it will work well for emulating other stochastic

simulation models. In situations in which a simple parametric model with smoothly evolving parameters does not work well, more flexible models may have to be used. Replacing the parametric model by a mixture of parametric models is one such possibility. In this vein, recent work by Griffin and Steel (2006) and Dunson et al. (2007) based on Dirichlet process mixtures, for example, may prove to be useful for flexibly modeling output from stochastic simulation models; this is the subject of ongoing work. Although the overall predictive performance of the biological model is good, there is still room for improvement. One possible development would be to include the data on neuron survival in the calibration of the stochastic kinetic model. Also, random effects generalizations to the model, which allow each individual to have their own mutation rate and degradation rate, might prove to be useful. This is the subject of current work by the authors. REFERENCES Bayarri, M. J., Berger, J. O., Paulo, R., Sacks, J., Cafeo, J. A., Cavendish, J., Lin, C.-H., and Tu, J. (2007), ‘‘A Framework for Validation of Computer Models,’’ Technometrics, 49, 138–154. Beaumont, M. A., Zhang, W., and Balding, D. J. (2002), ‘‘Approximate Bayesian Computation in Population Genetics,’’ Genetics, 162, 2025– 2035. Bender, A., Krishnan, K. J., Morris, C. M., Taylor, G. A., Reeve, A. K., Perry, R. H., Jaros, E., Hersheson, J. S., Betts, J., Klopstock, T., Taylor, R. W., and Turnbull, D. M. (2006), ‘‘High Levels of Mitochondrial DNA Deletions in Substantia Nigra Neurons in Aging and Parkinson Disease,’’ Nature Genetics, 38, 515–517. Boys, R. J., Wilkinson, D. J., and Kirkwood, T. B. L. (2008), ‘‘Bayesian Inference for a Discretely Observed Stochastic Kinetic Model,’’ Statistics and Computing, 18, 125–135. Diggle, P. J., and Gratton, R. J. (1984), ‘‘Monte Carlo Methods of Inference for Implicit Statistical Models (with Discussion),’’ Journal of the Royal Statistical Society: Series B, 46, 193–227. Dunson, D. B., Pillai, N., and Park, J.-H. (2007), ‘‘Bayesian Density Regression,’’ Journal of the Royal Statistical Society: Series B, 69, 163–183. Elson, J., Samuels, D., Turnbull, D., and Chinnery, P. (2001), ‘‘Random Intracellular Drift Explains the Clonal Expansion of Mitochondrial DNA Mutations with Age,’’ American Journal of Human Genetics, 68, 802– 806. Fearnley, J. M., and Lees, A. J. (1991), ‘‘Ageing and Parkinson’s Disease: Substantia Nigra Regional Selectivity,’’ Brain, 114, 2283–2301. Garthwaite, P. H., Kadane, J. B., and O’Hagan, A. (2005), ‘‘Statistical Methods for Eliciting Probability Distributions,’’ Journal of the American Statistical Association, 100, 680–701. Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004), Bayesian Data Analysis (2nd ed.), Boca Raton, FL: Chapman and Hall/CRC. Gillespie, D. T. (1977), ‘‘Exact Stochastic Simulation of Coupled Chemical Reactions,’’ Journal of Physical Chemistry, 81, 2340–2361. Gourieroux, C., Montfort, A., and Renault, E. (1993), ‘‘Indirect Inference,’’ Journal of Applied Econometrics, 8, S85–S118. Gramacy, R. B., and Lee, H. K. H. (2008), ‘‘Bayesian Treed Gaussian Process Models with an Application to Computer Modeling,’’ Journal of the American Statistical Association, 103, 1119–1130. Griffin, J. E., and Steel, M. F. J. (2006), ‘‘Order-Based Dependent Dirichlet Processes,’’ Journal of the American Statistical Association, 101, 179– 194. Gross, N. J., Getz, G. S., and Rabinowitz, M. (1969), ‘‘Apparent Turnover of Mitochondrial Deoxyribonucleic Acid and Mitochondrial Phospholipids in the Tissues of the Rat,’’ The Journal of Biological Chemistry, 244, 1552– 1562. Hassler, R. (1938), ‘‘Zur Pathologie der Paralysis Agitans und des Postenzephalitischen Parkinsonismus,’’ Journal fu¨r Psychologie und Neurologie, 48, 387–476. Hayashi, J.-I., Ohta, S., Kikushi, A., Takemitsu, M., Goto, Y.-I., and Nonaka, I. (1991), ‘‘Introduction of Disease-Related Mitochondrial DNA Deletions into Hela Cells Lacking Mitochondrial DNA Results in Mitochondrial Dysfunction,’’ Proceedings of the National Academy of Sciences of the United States of America, 88, 10,614–10,618.

Henderson et al.: Bayesian Emulation and Calibration of a Stochastic Computer Model Kennedy, M. C., and O’Hagan, A. (2001), ‘‘Bayesian Calibration of Computer Models (with Discussion),’’ Journal of the Royal Statistical Society: Series B, 63, 425–464. Krishnan, K. J., Bender, A., Taylor, R. W., and Turnbull, D. M. (2007), ‘‘A Multiplex Real-Time PCR Method to Detect and Quantify Mitochondrial DNA Deletions in Individual Cells,’’ Analytical Biochemistry, 370, 127–129. Larionov, A., Krause, A., and Miller, W. (2005), ‘‘A Standard Curve Based Method for Relative Real Time PCR Data Processing,’’ BMC Bioinformatics, 6, 1–16. Marjoram, P., Molitor, J., Plagnol, V., and Tavare´, S. (2003), ‘‘Markov Chain Monte Carlo Without Likelihoods,’’ Proceedings of the National Academy of Sciences of the United States of America, 100, 15,324–15,328. Molina, G., Bayarri, M. J., and Berger, J. O. (2005), ‘‘Statistical Inverse Analysis of a Network Microsimulator,’’ Technometrics, 47, 388–398.

87

O’Hagan, A. (2006), ‘‘Bayesian Analysis of Computer Code Outputs: A Tutorial,’’ Reliability Engineering & System Safety, 91, 1290–1300. Rossignol, R., Faustin, B., Rocher, C., Malgat, M., Mazat, J.-P., and Letellier, T. (2003), ‘‘Mitochondrial Threshold Effects,’’ The Biochemical Journal, 370, 751–762. Rougier, J. (2001), ‘‘Comment on ‘Bayesian Calibration of Computer Models’ by Kennedy and O’Hagan,’’ Journal of the Royal Statistical Society: Series B, 63, 453. Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989), ‘‘Design and Analysis of Computer Experiments,’’ Statistical Science, 4, 409–423. Santner, T. J., Williams, B. J., and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. New York: Springer. Wilkinson, D. J. (2006) Stochastic Modelling for Systems Biology. Boca Raton, FL: Chapman & Hall/CRC.

Suggest Documents