Computational approaches to parameter estimation and model ...

28 downloads 4574 Views 573KB Size Report
Oct 19, 2004 - 5 A change from 'normal conditions' to pathogen-free conditions would be ..... suitability, depends upon the computer system (e.g., Windows or ... http://www.mathworks.com/access/helpdesk/help/techdoc/ref/dde23.html ... (invented by Gill, Murray and Saunders) which may be called from a driver program in ...
Journal of Computational and Applied Mathematics 184 (2005) 50 – 76 www.elsevier.com/locate/cam

Computational approaches to parameter estimation and model selection in immunology C.T.H. Bakera, b,∗ , G.A. Bocharova, b, c , J.M. Fordd , P.M. Lumbb , S.J. Nortonb , C.A.H. Paula , T. Junte , P. Krebse , B. Ludewige, f a Department of Mathematics, The University of Manchester, Oxford Road, Manchester M13 9PL, UK b Mathematics Department, University College Chester, Chester, UK c Institute of Numerical Mathematics, Russian Academy of Sciences, Moscow, Russia d Royal Liverpool Children’s NHS Trust, Liverpool, UK e Institute of Experimental Immunology, University of Zürich, Switzerland f Research Department, Kantonsspital St. Gallen, Switzerland

Received 6 May 2004; received in revised form 11 October 2004

Abstract One of the significant challenges in biomathematics (and other areas of science) is to formulate meaningful mathematical models. Our problem is to decide on a parametrized model which is, in some sense, most likely to represent the information in a set of observed data. In this paper, we illustrate the computational implementation of an information-theoretic approach (associated with a maximum likelihood treatment) to modelling in immunology. The approach is illustrated by modelling LCMV infection using a family of models based on systems of ordinary differential and delay differential equations. The models (which use parameters that have a scientific interpretation) are chosen to fit data arising from experimental studies of virus-cytotoxic T lymphocyte kinetics; the parametrized models that result are arranged in a hierarchy by the computation of Akaike indices. The practical illustration is used to convey more general insight. Because the mathematical equations that comprise the models are solved numerically, the accuracy in the computation has a bearing on the outcome, and we address this and other practical details in our discussion. © 2005 Elsevier B.V. All rights reserved. Keywords: Mathematical model; Computational modelling; Parameter estimation; Numerical accuracy; Maximum likelihood; Parsimony; Immune response; Experimental LCMV infection

∗ Corresponding author. Department of Mathematics, The University of Manchester, Oxford Road, Manchester M13 9PL,

UK. E-mail address: [email protected] (C.T.H. Baker). 0377-0427/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.cam.2005.02.003

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

51

1. Introduction 1.1. A plan of the paper Echoing the title of this paper, our plan is first to summarize a methodology for computational modelling, then to apply it to immunological data using a representative set of models. The objective is to decide on a parametrized model which is, in some sense, most likely to represent the information in a set of observed data. We distinguish the preceding objective from that of approximating the actual parameters in an “ideal model”. One situation in which an ideal model clearly exists is that in which the data is generated artificially by solving a system of model equations with actual values of the parameters.This strategy may have a rôle in developing computational codes, but in the case of “real” (observed) data, one needs to consider what conclusions are supported by the observations. The situation that we address is one where neither the generic form of the model, nor the values of the parameters that determine a particular model, are selected in advance. We offer a hierarchy of models, ordered by indices that involve, in part, the number of “abstract” parameters1 as some measure of complexity. The outcome involves an ordering, according to our estimation of a measure of information loss, of particular parametrized models (that is, models with numerical values assigned to the parameters that are determined using the experimental data). An example that illustrates the situation in a representative manner is provided by models of conventional immune responses in experimental virus infections [25,33,40,56]. The main issue is to ascertain which of various models are more consistent with the information available for the system studied. A maximum likelihood approach provides a general framework for parameter estimation, parsimony analysis and model selection (aspects that are inter-related) for a set of plausible mathematical models that may vary considerably in their “architecture”. 1.2. Challenges and general methodology Our objective is to present and illustrate a computational approach for (i) developing a best-fit mathematical model that provides an accurate approximation to the data, (ii) assessing the confidence in the estimates of the parameters in the model and (iii) characterizing the parsimony of the models. Those who attempt computational modelling in immunology are faced with substantial challenges due to the enormous complexity of the systemic-, cellular- and molecular level processes underlying various immunological phenomena [31]. Model formulation is a critical element and it requires selection of essential variables and parameters and specification of causal relationships between them. The model must be related to the amount of data: small data sets support simple models with few reliably estimated parameters. However, various functional forms can be suggested for the same interaction/action, even while retaining ‘simple’ models. Optimum parameters are generally obtained by a minimization of a measure of the closeness of fit of the solution of the model to the given data. In our case, this is associated with maximum likelihood. The Kullback–Leibler information-theoretic measurement of the quality of some model relies on the concept 1 Some parameters may vanish, leaving us with a simplified form of model. Observe that to have more parameters than data points is not compatible with our objectives.

52

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

of an ideal model; it characterizes the information lost when the model is used to approximate the reality, and it should be assessed through an indicator such as that of Akaike [1,8,30]. These indices are evaluated by a maximum likelihood approach which also provides a general framework for parameter estimation. A number of our predecessors working in this area have examined issues similar to, or related to, those that we address (see, e.g., [18,26,54,58]). It has to be appreciated that the quality of models is limited by the quality of data and the robustness of the underlying methodology (and its computational implementation). One sometimes encounters in the literature a variety of models which are presented with little comment on the detail of their derivation, as though the journey from data to model was uncomplicated. The current authors have come to use the term sanitized2 for expositions that do not reveal the real-life difficulties. We endeavour to adopt a realistic attitude and highlight some of the practical considerations and difficulties that can arise. For example, the models that we consider are solved numerically, to a certain precision; the tolerance imposed on the accuracy has an impact in the model that is selected as ‘the best’ (Sections 5.4–5.7). Numerical accuracy also affects the success in optimizing the parameters. 1.3. Differential and delay differential equations as models The general discussion here will be conducted with reference to models based on ordinary differential equations (ODEs) and delay differential equations (DDEs) (see [9,16] etc.). Analytical approaches designed to provide qualitative (but not necessarily quantitative) consistency are usually constrained to simple models. In contrast, the computational approach permits the choice of more realistic (and, if appropriate, quite complex) equations. The ODEs and DDEs considered as potential models have solutions that we denote y(t) = y(t; p) ∈ RM , with parameter p = [p1 , p2 , . . . , pL ]T ∈ RL ; the models often have the following general form: y (t; p) = f(t, y(t; p), y(t − ; p); p) for t ∈ [tstart , T ], y(t; p) = (t; p), for t ∈ [tstart − , tstart ].

(1.1)

In general, the right-hand side functions f (where f ∈ ([tstart , t] × RM × RM × RL → RM )) that arise in biomathematics are continuously differentiable with respect to the arguments, whereas the initial functions  : [tstart − , tstart ] → RM are piecewise continuous with possible finite jump discontinuities at a finite number of points. The numerical treatment of DDEs (or more general neutral delay differential equations—NDDEs; see [5,23]) needs to take into account any inherent discontinuities of the solution (see, e.g., [14,15,55], and references therein). In our view, essential properties of the solution should be emulated in the numerical solution whenever a numerical solution is calculated. The identification of the initial function is the subject of a separate study recently presented in [11–13]. The potential for ill-posedness, remarked upon in [8] is made transparent in [12]. Where do the mathematical models originate? Our models are deterministic,3 and there is a vast body of expertise in mathematical modelling in immunology, which we cannot survey here. Such expertise 2 Concise Oxford Dictionary: sanitize v.tr. (also -ise): US colloq. render (information, etc.) more acceptable by

removing . . . disturbing material. 3 Renshaw’s book [51] provides an introduction to deterministic and stochastic models for population growth, birth-death processes, time-lag models of population growth, competition processes, predator–prey processes, spatial predator–prey systems, fluctuating environments, spatial population dynamics, and epidemic processes.

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

53

is founded upon scientific understanding, and we would counsel against an approach based on what is described by one author as “bits of mathematical machinery that behave in accordance with what is known about a system without constituting any sort of explanation of the behaviour”. Mathematical models can be based on various types of mathematical systems. From our perspective, both the underlying science and an understanding of the transient and long-term dynamics associated with different types of model equations should inform the choice of possible models. The choice of a family of putative models can be regarded as a constraint that may reduce the ill-posedness of the mathematical problem. Baker et al. [9] (see also [27]) have reported on the occurrence in the biomathematics literature of DDEs. The use, based upon a computational approach, of mathematical models featuring ODEs and DDEs4 in order to provide quantitative and qualitative consistency with biological observations has been discussed in [6,7]. In the latter papers, the objective has been to estimate scientifically meaningful parameters in DDEs that model phenomena for which experimental observations are available. We consider a model to be meaningful if it is both descriptive and predictive. Thus, we ask that it be consistent with previous observations and that it can predict future behaviour (e.g., the outcome of further observations or experiments) under moderately changed5 conditions. In this sense, a good model will reduce the need to perform certain experiments; however, the production of a good model depends on the availability of sufficient data of sufficient quality. Since one component in the choice of a model is the principle of parsimony, one expects to be able to assess whether (given the observations provided) a benefit from introducing a delay, or time-lag, into a model can be detected in the particular context under investigation. Baker et al. [8] set down a computational methodology (valid equally for ODEs and for DDEs) that will be illustrated below, in order to obtain and compare models for LCMV infection in mice for which data has been provided by three of the current authors. The multi-authorship of this report reflects the need for a multi-disciplinary approach to our area of study. The dependence on computational techniques points to the need for numerical analysts with practical skills to be a part of the investigative teams. Numerical methods for DDEs are considered briefly in Section 5.2; numerical methods for minimization are considered briefly in Section 5.3.

2. A maximum likelihood approach to model identification In this section, we relate the mathematical principles that we shall follow in our model identification. The two components of model identification (which are inter-related) are: (i) selection, from a range of forms, of a parametrized mathematical model, and (ii) computation with maximum confidence of the value of the parameter that yields a (numerically computed) best fit. An attempt to reconcile these two components can be sought using modern theories of information complexity [30]. We refer to the components p of the “parameter vector” p in (1.1) as “parameters”. The form of f is known, so that f is defined precisely if p is specified;  is an initial function (possibly parameter-dependent) and for a choice of p the values y(tj ; p) with components y i (tj ; p) will be expected to simulate data {yj } 4 It may be remarked that parameter estimation in DDEs has a long history, certainly going back to work of Banks and his

co-workers [16–20]. 5 A change from ‘normal conditions’ to pathogen-free conditions would be reflected by a change in the response mechanism (reflected in a need to adjust the model) and the observed data.

54

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

(observed at times6 tj ∈ [tstart , T ]) with components {yji } (i = 1, 2, . . . , M, j = 1, 2, . . . , N). Specific models are given in Section 4. In (1.1),  0, and if  > 0 it represents a time-lag. The methodology outlined here gives, inter alia, an indication whether a lag parameter  > 0 can be justified. We can be asked to identify parameter values in (t; p) as well in the equations. Throughout the paper, L is the number of parameters, M is the dimensionality of the state vector, and n denotes the sample size (the number of scalar observations, usually n = NM).

2.1. Maximum likelihood parameter estimation We follow a maximum likelihood approach, for the data sets or for the log-transformed data sets, that has its foundations in an information-theoretic approach to model selection, but in practice amounts to a type of least-squares fit. Our technique is based upon the ideas and methodology set out in Baker et al [8] and we shall not repeat here the arguments and motivation discussed at some length in [8,10]. The maximum likelihood formulation allows one to find the value of a parameter that maximizes the likelihood of obtaining exactly the observed data7 (see [3,21,36,44,47], etc.) using the computed parameters. The parameters are regarded as at our disposal; those parameters for which the likelihood is the highest are the “maximum likelihood estimates”. We do not pursue a possible link between maximizing likelihood and regularization. Remark 2.1. Our approach fits into a general framework, as follows. A general statistical framework for parameter estimation is the Bayesian approach. Under the assumption of a uniform prior distribution of parameter values, the Bayesian approach [21,36] reduces to a maximum likelihood estimation (MLE) [36]. The widely used least-squares technique (LSQ) is equivalent to MLE under the following set of assumptions: (i) the observational errors are normally distributed; (ii) equivalent positive and negative deviations from the expected values differ by equal amounts; (iii) the errors between samples are independent. Other powers of the deviation between model and data can be used depending on the error distribution, for example, the first power would correspond to an exponential distribution of the errors [36]. The key aspects of our procedure involve the following: • Assumptions about the statistical nature of the variation between one set of data and another; • Least-squares-type fitting, using an appropriate least-squares type of objective function, related to MLE;

6 With little amendment we can consider the case where different components {y i }Ni are associated with i-dependent times j j =1 Ni . {tij }j =1 7 The data is regarded as fixed (and assumed to have errors of a certain type).

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

55

• The calculation of indicators, incorporating a measure of parsimony, that provide a score of the merits or demerits (reflecting the information retained or lost when the model in question is used to approximate an—in some sense—‘best’ model for the data) of each model. We make certain assumptions: Assumption 2.1. The errors in observations at successive times are independent. Assumption 2.2. The errors in the observed data are assumed to have a Gaussian distribution about the vectors {y(tj ; p)}N j =1 , that is yj ∼ N(y(tj ; p), j ), where j is the jth covariance matrix. Under Assumption 2.2, the component probability density functions are given by  N       1 1 T −1 H(yj ; p) =  exp − [y(tj ; p) − yj ] j [y(tj ; p) − yj ] .   2   (2)M det j

(2.1)

j =1

Under Assumption 2.1 the likelihood function is then given by L(p) =

N

H(yj ; p),

(2.2)

j =1

where H(yj ; p) appears in (2.1). Assumption 2.3. The errors in the components of yj are assumed to be independent Define WLS (p) ≡ [y(tj ; p) − yj ]T −1 j [y(tj ; p) − yj ].

Under Assumption 2.3, we are led to define [j ]

[j ]

[j ]

j = 2 {diag2 [1 , 2 , . . . , M ]}

(2.3)

(where 2 denotes the data variance coefficient); then, WLS (p) ≡ −2 LS (p),

where LS (p) =

j

(2.4a)

[j ]

[j ]

[j ]

diag−1 [1 , 2 , . . . , M ][y(tj ; p) − yj ]2 .

(2.4b)

56

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

The maximum likelihood estimate of the model parameters provides an optimal estimate of the data variance coefficient as 1 1 [j ] [j ] [j ] 2 = diag−1 [1 , 2 , . . . , M ][y(tj , p) − yj ]2 = LS ( p), (2.4c)

NM NM j

where p maximizes the likelihood, and then (denoting the natural logarithm by n(·))    1 [j ] p) = −

n(i ) NM n(2) + NM + 2

n L(  2 i,j

1 − {NM n( LS ( p)) − NM n(NM)}. (2.5) 2 p) is maximized when LS ( p) is minimized (equivalently, when WLS p) is minimized) and Clearly, L(

( 2 2

 is assigned the value  in (2.4c). We seek a best-fit parameter p for (1.1) for which the corresponding i i=1:M values {y i (tj ; p)}ji=1:M , =1:N provide a ‘best fit’ to the given data {yj }j =1:N in the sense that LS ( p) = min LS (p).

(2.6)

p

2.2. Least-squares type objective functions As indicated above, a key element is the least-squares fitting which involves the informed selection of a least-squares objective function LS ( p) (and this corresponds to a choice of ). This entails making the connection between “the most likely” parameters and the “best fit” parameters by ascertaining a criterion for “best fit”, that is by choosing an appropriate . In general least-squares type data fitting, one encounters, in particular, three types of objective function i=1:M i (p), which depend upon the given data {tj ; yji }N j =1 (for i = 1, . . . , M) and the values {y (tj ; p)}j =1:N of the solution y(t; p) of the parametrized model, e.g.(1.1). These three types (ordinary least-squares, weighted least-squares, and log-least-squares objective functions) correspond to: OLS (p) =

M N j =1 i=1

WLS (p) ≡ 

−2

[y i (tj ; p) − yji ]2 =

LS (p),

N

y(tj , p) − yj 2 ,

(2.7a)

j =1

where LS (p) =

M N j =1 i=1

[j ]

{i [y i (tj , p) − yji ]}2

(2.7b)

(when locating p, the scaling factor −2 in the objective function is not relevant) and Log LS (p) =

M N j =1 i=1

[ n(y i (tj , p)) − n(yji )]2 .

(2.7c)

(Variants of these are possible.) To use (2.7c), it will be assumed that yji > 0 and that y i (tj ; p) > 0. As observed by Gingerich [37], the objective functions ((2.7) above correspond to maximum likelihood functions under differing assumptions. Thus, (i) (2.7b) (of which (2.7a) is a special case) is arrived at from

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

57

an assumption of arithmetic normality of observational errors, in which equivalent positive and negative deviations from expected values differ by equal amounts; (ii) (2.7c) is associated with an assumption of geometric normality of observational errors (in which equivalent deviations differ by equal proportions). The use of the term “geometric normality” refers to the errors being “log-normal”. If we adopt a weighted least-squares approach, as in (2.7b), the choice of  in LS (p) can be based on a natural assumption: [j ]

Assumption 2.4. We take i = {yji }−1 . This implies that the variance increases with the expected value but the coefficient of variation remains constant. 2.3. A scoring mechanism Given a family of models (each one with a best-fit set of parameters), the question is how to rank them by giving each a score(thus arriving at a hierarchy of parametrized models). The goodness of fit associated with parameter estimates p can be characterized, when one has confidence in the form of the model, by the size of an objective function  ( p). This is the data-fitting approach, and here  p may be an approximation (however obtained) to p such that  ( p) = minp  (p). Thus, one criterion by which to judge a model may be the size of  ( p) (see [28,43]). However, if there is a number of candidate models, our task is not simply to identify one with the smallest objective function but to incorporate other criteria for discriminating between models of differing complexity. There are (information-theoretic) criteria, such as the Akaike, Schwarz, and Takeuchi information criteria8 and generalizations related to informational complexity of models, which depend not only upon the maximum likelihood estimation bias [1,30,52] but incorporate the number of parameters and the number of observations in a quantitative evaluation of different models. Burnham and Anderson [30] review, as a natural basis for model selection, both the concept of K–L information and maximum likelihood. For the Akaike and the corrected Akaike criteria, the indicators are the size of the measures AIC and cAIC given by AIC = −2 n L( p) + 2(L + 1), cAIC = −2 n L( p) + 2(L + 1) +

(2.8a) 2(L + 1)(L + 2) n−L−2

with n = NM,

(2.8b)

respectively (see [30]). These indicators are expressed in terms of the MLE L( p). There are L + 1 parameters being estimated, comprising p1 , p2 , . . . , pL and , since we currently assume that a single value , which we also estimate, characterizes all the variances. The advice quoted by Burnham and Anderson [30] is that (2.8a) is satisfactory if n > 40(L + 1), otherwise (2.8b) is preferred by these authors. As n → ∞, cAIC → AIC . 8 The Akaike criterion is based upon the Kullback–Leibler notion of information or distance between two probabilistic models (information loss) [41] approximated using the maximum likelihood estimation [1,30].

58

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

Our interest is in the relative size of the indicators; thus (omitting technical details) it is convenient to discard extraneous terms and employ the revised indicators ˘ AIC = n n(LS ( p)) + 2(L + 1),

(2.9a)

2(L + 1)(L + 2) , n−L−2

(2.9b)

˘ cAIC = ˘ AIC +

where n( ) denotes the natural logarithm, n = NM (the sample size),

and L is the number of parameters estimated.

(2.9c)

2.4. Confidence intervals A number of approaches exist to characterize the confidence in the best-fit parameter estimates, of which we name only three major ones: (a) the variance–covariance matrix based technique [21, Chapter 7] (which requires that the objective function be ‘small’), (b) the profile-likelihood-based method [53], and (c) the bootstrap methods [34] (whose validity has been challenged from a probabilistic aspect [57]). Approximate confidence regions (e.g. 95% confidence) for every component parameter p of the optimum vector p ≡ [p1 , p2 , . . . , p , . . . , pL ]T can be obtained using the profile likelihood method

of the optimal parameter vector [53], as follows: Suppose that we concentrate upon the th component p

2 , . . . , p

−1 , p

, p

+1 , . . . p

L ]T . We search for the interval [p min , p max ] of maximal width and p = [ p1 , p

such that, when we define the hyper-plane containing p S (p) := {[p1 , p2 , . . . , p −1 , p, p +1 , . . . , pL ]T |p

fixed }

ˇ = maxp ∈ S (p)L(p) then and L (p) ˇ − n(L( | n(L (p)) p))|  21 21,0.95

whenever p ∈ [p min , p max ],

(2.10)

where 21,0.95 stands for the 0.95th quantile of the 2 -distribution for 1 degree of freedom (see [3]). Using the relationship (2.5) between the MLE and the least-squares estimation we obtain ˇ − n(LS ( NM| n(LS (p)) p))|  21,0.95

whenever p ∈ [p min , p max ].

(2.11)

For our particular data set (see Table 1) the total number of scalar observations (NM in the above formula) is 15, and the tabulated value of 21,0.95 is 3.841. Hence, the final expression for computing the 95% confidence interval reads ˇ − n(LS ( p))|  | n(LS (p))

3.841 . 15

(2.12)

Computational aspects of our study are discussed in Section 5; see in particular Remark 5.3. Remark 2.2. A comparison of variance–covariance, profile-likelihood and bootstrap methodologies (each of which is justified under particular assumptions) is planned by the authors for a later paper.

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

59

Table 1 Data set for the virus and cytotoxic T lymphocyte kinetics in the spleen after systemic infection with 200 pfu of LCMV-WE Time (days)

1 2 3 4 6 8 10 12 14

V (t)

E(t)

Set 1—virus population (pfu)

Set 2—virus population (pfu)

Set 1—CTL population (cells)

Set 2—CTL population (cells)

3.55 × 104 5.0 × 105 3.8 × 106 3.2 × 106 3.7 × 104 3.1 × 104 2.5 × 102 2.0 × 102 b.d.l.

1.20 × 104 1.6 × 106 3.9 × 106 2.1 × 106 1.25 × 105 2.6 × 104 8.0 × 104 7.5 × 102 b.d.l.

b.d.l. b.d.l. b.d.l. b.d.l. 8.33 × 105 4.75 × 106 4.16 × 106 3.07 × 106 2.22 × 106

b.d.l. b.d.l. b.d.l. b.d.l. 9.85 × 105 4.03 × 106 5.8 × 106 2.25 × 106 2.89 × 106

b.d.l. means below the detection limit.

3. Experimental LCMV Infection 3.1. Immune response to infection The infection of a mouse with lymphocytic choriomeningitis virus (LCMV) provides a basic experimental system used in immunology to address fundamental issues of virus–host interaction [59]. The infection results in the activation of immune responses and clonal burst [29] of virus-specific cytotoxic T-lymphocytes (CTL). We note that Ehl et al., [35] observed that “The use of a well-characterized murine infectious disease, which has been shown to be almost exclusively controlled by CTL-mediated perforindependent cytotoxicity, provides an exceptionally solid basis for the formulation of [models]”. At discrete times, it is possible to measure, experimentally, (i) the amount of the virus, measured in plaque forming units (pfu), and (ii) the virus-specific CTL (measured in the number of cells found per spleen).9 3.2. The experimental framework and the observations In general, it is possible that data comes from a single experiment, or that the data arises from several experiments or a series of observations. Our mathematical models rely upon data being of a certain type: we assume the mean values of data are, at each time, normally or log-normally distributed, and independent. The experimental data is provided in Table 1. It was obtained as follows. A batch of genetically identical C57BL/6 mice were infected with 200 pfu (plaque forming units) of LCMV (WE strain), delivered intravenously. Viral titers in spleens were determined at days 1, 2, 3, 4, 6, 8, 10, 12 and 14 days postinfection and the clonal expansion of CTL cells specific for the gp33 epitope in spleens was assessed using 9 Some modellers introduce as a variable the amount of virus-specific memory CTL, a subset of (ii) that is harder to quantify reliably.

60

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

tetramer analysis (see Table 1). The techniques are standard (see for example [2,22]). At the indicated time-points after infection, two mice were bled and single cell suspensions were prepared of spleen, prior to the determination of absolute cell counts using FACS and Neubauer equipment. An important feature is that the mice were genetically identical, produced by inbreeding. Inbred strains reduce experimental variation; their immune responses can be studied in the absence of variables introduced by individual genetic difference. When the mice are genetically identical, it is argued that large numbers of mice are not required and the mean obtained represents the mean of a larger set of data. This assertion merits closer examination and testing, but we proceed on the basis that it is correct. For reliable parameter estimation it is useful to have an idea of the CTL kinetics at times earlier than 6 days post infection—before the virus population starts to decrease. The quantity of virus-specific CTL below 5000 cell/spleen cannot be detected using the tetramer technique. Our experience (arising from numerous studies with the LCMV system) suggests that after injection of 200 pfu of LCMV the proliferating CTLs should reach the detection threshold at about two and a half days. This evidence was considered in the parameter estimation, by supplementing Table 1 with a CTL reading (representing the least possible detection level) at day 2.5. The detection threshold for LCMV in the spleen is about 100 pfu. LCMV-WE dropped below the detection threshold by day 14; however, it is believed that the virus still persisted below the detection level for some time. To ensure that the LCMV number in the model remains below the detection threshold, between days 12 and 14 we supplement the data with an assumption that the virus quantity on day 14 was 10 pfu/spleen.

4. Hierarchy of mathematical models A priori immunological and mathematical knowledge enter the models in the form of simplifying assumptions. Potentially, the interaction between virus and immune system can be described by multiple mechanisms and considering various sets of differential equations. One may thus argue that different mechanisms and their functional forms might equally well describe the data set and the goodness-of-fit (i.e., the maximized likelihood function) is not sufficient to judge whether the model is correct. It has been observed elsewhere that the maximum likelihood principle leads to choosing the models with higher possible complexity (corresponding to more parameters) [52]. If there is a number of candidate models, our task is not simply to identify the one with the smallest objective function but to consider the principle of parsimony in model evaluation, and the maximum use of information implicit in the data. We suggest a hierarchy of mathematical models that were distilled from the existing literature. The mathematical models for the virus–CTL interaction in LCMV infection are defined within a set of two- or three-dimensional ODEs or DDEs for the evolution of the virus, V(t), and virus-specific CTL (activated and memory cells—E(t), Em (t)) population dynamics. The equation for the rate of change of the virus population is the same for all the models and is based upon a Verhulst–Pearl logistic growth term and second-order elimination kinetics. The models differ in the way the immune response is described—an issue of some controversy in today’s mathematical immunology. Specifically, the models differ with respect to the following building blocks: (1) virus-dependent CTL proliferation (basic predator–prey versus the Holling type II response); (2) whether a time-lag in the division of CTL (cell division time) is included;

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

61

Table 2 Biological definition of the model parameters for virus–CTL dynamics in the spleen during primary LCMV infection Parameter (units) The units are d (days), pfu (plaque forming units)

Notation

Virus exponential growth rate (d−1 ) Carrying capacity for the virus (copies/spleen) Virus elimination rate (1/copy/d) CTL stimulation rate (1/copy/d, M1 ; d−1 , M2 –M5 ) CTL division time (d) Viral load for half-maximal CTL stimulation (copy/spleen) Death rate of CTL (d−1 ) Specific precursor CTL export from thymus (cell/spleen/d) Reversion activated CTL into the memory state (d−1 ) Death rate of memory CTL (d−1 )



K

bi

 Sat

E

T∗ rm

m

The spleen volume is estimated to be about 0.1 ml.

(3) consideration of homeostasis for naive CTL precursors; (4) whether a separate equation for the memory CTL is used. The death rate of CTL is assumed constant. Overall we consider the following five models that have their counterparts in the literature. See Table 2 for biological definitions of the parameters included in the model. Model 1 (M1 ): Simplest predator–prey consideration of the CTL dynamics   d V (t) V (t) = V (t) 1 − − V (t)E(t), dt K d E(t) = b1 V (t)E(t) − E E(t). dt Model 2 (M2 ): Virus-dependent with saturation CTL proliferation   d V (t) V (t) = V (t) 1 − − V (t)E(t), dt K d E(t) = b2 V (t)E(t)/( Sat + V (t)) − E E(t).    dt

(4.1) (4.2)

(4.3) (4.4)

A modification of model 1

Model 3 (M3 ): Virus-dependent with saturation CTL proliferation with time lag   d V (t) V (t) = V (t) 1 − − V (t)E(t), dt K d E(t) = b3 V (t − )E(t − )/( Sat + V (t)) − E E(t).    dt As in model 2 but incorporating delay

(4.5) (4.6)

62

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

Model 4 (M4 ): Primary CTL homeostasis   V (t) d V (t) = V (t) 1 − − V (t)E(t), dt K d E(t) = b4 V (t − )E(t − )/( Sat + V (t)) − dt

(4.7)

E E(t) + T ∗







.

(4.8)

includes additive term

Model 5 (M5 ): Additional equation for the population of memory CTL   V (t) d V (t) = V (t) 1 − − V (t)E(t), dt K

(4.9)

d E(t) = b5 V (t − )E(t − )/( Sat + V (t)) − E E(t) − rm E(t) + T ∗ , dt

(4.10)

d Em (t) = rm E(t) − m Em (t). dt

(4.11)

Observe that Model 1 is not a special case of Model 2 (the models are not nested); however, Models 2–5 are nested. Remark 4.1. The parameters in all the above models are meaningful only if they have non-negative values. This can be accommodated by, for example, writing p  2 for every non-negative parameter p , √ where p  = p , in an unconstrained optimization. (The parameters to be recovered remain the original {p }.) In this study, we take as initial data V (t) = 0, t ∈ [−, 0),

V (0) = V0 ;

E(t) = E0 , t ∈ [−, 0];

Em (0) = 0

(with t0 = 0) and V0 = 200

and

E0 = 265.

Here, V0 and E0 are the initial values for the dose of infection (measured in pfu) and the number of naive CTL (measured in cells). These parameters were considered to be fixed. The problem of identifying initial data is not addressed here (see [11], the references therein, and related work). 5. Computational methodology and numerical results 5.1. Description of numerical techniques Minimization of a functional dependent on the solution of a system of ODEs or DDEs is based on two components: (i) an ability to find the numerical solution of the ODE and DDE (and thereby evaluate the objective function) and (ii) an ability to find the parameters that provide a global minimum of an objective function when there are constraints on the parameters (e.g.,  0). The availability of codes, and their

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

63

suitability, depends upon the computer system (e.g., Windows or Linux) and the computer package or language (e.g., MATLAB or a version of FORTRAN). In similar works, a degree of “sanitization” sometimes occurs: it is rarely mentioned that the objective function that is to be minimized is not computed exactly10 (in our case, the solution of an ODE or DDE is computed using certain prescribed tolerances that govern the accuracy) (see Section 5.2). In the illustrations below, we shall present differing results arising from different tolerances. The user of a “black-box” code generally has the task of selecting tolerances that govern the accuracy of the computation. In practice, it is desirable to choose a tolerance sufficiently small that one has confidence in the parameters (it is possible to compute confidence intervals), but large enough that the calculations are tractable given the demand on computer memory and processing speed. In a well-written code, tolerances are input via a parameter list. In some codes, it could prove necessary to “take the cover off the black box” (thus invalidating any warranty from the author!!) in order to exercise the required options. 5.2. Numerical methods for ODEs and DDEs There is a wide choice of codes available for the numerical solution of ODEs. The situation for DDEs is less satisfactory. Most (if not all) codes for evolutionary problems proceed in an evolutionary mode, computing the solution over successive steps [tn , tn+1 ] using a “step” of length hn = tn+1 − tn . For DDE codes we refer to, e.g., • http://www.mathworks.com/access/helpdesk/help/techdoc/ref/dde23.html for MATLAB, • the code Archi, at www.ma.man.ac.uk/ hris/reports/rep283.pdf, for FORTRAN (and the related codes Archi-L, Archi-N) and • the code RETARD (the initial version to be found in [38]). The parameters in ODE and DDE solvers usually include an “error-per-step” tolerance (eps). Codes for ODEs are usually optimized for “non-stiff” problems or “stiff” problems; a few are type-insensitive. Robust DDE codes that can cope with behaviour akin to “stiffness” are, at the time of writing, wanting. A practical difficulty when solving DDEs arises from the need to store the “history” (an approximation over an interval of the form [t − , t]) which imposes large storage demands when the step-size is small relative to . 5.3. Numerical methods for optimization Computer codes available for optimization exist in MATLAB and in FORTRAN: see (a) http://www.mathtools.net/MATLAB/Optimization/ for MATLAB; (b) http://www.nag.co.uk/numeric/fortran%5Flibraries.html for NAg routines in FORTRAN; (c) http://www.sbsi-sol-optimize.com/manuals/SNOPT%20Manual.pdf, and http://www.sbsi-sol-optimize.com/asp/sol_product_snopt.htm. for SNOPT 10 It has to be remarked that the determination of error estimates, or confidence limits, can be as computationally expensive as the calculation of the approximation (possibly more expensive).

64

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

Table 3 Parameter

Significance of parameter

In Archi eps In LMDIF1 epsfcn ftol xtol

Error-per-step tolerance in the ODE or DDE solver The relative errors in the objective functions ≈ epsfcn Relative change in the estimated minimum Relative change in the estimated parameter value

Low accuracy calculation

High accuracy calculation

10−6

10−15

Default (0) 10−6 10−6

10−15 10−12 10−12

(invented by Gill, Murray and Saunders) which may be called from a driver program in FORTRAN, or MATLAB; (d) ftp://ftp.numerical.rl.ac.uk/pub/lancelot for the FORTRAN package LANCELOT developed at Rutherford [32] (free to academic users, subject to conditions) (see http://www. numerical.rl.ac.uk/lancelot/blurb.html). Optimization procedures are generally iterative and the tolerances specified by the user govern the successful conclusion of an iterative process for determining the minimum of an objective function: (i) ftol,11 governing the relative change in the estimated minimum value of the objective function, (ii) xtol, governing the relative change in the argument at which the estimate of the minimum is attained. A parameter that is sometimes hidden (but can be of prime importance) is that specifying the accuracy to which the objective function is computed. In this respect, one may need to exercise caution about default parameter values. Thus, LMDIF1 is a version12 of LMDIF in which the calling sequence is simplified through the use of default parameter values (the user is required to specify fewer parameters). In consequence, a call to LMDIF1 employs a default value (zero) of epsfcn. Our experience was transformed beyond recognition when the parameter epsfcn was set to an appropriate (non-default) value. Similar experience is likely with other codes. The calculations in our illustrations are based on the values in Table 3. Remark 5.1. Codes found in [48] are meant to facilitate parameter estimation. The code Archi-L includes a modified version of the optimization routine LMDIF; Archi-N invokes a NAg constrained minimization routine E04UNF [46] (in previous versions, E04UPF), to find the optimum parameter values. In our view,13 it is quite important to find (if possible) an optimization routine that, in searching for a solution to a constrained minimization problem, does not violate constraints that are essential to the code that evaluates the objective function. Thus, (i) DDE-solvers should not be expected to obtain a solution if  is assigned a negative value, (ii) a problem in which the parameters can give rise to symptoms of stiffness may require different treatment from one that cannot. Where the optimizing routine does not have this 11 We use names employed in LMDIF. 12 See http://www.netlib.org/minpack/lmdif1.f. 13 Others—we believe unwisely—take a more relaxed view.

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

65

Fig. 1. Contours suggesting that parameter estimation can be achieved successfully.

feature (and more generally), it is helpful to have good initial estimates of the optimum parameters. These may be obtained from the optimum parameters for a simpler model in the hierarchy, or (a time-consuming process not given to automation) by plotting contour plots corresponding to parameters taken in pairs. Contour plotting is valuable when what is believed to be an optimum value has been located, as it allows a check to be made to ascertain the sensitivity of the objective function in the neighbourhood and to distinguish local and global minima. Remark 5.2. A multitude of apparent local minima can be a consequence of too large an error tolerance in the ODE/DDE solver. (Compare the minimum of x 2n with the minima of x 2n +  sin(mx) where n, m are integers and  corresponds to the magnitude of a perturbation.) 5.3.1. Contour plotting as an aid: potential problems In practice, success in identifying parameters is related to the shape of the objective function, in terms of the parameters. The examples displayed here relate to actual models for the data presented. A visual assessment can be obtained from graphical displays of the behaviour of (p). Thus, successful minimization is relatively easily obtained when the objective function has a contour plot appearing like that in Fig. 1. However, a valley-type behaviour (Fig. 2, left-hand contour) implies a high correlation between corresponding parameters. The left-hand contour plot in Fig. 2 indicates that when Sat and b4 are appropriately related the objective function changes very little (the values of the residuals on all the included contours are identical to four decimal places). Actually, a large change in b4 with a related large change in Sat can result in a very small change in the objective function.14 On the other hand, the right-hand contour in Fig. 2 shows a cluster of local minima near the global minimum. These apparent 14 This example illustrates that the two parameters are not separately identifiable. The issue of a priori identifiability is addressed in [4].

66

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

Fig. 2. Contour plots. Left: Model 4, log-least-squares (vertical axis Sat , horizontal axis b4 ). Any combination of parameter components corresponding to the valley floor is equally effective at providing a small objective function. Right: Model 1, ordinary least-squares. Numerous local minima (which are spurious; see the text) cause a problem in the computation of best-fit parameter estimates. Vertical axis K, horizontal axis .

local minima are spurious, in the sense that they arise from using too large a tolerance when solving the differential equations. Here, it is quite feasible that differing initial estimates of the parameters (employed to start the search for optimal values) could produce any one of the local minima. 5.4. Ordinary least-squares objective function OLS (p) Parameter estimation results obtained using ordinary least-squares approach for Models 1–5 are summarized in Table 4. An increase in the number of model parameters provides a better description of the data in terms of the minimized value of the objective function. However, the increasing values of the corrected Akaike index indicate a gradual information loss for the given data set, as the complexity of models increases. Variation of the best-fit parameter estimates between the models is within ±10%, except for the estimate of Sat . Further, the data set does not provide a biologically correct estimate of the time lag of cell division . (Rather, the delay estimate obtained via ordinary least-squares corresponds to a realistic duration of some stage of the cell cycle.) Visual inspection of graphs of V (t) and E(t) suggests that Model 1 nicely approximates the viral load data, but rather poorly approximates the CTL data (Fig. 3a). The other models describe much better the CTL kinetics at the expense of a somewhat poorer agreement with the virus data (V (t)). 5.5. Weighted least-squares objective function LS (p) j

We used LS (p) (see (2.7b)) with the weights i = {y ij }−1 obtained using the means y ij of the data in Table 1. The best-fit parameter estimates are summarized in Table 5. Again, an increase in the number of model parameters reduces bias in the description of the data but is associated with an increasing value

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

67

Table 4 Best-fit parameter estimates for ordinary least-squares and the corrected Akaike indicator Parameter

M1

M2

M3

M4

M5



4.44 × 100 3.99 × 106 3.02 × 10−6 1.23 × 10−6 — — 0.0 — — — 1.05 × 1013 472.2

4.36 × 100 3.23 × 106 3.48 × 10−6 1.92 × 100 2.46 × 104 — 9.14 × 10−2 — — — 4.49 × 1012 467.0

4.52 × 100 3.17 × 106 3.45 × 10−6 2.52 × 100 1.34 × 105 7.17 × 10−2 8.62 × 10−2 — — — 4.04 × 1012 475.4

4.52 × 100 3.17 × 106 3.48 × 10−6 2.41 × 100 1.31 × 105 8.98 × 10−2 9.1 × 10−2 1.24 × 102 — — 3.91 × 1012 488.9

4.50 × 100 3.19 × 106 3.63 × 10−6 2.40 × 100 1.15 × 105 9.54 × 10−2 9.31 × 10−2 1.40 × 102 5.17 × 10−3 2.55 × 10−1 3.78 × 1012 544.4

K

bi

Sat 

E

T∗ rm

m OLS ˘ cAIC

Calculations are based on LMDIF and the values eps = 10−6 , ftol = 10−6 , xtol = 10−6 , epsfcn = 0.

of the Akaike criterion. Interestingly, the weighted least-squares approach ensured biologically correct estimates of the mean cell division time, the delay  ranging from 5 to 7 h. Qualitatively, Models 1–4 provide a similar approximation of the viral load data whereas Model 5 fits the CTL data somewhat better than the other models from the set. The best-fit values of the objective function suggest (in each case) that the maximum likelihood estimate of the data variance coefficient (see (2.3) and (2.4c)) is about

2 ≈ 0.3–0.4. 5.6. Log-least-squares objective function Log LS (p) Parameter estimation results obtained using log-least-squares approach for Models 1–5 are shown in Table 6 . (We fit to data summarized as the means of the log-values rather than the log of the means.) An increase in the number of parameters does not affect the bias in the data description by Models 1–3 but it does provide a better description of the data when applying Models 4 and 5. The increasing value of the corrected Akaike indicator suggests a gradual information loss for the given data set as the complexity of the models increases. The log-least-squares strategy fails to identify reliably the parameters bi and Sat , as these appear to be highly correlated (see Fig. 2). From another viewpoint, the estimates of rm , m , the memory CTL turnover, appear to be biologically more consistent than those obtained with ordinary- or weighted least-squares. Again, the strategy does not support the need to consider a delay in the model for the given data set. All the models with parameters estimated following the log-least-squares strategy seem to capture the overall kinetics of the data quite well. 5.7. Computing with greater accuracy The calculations represented in Table 4 were checked by refining the parameters that govern the accuracy. The original values eps = 10−6 , ftol = 10−6 , xtol = 10−6 , epsfcn = 0 were replaced by eps = 10−15 , ftol = 10−12 , xtol = 10−12 , epsfcn = 10−15 , and the figures in Table 7 were then

68

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76 x 106

4

6

3.5

x 106

5

3 4

2.5 2

3

1.5

2

1 1

0.5 0

0 0

5

10

0

5

V (t)

(a) 4

15

10

15

10

15

E (t)

x 106

6

x 106

5 4 2

3 2 1

0 0 (b)

0 5

10 V (t)

15

0

5 E (t)

Fig. 3. Ordinary least-squares. (a) Model 1 computed using low accuracy (Table 4); (b) Model 2 computed using high accuracy (Table 7). Shown are the best-fit predictions (solid lines) for the viral load, V (t) and for the number of CTLs, E(t), and the mean values (* symbols) for the original data. Horizontal axes: 0  t  15; vertical axes: 0  V (t)  4 × 106 ; 0  E(t)  6 × 106 .

obtained. The refined tolerances have a noticeable effect on the parameter values and in consequence on the ranking of the parametrized models. Models with parameters computed with lower tolerance are ranked with respect to the information loss in the ranking: M2(best).M1.M3.M4.M5 with parameters computed to the higher tolerance we obtain the ranking M2(best).M3.M1.M4.M5. In both cases Model 2 (see Fig 3b) has the least Akaike index. The parameter Sat (which represents the viral load for half-maximal CTL-stimulation) does not occur in Model 1; in the high-accuracy figures

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

69

Table 5 Best-fit parameter estimates for weighted least-squares and the corrected Akaike indicator Parameter

Model 1 (M1 )

Model 2 (M2 )

Model 3 (M3 )

Model 4 (M4 )

Model 5 (M5 )



4.68 × 100 2.80 × 106 5.19 × 10−6 1.09 × 10−6 — — 0.0 — — — 6.36 × 100 50.2

4.67 × 100 2.79 × 106 7.29 × 10−6 2.31 × 100 3.12 × 105 — 10−32 — — — 6.06 × 100 57.0

4.60 × 100 2.86 × 106 9.30 × 10−6 2.48 × 100 1.90 × 105 0.291 × 100 0.0 — — — 5.99 × 100 66.9

4.59 × 100 2.83 × 106 9.99 × 10−6 2.02 × 100 9.65 × 104 0.207 × 100 3.30 × 10−7 8.27 × 102 — — 4.95 × 100 78.0

4.60 × 100 3.04 × 106 1.48 × 10−5 2.03 × 100 4.64 × 104 0.276 × 100 2.2 × 10−2 8.22 × 102 8.73 × 10−4 1.60 × 10−4 4.43 × 100 132.3

K

bi

Sat 

E

T∗ rm

m LS ˘ cAIC

Calculations are based on LMDIF and eps = 10−6 , ftol = 10−6 , xtol = 10−6 , epsfcn = 0.

Table 6 Best-fit parameter estimates for log least-squares and the corrected Akaike indicator Parameter

M1

M2

M3

M4

M5



5.49 × 100 9.45 × 105 2.12 × 10−6 2.05 × 10−6 — — 2.35 × 10−2 — — — 2.443 × 100 35.9

5.49 × 100 9.45 × 105 2.12 × 10−6 4.78 × 106 2.33 × 1012 — 2.35 × 10−2 — — — 2.443 × 100 43.4

5.49 × 100 9.45 × 105 2.12 × 10−6 4.78 × 106 2.33 × 1012 0.00 2.35 × 10−2 — — — 2.443 × 100 53.4

4.70 × 100 1.31 × 106 2.25 × 10−6 3.88 × 107 3.07 × 1013 0.00 1.39 × 10−2 1.345 × 103 — — 1.761 × 100 62.5

4.70 × 100 1.31 × 106 2.24 × 10−6 3.88 × 107 3.07 × 1013 5.60 × 10−3 1.31 × 10−2 1.343 × 103 1.10 × 10−2 9.10 × 10−3 1.764 × 100 118.5

K

bi

Sat 

E

T∗ rm

m Log LS ˘ cAIC

Calculations were based on LMDIF and eps = 10−6 , ftol = 10−6 , xtol = 10−6 , epsfcn = 0.

for Model 2 it is close to zero. Such a small value represents an effectively immediate response to the infection (what is considered to be a “programmed” response, in [33]), irrespective of the viral load (see Tables 7 and 8). If we use Model 5, the data does not support a biologically correct estimation of the memory cell life-span m . We note that in a similar parameter estimation study [33] the parameter m was assigned a value rather than estimated. Table 9 summarizes the analysis of the confidence in the best-fit parameter estimates for Models 1 –3 using high accuracy solutions and following OLS. The ranking of the models according to the Akaike

70

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

Table 7 Best-fit parameter estimates for ordinary least-squares and the corrected Akaike indicator when the higher accuracy numerical solution is used: Calculations were based on Archi-N with E04UNF, and eps = 10−15 , ftol = 10−12 , xtol = 10−12 , epsfcn = 10−15 Parameter

M1

M2

M3

M4

M5



4.61 × 100 2.70 × 106 1.39 × 10−6 9.22 × 10−7 — — 9.29 × 10−2 — — — 6.54 × 1012 465.1

4.51 × 100 4.69 × 106 8.04 × 10−5 1.42 × 100 0 (3.23 × 10−176 ) — 2.01 × 10−1 — — — 7.82 × 1011 440.8

4.62 × 100 5.01 × 106 3.29 × 10−4 1.14 × 100 8.79 × 10−6 4.38 × 10−2 1.02 × 10−1 — — — 1.60 × 1012 461.5

4.61 × 100 4.98 × 106 2.96 × 10−4 1.16 × 100 4.59 × 10−6 4.15 × 10−2 1.02 × 10−1 1.09 × 100 — — 1.60 × 1012 475.5

4.61 × 100 5.07 × 106 2.45 × 10−4 1.22 × 100 2.45 × 10−4 4.38 × 10−2 1.03 × 10−14 134 × 100 2.12 × 10−1 2.20 × 10−1 1.37 × 1012 529.2

K

bi

Sat 

E

T∗ rm

m OLS ˘ cAIC

Table 8 Best-fit parameter estimates for weighted least-squares and the corrected Akaike indicator when the higher accuracy numerical solution is used: Calculations were based on Archi-N with E04UNF, and eps = 10−15 , ftol = 10−12 , xtol = 10−12 , epsfcn = 10−15 Parameter

Model 1 (M1 )

Model 2 (M2 )

Model 3 (M3 )

Model 4 (M4 )

Model 5 (M5 )



5.14 × 100 1.23 × 105 1.90 × 10−6 1.37 × 10−5 — — 2.09 × 10−2 — — — 5.23 × 100 47.3

4.56 × 100 4.42 × 106 5.47 × 10−5 1.41 × 100 0 (9.19 × 10−77 ) — 1.09 × 10−1 — — — 5.36 × 100 55.2

4.57 × 100 4.46 × 106 7.93 × 10−5 1.41 × 100 3.49 × 10−13 1.76 × 10−2 1.09 × 10−1 — — — 5.15 × 100 64.6

4.60 × 100 3.27 × 106 2.14 × 10−5 2.34 × 100 4.854 × 104 5.18 × 10−1 1.09 × 10−1 1.663 × 103 — — 4.18 × 100 75.5

4.60 × 100 3.26 × 106 2.23 × 10−5 2.41 × 100 5.39 × 104 5.04 × 10−1 2.52 × 10−10 1.508 × 103 1.32 × 10−1 5.10 × 10−1 4.18 × 100 131.5

K

bi

Sat 

E

T∗ rm

m LS ˘ cAIC

criteria suggests that the least information loss should be the feature of Model 2 as compared to Models 1 or 3. The reader may compare the widths of the confidence intervals for the same parameter in differing models, displayed in Table 9. Other things being equal, a narrower interval is to be preferred. An unduly large confidence interval indicates that the parameter is unidentifiable for practical purposes.

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

71

Table 9 Estimates of 95% confidence intervals, for the best-fit high accuracy parameter estimates for Models 1–3 using ordinary leastsquares Parameter

M1

M2

M3



4.61 ×100 [4.00 × 100 , 5.23 × 100 ] 2.70 × 106 [2.28 × 106 , 3.00 × 106 ] 1.39 × 10−6 [1.17 × 10−6 , 1.71 × 10−6 ] 9.22 × 10−7 [7.96 × 10−7 , 1.01 × 10−6 ] — —

4.51 × 100 [4.23 × 100 , 4.76 × 100 ] 4.69 × 106 [4.20 × 106 , 5.20 × 106 ] 8.04 × 10−5 [7.54 × 10−5 , 8.58 × 10−5 ] 1.42 × 100 [1.40 × 100 , 1.43 × 100 ] 0 (3.23 × 10−176 ) [0, 1.2 × 10−164 ] If Sat = 0, (5.3) results. — — 2.01 × 10−1 [1.19 × 10−1 , 2.14 × 10−1 ] 7.82 × 1011 440.8

4.62 × 100 [4.43 × 100 , 4.85 × 100 ] 5.01 × 106 [4.62 × 106 , 5.45 × 106 ] 3.29 × 10−4 [3.20 × 10−4 , 3.31 × 10−4 ] 1.141 × 100 [1.134 × 100 , 1.143 × 100 ] 8.79 × 10−6 [5.25 × 10−6 , max ] where max  8 × 10−5 . 4.38 × 10−2 [4.24 × 10−2 , 4.43 × 10−2 ] 1.02 × 10−1 [1.01 × 10−1 , 1.15 × 10−1 ] 1.6 × 1012 461.5

95% CI K 95% CI

95 % CI bi 95% CI Sat

95% CI 

95% CI

E

95% CI OLS ˘ cAIC

— — 9.29 × 10−2 [4.84 × 10−2 , 1.73 × 10−1 ] 6.54 × 1012 465.1

Estimates were computed using Archi-N (see Remarks 5.1 and 5.3), with E04UNF, eps = 10−15 , ftol = 10−12 , xtol = 10−12 , epsfcn = 10−15 as in Table 7.

Remark 5.3. The evaluation of confidence intervals for parameter estimates is easily described but can be computationally difficult. • The process is often computationally highly expensive. This is particularly the case where the interval is large; asking for 67% confidence rather than 95% confidence will reduce the length of the interval. • Even where the computed maxima in (2.11) are known to vary smoothly as p varies in [p min , p max ], we have found that the numerical approximations may display spurious non-smooth behaviour. This artefact of the numerical procedures appears to be associated with the properties of the optimization code. Where the derivatives of the objective function are estimated by differencing values of the objective function, one suspicion focusses on the fact that this can magnify errors (since the objective function is not evaluated exactly). This can be avoided by customizing the optimization procedure. • We have found little difficulty when using the NAg minimization routine E04UNF [46] (apart from the demands for computer time), provided the positivity constraints are incorporated as indicated in Remark 4.1. The use of an output parameter IFAIL in E04UNF allows monitoring of “soft failure” as well as “hard failure”. The numbers in Table 9 were obtained with this optimization code, in Archi-N. We recalculated those estimates using LMDIF1 in Archi-L, and obtained the same optimum parameters but somewhat different confidence intervals. • The optimization codes proceed iteratively and the starting estimate may affect the outcome. Confidence intervals are produced by investigating a range of values p and optimizing over the remaining parameters. It is a reasonable strategy to take the optimum parameters for p as a starting estimate

72

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

for the optimum values for p + p . Values in Table 9 were obtained with this strategy. However one can envisage situations where the optimization code was not sufficiently robust to compute a global minimum. Further investigation into robust optimization codes, and their useage, appears to be suggested. The Akaike indices suggest that the best model is Model 2 (M2 ). One might therefore anticipate that M2 would have the smallest confidence intervals for the best-fit parameter estimates. In reality, it appears that M3 is characterized by smaller confidence intervals. This might be a consequence of a number of factors, for example (i) the Akaike indices provide approximate indicators, (ii) one needs to check whether the differences in the estimated Akaike indicators are statistically significant, given the errors in the data. As stated above, the values of the Akaike indicators suggest that for the given data set the model which ensures the least information loss is Model 2:   d V (t) V (t) = V (t) 1 − − V (t)E(t), (5.1) dt K V (t) d E(t) = b2 E(t) − E E(t). dt Sat + V (t)

(5.2)

For this model, the best-fit estimate for the viral load necessary for the half-maximal CTL division rate ( Sat ) is close to zero. If Sat = 0 and V (t) = 0, (5.2) becomes d E(t) = b2 E(t) − E E(t). dt

(5.3)

As long as a virus population is present in the host, the second equation of Model 2 can be replaced by the simplified linear form: d E(t) = dt

b2 H Sat E(t)  V  

− E E(t),

(5.4)

Modification suggested by the best−fit estimate

where the cell proliferation term on the right-hand side uses the Heaviside function HV Sat = 0

for V  V Sat ,

HV Sat = 1

for V  V Sat

as a characteristic value for V Sat one can take V Sat ≈ 1. It is remarkable that the best-approximating model for the typical data set for the LCMV-CTL population dynamics in primary infection appears to be the one which was introduced elsewhere in an ad hoc manner (see [33]). Biologically, the form of the proliferation term implies that the CTL response to a low-dose LCMV infection is a process regulated by the virus load in an “on” (full activation) and “off” (no activation at all) way.

6. Conclusions The practical identifiability of parameters in the mathematical models of immune responses depends on the choice of model and the choice of the data assimilation strategy (the choice of objective function)—and

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

73

also on the computational implementation. The size of an objective function is an insufficient criterion for accepting a model as “best” in an information-theoretic context. The need to formulate plausible mathematical models indicates a need for an understanding of the possible scientific theories and a thorough experimental analysis of the statistical nature of the variation in experimentally measured quantities. Small data sets do not support complex models with a number of parameters above some threshold as the variance in the parameter estimates increases. For the models presented in the above study this “loss of confidence” is manifested (indirectly) by an increase in the value of corrected Akaike indicator, suggesting that a loss of information takes place. We believe that dealing with a set of plausible mathematical models, ranked according to the informationtheoretic criteria, provides an acceptable basis for model selection and multi-model inferences in immunology. However, it must be accepted that in real-life the implementation of a strategy is not without practical difficulties, and there remain questions to be explored. Locating the minimum of an objective function in higher-dimensional space depends upon the accuracy to which the objective function is computed, and the inherent sensitivity of the objective function to variation in the parameters. Accuracy depends upon the versatility and robustness of codes for solving ODEs and DDEs, and it is unfortunate if the routine for minimization requires the solution of differential equations with unrealistic parameters. Sensitivity analysis has an important rôle in the modelling process: it provides insight into parameter-observation interdependence, sensitivity to model components, optimal data sampling strategy, to name just a few of them. For a general discussion of sensitivity analysis we refer to [50]. Parameter estimation in applied mathematical modelling receives considerable attention from numerical analysts these days, with a special focus on accuracy matching strategies, robustness of the codes with respect to the multiple local minima, etc. (see e.g. [39,42,45]). Our experience with real data assimilation with mathematical models (as opposed to cases when simulated noisy data are used) confirms that parameter identification in mathematical immunology is potentially an ill-posed problem. Therefore, various regularization strategies need to be examined [11]. We mention the earlier work on this important topic in the context of flow-cytometry data analysis by Bertuzzi et al. [24]. Our overall view is that modelling in immunology is an interdisciplinary activity that ideally requires expertise in numerical computation as well as in mathematical modelling and frameworks for model discrimination, in addition to scientific understanding and experimental and observational techniques resulting in reliable data. It is an area in which much remains to be achieved.

Acknowledgements We thank the referee and handling editor for their interest and the Leverhulme Trust, Russian Foundation of Basic Research (Grant 03-01-00689), and Alexander von Humboldt Foundation, for financial support received.

References [1] H. Akaike, A new look at the statistical model identification, IEEE Trans. Automat. Control 19 (1974) 716–723. [2] J.D. Altman, P.A.H. Moss, P.J.R. Goulder, D.H. Barouch, M.G. McHeyzer-Williams, J.I. Bell, A.J. McMichael, M.M. Davis, Phenotypic analysis of antigen-specific T lymphocytes, Science 274 (1996) 94–96.

74

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

[3] P. Armitage, G. Be´rry, J.N.S. Matthews, Statistical Methods in Medical Research, fourth ed., Blackwell Science, Oxford, 2001. [4] S. Audoly, G. Belllu, L. D’Angio, M. Saccomni, C. Cobelli, Global identifiablity of nonlinear biological systems, IEEE Trans. Biomed. Eng. 48 (2001) 55–65. [5] C.T.H. Baker, Retarded differential equations, J. Comput. Appl. Math. 125 (2000) 309–335. [6] C.T.H. Baker, G.A. Bocharov, C.A.H. Paul, Mathematical modelling of the Interleukin-2 T-cell system: a comparative study of approaches based on ordinary and delay differential equations, J. Theoret. Med. 2 (1997) 117–128. [7] C.T.H. Baker, G.A. Bocharov, C.A.H. Paul, F.A. Rihan, Modelling and analysis of time-lags in some basic patterns of cell proliferation, J. Math. Biol. 37 (1998) 341–371. [8] C.T.H. Baker, G.A. Bocharov, C.A.H. Paul, F.A. Rihan, Computational modelling with functional differential equations: identification, selection, and sensitivity, Appl. Numer. Math., in press. Available online 19th October 2004. [9] C.T.H. Baker, G.A. Bocharov, F.A. Rihan, A report on the use of delay differential equations in numerical modelling in the biosciences, MCCM Technical Report 343, University of Manchester, 1999, 45pp, ISSN 1360–1725. [10] C.T.H. Baker, G.A. Bocharov, F.A. Rihan, A report on the models with delays for cell population dynamics: identification, selection and analysis—Part I. MCCM Technical Report 425, University of Manchester, 2003, 28pp, ISSN 1360–1725. [11] C.T.H. Baker, E.I. Parmuzin, Identification of the initial function for delay differential equations: Parts I, II, III, MCCM Technical Report 431, 443 & 444, University of Manchester, 2004, ISSN 1360-1725. [12] C.T.H. Baker, E.I. Parmuzin, Analysis via integral equations of an identification problem for delay differential equations, J. Integral Equations Appl. 16 (2004) 111–135. [13] C.T.H. Baker, E.I. Parmuzin, Identification of the initial function for nonlinear delay differential equations, Russian J. Numer. Anal. Math. Modelling, 20 (2005) to appear. [14] C.T.H. Baker, C.A.H. Paul, Pitfalls in parameter estimation for delay differential equations, SIAM J. Sci. Comput. 18 (1997) 305–314. [15] C.T.H. Baker, C.A.H. Paul, Discontinuous solutions of neutral delay differential equations, Appl. Numer. Math., to appear. [16] H.T. Banks, Delay systems in biological models: approximation techniques, Nonlinear systems and applications, in: V. Lakshmikantham (Ed.), Proceedings of the International Conference, University of Texas, Arlington, TX, 1976, Academic Press, New York, 1977, pp. 21–38. [17] H.T. Banks, Approximation of delay systems with applications to control and identification. Functional differential equations and approximation of fixed points, in: H.-O. Peitgen, H.-O. Walther (Eds.), Proceedings of the Summer School and Conference, University of Bonn, Bonn, 1978, Lecture Notes in Mathematics, vol. 730, Springer, Berlin, 1979, pp. 65–76. [18] H.T. Banks, K.L. Bihari, Modelling and estimating uncertainty in parameter estimation, Inverse Problems 17 (2001) 95–111. [19] H.T. Banks, J.A. Burns, E.M. Cliff, Parameter estimation and identification for systems with delay, SIAM J. Control Optim. 19 (6) (1981) 791–828. [20] H.T. Banks, P.K.D. Lamm, Estimation of delays and other parameters in nonlinear functional differential equations, SIAM J. Control Optim. 21 (1983) 895–915. [21] Y. Bard, Nonlinear Parameter Estimation, Academic Press, New York, 1974. [22] M. Battegay, S. Cooper, A. Althage, H. Banziger, H. Hengartner, R.M. Zinkernagel, Quantification of lymphocytic choriomeningitis virus with an immunological focus assay in 24- or 96-well plates, J. Virol. Methods 33 (1991) 191–198. [23] A. Bellen, M. Zennaro, Numerical methods for delay differential equations, Oxford University Press, New York, 2003. [24] A. Bertuzzi, A. Gandolfi, R. Vitelli, A regularization procedure for estimating cell kinetic parameters from flow-cytometry data, Math. Biosci. 82 (1986) 63–85. [25] G.A. Bocharov, Modelling the dynamics of LCMV infection in mice: conventional and exhaustive CTL responses, J. Theoret. Biol. 192 (1998) 283–308. [26] G.A. Bocharov, B. Ludewig, A. Bertoletti, P. Klenerman, T. Junt, P. Krebs, T. Luzyanina, C. Fraser, R.M. Anderson, Underwhelming the immune response: effect of slow virus growth on CD8+ -T-lymphocyte responses, J. Virol. 78 (2004) 2247–2254. [27] G.A. Bocharov, F.A. Rihan, Numerical modelling in biosciences using delay differential equations, J. Comput. Appl. Math. 125 (2000) 183–199.

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

75

[28] J.A. Borghans, L.S. Taams, M.H.M. Wauben, R.J. De Boer, Competition for antigenic sites during T cell proliferation: a mathematical interpretation of in vitro data, Proc. Natl. Acad. Sci. USA 96 (1999) 10782–10787. [29] F.M. Burnet, The Clonal Selection Theory of Acquired Immunity, Cambridge University Press, Cambridge, 1959. [30] K.P. Burnham, D.R. Anderson, Model Selection and Inference—a Practical Information-Theoretic Approach, Springer, New York, 1998. [31] A.K. Chakraborty, M.L. Dustin, A.S. Shaw, In silico models for cellular and molecular immunology: successes, promises and challenges, Nat. Immunol. 4 (2003) 933–936. [32] A.R. Conn, N. Gould, P.L. Toint, LANCELOT: a Fortran Package for Large-scale Nonlinear Optimization, Springer, New York, 1992. [33] R.J. De Boer, M. Oprea, R. Antia, K. Murali-Krishna, R. Ahmed, A.S. Perelson, Recruitment times, proliferation, and apoptosis rates during the CD8+ T-cell response to lymphocytic choriomeningitis virus, J. Virol. 75 (2001) 10663–10669. [34] B. Efron, R. Tibshirani, Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy, Statist. Sci. 1 (1986) 54–77. [35] S. Ehl, P. Klenerman, R.M. Zinkernagel, G. Bocharov, The impact of variation in the number of CD8+ T-cell precursors on the outcome of virus infection, Cell. Immunol. 189 (1998) 67–73. [36] N.A. Gershenfeld, The Nature of Mathematical Modelling, Cambridge University Press, Cambridge, 2000. [37] P.D. Gingerich, Arithmetic or geometric normality of biological variation: an empirical test of theory, J. Theoret. Biol. 204 (2000) 201–221. [38] E. Hairer, S.P. NZrsett, G. Wanner, Solving Ordinary Differential Equations, I, Nonstiff Problems, second ed., Springer, Berlin, 1993. [39] W. Horbelt, J. Timmer, H.U. Voss, Parameter estimation in nonlinear delayed feedback systems from noisy data, Phys. Lett. A 299 (2002) 513–521. [40] C. Kesmir, R. De Boer, Clonal exhaustion as a result of immune deviation, Bull. Math. Biol. 65 (2003) 359–374. [41] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Statist. 22 (1951) 79–86. [42] G.I. Marchuk, Mathematical Modelling of Immune Response in Infectious Diseases, Kluwer Academic Publishers, Dordrecht, 1997. [43] A.R. McLean, M.M. Rosado, F. Agenes, R. Vascocellos, A.A. Freitas, Resource competition as a mechanism for B cell homeostasis, Proc. Natl. Acad. Sci. USA 94 (1997) 5792–5797. [44] I.J. Myung, Tutorial on maximum likelihood estimation, J. Math. Physiol. 47 (2003) 90–100. [45] U. Nowak, A. Grah, M. Schreier, Parameter estimation and accuracy matching strategies for 2-D reactor models, ZIB-Report 03-52, 2003, 13pp. [46] Numerical Algorithms Group The NAg FORTRAN Library http://www.nag.co.uk/numeric/Fortran_ Libraries.asp. [47] M.A. Pascual, P. Kareiva, Predicting the outcome of competition using experimental data: maximum likelihood and Bayesian approaches, Ecology 77 (1996) 337–349. [48] C.A.H. Paul, A User Guide to Archi, MCCM Report 283, University of Manchester. http://www.maths.man. ac.uk/∼chris/reports/rep283.pdf. [50] H. Rabitz, Chemical sensitivity analysis theory with applications to molecular dynamics and kinetics, Comput. Chem. 5 (1981) 167–180. [51] E. Renshaw, Modelling Biological Populations in Space and Time, Cambridge University Press, Cambridge, 1993. [52] G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6 (1978) 461–464. [53] D.J. Venzon, S.H. Mooogavkor, A method for computing profile-likelihood-based confidence intervals, Appl. Statist. 37 (1988) 87–94. [54] D. Verotta, F. Schaedeli, Non-linear dynamics models characterizing long-term virological data from AIDS clinical trials, Math. Biosci. 176 (2002) 163–183. [55] D.R. Willé, C.T.H. Baker, The tracking of derivative discontinuities in systems of delay-differential equations, Appl. Numer. Math. 9 (1992) 209–222. [56] D. Wodarz, P. Klenerman, M.A. Nowak, Dynamics of cytotoxic T-lymphocyte exhaustion, Proc. R. Soc. (London) Ser. B 265 (1998) 191–203. [57] D.H. Wolpert, The bootstrap is inconsistent with probability theory, in: K. Hanson, R. Silver (Eds.), Maximum Entropy and Bayesian Methods, 1995.

76

C.T.H. Baker et al. / Journal of Computational and Applied Mathematics 184 (2005) 50 – 76

[58] L.M.M. Wolters, B.E. Hansen, H.G.M. Niesters, R.S.L. Levi-Drummer, A.U. Neumann, S.W. Schalm, R.A. de Man, The influence of baseline characteristics on viral dynamic parameters in chronic hepatitis B patients treated with lamivudine, J. Hepatol. 37 (2002) 253–258. [59] R.M. Zinkernagel, Lymphocytic choriomeningitis virus and immunology, Curr. Top. Microbiol. Immunol. 263 (2002) 1–5.

Further reading [49] C.A.H. Paul, Archi FORTRAN listing, University of Manchester, http://www.maths.man.ac.uk/ ∼chris/software/.