Forcing Function Diagnostics for Nonlinear Dynamics

3 downloads 2068 Views 1MB Size Report
Department of Biological Statistics and Nonlinear Dynamics, Cornell University, ... I propose modeling lack of fit as a time-varying correction to the right-hand side of a proposed differential ... lack of fit in this manner allows us to graphically investigate model ... Recent research has seen a significant increase in interest in.
DOI: 10.1111/j.1541-0420.2008.01172.x

Biometrics 65, 928–936 September 2009

Forcing Function Diagnostics for Nonlinear Dynamics Giles Hooker Department of Biological Statistics and Nonlinear Dynamics, Cornell University, Ithaca, New York 14850, U.S.A. email: [email protected]

Summary. This article investigates the problem of model diagnostics for systems described by nonlinear ordinary differential equations (ODEs). I propose modeling lack of fit as a time-varying correction to the right-hand side of a proposed differential equation. This correction can be described as being a set of additive forcing functions, estimated from data. Representing lack of fit in this manner allows us to graphically investigate model inadequacies and to suggest model improvements. I derive lack-of-fit tests based on estimated forcing functions. Model building in partially observed systems of ODEs is particularly difficult and I consider the problem of identification of forcing functions in these systems. The methods are illustrated with examples from computational neuroscience. Key words:

Diagnostics; Goodness of fit; Neural dynamics; Nonlinear dynamics.

1. Introduction Recent research has seen a significant increase in interest in fitting nonlinear differential equations to data. Many systems of differential equations used to describe real-world phenomena are developed from first-principles approaches by a combination of conservation laws and rough guesses. Such a priori modeling has led to proposed models that mimic the qualitative behavior of observed systems, but have poor quantitative agreement with empirical measurements. There has been relatively little literature on the development of interpretable differential equations models from an empirical point of view. The need for good diagnostics has been cited a number of times, for example, in Ramsay et al. (2007) and associated discussions. This article considers the problem of performing goodnessof-fit diagnostics and model improvement when a set of deterministic nonlinear differential equations has been proposed to explain observed data. I describe the state of a system by a vector of k time-varying quantities x(t) = {x1 (t), . . . , xk (t)}. An ordinary differential equation (ODE) describes the evolution of x(t) by relating its rate of change to the current state of the system: d x = f(x; t, θ). dt

(1)

Here f : Rk → Rk is a known vector-valued function that depends on a finite set of unknown parameters θ. The inclusion of t as an argument to f in equation (1) allows this dependence to vary with time. I assume that the nature of this variation is deterministic and specified up to elements of θ. This variation is included to allow known external inputs to affect x, for example. Such systems have been used in many areas of applied mathematics and science. It has been shown that even quite simple systems can produce highly complex behavior. They

928

also represent a natural means of developing systems from first principles or conservation laws and frequently represent the appropriate scale on which to describe how a system responds to an external input. I describe such systems as being modeled on the derivative scale as opposed to on the scale of the observations. The central observation in this article is that because differential equations are modeled on the derivative scale, lack of fit for these equations is most interpretably measured on the same scale. I illustrate this with a thought experiment. Suppose we have continuously observed a system y(t) without error and have postulated a model of the form (1) to describe it. A directly interpretable measure of lack of fit is g(t) = dy/dt − f(y; t, θ). Throughout this article, I will maintain the convention that y indicates observations of a system, while x represents solutions to a differential equation used to describe it. In practice, we do not have continuous, noiseless observations of a system, but instead need to estimate y from a discrete set of data. Rearranging the equation above d y = f(y; t, θ) + g(t). dt

(2)

g(t), representing lack of fit, appears as a collection of what may be termed empirical forcing functions. The term empirical is applied here to distinguish them from known inputs already accounted for in the specification of f. These act as an external influence on y(t), which responds in terms of ddt y(t). The basic task of diagnostics now becomes the estimation of g(t). This may be complicated by knowledge (or lack of knowledge) about unmeasured components of y(t) and where fit may be lacking. Once g(t) has been estimated, the usual diagnostic procedures may be undertaken: the statistical significance of g(t) may be assessed and it may then be examined graphically for relationships with various measured variables. Any  C

2009, The International Biometric Society

Forcing Function Diagnostics for Nonlinear Dynamics d R = p7 + p8 V + p9 R, dt

929

(4)

for unknown parameters p 1 , . . . , p 9 . I observe that the form of equation (3) is invariant under linear transformations of V and R. I have used this property to derive initial guesses for the parameters by taking the parameters given in Wilson (1999) and transforming them to approximately match the peaks and troughs found in the data. Figure 1 presents a plot of a short sequence of these data along with the best fit from these equations. It is clear that while there is a qualitatively good agreement between the model and the data, some systematic lack of fit is evident. This article develops tests to determine the significance of lack of fit and visual tools for model improvement.

Figure 1. Zebrafish data. Dots provide observed values of transmembrane potential, lines the results of a least-squares fit of equations (3) and (4) to these data.

apparent relationships may then be accounted for by modifications to the model structure and the modified system reestimated. We note that equation (2) represents a stochastic differential equation if g(t) is regarded as being random. Stochastic processes represent a violation of the model specification (1), and make up part of the alternative hypothesis for the goodness-of-fit tests I describe below. However, distinguishing between g(t) being the realization of a stochastic process and being purely due to a deterministic misspecification of f is beyond the scope of this article. 1.1 Neural Spike Data To demonstrate the ideas from this article in practice, I examine the results of a common experiment in neurobiology. Figure 1 gives readings taken from a voltage clamp experiment in which transmembrane potential is recorded on the axon of a single zebrafish motor neuron in response to an electric stimulus. When a stimulus exceeds a threshold value, neurons respond by initiating the sharp oscillations that can be seen in Figure 1. There is a long literature on models that aim to describe neural behavior; see Wilson (1999) or Clay (2005) for an overview. These models range from very simple FitzHugh– Nagumo type systems (FitzHugh, 1961; Nagumo, Arimoto, and Yoshizawa, 1962) to highly complex models originating with Hodgkin and Huxley (1952) involving many ion channels that appear as dynamical components. The task I pose is to develop a tractable modification of simple models that provides a reasonable agreement with the shape of the spikes found in the data. As a first modeling step, I use a polynomial whose terms are motivated from equation (9.7) in Wilson (1999). There, the model was developed with the aim of mimicking the turning points of a physiological system. Our equations take the form: d V = p1 + p2 V + p3 V 2 + p4 V 3 + p5 R + p6 V R, dt

(3)

1.2 Outline This article begins with a discussion of estimating lack of fit, illustrated with examples, in Section 2. I develop computationally inexpensive approximate goodness-of-fit tests to be used as initial diagnostics in Section 3. Some identifiability problems and possible solutions to them are discussed in Section 4. I provide an example in the form of model development for the spiking behavior of zebrafish neurons in Section 5. I suggest further directions for research in Section 6. 2. Estimating Lack of Fit The central technique in the article is the estimation of a set of forcing functions g(t) as in equation (2). For real-world data, y(t) and its derivatives are not known exactly; rather we have k ,n i access to a set of noisy observations yobs = (yiobs j )i , j =1 , where obs y i j is a measurement of the ith component of the system at times t i j . The vector ti will be taken to be the collection of times at which the ith component of y is measured. We take i to be the corresponding estimated values y(t) = {yi (ti j )}ki ,,jn=1 of the system, and yobs − y(t) to be a (possibly weighted) norm between them. Where appropriate, yobs and y(t) will be regarded as vectors formed by concatenating the observations for each component of y(t). We note that it is common that some of the components of y have no observations. g(t) is represented as a basis function expansion: g(t)T = Φ(t)D,

(5)

where Φ(t) is a row vector containing the evaluation of a set of m basis expansions {φ 1 (t), . . . , φ m (t)} at time t. In the examples below, the φ i are given by cubic B-splines with equispaced knots. D is then a m × k matrix providing the coefficients of each of m basis functions for each of k forcing components. It will frequently be convenient to use d = vec(D) to denote the vectorized version of D formed by concatenating its columns. d may be estimated via any parameter estimation scheme for differential equations. As is the case for other methods based on basis expansions, the variance resulting from estimating a large number of coefficients d may be regulated either by limiting the number of basis functions or by directly penalizing the smoothness of g(t). The goodness-of-fit tests developed below correspond to the latter approach. However, in many systems, the numerical challenges in estimating g(t) already limit the number of basis functions that can be used.

Biometrics, September 2009

930

2.1 Estimating Parameters in a Differential Equation Parameter estimation techniques in differential equation models is a difficult topic with a large associated literature. Solutions to equation (1) typically do not have analytic expressions, necessitating the use of sometimes computationally intensive numerical approximation methods. Further, such solutions are given only up to a set of initial conditions x0 = x(t0 ). When x 0 is unknown, the set of parameters must be augmented to include it. This article largely aims to be generic in its approach and the diagnostic prescriptions here may be tied to any parameter estimation scheme. The initial tests for goodness of fit described in this article make use of a straightforward implementation of nonlinear least squares, described below. In this implementation, solutions x(t; θ, x0 ) to equation (1) are approximated for any parameter vector θ and initial conditions x 0 . A Gauss–Newton method is then used to estimate both. To make use of a Gauss–Newton solver, gradients are calculated by solving an enlarged system to include the sensitivity equations (see Atherton, Schainker, and Ducot, 1975; Leis and Kramer, 1988):

⎫ ⎧ ⎫ ⎧ x f(x; t, θ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪     ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∂f dx dx ∂f ⎬ ⎨ vec ⎬ ⎨ vec + d dθ ∂θ ∂x dθ = . dt ⎪  ⎪   ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎪ ⎭ ⎩ vec dx ⎪ ⎩ vec ∂f dx dx0





dt

i =1

Each row of equation (6) may be derived by a straightforward chain rule and the following initial conditions may also be given:



k  

d

(6)

∂x dx0

dx dxi = 0, = x(0) = x0 , dθ t =0 dx0, j t =0

This estimation scheme ignores the many numerical problems associated with parameter estimation in differential equations. These include controlling for numerical error in the Runge–Kutta scheme and the problem of minimizing over rough response surfaces. The author’s experience is that the scheme above provides good estimates when initial parameter estimates are quite close to the true value. However, it frequently breaks down when started from parameter estimates that are moderately far from the truth and either finds a local optimum or a set of parameters for which the ODE system is not numerically solvable. There are several methods available that may be used to overcome these problems. Stochastic search techniques such as simulated annealing (Jaeger et al., 2004) or Markov chain Monte Carlo methods (Huang, Liu, and Wu, 2006) have been employed to try to find global minima in the response surface. Other methods include the estimation of parameters while attempting to solve equation (1) at the same time; see, for example, Tjoa and Biegler (1991). This last approach has been used in Section 5. Alternative methods rely on presmoothing steps. Varah ˆ (t) to the data and (1982) proposes estimating a smooth y choosing θ to minimize:

1 if i = j (7) 0 otherwise.

2 yˆj (t) − fj (ˆ y(t); t, θ)

dt.

(10)

This method requires all components of y(t) to be measured with high resolution and precision and may still suffer from bias due to the smoothing method used. More recent techniques include intermediate methods such as Ramsay et al. (2007), which replace the penalty in equation (9) with equation (10), and Ionides, Bret´ o, and King (2006), based on particle filtering techniques.

Both x(t; θ, x0 ) and its derivatives with respect to θ and x 0 may be obtained by solving the expanded systems (6)–(7). There are many methods for approximating solutions to ODEs and their accuracy can depend on the nature of the system to be estimated; see Deuflhard and Bornemann (2000). The simulations below have been undertaken using a Runge– Kutta method implemented in Matlab. However, these were insufficiently accurate for the real-data example in Section 5, where details are given for specific methods that were employed. When the θ are replaced by d in equation (5), the second line of equation (6) becomes:

2.2 Example: A Forced Linear System As an initial experiment, I use a two-component linear differential equation:

∂f dy d dy = + Φ(t), (8) dt dd ∂x dd where Φ(t) is the block diagonal matrix with Φ(t) on the diagonal blocks. For the purposes of visualization, it is frequently useful to smooth the estimate of g(t). When a standard roughness penalty is placed on g, the penalized nonlinear least squares objective becomes

I assume the equation is known exactly and attempt to reproduce g nonparametrically from noisy data. The system was run with initial conditions (x 1 , x 2 ) = (−1, 1), observations were taken every 0.05 time units on the interval [0, 20] and Gaussian noise with variance 0.25 was added to the observed system. Figure 2 provides the result of estimating g(t) using the methods described above. To regularize the resulting  smooth, I used a second derivative roughness penalty, λ g (2) (t)2 dt as in equation (9). λ was chosen by minimizing the squared discrepancy between the estimated differential equations at the observation times and the true (noiseless) values of the

||yobs (t) − y(t)| |2 + λdT P d,

(9)

where P is a penalty matrix. This can be achieved simply by augmenting the set of squared errors in the Gauss–Newton solver.

 d dt

x1 x2



 =

−4

8

−4

4



x1





g(t)



+

x2

, 0

in which g(t) takes the form of a stepped interval:



5

if t < 5

0

otherwise.

g(t) =

Forcing Function Diagnostics for Nonlinear Dynamics

931

where eAt represents a matrix exponential. Including a set of lack of fit forcing functions of the form D T Φ(t)T now modifies equation (12) to





t

t

e−A s u(s) ds + eA t

y(t)T = eA t x0 + eA t t0

e−A s D T Φ(s)T ds t0

˜ T. ˜ (t) + D T Φ(t) = X(t)x0 + u

(13)

Thus for an additive observational error process, we may express the estimation problem for d in terms of a linear model with parametric effects for initial conditions x 0 : ˜ ˜ (t) + X(t)x0 + Φ(t)d yobs = u + .

(14)

obs

Figure 2. Experiments with a forced linear system. The true forcing component (solid) with the estimated component (dashed) and a forcing component derived from a generic smooth of the data (dotted). system. Setting yλ (t) to be the reconstruction of the system, I choose λopt = argmin λ

2 

2

{yλ , i (t) − xi (t)} dt.

(11)

i =1

These results have been compared to the simple expedient of ˆ (t) and plotting generating a smooth of the data y d ˆ (t) − f(ˆ y y(t); t, θ). dt In this case, the smooth was estimated by a spline with a third derivative penalty, following the recommendations of Ramsay and Silverman (2005) and the smoothing parameter chosen to minimize equation (11). The second method provides rough results due to the difficulty of estimating derivatives accurately from discrete, noisy data. We can understand this by observing that the bias in a smooth is related to the size of the derivative that is penalized as well as the penalty parameter. Nonlinear systems produce trajectories that can have large derivatives, reducing the optimal value of λ and increasing the roughness of the resulting estimate. In the first method, the choice of basis functions and roughness penalty has been deliberately made without reference to the structure of the experiment. These results could be improved significantly with the knowledge that the system disturbance was a square wave. Similarly, prior knowledge of the structure of functional misspecification, such as cyclic behavior, can be incorporated explicitly either in the selection of basis functions, or in the smoothness penalty that is used. 3. Goodness-of-Fit Tests for Linear Differential Equations I consider a system f(x, u) = Ax(t) + u(t), where A is a known k × k matrix and u(t) a known k-valued function. Under these conditions, exact solutions to equation (1) are given by:

 x(t) = e

At

x0 + e

t

e−A s u(s) ds,

At t0

(12)

˜ (t) is formed by concateHere, y is taken as a vector, u nating the u ˜i (ti ), X(ti ) is a block matrix with the (i, j)th ˜ is a block diagonal matrix with block being X i j (t i ), and Φ(t) ˜ diagonal blocks Φ(ti ).  represents a vector of independent observational errors. Because the size of d may be expected to grow with the number of observations, I regard equation (14) as a mixedeffects model with d being random effects adding the distributional assumptions: d ∼ N (0, τ 2 P ),  ∼ N (0, σ 2 I),

(15)

to equation (14). Here, P may be given by a roughness penalty for a basis expansion as in equation (9), or simply taken to be the identity. This is a direct extension of the random-effects treatment of smoothing splines (e.g., Gu, 2002). In practice, I have found that the numerical challenges involved in solving equation (6) restrict us to a sufficiently coarse basis that this choice of P makes little difference. A test for the significance of d is then equivalent to testing the hypothesis H0 : τ 2 = 0. This process was described in Crainiceanu et al. (2005). Exact finite-sample distributions for a likelihood-ratio test can be derived under the null hypothesis; these are detailed in the Appendix. A simulation study is presented in Figure 3. I used the same linear system as presented in Section 2.2. The height of the perturbation was varied between 0 and 2.5 and I simulated Gaussian errors with variance 0.5. Goodness of fit was assessed using a second-order B-spline basis on 21 knots across the interval [0, 20]. I compared the power of the test when the basis was taken as representing a forcing function as in equation (2) against the model y(t)T = x(t; θ, x0 )T + D T Φ(t)T ˜ (t) + X(t)x0 + D T Φ(t)T , =u

(16)

with distributional assumptions given by equation (15). I describe this model as representing lack of fit on the scale of the observations. Power was calculated using 1000 simulations in each case. It is apparent that the representation in terms of forcing functions does noticeably better. One common aspect of nonlinear dynamical systems is that only some components of x are measured. Moreover, the measured components may not be those for which system misspecification is most severe. In these situations, examining residuals (model 16) only allows us to examine those components

Biometrics, September 2009

932

ity equations (6). Setting θ˜ to be the full vector (x0 , θ), the approximation (17) becomes ˆ ≈ X ˆ (t)δx + X ˆ (t)δθ + Φ(t)d + , yobs − x(t; xˆ0 , θ) x0 0 θ

Figure 3. Power comparisons for testing lack of fit in a linear differential equation for lack of fit represented as forcing functions (solid lines) and as additive disturbances on the scale of the observations (dashed) as a forcing perturbation is varied (horizontal axes). Lines with circles represent power when both x 1 and x 2 are observed; lines without represent power when only x 1 is observed.

(18)

where Xxˆ 0 (t) and Xθˆ (t) represent the concatenations of dxi (ti )/dx0 and dx i (t i )/dθ row-wise across i when these ˆ They may be calˆ 0 and θ = θ. derivatives are taken at x0 = x culated from equation (6). Φ(t) is the block diagonal matrix ˆ is the vector with Φ(t i ) on the diagonal blocks and x(t; xˆ0 , θ) ˆ formed by concatenating xi (ti ; xˆ0 , θ). Equation (18) fits into the framework above. We note that representing the disturbance in this manner is not dissimilar to the Gaussian process errors model suggested in Kennedy and O’Hagan (2001). To use a model of the form (2), a further linearization is required. A generalization of the approximation (17) is given by ˆ 0) + y = g(t, θ, c) + ≈ g(t, θ,

dg dg δ+ c + . dθ dc

(19)

For linear differential equations, this is equivalent to replacing ˜ Φ(t) with Φ(t) in equation (18). For non-linear models it becomes



of x which are measured. Representing lack of fit as a forcing function allows it to be estimated for all components where it frequently has a simpler form. Moreover, it is possible to explore lack of fit for different components to estimate where it may be most severe. Power for a system in which x 1 is the only observed component is also given in Figure 3. Tests with Unknown Parameters Typically, some or all of the parameters in a differential equation model need to be estimated from data before a goodnessof-fit test may be applied. Moreover, in nonlinear dynamics, the coefficients of a basis representation for a forcing function do not influence the model linearly, even when the parameters are fixed. Crainiceanu and Ruppert (2004) suggest extensions of mixed-effects likelihood ratio tests to nonlinear models by linearizing the model about the estimated parameter values. In that paper, a nonlinear model was assumed to be of the form Y = g(t, θ) + Φ(t)c, placing lack of fit on the scale of the observations. The test proceeds by linearizing g about the esˆ providing the approximation timated θ,



ˆ + d g(t, θ) δ + Φ(t)c + , Y ≈ g(t, θ) ˆ dθ θ

(17)

ˆ via a standard linear where δ and θ are then fit to Y − g(t, θ) mixed-effects model. A test for the variance of the random effects can then be carried out. The notion is that the variance associated with estimating θˆ is partially accounted for by including the fixed effects δ in the mixed model estimation scheme. In differential equation models, the procedure may be carried out by estimating dx/dθ and dx/dc0 from the sensitiv-

ˆ ≈ X ˆ (t)δx + X ˆ (t)δθ + yobs − x(t; xˆ0 , θ) x0 0 θ

dy d + , dd d=0 (20)

where dd dy is the concatenation of the dyi (ti )/dd, which can be estimated by solving the sensitivity equation (8). Treating the linearized model (20) as a mixed-effects model with distributional assumptions (15) in the same manner as equation (14) now allows us to apply the methodology above to test that the variance of d is zero. To examine the behavior of this approximate test, a simulation study was conducted using data taken from equations (3)–(4) with parameters set to zero apart from (p 2 , p 4 , p 4 , p 7 , p 8 , p 8 ) = (3, −1, 3, 1/15, −1/3, −1/15). These were chosen to provide a system that resembled a sin curve. For each simulation, a linear differential equation was fit to the data and goodness of fit tested using the methodology above. Figure 4 presents the power of this proposed test as a function of the variance of observational noise. As in Figure 4, this is compared to conducting the test using model (17) and we observe a noticeable improvement in power at moderate noise levels. Linearization distorts the null distribution of the likelihood ratio test statistic and should be treated with some care. We note, however, that Wald-type tests and confidence regions based on nonlinear least squares have a very similar form (e.g., Bates and Watts, 1988). The advantage of the tests suggested here is that they may be performed before running an expensive optimization scheme to estimate g(t). Moreover, they allow us to investigate which components of x may suffer most from lack of fit. This may allow us to reduce the number of components for which g(t) is estimated, both reducing the size of the optimization problem and the complexity of visually exploring them.

Forcing Function Diagnostics for Nonlinear Dynamics

Figure 4. Power comparisons for testing lack of fit in a linear differential equation for lack of fit represented as forcing functions (solid lines) and as additive disturbances on the scale of the observations (dashed) as the amount of observational noise around a solution to the nonlinear differential equations (3)–(4) is increased. 4. Unmeasured Components and Lack-of-Fit Identifiability In nonlinear dynamics, it is typical that only some of the components of a system are measured. Data from voltage clamp experiments, for example, are only available for the variable V in equations (3)–(4). For deterministic systems, the lack of measurements for some components does not present a problem for parameter estimation because the unmeasured components have a direct impact on those that are measured. However, the presence of missing components does become a problem for the estimation of forcing functions. Consider a system described by states (y, x), where y has been continuously measured and x has not been measured. Suppose a differential equation f(y, x; t) has been proposed to   describe it and f(y, ·; t) : Rk → Rk is invertible for all (y, t). It can be shown that it is only possible to identify the same number of forcing functions as there are observed components; this is proved formally in the Appendix. Note that

933

identification here assumes a fixed y(t) that is known continuously. This would be the case if we were to first smooth data and then perform diagnostics as in Varah (1982). The problem of identification for discretely observed data involves the number of observations and the number of basis functions used and will not necessarily be easy to analyze. For a single observed component, it is possible to independently estimate a forcing function for each component in the system and perform model building by including the strongest observed relationships at each step. However, when  b out of k components are observed, estimating the kb possible combinations of forcing functions becomes computationally intractable and analyzing them becomes intellectually infeasible. It may therefore be preferable to impose some identifiability constraints and estimate all the components jointly. A standard method of doing this is to use a ridge-type penalty,  τi gi (t)2 dt, where augmenting equation (9) with a term λ the τ i are chosen according to the scale of the variables x i , or according to prior information about which components of x i are likely to be misspecified. We may regard this as the fully nonlinear equivalent of the model (13) and use the restricted maximum likelihood (REML) estimate for λ as a guide to the amount of penalization required. In the author’s experience, diagnostic plots are insensitive to the choice of λ and very similar to those provided by estimating each g i individually. 5. Zebrafish Data Figure 5 presents lack of fit as estimated by nonlinear least squares for the zebrafish data. For this model and data, a naive implementation of nonlinear least squares proved insufficient to estimate parameters or forcing functions. Instead, they have been estimated via a constrained optimization technique for collocation methods, implemented in the IPOPT routines (Arora and Biegler, 2004; W¨achter and Biegler, 2006). A complete description of the use of these methods for this application is given in Hooker and Biegler (2007). To estimate lack of fit, I used 200 linear B-splines in both variables. I have plotted the lack-of-fit functions for the observed variable V as a function of their estimated position in both variables. These plots make the difficulties of manually assessing lack-of-fit clear. The most evident relationship is between g V and (V , R), where there is a clear nonlinear dependence. However, because g V is only measured for (V , R)

Figure 5. Diagnostics for the Zebrafish data. Lack-of-fit forcing functions for V, plotted against estimated trajectories for V (left) and R (right), after fitting equations (3) and (4) to the patch clamp data in Figure 1.

Biometrics, September 2009

934

Figure 6. A fit to the zebrafish data using the system (3)–(4) including the modification (21).

on a one-dimensional curve in (V , R) space, proposing any form to fit this relationship inherently requires extrapolation. I propose that the flat line evident in the right-hand panel of Figure 5 indicated a regime transition, the model provides a relatively good fit on one side of the regime while different parameters are required on the other. Formally, this model was implemented by modifying equation (3) to d V = p1 + p2 V + p3 V 2 + p4 V 3 + p5 R + p6 V R dt + logit {(c1 + c2 R − V )c3 }



these tools provide useful insight into where a system of differential equations may be misspecified. I have also derived approximate tests for the significance of lack of fit and the need for further modeling. Exploratory model development is particularly difficult in dynamical systems. Lack of fit suffers from identifiability problems when the posited model has unmeasured components. This makes it important to exercise caution in exploratory model development. As exemplified by our attempts to improve simple neurological models, it is also important to consider the dynamical properties of the proposed modification. In particular, care needs to be taken to preserve the stability of limit cycles, bifurcation points, and other dynamical features of the model. Much further work is therefore warranted. Similarly, interpolation tools that allow lack of fit to be estimated in such a way that it maintains the stability of limit cycles would prove very helpful for visual diagnostics. In the realm of stochastic systems, it would also be useful to develop tests for independence between estimated forcing functions and the trajectory of the model. I anticipate work in this field to be challenging and exciting for a long time. 7. Supplementary Materials Matlab code for the simulations in this article is available from the Paper Information link at the Biometrics website http://www.biometrics.tibs.org. A full description of the fitting procedures and further diagnostics for the neural data are given in Hooker and Biegler (2007), available from http://www.bscb.cornell.edu/∼ hooker.

Acknowledgements 

× q1 + q2 V + q3 V 2 + q4 V 3 + q5 R + q6 V R .

(21)

The logistic term provides a smooth transition between the two regimes. The results of an implementation of this model with additional parameters estimated in an ad hoc manner by fitting them to the estimated forcing functions are given in Figure 6. Here, the general shape of the spike is clearly better resolved. However, it is also evident from this figure that the timings of the spikes in the data are not exact, sometimes occurring earlier and sometimes later than a perfectly cyclic system would produce. Optimizing parameters within this system produced a 100-fold decrease in total squared error. However, we note that the modified equations, with the estimated parameters, occur in a chaotic region. Small changes to c 3 , which controls the steepness of the transition, produce a system that alternates between an eventual fixed point near zero and a system that eventually diverges. Thus, additional analysis is required to satisfy the requirements of a realistic system. 6. Conclusions The most interpretable representation for a model’s lack of fit is on the scale that the model is given. For differential equations, it is appropriate to measure lack of fit on the scale of derivatives. This observation leads to the estimation of forcing functions as diagnostic tools. This article has shown that

The zebrafish data used in this study was provided by Joe Fetcho’s lab at the Biology Department in Cornell University. I am greatly indebted to Larry Biegler for the time and effort he spent in assisting me with the IPOPT system.

References Arora, N. and Biegler, L. T. (2004). A trust region SQP algorithm for equality constrained parameter estimation with simple parametric bounds. Computational Optimization and Applications 28, 51–86. Atherton, R. W., Schainker, R. B., and Ducot, E. R. (1975). On the statistical sensitivity analysis of models for chemical kinetics. AIChE Journal 3, 441–448. Bates, D. M. and Watts, D. B. (1988). Nonlinear Regression Analysis and Its Applications. New York: Wiley. Bellman, R. (1953). Stability Theory of Differential Equations. New York: Dover. Clay, J. R. (2005). Axonal excitability revisited. Progress in Biophysics and Molecular Biology 88, 59–90. Crainiceanu, C. M. and Ruppert, D. (2004). Likelihood ratio tests for goodness-of-fit of a nonlinear regression model. Journal of Multivariate Analysis 91, 35–52. Crainiceanu, C. M., Ruppert, D., Claeskins, G., and Wand, M. P. (2005). Exact liklihood ratio tests for penalized splines. Biometrika 92, 91–103. Deuflhard, P. and Bornemann, F. (2000). Scientific Computing with Ordinary Differential Equations. New York: Springer-Verlag. FitzHugh, R. (1961). Impulses and physiological states in models of nerve membrane. Biophysical Journal 1, 445–466.

Forcing Function Diagnostics for Nonlinear Dynamics Gu, C. (2002). Smoothing Spline ANOVA Models. New York: Springer. Hodgkin, A. L. and Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology 133, 444–479. Hooker, G. and Biegler, L. (2007). IPOPT and neural dynamics: Tips, tricks and diagnostics. Technical Report BU-1676-M, Department of Biological Statistics and Computational Biology, Cornell University. Huang, Y., Liu, D., and Wu, H. (2006). Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics 62, 413–423. Ionides, E. L., Bret´ o, C., and King, A. A. (2006). Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences 103, 18438–18443. Jaeger, J., Blagov, M., Kosman, D., Kolsov, K., Manu, Myasnikova, E., Surkova, S., Vanario-Alonso, C., Samsonova, M., Sharp, D., and Reinitz, J. (2004). Dynamical analysis of regulatory interactions in the gap gene system of drosophila melanogaster. Genetics 167, 1721–1737. Kennedy, M. and O’Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society, Series B 63, 425– 464. Leis, J. R. and Kramer, M. A. (1988). The simultaneous solution and sensitivity analysis of systems described by ordinary differential equations. ACM Transactions on Mathematical Software 14, 45– 60. Nagumo, J. S., Arimoto, S., and Yoshizawa, S. (1962). An active pulse transmission line simulating a nerve axon. Proceedings of the IRE 50, 2061–2070. Ramsay, J. O., Hooker, G., Campbell, D., and Cao, J. (2007). Parameter estimation in differential equations: A generalized smoothing approach. Journal of the Royal Statistical Society, Series B (with discussion) 16, 741–796. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. New York: Springer. Rheinbolt, W. C. (1984). Differential-algebraic systems as differential equations on manifolds. Mathematics of Computation 43, 473– 482. Tjoa, I.-B. and Biegler, L. (1991). Simultaneous solution and optimization strategies for parameter estimation of differential-algebraic equation systems. Industrial Engineering and Chemical Research 30, 376–385. Varah, J. M. (1982). A spline least squares method for numerical parameter estimation in differential equations. SIAM Journal on Scientific Computing 3, 28–46. W¨ achter, A. and Biegler, L. T. (2006). On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming 106, 25–57. Wilson, H. R. (1999). Spikes, Decisions and Actions: The Dynamical Foundations of Neuroscience. Oxford, U.K.: Oxford University Press.

Received October 2007. Revised July 2008. Accepted August 2008. Appendix Tests for Random Effects Crainiceanu et al. (2005) derive distributions for the restricted likelihood ratio test of a random effect variance under the null hypothesis that the variance is zero. Specifically, consider a linear mixed model

 



b

=

Y = Xβ + Zb + , cov

σb2 Σ

0

0

σ 2 In



,

935

where Y is a n-vector of responses to be regressed on a n × p predictor matrix X with a further set of random effects contained in the n × k matrix Z. I assume that the random effects b have covariance matrix σ 2b Σ for known Σ and the are independent N (0, σ 2 ). Redefining λ = σ 2b /σ 2 , we have cov(Y) = σ 2 V λ for V λ = I n + λZΣZ T . Twice the negative restricted log likelihood is then

 

REL(β, σ 2 , λ) = (n − p − 1) log σ 2 + log {det(Vλ )}





− log det X T Vλ−1 X +



(Y − Xβ)T V −1 (Y − Xβ) σ 2

and the restricted likelihood ratio test statistic for λ = 0 is









RLRTn = sup REL β, σ 2 , λ − sup REL β, σ 2 , 0 . β , σ 2 , λ

β , σ 2

Let P 0 = I n − X(X T X )−1 X and μ s and ξ s be the eigenvalues of Σ1/ 2 Z T P 0 ZΣ1/ 2 and Σ1/ 2 Z T Z Σ1/ 2 . Letting u s for s = 1, . . . , K and w t for t = 1, . . . , n − p − 1 be independent N (0, 1) variables and Nn (λ) =

K

i =1

n −p −1 K

λμs ws2 2 ws , Dn (λ) = + ws2 , 1 + λμs 1 + λμs s =1

s =K +1

RLRT n is equal in distribution to

λ 0

 (n − p − 1) log

1+

Nn (λ) Dn (λ)

 −

K

log(1 + λμs ).

s =1

Although this appears complex, the distribution is easily simulated. Lack-of-Fit Identifiability Section 4 explored identifiability for lack-of-fit functions for systems described by two variables. The extension to more complex settings cannot be made so directly. I examine this setting by breaking the system into components which are or are not measured, and which are or are not forced. In the following theorem they will be given distinct labels: Theorem 1: Let x0 (t) and j0 (t) have dimension k0 , x1 (t), y(t) and j(t) have dimension k 1 for each t ∈ [0, 1], and let z(t) have dimension k 2 . Let f0 (x0 , x1 , y, z; t), f1 (x0 , x1 , y, z; t), g(x0 , x1 , y, z; t), and h(x0 , x1 , y, z; t) be Lipschitz continuous in their arguments, h be bounded and let f1 (x0 , x1 , y, z; t) have a Lipschitz continuous inverse fy−1 (x0 , x1 , w, z; t) in y for each x0 , x1 , z, t. For fixed, continuously differentiable x0 (t) and x1 (t), let y0 , z0 be such that x˙ 10 = ddt x1 |0 = f(x00 , x10 , y0 , z0 ; 0), the equations d 0 x = f0 (x0 , x1 , y, z; t) + j0 (t) dt

d x = f1 (x0 , x1 , y, z; t) dt

d y = g(x0 , x1 , y, z; t) + j(t) dt

d z = h(x0 , x1 , y, z; t) dt

have unique continuous solutions y(t), z(t), j0 (t), and j(t) with y(t) and z(t) continuously differentiable.

Biometrics, September 2009

936

Proof. Solutions may be constructed directly by noting that any solution z(t) must satisfy:   d ˙ z; t), z, t , z(0) = z0 (A.1) z = h x0 , x1 , fy−1 (x0 , x1 , x, dt (note that y0 = f −1 (x0 , x1 , x˙ 10 , z0 ; 0) by assumption). Under the assumption of Lipschitz continuity, the right-hand side of (A.1) is also Lipschitz continuous. This provides the local existence and uniqueness for solutions of (A.1); see, for example, Bellman (1953). Boundedness of h guarantees a global solution. We can now reconstruct y(t) and j(t) as y(t) = fy−1 (x0 , x1 , x˙ 1 , z; t) d 0 x − f0 (x0 , x1 , y, z; t) j0 (t) = dt d y − g(x, y, z; t). j(t) = dt

Continuity follows directly. We remark that this construction is only possible when y(t) has the same dimension as x1 (t), otherwise no such fy−1 will exist. Rather than insisting on this inverse, we could proceed to solve the differential-algebraic system: 0=

d x − f(x, y, z, u) dt

d z = h(x, y, z; t) dt in terms of y(t) and z(t) and proceed with the substitution. This allows slightly different conditions on f. See, for example, Rheinbolt (1984). The division of components into x0 , x1 , y, and z is arbitrary here, the point being that for any division for which fy−1 exists, solutions may be found.