Michael Small. Kevin Judd. Centre for Applied Dynamics and Optimization,. Department of Mathematics,. University of Western Australia. Nedlands,. Perth, WA ...
Using surrogate data to test for nonlinearity in experimental data Michael Small
Kevin Judd
Centre for Applied Dynamics and Optimization, Department of Mathematics, University of Western Australia. Nedlands, Perth, WA 6907.
Abstract| The technique of surrogate data provides has been used to test for membership of particular classes of linear systems. Existing algorithms provide non-parametric methods to generate surrogates similar to the data and consistent with a given hypothesis. These non-parametric methods allow a wide range of test statistics to be utilized. We suggest an obvious extension of this to classes of nonlinear parametric models. To do so it is necessary to restrict the statistics employed to a relatively broad class. We demonstrate that correlation dimension provides a suitable statistic and apply these methods, together with existing surrogate tests to respiratory data from sleeping infants. Although our data are clearly distinct from the dierent classes of linear systems we are unable to distinguish between our data and surrogates generated by nonlinear models. Hence we conclude that our data cannot be explained by linearly ltered noise but is consistent with the noisy periodic orbit of a nonlinear system.
good. Furthermore, not all hypotheses are as straightforward, or interesting, as they may appear. It is possible that one of the surrogate generating algorithms is
awed [3] and the choice of test statistic and surrogate generation algorithm should be made very carefully [5, 6, 10]. Existing surrogate methods are largely nonparametric and concerned with rejecting the hypothesis that a given data set is generated by some form of linear system. We suggest a new type of surrogate generation method which is both parametric and nonlinear. We model the data using methods described in [2, 4] and generate noise driven simulations from that model. Using correlation dimension (or another nonlinear statistic) we are then able determine which properties are common to both data and model. First we discuss linear surrogate generation methods suggested by Theiler [9, 10, 11]. In section II. we discuss the use of parametric nonlinear modeling to test the hypothesis that the data was generated by I. Introduction a noise driven nonlinear system, for a more complete Surrogate methods proceed by comparing the value of discussion see [5, 6, 7]. Finally we present some numer(nonlinear) statistics for the data and the approximate ical results of applying our methods to experimental distribution for various classes of linear systems and by recordings of infant respiration. doing so one can test if the data has some characteristics which are distinct from stochastic linear systems. Surrogate analysis provides a regime to test speci c A. Linear surrogate data hypotheses about the nature of the system responsible for data, nonlinear measures provide an estimate Dierent types of surrogate data are generated to test membership of speci c dynamical system classes, reof some quantitative attribute of the system1 . Surrogate analysis enables us to test whether the ferred to as hypotheses. The three types of surrodynamics are consistent with linearly ltered noise or gates described by [9], referred to as algorithms 0, a nonlinear dynamical system. Surrogate data analy- 1 and 2, address the three hypotheses: (0) linearly sis is not, however, entirely straightforward. Theiler's ltered noise; (1) linear transformation of linearly loriginal work on surrogate methods [9], suggested a tered noise; (2) monotonic nonlinear transformation of \hierarchy" of hypotheses that should be tested with a linearly ltered noise. Cycle shued surrogates have \battery" of test statistics. More recent work [11] has also been suggested to test for correlation between difdemonstrated that not all test statistics are equally ferent cycles of periodic data [11]. In this paper we test for the most general type of linear dynamics | 1 Because nonlinear measures are of particular interest they are often used as the discriminating statistic in surrogate data a monotonic nonlinear transformation of linearly lhypothesis testing tered.
II. Noise driven nonlinear system surrogates
is called a cylindrical basis model, the projections Pj are the distinction between this and a standard radial Hypothesis testing with surrogate data is, essentially a basis model2. modeling process. To test if the data is consistent with a particular hypothesis one rst builds a model that is B. Test statistics consistent with that hypothesis and has the same prop- To compare the data to its surrogates a suitable test erties as the original data, then one generates surro- statistic must be selected. A useful statistic must meagate data from the model and checks that the original a non-trivial invariant of a dynamical system that data is typical under the hypothesis by comparing it to sure is independent of the way surrogates are generated. the surrogate data. For surrogates generated by algoIn a recent paper Theiler [10] suggests that there rithm 0, 1 or 2 the model used is linear. Each of these are two, fundamentally dierent types of test statissurrogate tests addresses a hypothesis that the data is tics: pivotal; and non-pivotal . A test statistic T is piveither linear, or some (linear, or monotonic nonlinear) otal if the probability distribution of T is the same for transformation of a linear process. all processes F consistent with the hypotheses. OtherTo address the hypothesis that the data comes from it is non-pivotal. Similarly there are two dierent a noise driven nonlinear system, we build a nonlinear wise types hypotheses: those that are simple hypothemodel and generate surrogate data (noise driven sim- ses; andof composite hypotheses. A hypothesis is simple ulations). The nonlinear model that we build from if the set of all processes consistent with the hypothesis the data is a cylindrical basis model by the methods F is singleton. Otherwise the hypothesis is compos of [2, 4]. Cylindrical basis models are a generaliza- ite and the problem is not only generate surrogates tion of radial basis models that allow for a variable consistent with F (a particulartoprocess) but also to embedding. Cylindrical basis models are used because estimate F 2 F . they are known to be eective in modeling a variety Theiler argues that it is highly undesirable to use a of nonlinear dynamical systems and the authors have non-pivotal test statistic if the hypothesis is composat their disposal a sophisticated software implemen- ite. In the case the null hypothesis is composite tation of this modeling method. The hypothesis we one must specifywhen F | unless the test statistic T is pivwish to test is that the data is consistent with a non- otal, in which case the distribution of T is the same for linear system that can be described by a cylindrical F 2 F . In cases when non-pivotal statistics are to basis model and that the data of such a system can all be to composite hypotheses (as most interestbe modeled adequately using the algorithms we use. ingapplied hypotheses are) Theiler suggests that a constrained Rejection of the hypothesis could imply that the data realization scheme be employed. That is, instead of cannot be described by a cylindrical basis model, or generating surrogates that are typical realizations of that the modeling algorithm failed to build an accu- a model of the data, ensure that the surrogates are rate model. realizations of a system consistent the hypothWe return to discuss this hypothesis in section D.. In esis that gives identical estimates ofwith the parameters the following sections we describe our nonlinear model- (of that system) to the estimates of those parameters ing algorithm, choice of test statistics, and estimating from the data. In other words, if F^ 2 F is the process correlation dimension. estimated from the data z , and zi is a surrogate data A. Modeling set generated from Fi 2 F then F^i 2 F the process We present a concise description of the form of the estimated from zi must be the same as F^ . Unfortunately, not all interesting test statistics are model, for more information see [2, 4]. From a scalar pivotal and constrained realization schemes can be extime series yt we build a model of the form tremely non trivial3. Furthermore, the nonlinear suryt+1 = f (vt ) + t; rogate generation method we introduce in this secwhere vt is a d-dimensional time lag embedding of the tion is a parametric modeling method that utilizes a scalar time series and t are Gaussian random vari- stochastic search algorithm | it is de nitely not a ates. Observe that by using a time-delay embedding constrained realization method, and no related conthe only new component of vt+1 that the model needs strained method is evident. In the following section we discuss our test statistic. to predict is yt+1 . The function f is of the form We have chosen to use correlation dimension because 1 ? kP (v ? )k j n d X X it is a measure of great signi cance and has been the j t j j ; j exp bi yt?i + a0 + subject of much attention. To estimate correlation dij j j =1 i=0 2 a radial basis model Pj : Rd 7! Rd is the identity for where a0 , bi , j and j are scalar constants, j are ar- all jFor . 3 Theiler [10] gives examples of constrained realization bitrary points in Rd and Pj are projections onto arbitrary subsets of coordinate components. Such a model schemes for linear hypotheses, namely algorithm 0, 1 and 2.
Abdominal movement
(i)
(ii)
2
3.8
3.8
0
3.6
3.6
−2 0
200
400
600
800
1000
1200
1400
1600
correlation dimension
4
correlation dimension
4
3.4
3.2
3
3.4
3.2
3
Figure 1: Experimental data The abdominal movement measured with inductance plethysmography for an 2 month old male child in quiet (stage 3{4) sleep. The 1600 data points were sampled at 12.5Hz, and digitized using a 12 bit analogue to digital convertor during a sleep study at Princess Margaret Hospital for Children, Subiaco, Western Australia. Figure 2: Probability distribution for correlation 2.8
2.8
2.6
2.6
2.4
2.4
−2.5
−2 −log(epsilon)
−1.5
−2.5
−2 −log(epsilon)
−1.5
dimension estimates for linear surrogates of experimental data. Shown are contour plots which mension we follow the method described by Judd [1], represent the probability distribution of correlation di-
for completeness correlation dimension is de ned in mension estimate for various values of "0 . The data the following section. used in this calculation is illustrated in gure 1. The gures are probability density estimates for surrogates C. Dimension Estimation from: (i) constrained realization algorithm 2 N d e Let fvt gt=1 be an embedding of a time series in R . generated surrogates; (ii) a monotonic nonlinear transformation De ne the correlation function, CN ("), by of a parametric linear model. In each calculation 50 N ?1 X realizations of 1600 points were calculated, and their I (kvi ? vj k < "): CN (") = correlation dimension calculated for de = 3. The value 2 0i