Testing Time Series for Nonlinearity Michael Small, Kevin Judd and Alistair Mees Centre for Applied Dynamics and Optimization, Department of Mathematics and Statistics, University of Western Australia. February 4, 1999
Abstract The technique of surrogate data analysis may be employed to test the hypothesis that an observed data set was generated by one of several speci c classes of dynamical system. Current algorithms for surrogate data analysis enable one, in a generic way, to test for membership of the following three classes of dynamical system: (0) independent and identically distributed noise, (1) linearly ltered noise, and (2) a monotonic nonlinear transformation of linearly ltered noise. We show that one may apply statistics from nonlinear dynamical systems theory, in particular those derived from the correlation integral, as test statistics for the hypothesis that an observed time series is consistent with each of these three linear classes of dynamical system. Using statistics based on the correlation integral we show that it is also possible to test much broader (and not necessarily linear) hypotheses. We illustrate these methods with radial basis models and an algorithm to estimate the correlation dimension. By exploiting some special properties of this correlation dimension estimation algorithm we are able to test very speci c hypotheses. Using these techniques we demonstrate the respiratory control of human infants exhibits a quasi-periodic orbit (the obvious inspiratory/expiratory cycle) together with cyclic amplitude modulation. This cyclic amplitude modulation manifests as a stable focus in the rst return map (equivalently, the sequence of successive peaks).
1
Running Title: Nonlinear surrogates for hypothesis testing. Correspondence: Michael Small, Department of Physics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS, United Kingdom. Tel: +44 131 451 3030. Fax: +44 131 451 3136. Email:
[email protected]
Keywords: Surrogate data analysis, nonlinear surrogates, correlation dimension, infant respiratory patterns, Floquet theory, rst return map.
1 Introduction Nonlinear measures such as correlation dimension, Lyapunov exponents, and nonlinear prediction error are often applied to time series with the intention of identifying the presence of nonlinear, possibly chaotic behavior (see for example [Casdagli et al. 1996; Schmid and Dunki 1996; Small et al. 1999; Vibe and Vesin 1996] and the references therein). Estimating these quantities and making unequivocal classi cation can prove dicult and the method of surrogate data [Theiler et al. 1992] is a version of bootstrapping which is often employed to clarify and quantify statements about the presence of nonlinear eects. Surrogate methods compare the value of (nonlinear) statistics for the data and the approximate distribution for various classes of linear systems, so as to test if the data has some characteristics which are distinct from stochastic linear systems. Surrogate analysis provides a regime to test speci c hypotheses about the nature of the system responsible for data; nonlinear measures are often used as the discriminating statistic in this hypothesis testing. In this paper we demonstrate that statistics derived from the correlation integral provide a natural choice of test statistic for surrogate data analysis. Using such statistics it is possible to test a broad range of linear and nonlinear hypotheses. In particular, one may use correlation integral based statistics to test the hypothesis that the data came from one of many classes of nonlinear dynamical system. We illustrate these methods with an application to the analysis of human 2
respiration. In the following section, we introduce some terminology and review some common methods of generating linear surrogates. Following this we introduce the correlation integral and discuss reconstruction from experimental data. In section 4 we derive some results regarding the usefulness of correlation integral based statistics for nonlinear surrogate data analysis. Finally, we demonstrate the application of these methods to experimental data.
2 The rationale and language of surrogate data The general procedure of surrogate data methods has been described by Theiler [Theiler 1995; Theiler et al. 1992; Theiler and Prichard 1996; Theiler and Rapp 1996] and Takens [1993]. One rst assumes that the data comes from some speci c class of dynamical process, possibly tting a parametric model to the data. One then generates surrogate data from this hypothetical process and calculates various statistics of the surrogates and original data. The surrogate data will have some distribution of values of each statistic and one can check that the statistic of the original data is typical. If the original data has atypical statistics, we reject the hypothesis that the process that generated the original data is of the assumed class. One always progresses from simple and speci c assumptions to broader and more sophisticated models. Let be a speci c hypothesis and F the set of all processes (or systems) consistent with that hypothesis. Let z 2 RN be a time series (consisting of N scalar measurements) under consideration, and let T : RN ! U be a statistic which we will use to test the hypothesis that z was generated by some process F 2 F . Generally U will be R and one can discriminate between the data z and surrogates zi consistent with the hypothesis given the approximate probability density pT;F (t), i.e. the probability density of T given F . In a recent paper, Theiler and Prichard [1996] suggests that there are two fundamentally different types of test statistics: pivotal; and non-pivotal.
De nition 1: A test statistic T is pivotal if the probability distribution pT;F is the 3
same for all processes F consistent with the hypotheses; otherwise it is non-pivotal. Similarly there are two dierent types of hypotheses: simple hypotheses and composite hypotheses.
De nition 2: A hypothesis is simple if the set of all processes consistent with the hypothesis F is a singleton. Otherwise the hypothesis is composite. When one has a composite hypothesis the problem is not only to generate surrogates consistent with F (a particular process) but also to estimate F 2 F . Theiler argues that it is highly desirable to use a pivotal test statistic if the hypothesis is composite. In the case when the hypothesis is composite, one must specify F unless the test statistic T is pivotal, in which case pT;F is the same for all F 2 F . In cases when non-pivotal statistics are to be applied to hypotheses which are composite (as most interesting hypotheses are), Theiler suggests that a constrained realization scheme be employed.
De nition 3: Let F^ 2 F be the process estimated from the data z, and let zi be a surrogate data set generated from Fi 2 F . Let F^i 2 F be the process estimated
from zi , then a surrogate zi is a constrained realization if F^i = F^ . Otherwise it is non-constrained.
That is, as well as generating surrogates that are typical realizations of a model of the data, one should ensure that the surrogates are realizations of a process that gives identical estimates of the parameters (of that process) to the estimates of those parameters from the data. In Small and Judd [1998c] we discuss constrained realizations in more detail.
2.1 Linear surrogates Dierent types of surrogate data are generated to test membership of speci c dynamical system classes, referred to as hypotheses. The three types of surrogates described by Theiler et al. [1992], referred to as algorithms 0, 1 and 2, address the three hypotheses: (0) linearly ltered noise; (1) linear transformation of linearly ltered noise; (2) monotonic nonlinear transformation of linearly ltered noise. 4
Constrained realization consistent with each of these hypotheses can be generated by (0) shuf ing the data, (1) randomizing (or shuing) the phases of the Fourier transform of the data, and (2) applying a phase randomizing (shuing) procedure to amplitude adjusted Gaussian noise.
Algorithm 0: The surrogate zi is created by shuing the order of the data z. Generate an i.i.d Gaussian data set y and reorder z so that it has the same rank distribution as
y.
Algorithm 1: An algorithm 1 surrogate zi is produced by applying algorithm 0 to the phases of the Fourier transform of z . Calculate Z the Fourier transform of z . Either randomize the phases of Z (preserving the complex conjugate pairs) or shue them by applying algorithm 0. Take the inverse Fourier transform to produce the surrogate zi .
Algorithm 2: The procedure for generating surrogates consistent with algorithm 2 is the following [Theiler et al. 1992]: start with the data set z , generate an i.i.d. Gaussian data set y and reorder y so that it has the same rank distribution as z . Then create an algorithm 1 surrogate yi of y (either by shuing or randomizing the phases of the Fourier transform of y). Finally, reorder the original data z to create a surrogate zi which has the same rank distribution as yi . Algorithm 2 surrogates are also referred to as amplitude adjusted Fourier transformed (AAFT) surrogates. Surrogates generated by these three algorithms have become known as algorithm 0, 1 and 2 surrogates. Each of these hypotheses should be rejected for data generated by a nonlinear system. However, rejecting these hypotheses does not necessarily indicate the presence of a nonlinear system, only that it is unlikely that the data is generated by a monotonic nonlinear transformation of linearly ltered noise. The system could, for example, involve a non-monotonic transformation or non Gaussian or state dependent noise. In the case of an approximately periodic signal it would be useful to be able to determine the presence of temporal correlation between cycles. In recent papers Theiler [1995] and Theiler 5
and Rapp [1996] address this problem and propose that a logical choice of surrogate for strongly periodic data, should also be periodic. To achieve this Theiler decomposes the signal into cycles, and shues the individual cycles. Theiler's hypothesis for strongly periodic signals is rather simple, but in many ways powerful. Theiler proposes that surrogates generated by shuing the cycles addresses the hypothesis that there is no dynamical correlation between cycles.
3 The correlation integral The correlation integral is a property of the spatial distribution of the system variables of a dynamical system. To estimate the correlation integral from an experimental time series it is necessary to reconstruct the dynamics of the dynamical system which generated the time series. To do this we employ the reconstruction technique of time delay embedding. In this section we brie y review time delay embedding, the correlation integral, and an algorithm which we employ to estimate the correlation dimension.
3.1 Reconstruction Attractor reconstruction using the method of time delays is now widely applied, we will brie y describe the key points of this technique and the methods we utilize to select an appropriate embedding strategy. Let M be a compact m dimensional manifold, Z : M 7?! M a C 2 vector eld on M , and
h : M 7?! R a C 2 function (the measurement function). The vector eld Z gives rise to an associated evolution operator ( ow) t : M 7?! M . If zt 2 M is the state at time t then the state at some latter time t + is given by zt+ = (zt ). Observations of this state can be made so that at time t we observe h(zt ) 2 R and at time t + we can make a second measurement h( (zt )) = h(zt+ ). Takens' embedding theorem [Takens 1981] guarantees that given the above
6
situation, the system generated by the map Z;h : M 7?! R2m+1 where Z;h(zt ) := (h(zt ); h( (zt )); : : : ; h(2m (zt )))
(1)
= (h(zt ); h(zt+ ); : : : ; h(zt+2m )) is an embedding. By embedding we mean that the asymptotic behavior of Z;h(zt ) and zt are dieomorphic. We can apply this result to reconstruct from a time series of experimental observations fyt gNt=1 (where yt = h(zt )) a system which is (asymptotically) dieomorphic to that which generated the underlying dynamics (subject to the usual restrictions of nite data and observational error). We produce from our scalar time series
y1 ; y2 ; y3 ; : : :; yN a de -dimensional vector time series via the embedding (1)
yt? 7?! vt = (yt? ; yt?2 ; : : : ; yt?de )
8t > de:
To perform this transformation one must rst identify the embedding lag and the embedding dimension de . A sucient condition on de is that it must exceed 2m + 1 where m is the attractor dimension. However, to estimate m, one must already have embedded the time series. We describe the selection of suitable values of these parameters in the following paragraphs. An embedding depends on two parameters, the lag and the embedding dimension de . For an embedding to be suitable for successful estimation of dimension and modeling of the system dynamics, one must choose suitable values of these parameters. The following two subsections discuss some commonly used methods to estimate embedding lag and embedding dimension de . Takens embedding theorem [Noakes 1991; Takens 1981] and more recently work of Ding et al. [1993] gives sucient conditions on de . Ding et al. give a sucient condition on the value of
de necessary to estimate the correlation dimension of an attractor, not to avoid all possible self 7
intersections. Unfortunately, the conditions require a prior knowledge of the fractal dimension of the object under study. In practice one could guess a suitable value for de by successively embedding in higher dimensions and looking for consistency of results; this is the method that is generally employed. However, other methods, such as the false nearest neighbor technique [Farmer et al. 1983; Theiler 1990; Kennel et al. 1992], are now available to suggest the value of de . Any value of is theoretically acceptable, but the shape of the embedded time series will depend critically on the choice of and it is wise to select a value of which separates the data as much as possible. Studies in nonlinear time series [Abarbanel et al. 1993] suggest the rst minimum of the mutual information criterion [Rissanen 1989], the rst zero of the autocorrelation function [Priestly 1989] or one of several other criteria to choose . Our experience and numerical experiments suggest that selecting a lag approximately equal to one quarter of the quasi-period of the time series produce comparable results to the autocorrelation function but is more expedient. Note that the rst zero of the autocorrelation function will be approximately the same as one quarter of the quasi-period if the data is almost periodic. Numerical experiments with infant respiratory data [Small et al. 1999] have shown that either of these methods produces superior results to the mutual information criterion (MIC).
3.2 Correlation dimension and the correlation integral To de ne the correlation dimension in a meaningful way we generalize the concept of integer dimension to fractal objects with non-integer dimension. In dimensions of one, two, or three it is easily established, and intuitively obvious, that a measure of volume V (e.g. length, area or volume) varies as
V / "d ;
(2)
where " is a length scale (e.g. the length of a cube's side or the radius of a sphere) and d is the dimension of the object. For a general fractal it is natural to assume a relation like equation (2)
8
holds true, in which case its dimension is given by,
V: d log log "
(3)
Let fvt gNt=1 be an embedding of a time series in Rde . De ne the correlation function, CN ("), by
!?1 X N CN (") = 2 I (kvi ? vj k < "): i de ) the embedded time series will \ ll" the embedding space. If the time series is of in nite length then the dimension dc of the embedded time series will then be equal to de . If the time series is nite then the dimension dc of the embedded time series will be less than de . This is particularly likely for a short time series and large embedding dimension. For a moderately 14
small embedding dimension this dierence is typically not great and is dependent on the estimation algorithm and the length of the time series, and independent of the particular realization. Hence, if the correlation dimension dc of all surrogates consistent with the hypothesis under consideration exceeds de then correlation dimension is a pivotal test statistic for that value of de . An examination of the \pivotalness" of the correlation integral (and therefore correlation dimension) can be found in a recent paper of Takens [Takens 1993]. Takens' approach is to observe that, if and 0 and two metrics in the embedded space X and k is some constant and for all
x; y 2 X k?1 (x; y) 0 (x; y) k(x; y)
(6)
then the correlation integral limN !1 CN (") with respect to either metric is similarly bounded and hence the correlation dimension with respect to each metric will be the same. This result is independent of the conditions of Takens' embedding theorem (i.e. that n > 2dc + 1 for X = Rn ). Hence if we (for example) embed a stochastic signal in Rn the correlation dimension will have the same value with respect to the two dierent metrics and 0 . To show that dc is pivotal for the various linear hypotheses addressed by algorithm 0, 1 and 2 it is only necessary to show that various transformations can be applied to a realization of such processes which have the aect of producing i.i.d. noise and are equivalent to a bounded change of norm as in (6). Our approach is to show that surrogates consistent with each of the three standard linear hypotheses are at most a C 2 function from Gaussian noise N (0; 1). A C 2 function on a bounded set (a bounded attractor or a nite time series) distorts distance only by a bounded factor (as in equation (6)) and so the correlation dimension is invariant. We therefore have the following result.
Proposition 1: The correlation dimension dc is a pivotal test statistic for a hypothesis if 8F ; F 2 F and embeddings ; : R 7?! X ; there exists a C function f : X 7?! X such that 8 t f ( (F (t))) = (F (t)). 1
1
2
2
12
1
1
2
12
2
2
Proof: The proof of this proposition is in outline as follows. Let F ; F 2 F be par1
2
ticular processes consistent with a given hypothesis and F1 (t) and F2 (t) realizations of 15
those processes. We have that 8 t f (1 (F1 (t))) = 2 (F2 (t)), and so if 1 (x1 ); 1 (y1 ) 2 X1 and 2 (x2 ); 2 (y2 ) 2 X2 are points on the embeddings 1 and 2 of F1 (t) and F2 (t) respectively, then f (1 (x1 )) = (x2 ) and f (1 (y1 )) = 2 (y2 ). Let 2 be a distance function on X2 , then de ne 1 (1 (x1 ); 1 (y1 )) := 2 (f (1 (x1 )); f (1 (y1 ))) = 2 (2 (x2 ); 2 (y2 )): Clearly (6) is satis ed and so limN !1 CN (") on X1 and X2 are similarly bounded, and therefore the correlation dimension of X1 and X2 are identical.
Hence, if any particular realization of a surrogate consistent with a given hypothesis is a C 2 function from i.i.d. noise (which in turn is a C 2 function from Gaussian noise) then correlation dimension is a pivotal statistic for that hypothesis.
5 Examples Figure 2: about here. Figure 3: about here. Figure 4: about here. If a set of data is inconsistent with each of the three linear hypotheses addressed by algorithms 0, 1 and 2, one may wish to ask more speci c questions: is the data consistent with (for example) a noise driven periodic orbit? In particular, a hypothesis similar to this is treated by Theiler and Rapp [Theiler 1995; Theiler and Rapp 1996]. We have applied this method elsewhere [Small and Judd 1998b]. In this section we focus on more general hypotheses. In Small et al. [1999] we test the hypothesis that infant respiration during quiet sleep is distinct from a noise driven (or chaotic) quasi-periodic or toroidal attractor (with at least two identi able periods). Such an apparently abstract hypothesis can have real value: these results have been con rmed with observations of 16
cyclic amplitude modulation in the breathing of sleeping infants [Small et al. 1996; Small et al. 1999] during quiet sleep and in the resting respiration of adults at high altitude [Waggener et al. 1984]. To apply such complex hypotheses we build cylindrical basis models using a minimum description length selection criterion [Judd and Mees 1995; Small and Judd 1998a] and generate noise driven simulations (surrogate data sets) from these models. This modeling scheme has been successful in modeling a wide variety of nonlinear phenomena. However, it involves a stochastic search algorithm. This method of surrogate generation does not produce surrogates that can be used with a constrained realization scheme (the modeling algorithm described in [Judd and Mees 1995; Small and Judd 1998a] is partially stochastic), and so a pivotal statistic is needed. It is important to determine if the data is generated by a system consistent with a speci c model or a general class of models. To do this we need to determine exactly how representative a particular model is for a given test statistic | how big is the set F for which T is pivotal? By comparing a data set and surrogates generated by a speci c model are we just testing the hypothesis that a system consistent with this speci c model generated the data or can we infer a broader class of models? In either case (unlike constrained realization linear surrogates), it is likely that the hypothesis being tested will be determined by the results of the modeling procedure and therefore depend on the particular data set one has. The hypothesis one can test will be as broad as the class of all systems with metric bounded by equation (6) (in the case of correlation integral based test statistics). In particular proposition 4.2 holds. We wish for T to be a pivotal test statistic for the hypotheses . But is a broad class of nonlinear dynamical systems. For example, if F is the set of all noise driven processes then dc (") will not be pivotal. However, if we are able to restrict ourselves to F~ F where T is pivotal on F~ then the problem is resolved. To do this we simply rephrase the hypothesis to be that the data is generated by a noise driven nonlinear function (modeled by a cylindrical basis model) of dimension d. For example this allows us to test if the data is generated by a periodic orbit with 2 degrees of freedom driven by Gaussian noise. Furthermore the scale dependent properties of our 17
estimate of dc ("0 ) allow some sensitivity to the size (relative to the size of the data) of structure of a particular dimension. This is a much more useful hypothesis than that the system is noisy and nonlinear | if this was our hypothesis, then what would be the alternative?
5.1 Calculations We wish to test the hypothesis that a data set is consistent with a particular class of nonlinear dynamical system. First we must check that a particular model is representative of the general class of nonlinear models by calculating probability estimates of the test statistics for the particular model we wish to test and for the general class of models. Then, we need to compare the value of the test statistic for the data to that probability distribution. Figure 3 and 4 give examples of this method for the experimental data in gure 2. We clearly reject the linear hypothesis associated with the calculations of gure 3, but are unable to reject the nonlinear hypothesis of gure 4. In Small et al. [1999] we have performed similar calculations for 27 observations from 10 infants, and in Small and Judd [1998b] for 14 observations from 14 infants. These calculations concluded that the simulations produced by cylindrical basis models have distributions of correlation dimension estimates which match that of the data. The data is clearly distinct from the linear surrogates but consistent with the nonlinear surrogates. However, the probability density function of the correlation dimension estimate is the same for the constrained (algorithm 2) surrogates and simple parametric surrogates. The parametric surrogates are generated by rescaling the data to be Gaussian, building a reduced autoregressive model (as described in [Judd and Mees 1995; Small and Judd 1999]) on the data from the data, generating a noise driven simulation of that model and rescaling it to have the same distribution as the data. This essential requires a parameterized estimate of the monotonic nonlinear transformation (parameterized by the data) and a parametric linear model.
18
5.2 Inference We have established that radial basis models are consistent with the data. The next important issue is to determine which properties are exhibited by these models, and (preferably) only by models which exhibits this distribution of statistic values. If a property is exhibited by models with the observed distribution of statistic values then we can infer that this property is consistent with the data (just as the model is). If a property is exhibited only by models with the observed distribution of statistic values then we can conclude that this property is necessary for a system to generate data consistent with this hypothesis. The most obvious feature is correlation dimension. The correlation dimension of a system should be evident in an estimate of the correlation dimension of data from that system. However, because of the scale dependent properties of the estimate of correlation dimension which we employ, we able to make more speci c observation. For example, Figure 5 shows distribution of correlation dimension estimates for two monotonic nonlinear transformation of linearly ltered noise. The shape of these distribution are quite dierent and are a characteristic property of the estimates of correlation dimension. The transformation g(x) = x3 has the eect of compressing the blob of points produced by realizations of the linear stochastic process | this decreases the scale "0 and increase the correlation dimension at this scale to almost the embedding dimension. The second transformation g(x) = sign(x)jxj1=4 stretched the data out creating a \shell" of points and a relatively empty interior. Hence the two distinct values of correlation dimension. For large length scale the structure is low dimensional | it is only the \shell". For smaller observation scales one observes the de dimensional behavior within the surface of that shell. Figure 5: about here. A property characteristic of these models is the periodic orbit they exhibit. The number of degrees of freedom of that periodic orbit are characterized by the correlation dimension lim"0 ?!0 dc ("0 ). For large values of "0 , dc ("0 ) tells us more about the general structure and distribution of the attractor. For example, the stability of this periodic orbit. Although other models with dierent 19
probability distributions of correlation dimension may exhibit periodic orbits with similar stability properties these properties are exhibited by all the models we have built from this data. Hence this is a property of these models, but not only of these models. To infer the stability of the respiratory motion we apply Floquet theory to analyze the stability of the periodic orbit of the models.
5.3 Floquet theory From a data set we can build a map F , an approximation to the dynamics of respiration. This is the (cylindrical basis) model. Let z be a point on a periodic orbit of period p, that is
z = F p (z) = F| F{z: : : F}(z): p times
Hence z is a xed point of the map F p and we can calculate the eigenvectors and eigenvalues of that xed point. These eigenvectors and eigenvalues correspond exactly to the linearized dynamics of the periodic orbit: one eigenvector will be in the direction DF (z ) and will have associated eigenvalue 1, the others will be determined by the dynamics [Guckenheimer and Holmes 1983]. To calculate these eigenvectors and eigenvalues we must rst linearize F p at z . We have that
Dz F p (z) = DF p?1 (z) F (F p?1 (z))Dz F p?1 (z) = DF p?1 (z) F (F p?1 (z ))DF p?2 (z) F (F p?2 (z )) : : : Dz F (z ) pY ?1 = DF k (z) F (F k (z)): k=0
(7)
Q One may then calculate the eigenvalues of the matrix pk?=01 DF k (z) F (F k (z )) to determine the stability of the periodic orbit of z . Unfortunately the application of this method has several problems. To calculate (7) one must rst be able to identify a point z on a periodic orbit. In practice a model built by the methods described we employ will typically have been embedded in approximately 20 dimensional space. In this situation, we limit ourselves to the study of stable periodic orbits. Fortunately this is a common feature of these models. However, the periodic orbit may not 20
be exactly periodic. The map F is an approximation to the dynamics of a ow and it is unlikely that the \periodic orbit" of interest will be periodic with exactly period p | p will be of the order of the embedding dimension. In most cases it is only possible to nd a point z of an approximately periodic orbit. By this we mean that z and F p (z ) are close. If the map F is not chaotic then one can choose a point z such that fF p (z )g1 p=1 is bounded and p will be chosen to be the rst local minimum of kF p (z ) ? z k for p > 1.
Figure 6: about here. Having found a point z such that fz; F (z ); F 2 (z ); : : : ; F p?1 (z )g form points of an \almost periodic" orbit the expression (7) may be evaluated. However for our data p is approximately 20 and the periodic orbit fz; F (z ); F 2 (z ); : : : ; F p?1 (z )g is (presumably) stable, hence the calculation of the eigenvalues of (7) will be numerically highly sensitive. The eigenvalues will be close to zero and the Q matrix pk?=01 DF k (z) F (F k (z )) will be nearly singular. By embedding the data in a lower dimension (perhaps not using a variable embedding strategy) this calculation becomes more stable. However, Q as the calculation of pk?=01 DF k (z) F (F k (z )) becomes more stable the periodic orbit itself will be more \approximate", and the model will possibly provide a worse t of the data. Figure 6 demonstrates some of the common features of models with a low embedding dimension. This is clearly problematic. The probability distribution of such models is typically dierent to the data. Models that predict a short time (less than 41 (quasi-period)) ahead by only using the immediately preceding values provide a poor t of the data. However if we embed using a uniform embedding strategy such as (yt ; yt? ; yt?2 ), where 41 (quasi-period) we can build a model yt+1 = f (yt ; yt? ; yt?2 ). However, it is impossible to iterate a model of this form to produce a free run prediction. Models of the form yt+ = f (yt ; yt? ; yt?2 ) are not likely to produce periodic orbits as it is unlikely that the relationship 4 = (quasi-period of data) will hold exactly. For a given embedding lag and embedding dimension d determined by the methods discussed earlier, we have applied this technique to models of the form yt+1 = f (yt ; yt? ; : : : ; yt?d ) (eectively producing periodic orbits with period d ). Note that, using time delay embeddings 21
one will have that F (zt ) = F (yt ; yt?2 ; : : : ; yt?d ) = [yt+1 ; yt ; yt?2 ; : : : ; yt?d +1 ], where yt+1 = f (yt ; yt? ; : : : ; yt?d ). From these models we calculate the eigenvalues and eigenvectors of the periodic orbits. We comparing the 6 largest eigenvalues of an \almost" periodic orbit of the map F generated by models of 38 data set from 14 infants. These maps are an approximation to a (presumably) periodic orbit of the ow of the original data. In almost all cases the 6 largest eigenvalues include complex conjugate pairs: evidence of a stable focus in the rst return map. Most of these models produces complex eigenvalues with the magnitude of the real part less than one. This indicates that the map F p has a stable focus, or that trajectories will spiral towards the periodic orbit. This provide additional evidence for the presence of CAM.
6 Conclusion We have shown that statistics based on the correlation integral often will be pivotal statistics for surrogate data analysis. Utilizing this property we may apply (for example) correlation dimension estimates as a test statistic for the hypotheses addressed by Theiler's algorithm 0, 1 and 2 surrogates. Because our test statistic is pivotal there is no requirement to ensure that the surrogate generation method we employ is constrained. Therefore, we are not bound to use these three algorithms to generate surrogate data. More importantly, this greatly expands the scope of surrogate data testing. Because we have precise conditions on the \pivotalness" of correlation dimension, it is possible to extend surrogate data hypothesis to nonlinear hypothesis testing. With the help of minimum description length cylindrical basis modeling techniques [Judd and Mees 1995; Small and Judd 1998a] correlation dimension provides a useful statistic to test membership of particular classes of nonlinear dynamical processes. The hypothesis being tested is in uenced by the results of the modeling procedure and cannot be determined a priori. After checking that all models have the same distribution of test statistic values and are representative of the data (in the sense that the models produce simulations that have qualitative features of the data), one is able 22
to build a single nonlinear model of the data and test the hypothesis that the data was generated from a process in the class of dynamical processes that share the characteristics (such as periodic structure) of that model. In general one may take a data set, build nonlinear models of that data set and generate many noise driven simulation from each of these models and compare the distributions of a test statistic for each model and for broader groups of models (based on qualitative features, such as xed points or periodic orbits, of these models). By comparing the value of the test statistic for the data to each of these distribution (for groups of models) one may either accept or reject the hypothesis that the data was generated by a process with the same qualitative features as the models used to generate a given probability denisity function. We have demonstrated these methods with an application to infant respiratory patterns. We built models of this data to estimate the probability distribution of correlation dimension estimates for groups of models. By comparing this distribution to the value of correlation dimension for the data we concluded that this data is consistent with a noise driven simulation of these models. Having established that these models generate dynamics consistent with the respiration of sleeping infants we calculated the eigenvalues of a xed point of the rst return map. This xed point corresponds to a periodic orbit of the model and the eigenvalues of this periodic orbit demonstrate that it behaves as a stable focus. As trajectories converge to this periodic orbit they \spiral" around it.
Acknowledgments This research was supported by grants from the Australian Research Council.
References Abarbanel, H. D. I., R. Brown, J. J. Sidorowich, and L. S. Tsimring (1993). The analysis of observed chaotic data in physical systems. Reviews of Modern Physics 65, 1331{1392. 23
Casdagli, M. C., L. D. Iasemidis, J. C. Sackellares, S. N. Roper, R. L. Glimore, and R. S. Savit (1996). Characterizing nonlinearity in invasive EEG recordings from temporal lobe epilepsy. Physica D 99, 381{399.
Ding, M., C. Grebogi, E. Ott, T. Sauer, and J. A. Yorke (1993). Plateau onset for correlation dimension: when does it occur? Physical Review Letters 70, 3872{3875. Farmer, J. D., E. Ott, and J. A. Yorke (1983). The dimension of chaotic attractors. Physica D 7, 153{180. Galka, A., T. Maa, and G. P ster (1998). Estimating the dimension of high-dimensional attractors: A comparison between two algorithms. Physica D 121, 237{251. Grassberger, P. and I. Procaccia (1983). Measuring the strangeness of strange attractors. Physica D 9, 189{208.
Guckenheimer, J. and P. Holmes (1983). Nonlinear oscillations, dynamical systems, and bifurcations of vector elds, Volume 42 of Applied mathematical sciences. New York: Springer-Verlag.
Ikeguchi, T. and K. Aihara (1997). Estimating correlation dimensions of biological time series with a reliable method. Journal of Intelligent and Fuzzy Systems 5, 33{52. Judd, K. (1992). An improved estimator of dimension and some comments on providing con dence intervals. Physica D 56, 216{228. Judd, K. (1994). Estimating dimension from small samples. Physica D 71, 421{429. Judd, K. and A. Mees (1995). On selecting models for nonlinear time series. Physica D 82, 426{444. Kennel, M. B., R. Brown, and H. D. I. Abarbanel (1992). Determining embedding dimension for phase-space reconstruction using a geometric construction. Physical Review A 45, 3403{3411. Noakes, L. (1991). The Takens embedding theorem. International Journal of Bifurcation and Chaos 1, 867{872.
24
Priestly, M. B. (1989). Non-linear and non-stationary time series analysis. London: Academic Press. Rissanen, J. (1989). Stochastic complexity in statistical inquiry. Singapore: World Scienti c. Schmid, G. B. and R. M. Dunki (1996). Indications of nonlinearity, intraindividual speci city and stability of human EEG: the unfolding dimension. Physica D 93, 165{190. Schreiber, T. and A. Schmitz (1996). Improved surrogate data for nonlinearity tests. Physical Review Letters 77, 635{638.
Small, M. and K. Judd (1997). Using surrogate data to test for nonlinearity in experimental data. In International Symposium on Nonlinear Theory and its Applications, Volume 2, pp. 1133{1136. Research Society of Nonlinear Theory and its Applications, IEICE. Small, M. and K. Judd (1998a). Comparison of new nonlinear modelling techniques with applications to infant respiration. Physica D 117, 283{298. Small, M. and K. Judd (1998b). Detecting nonlinearity in experimental data. International Journal of Bifurcation and Chaos 8, 1231{1244.
Small, M. and K. Judd (1998c). Pivotal statistics for non-constrained realizations of composite null hypotheses in surrogate data analysis. Physica D 120, 386{400. Small, M. and K. Judd (1999). Detecting periodicity in experimental data using linear modeling techniques. Physical Review E 59 (2). In press. Small, M., K. Judd, M. Lowe, and S. Stick (1999). Is breathing in infants chaotic? Dimension estimates for respiratory patterns during quiet sleep. Journal of Applied Physiology 86, 359{ 376. Small, M., K. Judd, and S. Stick (1996). Linear modelling techniques detect periodic respiratory behaviour in infants during regular breathing in quiet sleep. Amican Journal of Respiratory Critical Care Medicine 153, A79. (abstract).
25
Takens, F. (1981). Detecting strange attractors in turbulence. Lecture Notes in Mathematics 898, 366{381. Takens, F. (1993). Detecting nonlinearities in stationary time series. International Journal of Bifurcation and Chaos 3, 241{256.
Theiler, J. (1990). Estimating fractal dimension. Journal of the Optical Society of America A 7, 1055{1073. Theiler, J. (1995). On the evidence for low-dimensional chaos in an epileptic electroencephalogram. Physics Letters A 196, 335{341. Theiler, J., S. Eubank, A. Longtin, B. Galdrikian, and J. D. Farmer (1992). Testing for nonlinearity in time series: the method of surrogate data. Physica D 58, 77{94. Theiler, J. and D. Prichard (1996). Constrained-realization Monte-Carlo method for hypothesis testing. Physica D 94, 221{235. Theiler, J. and P. Rapp (1996). Re-examination of the evidence for low-dimensional, nonlinear structure in the human electroencephalogram. Electroencephalography Clinical Neurophysiology 98, 213{222.
Vibe, K. and J.-M. Vesin (1996). On chaos detection methods. International Journal of Bifurcation and Chaos 6, 529{543.
Waggener, T. B., P. J. Brusil, R. E. Kronauer, R. A. Gabel, and G. F. Inbar (1984). Strength and cycle time of high-altitude ventilatory patterns in unacclimatized humans. Journal of Applied Physiology 56, 576{581.
26
Captions Figure 1: Correlation dimension from the distribution of inter point distances. The logarithm of the distribution of inter point distances, and an approximation to the derivative for one of our sets of data embedded in three dimensions. The approximate derivative is a smoothed numerical dierence. This calculation used a recording of infant abdominal movement during natural sleep. The data was embedded in 3 dimensions with a lag of 19 data points (380ms). Even with well behaved data and a smooth approximately monotonic distribution of inter point distances the choice of scaling region is still subjective. Figure 2:Experimental data. The abdominal movement measured with inductance plethysmography for an 2 month old male child in quiet (stage 3{4) sleep. The 1600 data points were sampled at 12.5Hz, and digitized using a 12 bit analogue to digital convertor during a sleep study at Princess Margaret Hospital for Children, Subiaco, Western Australia. Figure 3:Probability distribution for correlation dimension estimates for linear sur-
rogates of experimental data. Shown are contour plots which represent the probability distribution of correlation dimension estimate for various values of "0 . The data used in this calculation is illustrated in gure 2. The gures are probability density estimates for surrogates generated from: (i) constrained realization algorithm 2 surrogates; (ii) a monotonic nonlinear transformation of a parametric linear model. In each calculation 50 realizations of 1600 points were calculated, and their correlation dimension calculated for de = 3. The value of dc ("0 ) for the data is shown on each plot as a dashed line. The two distributions are practically identical, despite the fact that only one of them generates constrained realisations. The correlation dimension of the data is clearly distinct from this probability distribution.
27
Figure 4:Probability distribution for correlation dimension estimates for nonlinear
surrogates of experimental data. Shown are contour plots which represent the probability distribution of correlation dimension estimate for various values of "0 . The data used in this calculation is illustrated in gure 2. The gures are probability density estimates for surrogates generated from: (i) realizations of distinct models; (ii) realizations for one of the models used in (i) with the maximum value of correlation dimension (dc ("0 ) for ? log "0 = ?1:8). In each calculation 50 realizations of 1600 points were calculated, and their correlation dimension calculated for de = 3. The value of dc ("0 ) for the data is shown on each plot as a dashed line. The two distributions are practically identical, despite the fact that the model used in panel (ii) had the highest correlation dimension estimate of the distribution of models in (i). The correlation dimension of the data is clearly similar to this probability distribution. Figure 5:Probability distribution for correlation dimension estimates for two mono-
tonic nonlinear transformations of linearly ltered noise. Shown are contour plots which represent the probability distribution of correlation dimension estimate for various values of "0 . Panel (i), (ii) and (iii) are probability density estimates of correlation dimension for linearly ltered noise with a monotonic nonlinear transformation given by g(x) = x3 , embedded in R3 , R4 , and
R , respectively. Panel (iv), (v) and (vi) show similar plots of probability density estimates for 5
realizations of the same linear process and same embedding dimensions. The monotonic nonlinear transformation in this case is g(x) = sign(x)jxj1=4 . Note that the shapes of these estimates are completely dierent | the horizontal scale is dierent on each plot. Figure 6:Free run prediction from a model with uniform embedding: The top plot shows a free run prediction of a model yt+ = f (yt ; yt? ; yt?2 ) where is the closest integer to 41 (quasi-period) of the data. The bottom two panels show an embedding (x1 ; x2 ; x3 ) = (yt ; yt? ; yt?2 ) of that free run prediction. The plot on the left shows that the free run prediction 28
is not periodic, the one on the right demonstrates that it does have a bounded 1 dimensional attractor. The problem with this model is that the quasi-period of the model and 4 do not agree precisely.
29
Figures log(occupancy)
Distribution of interpoint distances 15 10 5 0
−6
−5
−4 distance Derivative
−3
−2
−1
−6
−5
−4 distance
−3
−2
−1
4 2 0
Figure 1:
30
Abdominal movement 2 0 −2 0
200
400
600
800
1000
1200
1400
1600
Figure 2: (ii) 4
3.8
3.8
3.6
3.6 correlation dimension
correlation dimension
(i) 4
3.4
3.2
3
3.4
3.2
3
2.8
2.8
2.6
2.6
2.4
2.4 −2.5
−2 −log(epsilon)
−1.5
−2.5
Figure 3: 31
−2 −log(epsilon)
−1.5
(ii) 3.2
3
3
2.8
2.8 correlation dimension
correlation dimension
(i) 3.2
2.6
2.4
2.2
2.6
2.4
2.2
2
2
1.8
1.8
1.6
1.6 −3
−2.5 −2 −log(epsilon)
−1.5
−3
Figure 4:
32
−2.5 −2 −log(epsilon)
−1.5
(i)
(ii) 2.6
2.4
1.8
2.4
2.2
1.6
2.2
1.4
2
dc
dc
dc
(iii)
1.8
1.8 1.2
1.6
1.6 1 −6
−5
−4
2
−5
−4.5
(iv)
1.4
−4
(v)
−5 −4.5 −4 −3.5 (vi)
2.2 3
2.5
2 dc
2
dc
dc
2.5 1.8
2 1.5
1.6
1.5 −2.5 −2 −1.5 −1 −log(epsilon)
−2
−1.5 −1 −0.5 −log(epsilon)
Figure 5:
33
−1.2 −1 −0.8 −log(epsilon)
3 2 1 0 −1 50
100
150
200
250
3 2 1 0 −1 −1
x3
x3
−2 0
0
1
2 x1
3
1 −1 0 x2
2
300
0
1
2 x1
Figure 6:
34
400
450
500
3 2 1 0 −1 −1
3
350
3
1 −1 0 x2
2
3