applying the method of small–shuffle surrogate data - Semantic Scholar

1 downloads 0 Views 3MB Size Report
embedding parameters unlike by Cellucci et al. [2003]. Although we will present ...... Paul E. Rapp (Drexel University) for providing us the EEG data. The work ...
January 13, 2007

16:19

01699

International Journal of Bifurcation and Chaos, Vol. 16, No. 12 (2006) 3581–3603 c World Scientific Publishing Company 

APPLYING THE METHOD OF SMALL–SHUFFLE SURROGATE DATA: TESTING FOR DYNAMICS IN FLUCTUATING DATA WITH TRENDS TOMOMICHI NAKAMURA∗ and MICHAEL SMALL† Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong ∗[email protected][email protected] Received June 27, 2005; Revised January 6, 2006 Recently, a new surrogate method, the Small–Shuffle (SS) surrogate method, has been proposed to investigate whether there is some kind of dynamics in irregular fluctuations, even if they are modulated by long term trends or periodicities. This situation is theoretically incompatible with the assumption underlying previously proposed surrogate methods. We apply the SS surrogate method to a variety of simulated data with known dynamics and actual time series with unknown dynamics. Keywords: Surrogate method; Small–Shuffle surrogates; trends and periodicities.

1. Introduction There are many occasions to encounter irregular time series that look random. Often we want to know whether there is some kind of dynamics behind the data or not. For investigating such features of some data, a method of surrogate data exists [Theiler et al., 1992; Schreiber & Schmitz, 1996], and the method has become a central tool for validating the results of dynamical analysis and is widely applied as a form of hypothesis testing [Theiler & Rapp, 1996]. The result is a useful indication to understand data. For example, if it was found that irregular fluctuations are not independent identically distributed (IID) random variables, this implies that they are due to some kind of dynamical structure. We then expect that it might

be possible to build models (or model systems) from the time series. Clearly, such models are of immense value for both understanding and predicting the time series. Hence, the result obtained by the method of surrogate data becomes the first step to understand the given data and the underlying system. Recently Nakamura and Small proposed an improved algorithm, the Small–Shuffle (SS) surrogate method, to investigate whether there is some kind of dynamics in irregular fluctuations, even if they are modulated by long term trends1 [Nakamura & Small, 2005]. In this paper, we apply the SS algorithm to several actual data. The results and analysis of these results are the major novel contributions of this paper.

1

We use the term “trend” to describe long-term slow variation in the magnitude of time series data. Although the concept of “trend” is often used interchangeably with nonstationary, we are not concerned with the problem of whether data are stationary or nonstationary, largely because the definition of nonstationary strongly depends on implicit assumption about the extent of deterministic dynamics. 3581

January 13, 2007

3582

16:19

01699

T. Nakamura & M. Small

In Sec. 1.1, we introduce some data we can see in the real world. In Sec. 1.2, we briefly review surrogate methods that have already been proposed. In Sec. 2 the SS algorithm and the hypothesis will be discussed. In Sec. 3 we present numerical examples of the application of the SS algorithm and in Sec. 4, based on the result of these studies, we apply the SS algorithm to actual data.

1.1. Diverse behaviors of data in the real world We often see irregular fluctuations like that shown in Fig. 1(a). To investigate some features of such irregular fluctuations, three surrogate algorithms have been proposed by Theiler et al. [1992]. In an attempt to address some of the problems of these techniques, an improved surrogate method has also been proposed by Shreiber and Schmitz [1999]. All these techniques are linear surrogate methods, because they are based on a linear process and address a linear null hypothesis.2 We will give more details on linear surrogate methods in the next section. Also, we often observe data with obvious periodicity3 [see Fig. 1(b)]. Time series exhibiting strong periodicities are clearly not consistent with the hypothesis of linear noise.4 That is, linear surrogate methods are not useful. To tackle this case, some algorithms and null hypotheses have been proposed [Theiler, 1995; Small et al., 2001; Luo et al., 2005]. Furthermore, there are more complicated time series in the real world. For such time series irregular fluctuations may be modulated by long term trends or periodicities [see Figs. 1(c) and 1(d)]. The assumptions or null hypothesises of the above mentioned algorithms are theoretically incompatible with such cases [Theiler et al., 1992; Theiler,

1995; Schreiber & Schmitz, 1999; Small et al., 2001; Luo et al., 2005]. Until lately, there was no method to tackle such situations. Recently Nakamura and Small proposed an improved algorithm, the SS surrogate method, to overcome such difficult situations [Nakamura & Small, 2005]. We will describe the details of the method in Sec. 2.

1.2. Linear surrogate methods: The current technology We briefly review the methodology of surrogate data and previously proposed linear surrogate methods. The essential feature of the methodology of surrogate data is that one must have some algorithms that generate surrogates that preserve certain properties of the data and destroy others, and is also consistent with a specified null hypothesis. It is also important that the surrogate data are sufficiently similar to the original. One then applies some discriminating statistics to both the original and surrogates data. Generically stated, the procedure can be reduced to four steps [Rapp et al., 2001]. 1. A discriminating statistic is applied to the original data. 2. Artificial surrogate data that are both “like” the original data and consistent with some null hypothesis are constructed using the original data. 3. The discriminating statistic which was applied to the original data is applied to the surrogates. 4. The discriminating statistic value for the original data is compared with the ensemble of values estimated for the surrogates. If there is sufficient difference between them, the surrogate null hypothesis is rejected. In this case,

2 The term “linear surrogate” has been widely used by, for example Schreiber and Schmitz [1997, 2000]. It also has been widely used that the null hypothesis of the methods proposed by Theiler et al. and Schreiber et al. is a linear null hypothesis and a linear stochastic process, for example [Bhattacharya, 2001] and [Dolan & Spano, 2001]. We follow this convention. 3 By-and-large our use of various terminology in this paper is intended to convey the commonly accepted meanings of the terms. Any incompatibilities with specific nomenclature is unintended. “Periodicity” means approximately (pseudo-) periodic fluctuations in time series data, “perturbation” means noise or other external influence, “modulated” means regular change in system parameters, “determinism” means that things are decided by fixed rules, and “time-scale (small and large)” means the period of time that it takes for something to happen or be completed. 4 It might be possible to describe periodic behavior by a special case of linear autoregressive moving average (ARMA) process. However, time series exhibiting such regular persistent fluctuations are inconsistent with linear noise. We use the term strong periodicities in this general sense. As far as annual sunspot numbers used as one of the periodic data example are concerned, they are not linear, because annual sunspot numbers increase more rapidly than they decrease. This is in itself an indication of the nonlinearity of this time series. This feature cannot be extracted by linear models. Hence, we consider that linear model is not the best model to describe the annual sunspot numbers [Judd & Mees, 1995; Nakamura et al., 2003].

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data NMR laser data

200

2000

3583

Annual sunspot numbers from 1700 to 2004

150

0

100

−2000

50

−4000

0 0

50

100

150

200

0

Iteration

100

200

300

Years

(a)

(b)

Daily highest temperature (˚C) in Tokyo

EEG data (eyes closed) 40

40

20

30

0 20

−20 −40

10

−60

0 0

2000

4000

6000

8000

10000

Iteration (c)

0

500

1000

1500

2000

2500

Days (d)

Fig. 1. Four time series: (a) nuclear magnetic resonance (NMR) laser data, (b) annual sunspot numbers, (c) healthy human electroencephalography (EEG) data, and (d) daily highest temperature in Tokyo.

we consider that the original and the surrogate data would not come from the same population. If there is no significant difference, one may not reject the null hypothesis. In this case, we consider that the original and the surrogate data may come from the same population. For investigating some features of irregular fluctuations, three algorithms have been proposed by Theiler et al. [1992]. The null hypotheses addressed are: (0) IID random variables, tested by the random shuffle (RS) algorithm; (1) linearly filtered IID noise, tested by the Fourier transform (FT) algorithm; and (2) a static monotonic nonlinear transformation of linearly filtered noise, tested by the amplitude adjusted Fourier transform (AAFT) algorithm. These three algorithms are also known as algorithms 0, 1, and 2. Algorithm 0, the RS surrogate method, generates surrogates by shuffling the data in the original time series. In this way the surrogates have the same probability distribution as the data, but there is no temporal correlation and therefore the surrogates provide a test of the hypothesis of

IID random variables. Algorithm 1, the FT surrogate method, generates surrogates by shuffling the phases of the Fourier transform of the original data. Hence, although linear correlation (the power spectrum) is preserved, any additional (nonlinear) structure is destroyed. However, in particular, the probability distribution of the original data is not preserved in the surrogate data. To avoid any statistical bias, it is preferable for surrogate data to have the same probability distribution as the original data. Algorithm 2, the AAFT surrogate method, generates surrogates to preserve both the probability distribution and the power spectrum of the data. Algorithm 2 is the more general method than algorithm 1. However, although the surrogate data generated by algorithm 2 have the same probability distribution as the original data, the data do not have exactly the same power spectrum as the original data unlike the FT surrogate data. Hence, an improved surrogate method, iterative AAFT (IAAFT) algorithm, has been proposed by Schreiber and Schmitz [1999]. They proposed a method which iteratively corrects deviations in

January 13, 2007

3584

16:19

01699

T. Nakamura & M. Small

spectrum and distribution from the goal set by the original data. In an alternating method, the surrogate is filtered towards the more correct Fourier amplitudes and rank-ordered to the correct distribution. See more details concerning linear surrogate methods and the relevant problems elsewhere [Theiler & Rapp, 1996; Schreiber & Schmitz, 2000; Galka, 2000; Kugiumtzis, 2001; Small, 2005]. It should be noted that these methods are useful when data is stationary and without any long term trend or periodicity.

2. The Small–Shuffle Surrogate Method In this section we will introduce the new surrogate generation algorithm, the Small–Shuffle (SS) surrogate method. The method can investigate whether there are dynamics in irregular fluctuations (short term variability), even if the fluctuations are modulated by trends or periodicities. In Sec. 2.1 we describe the SS algorithm and the hypothesis, and in Sec. 2.2 we present our choice of discriminating statistic.

2.1. The algorithm and the null hypothesis As stated above, several surrogate methods have already been proposed. To investigate whether the data can be fully described by IID random variables or not, the RS surrogate method is useful. The method is effective for time series like shown in Fig. 1(a). However, the algorithm is ineffective for data exhibiting slow trends or periodicities like shown in Figs. 1(c) and 1(d). These cases are theoretically incompatible with the assumption of the RS surrogate method as well as other linear surrogate tests [Theiler et al., 1992; Schreiber & Schmitz, 1996]. Recently, Nakamura and Small proposed the SS surrogate method which can determine whether irregular fluctuations are random or not, even if the fluctuations are modulated by trends or periodicities [Nakamura & Small, 2005]. The basic premise of the technique is that if irregular fluctuations are not random, there is some kind of underlying dynamical system: modulated by whatever trending or periodicity is contaminating the data. In other 5

words, when data have some kind of dynamics, it is accompanied by correlations at small time scales. In such a case, the index (order) of the data itself has important implications irrespective of whether the time series are linear or nonlinear. Hence, whenever the index of the data changes, the flow of information also changes and the resultant time series no longer reflects the original dynamics. We focus our attention on this point and propose a new surrogate method using this idea. The purpose of the method is to distinguish between irregular fluctuations with or without dynamics. This is similar to the RS surrogate method proposed by Theiler et al. [1992]. However, in RS surrogate data, any structure (local and global) is destroyed. Hence, the null hypothesis which the RS algorithm can test is that data are “independent and identically distributed” (IID) random variables. However, we note that the SS surrogate method includes the possibility of trending data in the null hypothesis, because the SS surrogate method can shuffle data on a small scale. In other words, in SS surrogate data, although local structures are destroyed, long term behaviors are preserved. Hence, the null hypothesis which the SS algorithm can test is that irregular fluctuations (short term variability) are “independently distributed” (ID) (temporally uncorrelated) random variables (in other words, there is no short term dynamics or determinism in irregular fluctuations). That is, the major difference between the RS and SS surrogate methods is that the SS surrogate method removes the requirement for “identically distributed” random variates.5 To investigate irregular fluctuations (especially when they are modulated by long term trends or periodicities), we wish to destroy local structures or correlations in irregular fluctuations (short term variability) and preserve the global behaviors (trends or periodicities). To generate surrogate data that can fulfill such conflicting conditions, we shuffle the index of a given data on a “small” (local) scale: this is in contrast to the RS surrogate method, where the index is shuffled on a “large” (global) scale and any structure of the original data is completely destroyed. We generate surrogate data as follows; Let the original data be x(t), let i(t) be the index of x(t) (that is, i(t) = t, and so x(i(t)) = x(t)), let g(t)

To show more clearly similarity and distinction between the RS and SS surrogate methods, we show results of applying the RS algorithm to real world data in the appendix.

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

be Gaussian random numbers and s(t) will be the surrogate data. (i) Obtain i (t) = i(t) + Ag(t),

(1)

where A is an amplitude (we add Gaussian random numbers to the index of the original data). Note that, the index i(t) will be a sequence of integers whereas the perturbed sequences i (t) will not. (ii) Sort i (t) by the rank-order and let the index (order) of i (t) be ˆi(t) (rank-order6 the perturbed index, thereby generating a slightly perturbed index of the original data). (iii) Obtain the surrogate data s(t) = x(ˆi(t)) (reorder the original data with the perturbed index. A simple example is given in Table 1). When the amplitude A in Eq. (1) is selected appropriately, the data is shuffled only on a small scale. Hence, we call the method the Small–Shuffle (SS) surrogate method. The SS surrogate data have the same probability distribution as the original data. As is obvious, the SS surrogate data are influenced primarily by the amplitude A. Broadly speaking, if A is too small, the data are shuffled very little or not at all, and then the SS surrogate data are almost identical to the original data. Conversely, if A is too large, the data are shuffled a lot, and the SS surrogate data are almost random like the RS surrogate data. That is, in other words, smaller values are better for preserving any structure and correlation in the original data, however, they are not effective at destroying the local structures. Larger values are better at destroying any structure and correlation of the original data, however, they are not effective at preserving the long term behaviors (trending). That is, for data with trends, large values are not appropriate, because the global behaviors of the original data are lost 6

3585

and the influence of contaminated trends or periodicities may be larger than that of irregular fluctuations. Hence, the smaller the value of A the better, provided the value can destroy local structures and preserve the long term behaviors. The influence of A has been investigated and it is found that A = 1.0 is appropriate and more than adequate for nearly all purposes [Nakamura & Small, 2005]. Also, it is found that the SS surrogate data are very similar to the original data [Nakamura & Small, 2005]. We show the influence of the amplitude A in Fig. 2. Panel (a) shows that as A increases, the number of data points which do not move decreases and the ratio of the maximum move distance (MMD) increases, where the MMD is the maximum distance among all distances from the position of a data point in the original data to the shifted position in the SS surrogate data. To show the influence of the amplitude visually, we directly compare the original data and the SS surrogate data at different amplitude A, where the values of A are 0.25, 0.5, 1.0, 2.0, 5.0 and 10.0. Panel (b) shows that until A is about 2.0, the behavior of s(t) is almost the same as the original data (A = 0), as A increases, the behavior of s(t) becomes more stochastic. We note that although we expect A = 1.0 is appropriate in most cases from our experience, the value of A probably depends on features of data, and smaller or larger values may be justified in some situations.7

2.2. The discriminating statistics A dynamical measure is often used as the discriminating statistic for surrogate tests. The correlation sum [Kantz & Schreiber, 1997] or a Lyapunov exponent [Abarbanel, 1990] are popular choices. The correlation sum is a measure of the self-similarity, and the Lyapunov exponent is a measure of the orbital instability, both measures are estimated on

By rank-order we mean√the sequence in which the values of different relative magnitude occur. For example, the rank-order of the sequence {π, 0, e, 2} is {4, 1, 3, 2}. That is, rank-order is the index of the magnitude of the data. 7 The SS surrogate method use only data index, although linear surrogate methods need linear autoregressive (AR) or autoregressive moving average models (ARMA) models and power spectrum estimate [Theiler et al., 1992; Theiler & Prichard, 1997]. It has been reported that there are many difficulties and problems to build models [Judd & Mees, 1998]. Furthermore, building linear AR or ARMA models and nonlinear models for data exhibiting irregular fluctuations and trends is still a large problem and it is in infancy now. Also, it is observed that wrap around effects of the Fourier transform may lead to spurious high frequency content in the surrogates [Theiler & Rapp, 1996]. In the SS algorithm, we do not need them and it does not depend on features of data because we use only data index to generate the surrogate. Hence, we believe that the SS algorithm is very powerful, because we can avoid these drawbacks. However, there may well be significant practical generalizations of the technique we present in this paper. This will be the subject of future research.

January 13, 2007

16:19

01699

T. Nakamura & M. Small

3586

Table 1.

The simple example of the SS surrogate method

Iteration t

Index i(t)

Data x(t)

Perturbed Index i (t)

Sorted i (t) by the Rank-Order

Index of Sorted i (t) ˆi(t)

1 2 3 4 5

1 2 3 4 5

13 12 14 11 15

0.13 −1.33 3.25 4.58 2.71

−1.33 0.13 2.71 3.25 4.58

2 1 5 3 4

SS Surrogate Data s(t)(= x(ˆi(t))) 12 13 15 14 11

(= x(2)) (= x(1)) (= x(5)) (= x(3)) (= x(4))

the ratio of the same data points the ratio of max move distance 0

−1

Ratio

10

Small shuffle surrogate data s(t)

10

10

10

original data (A=0)

−2

A=0.25 A=0.5 A=1.0 A=2.0 A=5.0 A=10.0

−3

10

−1

10

0

10

1

10

2

10

3

Amplitude A (a)

0

20

40

60

80

100

Index of data (b)

Fig. 2. Relationship of the amplitude A of Gaussian random numbers and the index. Panel (a) illustrates (as a function of shift amplitude A): the proportion of points that are unperturbed by the SS algorithm (•); and, the maximum distance that any point in the original data is perturbed in the surrogate (, expressed as a fraction of the data length). Panel (b) illustrates the effect of different values of A. The original data is generated by x(t) = t, 1 ≤ t ≤ 100. If the SS surrogate and original data are identical, then the curve should be a straight line (as in the case of A = 0). If the SS surrogate data is equivalent to an ordinary RS surrogate data set, then the curve should be IID.

the reconstructed attractor. There is a strong link between chaotic (nonlinear) dynamical systems and these measures. Hence, these are often used for surrogate tests, because it is expected that there should be a difference between the original data and the surrogate data if the original data is nonlinear. Furthermore, it is preferable to use discriminating 8

statistics which can reflect features of the surrogate method. The SS surrogate method changes the flow of information in data. Hence, we choose to use the auto-correlation function (AC) and the average mutual information (AMI) as discriminating statistics.8 The AC; an estimate of linear correlation in

We use the AC and AMI as a purely “mechanical” statistic for discrimination purpose. Our purpose is not to find good embedding parameters unlike by Cellucci et al. [2003]. Although we will present the details later, in Sec. 3.3, when we use the Logistic map data, the AC does not work well, although the Logistic map has clear dynamics. This indicates that only one statistic (AC or AMI) is not enough for some cases, and this is our major reason why we adopt two discriminating statistics, the AC and AMI. We note that when data are essentially random, both the AC and AMI of the data must fall within the distribution of the SS surrogate data theoretically, and actually they do.

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

the data; and AMI; a general nonlinear version of AC on a time series; can answer the question: on average how much does one learn about the future from the past [Abarbanel, 1990]. After calculation of these statistics, we need to inspect whether a null hypothesis shall be rejected or not. We employ Monte Carlo hypothesis testing and check whether estimated statistics of the original data fall within or outside the statistics distribution of the surrogate data [Theiler & Prichard, 1996, 1997]. When the statistics fall within the distribution of the surrogate data, the surrogate null hypothesis may not be rejected. We then consider that the original and the surrogate data may come from the same population. We generate 39 SS surrogate data and hence the significance level is between 0.025 and 0.049 for a one-sided test with two nonindependent statistics.

by data length and intervals. Otherwise, we use the term “data with trends”. In all cases, we use A = 1.0 for generating SS surrogate data as we described in the previous section. We generate 39 SSS data, the number of data points is 5000 and the data we use is both noise free and subsequently contaminated by 20 dB (10%) Gaussian observational noise.

3.1. Data with no trend The first application is to time series with no trend. As IID random variables (that is, there is no dynamics) we use Gaussian random numbers. As dynamics, we use a linear auto-regressive (AR) model, the Ikeda map and the Logistic map. Also, we use simulated 1/f noise.9 The linear AR model is given by xt = a1 xt−1 + a6 xt−6 + ηt ,

3. Numerical Examples We now demonstrate the application of our algorithm to various simulated time series data, and confirm our theoretical arguments with several examples. We use two types of time series, data with no trend (which can be adequately addressed with the standard surrogate methods) likes Fig. 3(a) and data with trends (which are not consistent with existing surrogate techniques) like in Fig. 3(b). Here we note that broadly speaking, we use the term “data with no trend”, when calculated values of mean and standard deviations are not influenced

4

Gaussian random numbers

xt = axt−1 (1.0 − xt−1 ),

(4)

where we use a = 4.0 [May, 1976].

Gaussian random numbers with Rössler data

5

0

0

−2 −4

(2)

where we use a1 = 0.3, a6 = 0.2 and ηt is Gaussian dynamical noise with standard deviation 1.0 [Small & Judd, 1999]. The Ikeda map is given by   1 + µ (x cos θ − y sin θ) f (x, y) = , (3) µ (x sin θ + y cos θ)   where θ = a − b/ 1 + x2 + y 2 with µ = 0.83, a = 0.4 and b = 6.0 [Ikeda, 1973]. The Logistic map is given by

10

2

3587

−5 0

500

1000

Iteration (a)

1500

2000

0

500

1000

1500

2000

Iteration (b)

Fig. 3. (a) Gaussian random numbers as a data with no trend, and (b) Gaussian random numbers with x component of the R¨ ossler equations as a data with trends. 9

We use 1/f noise as a different dynamics. In the linear AR model, the Ikeda map and Logistic map, an output of the system is used as the input to the system during the next iteration. However, in the system for generating 1/f noise, an output is not used as the input. This is a major difference between this system and the former three systems.

January 13, 2007

3588

16:19

01699

T. Nakamura & M. Small

To generate 1/f noise, a number of different mechanisms have been proposed for the emergence of 1/f , for example intermittency [Manneville, 1980] and self-organized criticality [Bak et al., 1987]. Some of these proposed mechanisms clearly underlie the 1/f behavior of certain physical systems. However, the origin of 1/f noise in many systems still remains unknown. Hence, as more general mechanisms, we adopt an idea proposed by Hausdorff and

Peng [1996]. This is a model whose output is the summation of multiple random inputs (that is, different regulatory mechanisms). The output xt of the model at any time step m t is the sum of m random inputs ut (i): xt = i=1 Ai ut (i). Each input u(i) takes on a Gaussian distributed random value (the current state of this input) and is amplified by a constant Ai which represents the relative effect of each input on the output. At each time step, the

Gaussian random numbers

0.04

0.72

0.02

0.7

AMI

AC

Gaussian random numbers

0

0.68 0.66

−0.02

0.64

−0.04 0

5

10

15

0.62 0

20

5

10

15

20

Time lag

Time lag (a)

(b)

Linear AR model

0.74

Linear AR model 0.74

0.3

0.72

0.72

0.7

0.68

0.7

AC

AMI

0.2 0.1

0.66 0.64

0.68

1

2

3

0.66 0 0

5

10

15

0.64 0

20

5

(c)

0.4

2.5

20

15

20

Ikeda map

2.0

AMI

0.2

AC

15

(d)

Ikeda map

0 −0.2 −0.4 0

10

Time lag

Time lag

1.5 1.0

5

10

Time lag (e)

15

20

0.5 0

5

10

Time lag (f)

Fig. 4. A plot of the AC and AMI: (a) and (b) Gaussian random numbers, (c) and (d) a linear AR model, (e) and (f) the Ikeda map, (g) and (h) the Logistic map, and (i) and (j) 1/f noise, where we use 5000 data points, A = 1.0 and 39 SS surrogate data. The solid line is the original data and dotted lines are the SS surrogate data. A smaller window inside the panels (d), (i) and (j) is an enlargement.

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

0.06

Logistic map

5.0

0.04

AMI

AC

Logistic map

4.0

0.02 0

3.0

−0.02

2.0

−0.04

1.0

−0.06 0

5

10

15

0.0 0

20

5

10

15

20

Time lag

Time lag (g)

(h)

1/f noise

1/f noise

0.8

1.3

0.85

1.3

0.80

0.7

1.2

1.2

1.1

1.1

0.75 0.70

0.6

0.65

1

2

3

4

5

AMI

AC

3589

1.0

1.0

1

2

3

4

5

0.9

0.5

0.8 0.4 0

5

10

15

20

0

(i)

10

15

20

(j) Fig. 4.

state of u(i) changes with probability 1/τi , where τi is the time constant for input u(i). The process with m time scales, with each input following the same scaling relation is defined by RT and RA , that is, τ2 /τ1 = τ3 /τ2 · · · = τn /τn−1 = RT and A2 /A1 = A3 /A2 = · · · = An /An−1 = RA . We use m = 20, RT = 2.0, RA = 1.0 and An = τn = 1.0. For more details see [Hausdorff & Peng, 1996]. In all cases, we use xt as the observational data. Figure 4 shows the results for these data. The result shows that when there is no dynamics (that is, data are Gaussian random numbers, see Figs. 4(a) and 4(b)), the AC and AMI do not show any significant difference, and the AC and AMI falls within the distributions of the SS surrogate data. According to the criterion mentioned previously, we cannot reject the hypothesis. That is, we consider that the data are ID random variables and there is no dynamics in the data. However, the AC or AMI 10

5

Time lag

Time lag

(Continued)

or both fall outside the distributions of the SS surrogate data and are distinct in other cases.10 Hence, we reject the hypothesis. We then consider that the data are not ID random variables and have some kind of dynamics. It should be noted that some differences clearly appear when the time lag is relative small, because the information in the systems is not retained for longer periods of time. When the data is contaminated by 20 dB observational noise, the results obtained are essentially the same. Also, since these data have no trend, as an extra investigation, we apply the SS surrogate method using larger values of A (that is, A is 2.0, 5.0 and 10.0). The results obtained are also essentially the same.

3.2. Data with trends In the previous section, we found that the SS surrogate algorithm can detect whether there is

We note the case of the linear AR model. Figure 4(c) shows that the AC indicates significant difference, however, Fig. 4(d) shows that the AMI does not, although the AMI shows small difference when the time-lag is around 1. When we use more data points the AMI can show the difference more clearly. This is a specific feature of linear AR models, because linear AR models are perturbed by dynamical noise, which are Gaussian random numbers. Hence, although linear AR models have deterministic dynamics, it seems to be not as strong as nonlinear dynamics.

January 13, 2007

3590

16:19

01699

T. Nakamura & M. Small

dynamics or not irrespective of whether the time series are linear or nonlinear. In this section, we consider data with trends.11 We combine the Gaussian random numbers, a linear AR model, the Ikeda map, the Logistic map and 1/f data sets, which are the same as the models used in the previous section, and the R¨ ossler system, where these dynamics are independent. That is, irregular fluctuations generated by these models are modulated by the R¨ ossler system. The R¨ossler equations are given by dx = −(y + z), dt dy = x + ay, dt dz = b + z(x − c), dt

0.78

where a = 0.3909, b = 2.0, c = 4.0, when calculated using the fourth order Runge–Kutta method with sampling interval 0.02. The equations when using these parameters exhibit period 6 behavior [Small et al., 2001]. We use the x component of the equations and the level of additional data (irregular fluctuations) to the R¨ ossler data is equivalent to 5 dB (56.2%) observational noise for each case. See the behavior in Fig. 3(b). Figure 5 shows the results for these data. The result again shows that when there is no dynamics in irregular fluctuations (that is, the data are Gaussian random numbers), the AC and AMI do not show any significant difference and both the AC and AMI fall within the distributions of the SS surrogate data. Hence, as we cannot reject the hypothesis, we consider that the irregular fluctuations are ID random variables and have no dynamics. However, the

Gaussian random numbers and Rössler data

Gaussian random numbers and Rössler data 1.15

0.74

AMI

AC

0.76

0.72

1.1 1.05

0.7 0.68 0

5

10

15

1 0

20

5

(a)

0.85

10

15

20

15

20

Time lag

Time lag

(b)

Linear AR model and Rössler data

Linear AR model and Rössler data 1.3

AMI

AC

0.8 0.75

1.1

0.7 0.65 0

1.2

5

10

Time lag (c)

15

20

1 0

5

10

Time lag (d)

Fig. 5. A plot of the AC and AMI: (a) and (b) Gaussian random number, (c) and (d) linear AR model, (e) and (f) Ikeda map, (g) and (h) the Logistic map, and (i) and (j) 1/f noise, with x component of the R¨ ossler equations, where we use 5000 data points, A = 1.0 and 39 SS surrogate data. The solid line is the original data and dotted lines are the SS surrogate data. A smaller window inside the panels (i) and (j) is an enlargement. 11

When irregular fluctuations are modulated by long trends or periodicities, even if the dynamics of long trends or periodicities do not influence dynamics of irregular fluctuations (that is, they are independent), such time series could be regarded as nonstationary from the viewpoint of dynamics of irregular fluctuations.

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

0.85

Ikeda map and Rössler data

Ikeda map and Rössler data 1.5 1.4

AMI

AC

0.8 0.75 0.7

1.3 1.2 1.1

0.65 0

5

10

15

1 0

20

5

(e)

15

20

15

20

(f)

Logistic map and Rössler data

1.8

0.76

Logistic map and Rössler data

1.6

0.74

AMI

AC

10

Time lag

Time lag

0.78

3591

0.72

1.4 1.2

0.7 0.68 0

1 5

10

15

0

20

5

(g)

(h)

1/f noise and Rössler data 0.95

2.0

1/f noise and Rössler data

0.96

2.0

AC

0.90

1.8 1.7

0.92

1

2

3

4

5

0.85

AMI

0.90

1.9

1.8

0.94

1.6

1.6 1.5

1

2

3

4

5

1.4

0.80 0

10

Time lag

Time lag

1.2 5

10

15

20

0

5

10

15

20

Time lag

Time lag (i)

(j) Fig. 5.

AC or AMI or both are distinct and fall outside the distributions of the SS surrogate data when there is dynamics. Hence, as we can reject the hypothesis, we consider that the irregular fluctuations are not ID random variables and have some kind of dynamics. We note that in all case, behaviors of the AC and AMI of the SS surrogate data are very similar to that of the original data, especially when the time lag is larger. This indicates that the local structures are destroyed and the global structures are preserved in the SS surrogate data. When the

(Continued)

data is contaminated by 20 dB observational noise, the results are essentially the same.

3.3. Some observations Figures 4 and 5 show that when irregular fluctuations are Gaussian random numbers (that is, there are no dynamics in the data), both the AC and AMI do not show difference between the original data and the SS surrogate data, but when there are dynamics in the data, the AC or AMI or both show some difference.

January 13, 2007

3592

16:19

01699

T. Nakamura & M. Small

We note two results, the Logistic map and 1/f noise in both cases of data with and without trends. It is well known that the Logistic map is a nonlinear system and the randomness is equivalent to that of IID random variables. That is, in other words, it can be considered that the Logistic map is linearly equivalent to IID random variables. In panel (g) in Figs. 4 and 5 the AC does not show difference between the original data and the SS surrogate data sets. However, since the Logistic map is a nonlinear system, in panel (h) in Figs. 4 and 5, the AMI shows significant difference between them. These results are consistent with the known features of the Logistic map. The components of the 1/f noise are independent Gaussian random numbers, which are IID random variables, and the state changes with probability 1/τi , where the probability is constant. From the result shown in Figs. 4(i) and 4(j) we reject the null hypothesis of the SS surrogate method, where the null hypothesis is that irregular fluctuations are independently distributed (ID) random variables. That is, we consider that the data are not ID random variables. The result seems to reflect that the system has a deterministic structure regarding the state change. When we obtain a result that the data are not ID random variables, we often expect that it is possible to build predictive models (for example, pseudo-linear models [Judd & Mees, 1995]). However, to generate 1/f noise, the output is not used as the input. This is different from the linear AR model, the Ikeda map and the Logistic map. That is, this result implies that even if data are not ID random variables, it does not always mean that it is possible to build predictive models that can reflect the system. In a subsequent paper we shall conduct an investigation of predictability of 1/f time series. Based on the result of these computational studies, we apply the SS surrogate method to actual data in the next section.

4. Applications In this section we apply the SS surrogate algorithm to several actual systems. For each time series that exhibits irregular fluctuation, we apply this algorithm to test the null hypothesis that the irregular fluctuations are independently distributed (ID) random variables (in other words, there are no dynamics in irregular fluctuations). In all cases, we use A = 1.0 and generate 39 SS surrogate data.

There are five actual data, cobalt data, nuclear magnetic resonance (NMR) laser data, electroencephalography (EEG) data, daily temperature data in Tokyo and global average temperature. The first two actual data, cobalt and NMR laser data, seemingly have no trend and periodicity, and they have been investigated. The other three data, EEG data, daily temperature data in Tokyo and global average temperature, seem to exhibit trends and periodicities, and they have not been investigated yet.

4.1. Cobalt data We use time intervals of γ-ray emissions of cobalt, which has been recognized as IID random variables [Ikeguchi & Aihara, 1997]. Figure 6 shows segments of the cobalt data and one of the SS surrogate data. There is not much difference in behavior between these data, although slight difference can be seen in Figs. 6(b) and 6(d). Figure 7 shows the result of applying the SS surrogate method, where we use 10 000 data points. The figure shows that both the AC and AMI of the original data fall within the distributions of SS surrogate data. According to the criterion mentioned previously, we cannot reject the hypothesis. These behaviors are almost the same as that shown in Figs. 4(a) and 4(b), where the data are Gaussian random numbers. Hence, we consider that the cobalt data would be ID random variables, that is, there is no dynamics in the irregular fluctuations. This is in agreement with the previously obtained results.

4.2. NMR laser data This data set contains data from a stroboscopic cross-section of the output power of a NMR laser. The data are known to be nonlinear [Kantz & Schreiber, 1997]. Figure 8 shows segments of the NMR data and one of the SS surrogate data. We can see the slight difference in between them. Although Figs. 8(a) and 8(c) show that amplitude increases and decreases over time are preserved, Figs. 8(b) and 8(d) clearly show differences. Figure 9 shows the result of applying the SS surrogate method, where we use 10 000 data points. The figure shows that both the AC and AMI clearly show significant difference and both the AC and AMI of the original data fall outside the distributions of SS surrogate data. Hence, we consider that the NMR laser data would not be ID random variables and there is some kind of dynamics. This is

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

0.2

Cobalt data

3593

Cobalt data 0.1

0.1

0.05

0

0 0

500

1000

1500

2000

0

Iteration

20

40

(a)

0.2

60

80

100

80

100

Iteration (b)

SS surrogate data of Cobalt data

SS surrogate data of Cobalt data 0.1

0.1

0.05

0

0 0

500

1000

1500

2000

0

20

40

Iteration

60

Iteration

(c)

(d)

Fig. 6. Segments of (a) and (b) cobalt data, and (c) and (d) one of the SS surrogate data. Panels (b) and (d) are the enlargement of (a) and (b).

0.04

Cobalt data

Cobalt data 0.34

AMI

AC

0.02 0

−0.02 −0.04 0

0.32

0.3 5

10

15

20

Time lag (a)

0

5

10

15

20

Time lag (b)

Fig. 7. A plot of the AC and AMI for the cobalt data: (a) AC and (b) AMI, where we use 10 000 data points, A = 1.0 and 39 SS surrogate data. The solid line is the original data and dotted lines are the SS surrogate data.

in agreement with the previously obtained results [Kantz & Schreiber, 1997].

4.3. EEG data The EEG signal was recorded from a healthy human adult in a shield room. The EEG data we use are

obtained from CZ of the unipolor 10–20 Jasper registration scheme [Jasper, 1958] during the resting state, that is, eyes closed and resting. The data was digitized at 1024 Hz using a 12-bit digitizer. The EEG impedances were less than 5 KΩ. The data were amplified, Gain = 18 000, and amplifier frequency cut-off settings of 0.03 Hz and 200 Hz were used.

January 13, 2007

16:19

01699

T. Nakamura & M. Small

3594

NMR laser data

NMR laser data

2000

2000

0

0

−2000

−2000

−4000

−4000

0

100

200

300

400

500

0

20

40

60

Iteration

Iteration

(a)

(b)

SS surrogate data of NMR laser data

80

100

80

100

SS surrogate data of NMR laser data

2000

2000

0

0

−2000

−2000

−4000

−4000 0

100

200

300

400

500

0

20

40

60

Iteration

Iteration

(c)

(d)

Fig. 8. Segments of (a) and (b) NMR data, and (c) and (d) one of the SS surrogate data. Panels (b) and (d) are the enlargement of (a) and (b).

1.0

NMR laser data

NMR laser data 3

AMI

AC

0.5 0.0

1

−0.5 −1.0 0

2

5

10

15

20

Time lag (a)

0 0

5

10

15

20

Time lag (b)

Fig. 9. A plot of the AC and AMI for the NMR laser data: (a) AC and (b) AMI, where we use 10 000 data points, A = 1.0 and 39 SS surrogate data. The solid line is the original data and dotted lines are the SS surrogate data.

Figure 10 shows the EEG data and one of the SS surrogate data. The EEG data seems to have long term trends. Panels (a) and (c) show that the SS surrogate data also have trends as well as the original EEG data and there is not much difference in behavior between them, although panels (b) and (d) show some difference. These figures indicate

that the SS surrogate method generates data in which long term behaviors are preserved and local structures or correlations are destroyed. Figure 11 shows the result of applying the SS surrogate method, where we use 10 000 data points (that is, 10 sec). The figure shows that although the AC shows small difference and the AMI clearly show

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data EEG data

−30

40 20

3595

EEG data

−40

0 −20

−50

−40 −60 0

2000

4000

6000

8000

10000

−60

0

50

Iteration

100

150

200

150

200

Iteration

(a)

(b)

SS surrogate data of EEG data

−30

40 20

SS surrogate data of EEG data

−40

0 −20

−50

−40 −60 0

2000

4000

6000

8000

10000

−60

0

50

Iteration

100

Iteration

(c)

(d)

Fig. 10. A plot of (a) and (b) EEG data, and (c) and (d) one of the SS surrogate data. Panels (b) and (d) are the enlargement of (a) and (b).

EEG data

EEG data

1.00

4.0

1.00

4.0

0.98

AC

0.96

1

2

3

4

5

AMI

3.0 0.99

3.0 2.0

1

2

3

4

5

2.0 0.94 0

5

10

15

20

Time lag (a)

0

5

10

15

20

Time lag (b)

Fig. 11. A plot of the AC and AMI for the EEG data: (a) AC and (b) AMI. We use 10 000 data points, A = 1.0 and 39 SS surrogate data. The solid line is the original data and dotted lines are the SS surrogate data. A smaller window inside the panels (a) and (b) is the enlargement of itself.

significant difference, both the AC and AMI of the original data fall outside the distributions of SS surrogate data. Hence, we consider that the irregular fluctuations of EEG data would not be ID random variables and there is some kind of dynamics behind the irregular fluctuations. However, this result is not directly linked to the premise or hypothesis that

neurons act chaotically. Because measurable EEG data are considered macroscopic spatial ensemble mean of many neurons’ activity in the brain, which is similar to the case of 1/f in Secs. 3.1 and 3.2, as discussed in Sec. 3.3, we cannot conclude whether each neuron’s signal is IID random variable or deterministic.

January 13, 2007

3596

16:19

01699

T. Nakamura & M. Small

4.4. Daily temperature data in Tokyo The temperature data we use are the daily highest temperature in Tokyo from 1 January 1998 to 31 March 2005. Figure 12 shows the temperature data

and one of the SS surrogate data. The temperature data seems to have long term periodicities. The time series should have at least one long term periodicity, one year (365 days) periodicity. Panels (a) and (c) show that the SS surrogate data also have trends

Daily highest temperature (˚C)

Daily highest temperature (˚C) 40

40

30

30

20

20

10

10

0

0

500

1000

1500

2000

2500

0

(b)

30

30

20

20

10

10

1000

1500

2000

150

SS surrogate data of daily highest temperature (˚C) 40

500

100

(a)

SS surrogate data of daily highest temperature (˚C)

0

50

Days

40

0

0

Days

2500

0

0

50

100

Days

Days

(c)

(d)

150

Fig. 12. A plot of (a) and (b) temperature data, and (c) and (d) one of the SS surrogate data. Panels (b) and (d) are the enlargement of (a) and (b).

Daily highest temperature in Tokyo

1.9

Daily highest temperature in Tokyo

0.92

0.90

1.9

0.90

1.8

1.8

0.85

0.86 0.84

1

2

3

4

5

1.7

1.7 1.6 1

2

3

4

5

1.6

0.80 0.75 0

AMI

AC

0.88

1.5 5

10

Time lag (a)

15

20

0

5

10

15

20

Time lag (b)

Fig. 13. A plot of the AC and AMI for the daily highest temperature data in Tokyo: (a) AC and (b) AMI. We use 2647 data points, A = 1.0 and 39 SS surrogate data. The solid line is the original data and dotted lines are the SS surrogate data. A smaller window inside the panels (a) and (b) is an enlargement.

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

as well as the original temperature data and there is not much difference in behavior between them, although panels (b) and (d) show some difference. Figure 13 shows the result of applying the SS surrogate method, where the number of data points are 2647. The figure shows that the AC and AMI show small difference and both the AC and AMI of

1

3597

the original data fall outside the distributions of SS surrogate data. Hence, we consider that the irregular fluctuations of the temperature data would not be ID random variables and there is some kind of dynamics behind the irregular fluctuations. When we use the daily lowest temperature in Tokyo, the result obtained is essentially the same.

Monthly global average temperature

0.5

0.5

Monthly global average temperature

0

0 −0.5

−0.5

−1

−1 0

500

1000

1500

2000

0

20

40

Months (a)

1

60

80

100

Months (b)

SS surrogate data of monthly global average temperature

0.5

0.5

SS surrogate data of monthly global average temperature

0

0 −0.5

−0.5

−1

−1 0

500

1000

1500

0

2000

20

40

Months

60

80

100

Months

(c)

(d)

Fig. 14. A plot of (a) and (b) monthly global average temperature, and (c) and (d) one of the SS surrogate data. Panels (b) and (d) are the enlargement of (a) and (b).

Monthly global average temperature

Monthly global average temperature 2.1

2.1

0.85

0.80

2.0

0.75 1

2

3

4

5

AMI

AC

0.80

2.0

1.9

1.9

1

2

3

4

5

0.70 1.8 0.60 0

5

10

Time lag (a)

15

20

1.7 0

5

10

15

20

Time lag (b)

Fig. 15. A plot of the AC and AMI for the monthly global average temperature data: (a) AC and (b) AMI. We use 1794 data points, A = 1.0 and 39 SS surrogate data. The solid line is the original data and dotted lines are the SS surrogate data. A smaller window inside the panels (a) and (b) is an enlargement.

January 13, 2007

3598

16:19

01699

T. Nakamura & M. Small

4.5. Monthly global average temperature The temperature data we use are the monthly deviations from monthly global average air temperature from January 1856 to June 2005. Figure 14 shows the data and one of the SS surrogate data. We can see that this data have changed or increased rapidly after around 500 months (this corresponds around the year 1900). That is, this data show irregular fluctuations and trends (or somewhat nonstationary). Panels (a) and (c) show that the SS surrogate data also have trends as well as the original temperature data and there is not much difference in behavior between them, although panels (b) and (d) show some difference. Figure 15 shows the result of applying the SS surrogate method, where the number of data points are 1794. The figure shows that the AC shows small difference and the AC of the original data fall outside the distributions of SS surrogate data, although the AMI of the original data fall within the distributions of SS surrogate data. Hence, we consider that the irregular fluctuations of this data would not be ID random variables and the irregular fluctuations have some kind of dynamics.

5. Summary and Conclusion The algorithm we have described in this paper provides a robust method to test irregular fluctuations, even if they are modulated by long term trends or periodicities. We have shown that the SS algorithm is capable of testing against the null hypothesis of irregular fluctuations are ID (independently distributed) random variables (in other words, there are no dynamics in irregular fluctuations). We have demonstrated the application of this algorithm to a variety of numerical and actual systems using the AC and AMI as discriminating statistics. Significantly, we concluded that human EEG and global average temperature with trends are not consistent with ID random variables and the daily highest temperature in Tokyo with periodicities is also not consistent with ID random variables. These results suggest possible future treatment regimes (EEG) and improved prediction and understanding (temperature).

Acknowledgments The authors would like to acknowledge Professor Paul E. Rapp (Drexel University) for providing

us the EEG data. The work described in this paper was supported by a grant from the Research Grants Council of Hong Kong (Project No. PolyU 5216/04E).

References Abarbanel, H. D. I. [1996] Analysis of Observed Chaotic Data (Springer-Verlag, NY). Bak, P., Tang, C. & Wiesenfeld, K. [1987] “Selforganized criticality: An explanation of the 1/f noise,” Phys. Rev. Lett. 59, 381–384. Bhattacharya, J. [2001] “Detection of weak chaos in infant respiration,” IEEE Trans. Syst. Man Cybern. 31, 637–642. Cellucci, C. J., Albano, A. M. & Rapp, P. E. [2003] “Comparative study of embedding methods,” Phys. Rev. E 67, 066210. Dolan, K. T. & Spano, M. L. [2001] “Surrogate for nonlinear time series analysis,” Phys. Rev. E 64, 046128. Galka, A. [2000] Topics in Nonlinear Time Series Analysis — With Implications for EEG Analysis (World Scientific, Singapore) Hausdorff, J. M. & Peng, C. K. [1996] “Multiscaled randomness: A possible source of 1/f noise in biology,” Phys. Rev. E 54, 2154–2157. Ikeda, K. [1973] “Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system,” Opt. Commun. 30, 257–261. Ikeguchi, T. & Aihara, K. [1997] “Lyapunov spectrum analysis on random data,” Int. J. Bifurcation and Chaos 7, 1267–1282. Jasper, H. H. [1958] “The ten-twenty electrode system of the International Federation,” Electroenceph. Clin. Neurophysiol. 10, 371–375. Judd, K. & Mees, A. I. [1995] “On selecting models for nonlinear time series,” Physica D 82, 426–444. Judd, K. & Mees, A. I. [1998] “Embedding as modeling problem,” Physica D 120, 273–286. Kantz, H. & Schreiber, T. [1997] Nonlinear Time-Series Analysis (Cambridge University Press, Cambridge). Kugiumtzis, D. [2001] “On the reliability of the surrogate data test for nonlinearity in the analysis of noisy time series,” Int. J. Bifurcation and Chaos 11, 1881–1896. Luo, X., Nakamura, T. & Small, M. [2005] “Surrogate test to distinguish between chaotic and pseudoperiodic time series,” Phys. Rev. E 71, 026230. Manneville, P. [1980] “Intermittency, self-similarity, and 1/f noise in dissipative dynamical systems,” J. Physique (Paris) 41, 1235–1243. May, R. M. [1976] “Simple mathematical models with very complicated dynamics,” Nature 261, 459–467. Nakamura, T., Mees, A. I. & Judd, K. [2003] “Refinements to model selection for nonlinear time series,” Int. J. Bifurcation and Chaos 13, 1263–1274.

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

Nakamura, T. & Small, M. [2005] “Small–Shuffle surrogate data: Testing for dynamics in fluctuating data with trends,” Phys. Rev. E 72, 056216. Rapp, P. E., Cellucchi, C. J., Watanabe, T. A. A., Albano, A. M. & Schmah, T. I. [2001] “Surrogate data pathologies and the false-positive rejection of the null hypothesis,” Int. J. Bifurcation and Chaos 11, 983– 997. Schreiber, T. & Schmitz, A. [1996] “Improved Surrogate data for nonlinearity tests,” Phys. Rev. Lett. 77, 635– 638. Schreiber, T. & Schmitz, A. [1997] “Discrimination power of measures for nonlinearity in a time series,” Phys. Rev. E 55, 5443–5447. Schreiber, T. & Schmitz, A. [2000] “Surrogate time series,” Physica D 142, 346–382. Small, M. & Judd, K. [1999] “Detecting periodicity in experimental data using linear modeling techniques,” Phys. Rev. E 59, 1379–1385. Small, M., Yu, D. & Harrison, R. G. [2001] “Surrogate test for pseudoperiodic time series data,” Phys. Rev. Lett. 87, 188101. Small, M. [2005] Applied Nonlinear Time Series Analysis: Applications in Physics, Physiology and Finance, World Scientific Series on Nonlinear Science, Series A — Vol. 52. Theiler, J., Eubank, S., Longtin, A., Galdrikian, B. & Farmer, J. D. [1992] “Testing for nonlinearity in time series: The method of surrogate data,” Physica D 58, 77–94. Theiler, J. [1995] “On the evidence for low-dimensional chaos in an epileptic electroencephalogram,” Phys. Lett. 196, 335–341. Theiler, J. & Prichard, D. [1996] “Constrainedrealization Monte-Carlo method for hypothesis testing,” Physica D 94, 94–235. Theiler, J. & Rapp. P. E. [1996] “Re-examination of the evidence for low-dimensional, nonlinear structure

0.2

3599

in the human electroencephalogram,” Electroenceph. Clin. Neurophysiol. 98, 213–222. Theiler, J. & Prichard, D. [1997] “Using surrogate surrogate data to calibrate the actual rate of false positives in tests for nonlinearity in time series,” in Nonlinear Dynamics and Time Series: Building a Bridge between the Natural and Statistical Sciences, eds. Cutler, C. D. & Kaplan, D. T. Fields Institute Communications, Vol. 11, (American Mathematical Society, Boston) pp. 99–113.

Appendix A In this appendix, to show similarity and distinction between the Random–Shuffle (RS) and Small– Shuffle (SS) surrogate methods, we show that the result of applying the RS surrogate method to actual data (Cobalt data, NMR laser data, EEG data, daily highest temperature in Tokyo and global average temperature) are examined in the previous sections. We note that the meaning of the results of applying the RS and SS surrogate methods is different. When data have irregular fluctuations and no trend, both the RS and SS surrogate methods can indicate that “the data” have some kind of dynamics. However, when data have irregular fluctuations and trends, the RS surrogate method can indicate that “the data” have some kind of dynamics, and the SS surrogate method can indicate that “the irregular fluctuations” have some kind of dynamics.

A.1. Cobalt data Figures 16(a) and 16(b) show that the RS surrogate data are similar to the original cobalt and SS

RS surrogate data of Cobalt data

RS surrogate data of Cobalt data 0.15 0.1

0.1 0.05

0

0

0

500

1000

Iteration (a)

1500

2000

0

20

40

60

80

100

Iteration (b)

Fig. 16. A plot of (a) and (b) one of the RS surrogate data, and (c) AC and (d) AMI for the cobalt data. Panel (b) is a enlargement of (a). We use 10 000 data points and 39 RS surrogate data. The solid line is the original data and dotted lines are the RS surrogate data.

January 13, 2007

16:19

01699

T. Nakamura & M. Small

3600

0.04

0.34

AMI

AC

0.02 0

0.32

−0.02

0.3 −0.04 0

5

10

15

0

20

5

10

15

20

Time lag

Time lag (c)

(d) Fig. 16.

surrogate data, see Figs. 6(a) and 6(c). However, although there is one large peak at almost the same position (around 5) in Figs. 6(b) and 6(d), there is no such peak in Fig. 16(b). Figures 16(c) and 16(d) show that both the AC and AMI of the original data fall within the distribution of the RS surrogate data. This result is essentially the same as that when the SS surrogate method is applied. However, Fig. 16(c) shows that the AC distribution has no slow oscillation, although Fig. 7(a) shows it.

(Continued)

These results indicate that although the SS algorithm shuffles data on a small scale, the RS algorithm shuffles data on a large scale and does not preserve global behavior.

A.2. NMR laser data Figures 17(a) and 17(b) show that the RS surrogate data are no longer similar to the original NMR and SS surrogate data. For example, amplitude does

RS surrogate data of NMR laser data

RS surrogate data of NMR laser data

2000

2000

0

0

−2000

−2000

−4000

−4000 0

100

200

300

400

500

0

20

40

60

Iteration

Iteration

(a)

(b)

1.0

80

100

3

AMI

AC

0.5 0.0

1

−0.5 −1.0 0

2

5

10

Time lag (c)

15

20

0

5

10

15

20

Time lag (d)

Fig. 17. A plot of (a) and (b) one of the RS surrogate data, and (c) AC and (d) AMI for the NMR laser data. Panel (b) is a enlargement of (a). We use 10 000 data points and 39 RS surrogate data. The solid line is the original data and dotted lines are the RS surrogate data.

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

not increase and decrease over time like the original and SS surrogate data. Figures 17(c) and 17(d) show that both the AC and AMI of the original data fall outside the distribution of the RS surrogate data. This result is essentially the same as that when the SS surrogate method is applied. However, although Fig. 9 shows that the AMI distribution of SS surrogate data decreases slowly, Fig. 17(d) shows that the AMI distribution of RS surrogate data is almost constant. This result indicates that the RS surrogate data are presumably IID and no longer preserve any feature of NMR laser data, although the SS surrogate data have.

A.3. EEG data Figures 18(a) and 18(b) show that the RS surrogate data have no trend and are no longer similar to the original EEG and SS surrogate data. Figures 18(c) and 18(d) show that both the AC and AMI of the original data fall outside the distribution of the RS surrogate data. This result is essentially the same as that when the SS surrogate method is

3601

applied. However, the behavior of the AC and AMI is different from that of SS surrogate data. The AC and AMI distribution of SS surrogate data decreases slowly, the AC and AMI distribution of RS surrogate data is almost constant.

A.4. Daily highest temperature in Tokyo Figures 19(a) and 19(b) show that one long term periodicity (one year periodicity) and any trend disappear. Also, as well as the result of EEG data in the previous example, the AC and AMI distribution of RS surrogate data is almost constant. Although this result and that obtained by applying the SS surrogate method show that both the AC and AMI of the original data fall outside the distribution of surrogate data, the meaning of the results is different. Although the result by applying the SS surrogate method indicates that “the irregular fluctuations” have some kind of dynamics, the result by applying the RS surrogate method indicates that “the data” have some kind of dynamics.

RS surrogate data of EEG data

RS surrogate data of EEG data

40

40 20

20

0

0

−20

−20

−40

−40

−60

−60 0

2000

4000

6000

8000

10000

0

50

Iteration

100

150

200

15

20

Iteration

(a)

(b)

4.0

1

AMI

AC

3.0

0.5

2.0 1.0

0 0

5

10

Time lag (c)

15

20

0.0 0

5

10

Time lag (d)

Fig. 18. A plot of (a) and (b) one of the RS surrogate data, and (c) AC and (d) AMI for the EEG data. Panel (b) is a enlargement of (a). We use 10 000 data points and 39 RS surrogate data. The solid line is the original data and dotted lines are the RS surrogate data.

January 13, 2007

3602

16:19

01699

T. Nakamura & M. Small RS surrogate data of daily highest temperature (˚C)

RS surrogate data of daily highest temperature (˚C)

40

40

30

30

20

20

10

10

0

0

500

1000

1500

2000

2500

0

0

(a)

(b)

1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 0

0.8

AMI

0.6

AC

100

Days

1.0

0.4 0.2 0.0 −0.2 0

50

Days

5

10

15

20

5

10

150

15

20

Time lag

Time lag

(c)

(d)

Fig. 19. A plot of (a) and (b) one of the RS surrogate data, and (c) AC and (d) AMI for the daily highest temperature in Tokyo. Panel (b) is a enlargement of (a). We use 2647 data points and 39 RS surrogate data. The solid line is the original data and dotted lines are the RS surrogate data.

1

RS surrogate data of monthly global average temperature

1

0.5

0.5

0

0

−0.5

−0.5

−1 0

500

1000

1500

2000

−1

RS surrogate data of monthly global average temperature

0

20

40

Days

1.0

2.1

0.8

2.0

0.6

1.9

0.4

1.7

0.0

1.6 10

Time lag

(c)

100

1.8

0.2

5

80

(b)

AMI

AC

(a)

−0.2 0

60

Months

15

20

1.5 0

5

10

15

20

Time lag

(d)

Fig. 20. A plot of (a) and (b) one of the RS surrogate data, and (c) AC and (d) AMI for the monthly global average temperature. Panel (b) is a enlargement of (a). We use 1794 data points and 39 RS surrogate data. The solid line is the original data and dotted lines are the RS surrogate data.

January 13, 2007

16:19

01699

Applying the Method of Small–Shuffle Surrogate Data

A.5. Monthly global average temperature Figures 20(a) and 20(b) show that the RS surrogate data have no trend and are no longer similar to the original and SS surrogate data. Also, as well as the results in the previous two examples, the

3603

AC and AMI distribution of RS surrogate data is almost constant. Figure 20(d) shows that the AMI falls outside the distribution of the RS surrogate data, although the AMI falls within the distribution of the SS surrogate data.

Suggest Documents