CHAOS 18, 023125 (2008) DOI: 10.1063/1.2927388
Testing for intra-cycle determinism in pseudo-periodic time series Mara C. S. Coelho∗ Programa de P´ os-Gradua¸ca˜o em Engenharia El´etrica, Universidade Federal de Minas Gerais, Av. Antˆ onio Carlos 6627, 31.270-901 Belo Horizonte, Minas Gerais, Brazil Eduardo M. A. M. Mendes† and Luis A. Aguirre‡ Departamento de Engenharia Eletrˆonica, Universidade Federal de Minas Gerais, Av. Antˆ onio Carlos 6627, 31.270-901 Belo Horizonte, Minas Gerais, Brazil
Abstract A determinism test is proposed based on the well-known method of the surrogate data. Assuming predictability to be a signature of determinism, the proposed method checks for intra-cycle (e.g. short-term) determinism in pseudo-periodic time series for which standard methods of surrogate analysis do not apply. The approach presented is composed of two steps. First, the data are pre-processed to reduce the effects of seasonal and trend components. Second, standard tests of surrogate analysis can then be used. The determinism test is applied to simulated and experimental pseudo-periodic time series and the results show the applicability of the proposed test. Keywords: Surrogate Method, Periodicity, Nonlinear Time Series Analysis
∗ †
[email protected] [email protected],Phone:55+(31)3409-4862,FAX:+55(31)34094848
‡
[email protected]
1
In the analysis of a time series, one of the first tests to be performed is to determine whether there is any signature of determinism underlying the data. The presence of determinism will suggest the construction of deterministic models or other ways of analysis, whereas the lack of determinism demands a statistical approach to modeling and analysis. In cases where the original time series exhibits a pseudo-periodic component, short-time intra-cycle determinism cannot generally be determined using standard surrogate methods. In this work a test is proposed to check for possible intra-cycle determinism in pseudo-periodic time series.
I.
INTRODUCTION
The interest in detecting determinism in time series arose out of the need to discriminate a chaotic time series from a random time series in cases where the chaotic behavior was noiselike. In the eighties, the distinction between chaotic determinism and randomness was made by calculating well-known dynamic invariants such as correlation dimension and Lyapunov exponents1–5 . Later, several other methods were proposed for the analysis of determinism in time series6–9 . Among such methods, those based on surrogate data8 have become quite popular. The surrogate time series is a set of data artificially generated from the original time series under investigation. Once the property under investigation (for instance nonlinearity, determinism, and so on) is defined, a method is sought to generate the surrogate data that preserves a specific characteristic of the original time series. A discriminant factor is then used to differ the original time series from the artificially set of time series generated by the chosen surrogate method. Several algorithms have been proposed to generate the surrogate data in the literature. The most commonly used methods are: the RS (Random Shuffle) algorithm, the FT (Fourier Transform) algorithm and the AAFT (Amplitude Adjusted Fourier Transform) algorithm8 . These methods were designed to detect determinism in nonperiodic and stationary time series6,8,9 . Economic, physics and biological real time series often exhibit periodic or pseudo-periodic characteristics that hide any possible underlying dynamics. See10 for an interesting discus2
sion of the importance of determining such dynamics in the context of load forecasting. At first, a visual inspection or a linear technique of analysis of time series, such as the autocorrelation function (ACF) and the power spectrum, could be attempted. However it is not always possible to check for the existence of deterministic dynamics in addition to the pseudo-periodic behavior. The algorithm SSS (Small Shuffled Surrogate) proposed by Nakamura and Small11 is perhaps the sole method available for such a purpose. The goal of this paper is then to propose an alternative approach to detect intra-cycle determinism in pseudo-periodic time series. In this work, the definition of intra-cycle determinism is based on whether a time series can be predicted or not6 . This definition was also used in the work by Gomes et al 12 . The background material, some definitions and the techniques borrowed from the statistical analysis of time series are given in Section II. In a statistical framework, the null hypothesis, H0 , to be tested is that the pseudo-periodic time series is random apart from its pseudo-periodic characteristic. H0 will be rejected if the time series exhibits a deterministic characteristic in the sense just defined. This will be made more precise in Section III. Experiment results are given in Section IV and the conclusions in Section V.
II.
BACKGROUND MATERIAL
In this section, a simple example is given to illustrate the problem of distinguishing the underlying dynamics in a given pseudo-periodic time series. A brief review of the surrogate method is also provided. In order to pre-process the data for further analysis, methods to remove seasonal and trend components are listed.
A.
Statement of the problem
The sole use of the autocorrelation function (ACF) or the power spectrum to detect if there is deterministic dynamics undelying the pseudo-periodic behavior of a time series is not always successful. For instance, compare time series y1 (k) and y2 (k) in Figures 1(a) and 2(a). At first sight, there is no obvious qualitative difference between them. Could the autocorrelation function or the power spectrum be used to distinguish the underlying dynamics of such time series? To answer this question, consider the pseudo-periodic component s(k)
3
with seasonal periods d1 = 13 and d2 ≈ 314.16 defined as follows s(k) = 5sin(ω1 k) − sin(ω2 k), where ω1 = 2π/d1 and ω2 = 2π/d2 . The deterministic component of one of the time series shown in the Figure 1(a) is given by the following auto-regressive, AR(3), model: x1 (k) = 0.9x1 (k − 1) + 0.02x1 (k − 2) + 0.03x1 (k − 3) + 0.2e(k), √ where e(k) is zero-mean, Gaussian white noise with variance equal to 0.04, N (0.0, 0.04). The resulting pseudo-periodic time series with short-term deterministic signature can be obtained by adding the pseudo-periodic and deterministic components as follows y1 (k) = s(k) + x1 (k).
(1)
The other pseudo-periodic time series, y2 (k), with underlying random behavior can be √ generated by adding a random time series x2 (k), with distribution N (0.0, 0.2), to the same pseudo-periodic component as shown below y2 (k) = s(k) + x2 (k).
(2)
The auto-correlation function and the power spectrum of time series y1 (k) and y2 (k) are shown in Figures 1(c)-(d) and Figures 2(c)-(d), respectively. Note that nothing can be said about the underlying dynamic behavior of the series. The linear techniques used in this example were not adequate to distinguish the underlying dynamic behavior. The aim of this work is to detect intra-cycle deterministic dynamics. In other words, the proposed method should distinguish between y1 (k) and y2 (k). A review of the surrogate method on which the method is based is given next.
B.
Hypothesis test based on surrogate data
The hypothesis test based on surrogate data is widely used to detect nonlinearity and determinism in time series13,14 . The hypothesis test consists of generating a set of artificial time series that preserves some statistical characteristics of the time series and destroy other statistical characteristics14 . The surrogate data set should be consistent with the null
4
(a)
(b) 10
10
10
y 1 (k)
8
1 1 3
10
5
1 3 1 4 .1 5 6
10
0 4
10
ï5
ï10 0
2
10
0
100
200
300
k
400
10
0
0.05
0.1
H z
0.15
0.2
0.25
(c) 1 0.8 0.6 0.4 0.2 0 ï0.2 ï0.4 ï0.6 ï0.8 ï1 0
314
τ
628
FIG. 1: (a) Pseudo-periodic time series with short-term deterministic dynamics, y1 (k) in (1). (b) Power spectrum of y1 (k) with periods d1 = 13 and d2 = 100π . . .. (c) ACF of y1 (k).
hypothesis H0 of interest.1 To carry on the hypothesis test, some statistical discriminant must be applied to the time series and to the surrogate data set. If the discriminant factor of the time series is significantly different from that one obtained from the set of surrogate data, the null hypothesis H0 can be rejected as the generating mechanism of the original time series. If the opposite happens, then nothing can be said about the generating mechanism of the time series. Few methods were developed to detect determinism in pseudo-periodic time series. The 1
H0 attempts to define the nature of the underlying process that generated the original time series under analysis. The statistical framework under H0 establishes whether the process can or cannot be appropriately explained by the available data.
5
(a)
(b) 10
10
10
y 2 (k)
8
10
5
1 3 1 4 .1 5
1 1 3
6
10
0 4
10
ï5
ï10 0
2
10
0
100
200
300
k
400
10
0
0.05
0.1
H z
0.15
0.2
0.25
(c) 1 0.8 0.6 0.4 0.2 0 ï0.2 ï0.4 ï0.6 ï0.8 ï1 0
314
τ
628
FIG. 2: (a) Pseudo-periodic time series with underlying random behavior y2 (k) in (2). (b) Power spectrum of y2 (k) with periods d1 = 13 and d2 = 100π . . .. (c) ACF of y2 (k).
algorithm CSS (Cycle Shuffled Surrogate) proposed by Theiler et al.8 searches for inter-cycle determinism by shuffling the seasonal cycle within a time series. The algorithm PPS (PseudoPeriodic Surrogate) proposed by Small et. al15 also searches for inter-cycle determinism, but the authors use different embeddings of the time series to generate the surrogate data. The null hypothesis, H0 , in those methods is different from H0 used in this work because they search for inter-cycle determinism rather than intra-cycle determinism. The algorithm SSS (Small Shuffled Surrogate) proposed by Nakamura and Small 11 aims to detect intra-cycle determinism by shuffling of the pseudo-periodic time series locally; the same goal of the method proposed in this work. As shown in paper11 the algorithm SSS has been successfully applied to several cases such as the cobalt data and daily sunspots numbers. However, SSS can fail to detect determinism in pseudo-periodic time series with 6
different underlying behavior. For instance, the algorithm SSS is applied to the time series y1 (k) and y2 (k) shown in Figures 1(a) and 2(a), respectively. For the time series y1 (k) with 4000 observations, a set of 400 surrogate data was generated using the algorithm SSS11 . Figure 3 (a) shows one example of such surrogate series. One of the discriminant factors used in11 is the ACF. In the same paper, the authors show that if ACF of the original series fall within the distributions of the surrogate data they consider that the original and the surrogate data may come from the same population and then the surrogate null hypothesis may not be rejected. This procedure was also followed in this work. The ACF of y1 (k) and the ACF of the 400 surrogate time series are shown in Figure 3(b). Note that the ACF of the time series y1 (k) is different from the ACF of the surrogate data sets. Therefore, it can be concluded that the pseudo-periodic time series y1 (k) exhibits short-term determinism. For the pseudo-periodic time series with underlying random behavior y2 (k) shown in Figure 2(a), a set of 400 surrogate data was generated. The ACF of both y2 (k) and the surrogate data are shown in Figure 4(b). It can be noticed that the ACF of y2 (k) is also different from the ACF of surrogate data sets which implies that the underlying behavior of y2 (k) is different from the surrogate data. In other words, it can be inferred that y2 (k) has short-term determinism in addition to the pseudo-periodic behavior. This result is not in agreement with what is known about y2 (k). It is worth pointing out that, in order to detect determinism in pseudo-periodic time series, the three aforementioned methods - CSS, PPS and SSS - generate surrogate data in a different fashion when compared to the work of Theiler8 . However these methods were developed to be directly applied to the pseudo-periodic time series avoiding the necessity of pre-processing the time series under investigation. In this work, the test of determinism combines some available and well established methods in the literature to generate the surrogate data8 . In order to use the proposed algorithm, the pseudo-periodic time series must be pre-processed, since the aforementioned methods cannot be applied to periodic time series. In particular, it is necessary to extract pseudo-periodic and trend components of the pseudo-periodic time series. In the literature of time series, traditional approaches for modeling seasonal time series are to remove the seasonal components using seasonal adjustment methods and then the models are scaled back using the estimated seasonal components for forecasting purposes. 7
(a)
(b)
8
1.5
original 6
ï x ï original
surrogate
ï o ï surrogate
1
4 2
0.5
0 0
ï2 ï4
ï0.5
ï6 ï8 0
5
10
15
observations
20
25
30
ï1 0
10
20
lags
30
40
50
FIG. 3: (a) Pseudo-periodic time series with short-term deterministic dynamics, y1 (k) in (1) in (-x-), one of surrogate data (-o-). (b) ACF of y1 (k) and of the surrogate data sets. (a)
(b) 1.5
8
ï x ï original
original surrogate
6
ï o ï surrogate
1
4 0.5
2 0
0
ï2 ï0.5
ï4 ï6 0
5
10
15
observations
20
25
30
ï1 0
10
20
lags
30
40
50
FIG. 4: (a) The pseudo-periodic time series with underlying random behavior y2 (k) in (2) in (-x-) and, one of the surrogate data (-o-). (b) ACF of y2 (k) and of the surrogate data sets.
To the best of our knowledge there is no mathematical proof that preprocessing does not affect the underlying dynamics. For a recent debate on whether these adjustment methods affect directly the deterministic components or not, please see16 . One of the most used techniques in financial time series to extract seasonal and trend components is known as X-11-ARIMA17 and its extended version X-12-ARIMA18 . Ano8
ther well-known method used to reduce seasonal components is the moving average seasonal filter19 . This filter will be used in this work and will be reviewed in Section II C.
C.
Reduction of seasonal and trend components
A time series may be represented as the combination of four distinct components: trend, cyclical, seasonal and residual19 . In order to isolate these components, several procedures to extract them from a time series have been developed. This section aims at briefly describing a method to estimate and reduce trend and seasonal components. As a result a residual time series to which the determinism test will be applied is obtained. Quite possibly (the degree of uncertainty is imposed by the user) the original pseudo-periodic time series has intra-cycle determinism behavior if determinism is found in the residual time series. In the statistical framework, the null hypothesis can be safely rejected.
D.
The seasonal moving average filter method (SMAF)
The pseudo-periodic time series can be represented by the following classic additive decomposition model19 : y(k) = m(k) + s(k) + x(k), k = 1, . . . , n,
(3)
where n is the number of observations, m(k) is the trend component, s(k) is the seasonal component with seasonal period d, and x(k) is the residual component. The components m(k), s(k) and, x(k) in (3) will be estimated by following a sequence of steps, detailed below. Considering the original time series y(k) with n observations, the trend component over the sliding window, m ˆ 0 (k), can be estimated by applying a moving average filter as follows (m(k) ˆ will be estimated later) m , for d even, then q = d/2, ˆ o (k) = [0.5y(k−q)+y(k−q+1)+...+y(k+q−1)+0.5y(k+q)] d q % 1 ˆ o (k) = 2q+1 y(k − i), for d odd, then q = (d − 1)/2, m i=−q
where q < k ≤ (n − q).
9
(4)
The next step consists of removing from the data the trend component, that is, α(k) = y(k) − m ˆ 0 (k), q < k ≤ (n − q). Because of edge effects due to the moving average filter (4), m ˆ 0 (k) and α(k) are pruned at each end, notice that such series exist for q < k ≤ (n − q). Subsequently, it is desired to find the “average cycle” among all the complete cycles within α(k). This average is a sample average, that is, we take the average of all the first values of each cycle in α(k), then the second values are considered, and this is repeated until the dth values (the last values) of each complete cycle in α(k). Mathematically, this can be expressed as jc 1& w(k ˆ 0) = α((j − 1)d + k0 ), k0 = 1 . . . d, jc j=1
(5)
where jc is the number of complete cycles in α(k). Because w(k ˆ 0 ) can have an offset, w, this is removed thus β(k0 ) = w(k ˆ 0 ) − w,
k0 = 1, . . . , d.
(6)
Now a periodic time series, composed of the average zero-mean cycle β(k0 ), has to be produced. This is done as sˆ(k0 ) = β(k0 ), k0 = 1, . . . , d sˆ(d + k0 ) = β(k0 ), k0 = 1, . . . , d sˆ(2d + k0 ) = β(k0 ), k0 = 1, . . . , d sˆ((j − 1)d + k0 ) = β(k0 ), k0 = 1, . . . , d and j = 1, . . . , jc .
(7)
The time series without the seasonal component is obtained as ˆ = y − sˆ, D
(8)
where it is important that the data y and sˆ be in phase. Now we are able to estimate m(k). ˆ Notice that m ˆ 0 (k) is the trend component along the ˆ sliding window and not of the time series as a whole. m(k) can be estimated from D(k) as m(k) ˆ = a0 + a1 k + a2 k 2 + . . . + aj k j . 10
(9)
where j is the degree of the polynomial chosen by visual inspection. Finally, the residual time series can be obtained as: xˆ = y − m ˆ − sˆ.
(10)
In equations (8) and (10) the time index k was omitted to call attention to two facts. First, y is longer than sˆ. Although m(k) ˆ can be, in principle, evaluated at any value of k, it is ˆ which is shorter strongly recommended that k should range only within the limits of D, than y. Second, y and sˆ must be in phase. In practice, these remarks become obvious and are easily implemented.
E.
Application of the SMAF method to reduce seasonal and trend components
The SMAF method described in the section II C is applied to the two simulated pseudoperiodic time series, y1 (k) and y2 (k), shown in Figures 1(a) and 2(a) respectively, and to four experimental pseudo-periodic time series, namely: • yC (k) is the reported monthly cases of mumps during 1928-1972 in New York City, with seasonal period d = 12. The corresponding residual time series is represented by xˆC (k), shown in Figure 6(a) and (b), respectively. • yL (k) is the monthly observations of milk production during 1962-1975. This time series has the seasonal period d = 12. The corresponding residual time series is represented by xˆL (k), shown in Figures 7(a) and (b), respectively. • yQ (k) is the hourly consumption of electric power (electric load) with seasonal period d = 24. The corresponding residual time series is represented by x ˆQ (k), shown in Figures 8(a) and (b), respectively. • yE (k) is the monthly consumption of electric power (electric load) with seasonal period d = 12. The corresponding residual time series is represented by x ˆE (k), shown in Figures 9(a) and (b), respectively. For the simulated pseudo-periodic time series, y1 (k) and y2 (k), depicted in Figure 1 and 2 respectively, the corresponding residual time series, x ˆ1 (k) and xˆ2 (k) are shown in Figure 5(a) and (d), respectively. 11
From Figures 5(a) and (d) it can be noticed that the SMAF method has successfully removed the seasonal component. Therefore, the resulting residual time series, xˆ1 (k) and xˆ2 (k), do not show seasonal behavior with respect to d1 = 13 and d2 ≈ 314.15, as can be seen in the power spectrum and the ACF of residual time series (Figures 5(b)–(c) and Figures 5(e)–(f), respectively.). Therefore these results show that the procedure of removing the cycles did not affect the deterministic component. The SMAF method for reduction of pseudo-periodic component has also a good performance when used to reduce seasonal and trend components of the experimental pseudoperiodic time series yC (k), yL (k), yQ (k) and yE(k). From the analysis of the ACF and power spectrum shown in Figure 6 (c) and (d), it can be verified that the seasonal component with d1 = 12 and d2 = 31 of the mumps time series, yC (k), were removed. The same occurs with the time series of milk, yL (k), with with d = 12, as shown in Figure 7 (c) and (d). Likewise, in the case of the load time series yQ (k), the seasonal component with d1 = 168 was strongly reduced. However a lesser reduction of the seasonal component with d2 = 24 can be observed (see Fig. 8 (c) and (d)). In principle, unremoved portions of the seasonal component can affect the performance of the proposed method, although this seems not to have been the case in the examples given here, as argued in Section IV. The reduction of the pseudo-periodic component in the time series yE (k) also had a good performance as it can be seen from the analysis of the ACF and power spectrum shown in Figure 9 (c) and (d), it can be verified that the seasonal component with d = 12 of the electric load time series, yE (k), were removed.
III.
METHODOLOGY
In order to evaluate whether there is a deterministic feature in the original time series a discriminating statistic must be chosen. Since the objective of this work is to verify the existence of intra-cycle determinism in pseudo-periodic time series based on the existence of some predictability, the original time series and the surrogate data set are submitted to NARMA (Nonlinear Autoregressive and Moving Average model) polynomial modeling20–23 and subsequent prediction. To quantify the forecasting performance obtained, each time series used in the test (ori12
(a)
(b)
(c)
10
6
10
4
10
8
2
1.2
x ˆ1 (k)
1
1 1 3 1 3 1 4 .1 5
6
0.8
10
0.6
0
4
10
0.4
ï2
2
10
0.2
ï4
0
0
10
ï6 0
100
200
300
k
0
400
0.05
0.1
(d)
H z
0.15
0.2
0.25
ï0.2 0
(e)
314.15
τ
628.31
(f)
10
4
10
3
1.2
x ˆ2 (k)
8
10
1 3 1 4 .1 5
2
1
1 1 3
0.8
6
10
1
0.6
0
4
10
0.4
ï1 2
10
ï2 ï3 ï4 0
0.2 0
0
10
100
FIG. 5:
200
k
300
400
500
0
0.05
0.1
H z
0.15
0.2
0.25
ï0.2 0
314.15
τ
628.31
Simulated pseudo-periodic time series. (a) Residual time series x ˆ1 (k). (b) Power spectrum of the residual time series
x ˆ1 (k) and, in (c) ACF of x ˆ1 (k). (d) Residual time series x ˆ2 (k). (e) Power spectrum of the residual time series x ˆ2 (k) and, (f) ACF of x ˆ2 (k). The interval indicated by dashed lines corresponds to the 95% confidence level.
ginal times series and the surrogates) is divided into Nw intervals (windows) of equal length. The forecast error index mRSE (Mean Root Square Error) is calculated over each window as follows Nw ' (yw (i) − yˆw (i))2 1 & ' , mRSE(i) = Nw w=1 (yw (i) − y w )2
(11)
where i is the forecasting step and can range from 1 to h, where h is the maximum forecasting horizon. w = 1 . . . Nw is the window used for forecasting, where Nw is the maximum number of windows for which the average is calculated. In this work, Nw = 10. yw (i) refers to time series in the interval (window) w in the instant i, yˆw is the predicted value of time series in the interval w obtained from the model and y¯w is the mean value over the window w.
13
(a)
(b) 1500
yC (k)
2500
x ˆC (k)
1000 2000 500
1500
1000
0
500 ï500 0 0
100
200
300
400
500
0
100
200
300
k
k
(c)
(d)
1.5
400
500
12
10
- . - yC(k) –o– x ˆC(k)
1
x ˆC (k)
yC (k)
10
10
8
10
0.5 6
10
0 4
10
ï0.5 0
2
50
100
150
200
10
o
FIG. 6:
0
0.1
0.2
Hz
0.3
0.4
0.5
Experimental pseudo-periodic: (a) time series of the monthly cases of mumps with d = 12 yC (k), and (b) corre-
sponding residual series x ˆC (k). (c) ACF and (d) power spectrum of the time series yC (k) (- -) and x ˆc (k) (—).
A.
The proposed test for detecting determinism
The proposed method for testing determinism in time series is based on the idea that if the time series is predictable to some extent, it exhibits some determinism12 . In other words, predictability (even limited) is a signature of determinism. In paper12 the seasonal component of the time series was not extracted; The FT and AAFT algorithms8 were used to generate the surrogate data set and; the index RMSE were used and it is calculated for every window. The proposed algorithm for testing intra-cycle determinism can be summarized as: 1 - from y(k), estimate and remove the seasonal and trend components, yielding xˆ(k); 2 - from xˆ(k), generate the surrogate time series;
14
(a) 1100
(b) 50
yL (k)
1050
x ˆL (k)
40
1000 30
950 900
20
850 10 800 0
750 700
ï10
650 ï20
600 550 0
20
40
60
80
k
100
120
140
160
ï30 0
20
40
60
(c)
80
k
100
120
140
160
(d) 10
10
1.4
x ˆL (k)
- . - yL(k)
1.2
yL (k) 8
–o– x ˆL(k)
1
10
0.8 6
10
0.6 0.4
4
10
0.2 0
2
ï0.2
10
ï0.4 0
50
100
150
0
o
FIG. 7:
0.1
0.2 H
z
0.3
0.4
0.5
(a) Experimental pseudo-periodic time series of the monthly milk production with d = 12 yL (k), and (b) corre-
sponding residual series x ˆL (k). In (c) ACF and (d) power spectrum of the time series yL (k) (- -) and x ˆL (k) (—).
3 - build mathematical models from xˆ(k) and the surrogate time series; 4 - using the models just built, forecast from 1-to-h steps ahead and calculat the forecasting error index mRSE given by equation (11); 5 - test of the null hypothesis H0 . In this paper, step 1 is performed using the seasonal moving average filter (SMAF) method described in Section II C.
15
(a) 7000 6500
(b) 1500
yQ (k)
x ˆQ (k)
1000
6000 5500
500 5000 4500 0 4000 3500
ï500
3000 2500 0
1000
2000
3000
k
4000
5000
ï1000 0
1000
2000
(c)
3000
k
4000
5000
(d)
1.5
14
10
- . - yQ(k)
x ˆQ (k)
yQ (k)
12
10
–o– x ˆQ(k)
1
10
10
0.5
8
10
6
10
0
4
10
ï0.5 0
2
50
100
150
200
250
300
10
o
FIG. 8:
0
0.1
0.2
Hz
0.3
0.4
0.5
(a)Experimental pseudo-periodic time series of electric load time series with d = 24 yQ (k), and (b) corresponding
residual series x ˆQ (k). (c) ACF and (d) power spectrum of the time series yQ (k) (- -) and x ˆq (k) (—).
The second step is performed using the Shuffled Algorithm2 to generate the surrogate time series described in paper8 . In the third step, xˆ(k) and the surrogate time series are divided into two windows: one for identification, another for validation. This step is performed as described in12 . It should be noticed that other modelling techniques can be used here. After building the models from both the original time series and the surrogate series, 1-to-h step-ahead predictions are determined over the validation data. The mRSE in (11) is calculated for both the original time series and the surrogate data set with respect to each 2
It is important to emphasize that the Random Shuffled Algorithm proposed in8 was not developed for time series with pseudo-periodic behavior. Therefore, to use this algorithm in pseudo-periodic time series the seasonal component of the time series must be reduced as much as possible.
16
(a)
(b)
0.1
0.1
0.08
yE (k)
0.08
0.06
x ˆE (k)
0.06
0.04
0.04
0.02 0.02 0 0 ï0.02 ï0.02
ï0.04
ï0.04
ï0.06
ï0.06
ï0.08 ï0.1 0
20
40
60
k
80
100
ï0.08 0
20
40
(c)
60
80
k
100
(d) 2
1.5
10
1
- . - yE (k)
10
–o– x ˆE(k)
10
x ˆE (k) yE (k)
1
0
ï1
0.5
10
ï2
10
0 ï3
10
ï0.5 0
ï4
20
40
60
80
100
10
o
FIG. 9:
0
0.1
0.2
Hz
0.3
0.4
0.5
(a) Experimental pseudo-periodic time series of the monthly eletrical load with d = 12 yE (k), and (b) corresponding
residual series x ˆE (k). In (c) ACF and (d) power spectrum of the time series yE (k) (- -) and x ˆE (k) (—).
prediction step. The mRSE and the confidence interval obtained from the predictions for surrogate data sets are plotted and compared with the performance of the predictions over the validation window of xˆ(k). In the determinism framework established in this paper, and assuming that pseudoperiodic component was removed from the time series under investigation, y(k), the null and alternative hypotheses can be defined with respect to the index mRSE as follows: • H0 : xˆ(k) is random. H0 cannot be rejected if mRSE ≈ 1. • H1 : xˆ(k) displays short-term determinism. H0 can be rejected and H1 accepted if mRSE1.
B.
Confidence Intervals
In order to define the degree of confidence of the test, confidence intervals are calculated. To this end, it is necessary to know the distribution of the mRSE index. Several examples show that the mRSE index does not have a normal distribution for short prediction horizons. See for instance Figure 10. In this particular case, a set of 1000 surrogate sequences was generated and a model was fitted to each one of the surrogate time series. Each model was used to predict from i = 1, 2, 3, 4 steps ahead over windows of length h. All our examples use h = 10. At each forecasting step the prediction error of each model was computed and. the mRSE index was calculated. For i = 1 and i = 2 the distribution of the index mRSE(i) is closer to a chi-squared distribution than to a normal distribution. See Figures 10 (a) and (b), respectively. As the prediction horizon increases, the distribution of the index mRSE(i) approaches a Normal distribution as shown in Figures 10 (c) and (d). This is an interesting result. The change in the distribution of mRSE(i) as i increases seems to indicate the loss of predictability when the distribution gets closer to a Normal distribution. Since the distribution of the mRSE(i) changes for different prediction horizons, the confidence interval in this work is calculated using the interval among percentis24 and ignoring the underlying distribution. The maximum confidence level is established gradually by increasing the confidence level of the determinism test starting from the initial value of 50% up to the point for which H0 cannot be rejected anymore. It should be noticed that to reject H0 at 50% confidence is equivalent to decide between the null and the alternative hypotheses based on a coin toss. So, it is expected that H0 should be rejected at a much higher confidence level than 50%.
C.
Performance of intra-cycle determinism test
The proposed determinism test was applied to the two simulated pseudo-peridodic time series, that is, y1 (k) given by equation (1) and y2 (k) given by equation (2), shown in Figures 1(a) and Figures 2(a), respectively.
18
(a)
(b)
70
(c)
180
(d)
250
350
160
60
300 200
140
50
250 120 150
40
100
30
80
200 150
100 60
20
100 40
10
50 50
20
0
1
1.5
2
mRSE(1)
2.5
0
0.8
1
1.2
1.4
1.6
1.8
2
0
0.8
1
1.2
1.4
mRSE(3)
mRSE(2)
1.6
0
0.8
1
1.2
1.4
1.6
1.8
mRSE(4)
FIG. 10: Histogram of mRSE(i) calculated for the forecasting horizon of i = 1 . . . 4 steps-ahead from surrogate data derived from the deterministic time series.
The results of intra-cycle determinism in y1 (k) and y2 (k) are shown in Figure 11 (a) and (b), respectively. As stated before, the proposed method applies a random shuffle surrogate test to the residual time series xˆ1 (k) and xˆ2 (k). From Figure 11 (a) it can be noticed that the index mRSE from xˆ1 (k) is smaller than the mean mRSE obtained from surrogate data set. The test suggests that H0 can be rejected and H1 accepted at the confidence level of 96%. It should be noticed that in Fig. 11(a) the dotted lines show the limitting case, that is, if the confidence interval is increased any further, H0 cannot be rejected anymore. In Fig. 11(a) the limitting value is 96%. This means that xˆ1 (k) is likely to have short-term determinism and consequently y1 (k) has an intra-cycle deterministic characteristic. The determinism test for the residual time series xˆ2 (k) shows that the mRSE of xˆ2 (k) is closer to the mRSE of the set of surrogate time series and H0 cannot be rejected at the standard level of confidence (95%). Therefore it is very unlikely that y2 (k) has any intra-cycle deterministic signature. These results show that the proposed procedure works well in this simulated example. Next section the procedure will be tested in real data sets.
IV.
EXPERIMENTAL RESULTS WITH PSEUDO-PERIODIC TIME SERIES
In this section, four experimental pseudo-periodic time series are used: the time series of production of milk (N=168 observations)25 ; the time series of indexes of mumps occurrence (N=528 observations)26 ; the hourly time series of electric power consumption (electric load,
19
5 0 ï5 ï10
(b) 5 original
original
(a)
500
0 ï5
1000 1500 2000 2500 3000 3500 4000
500
1000 1500 2000 2500 3000 3500 4000
5 0 ï5 ï10
k
5
surrogate
surrogate
k
500
1000 1500 2000 2500 3000 3500 4000
0 ï5
500
1000 1500 2000 2500 3000 3500 4000
k
k
2 RMSE
mRSE
2 1 0
2
4
i
6
8
1 0
10
2
4
i
6
8
10
FIG. 11: Result of the determinism test applied the residual series: in (a) x ˆ1 (k), where the test suggests that H0 can be rejected (and H1 accepted) at a confidence level of 96%. The dotted lines show the 96% confidence bands. In (b) x ˆ2 (k), H0 cannot be rejected even at the standard confidence level of 95%. The upper plot shows the residual time series, the middle plot shows one of the surrogate data, and the under plot shows the mRSE(h): where mRSE(h) for residual time (—), mean mRSE(h) for surrogate data (- -), and confidence interval of the mean mRSE for the surrogate data (· · · ).
N=5616 observations)3 , and the monthly time series of electric load (N=116 observations) (COLCOAR A REFERENCIA). The application of the SMAF filter to these data was described in Section II C.
A.
Monthly cases of mumps
The determinism test based on surrogates data is applied to the residual series x ˆC (k) shown in Figure 6(b) and the result is shown in Figure 12(a). The determinism test indicates that H0 can be rejected (and H1 accepted) at a confidence level of virtually 100%. This result implies that the residual time series xˆC (k) is predictable up to 2 steps ahead when the seasonality has been removed from yC (k). Consequently, assuming the seasonality has been removed yC (k) is very likely to have intra-cycle deterministic features. 3
The National Electricity Market Management Company (NEMMCO).
20
B.
Monthly milk production time series
The determinism test based on surrogates data is applied to the residual series x ˆL (k) shown in Figure 7(b), and the result of the determinism test shown in Figure 12(b) indicates that H0 can be rejected and H1 accepted at a confidence level of practically 100%. This result implies that such residual time series xˆL (k) is predictable up to 5 steps ahead and consequently yL(k) is very likely to have intra-cycle deterministic behavior.
C.
Hourly Electric load
The result of the determinism test applied to the residual time series xˆQ (k) depicted in Figure 8(b) is shown in Figure 12(c) and suggests that H0 can be rejected and that H1 accepted with practically 100% confidence. This result implies that the residual time series xˆQ (k) is predictable and consequently xˆQ (k) is very likely to have intra-cycle deterministic features when the seasonality is removed.
D.
Monthly eletric load
The result of the determinism test applied to the residual time series xˆE (k) depicted in Figure 9(b) is shown in Figure 13 and suggests that H0 can not be rejected with practically 100% confidence. This result implies that the residual time series xˆE (k) is not predictable and consequently xˆE (k) is not have intra-cycle deterministic features when the seasonality is removed.
V.
CONCLUSIONS
In this work, the intra-cycle determinism test has been proposed. The method deals with the problem of finding dynamics underlying a pseudo-periodic component. To this end, the time series under investigation should have the seasonality and trend removed. When the method was applied to simulated data with known characteristics determinism was successfully detected. In real data, where short-term intra-cycle determinism seems a reasonable hypothesis but cannot be asserted, the proposed method was also able to detect determinism. In the examples given in this work the indication of determinism intra-cycle 21
2000 1000 0 ï1000
(b) 50 original
original
(a)
100
200
300
400
0 ï50
500
20
40
60
80
2000 1000 0 ï1000
100
140
160
200
300
400
100
120
140
160
0 ï50
500
20
40
60
80 k
2 mRSE
2 mRSE
120
50
k
1 0
100
k
surrogate
surrogate
k
2
4
i
6
8
1 0
10
2
4
i
6
8
10
(c) original
500 0 ï500 4000
4050
4100
4150
4200
4150
4200
surrogate
k
1000 500 0 ï500 4000
4050
4100
mRSE
k
2 1 0
2
4
i
6
8
10
FIG. 12: (a) Result of the determinism test applied the residual series: in (a) of the x ˆC (k); in (b) of the x ˆL (k); and in (c) of x ˆQ (k). For the three time series —ˆ xC (k), x ˆL (k), and x ˆQ (k)— the dotted lines indicate the 95% confidence bands. The determinism test indicates that H0 can be rejected and H1 accepted with practically 100% confidence. In both the figures, the upper plot shows the residual time series x ˆL (k), the middle plot shows one of the surrogate data set, and the under plot shows the mRSE(h): in (- -) mean mRSE(h) for surrogate data set, in (· · · ) confidence interval of 95% of the mean mRSE for the surrogate data set, and in (—) mRSE(h) for both residual time series x ˆC (k), x ˆQ (k) and x ˆL (k).
implies that, in real time series such as milk and mumps, the best predictor is not the mean cycle model (known as the trivial model) and therefore modeling techniques are worth investigating. Consequently a tool to help decide when is modeling worthwhile and when is not is of great practical importance. Finally the proposed method comes as an addition to
22
Série original
0.1 0 ï0.1
20
40
60
80
100
80
100
Série subïrogada
k
0.1 0 ï0.1
20
40
60
indice RMSE
k
2 1 0
2
4
h
6
8
10
FIG. 13: (a) Result of the determinism test applied the residual series of x ˆE (k), where the dotted lines indicate the 95% confidence bands. The determinism test indicates that H0 can not be rejected with practically 100% confidence. In figure, the upper plot shows the residual time series x ˆE (k), the middle plot shows one of the surrogate data set, and the under plot shows the mRSE(h): mRSE(h) for x ˆE (k) (—), mean mRSE(h) for surrogate data set (- -), and confidence interval of 95% of the mean mRSE for the surrogate data set (· · · ).
the few methods available in the literature for dealing with pseudo-periodic time series.
1
P. Grassberger and I. Procaccia (1983). Physical Review Letters, 50(5):346–349.
2
J. P. Eckmann and D. Ruelle (1985). Reviews of Modern Physics, 57(3):617–656.
3
J. P. Eckmann and S. O. Kamphorst and D. Ruelle and S. Ciliberto (1986). Physical Review A, 34(6):4971–4979.
4
L. Trulla and A. Giuliani and J. P. Zbilut and C. L. Webber Jr (1996). Physics Letters A, 223(4):255–260.
5
N. Marwan and M. C. Romano and M. Thiel and J. Kurths (2007). Physics Reports, 438:237– 329.
6
G. Sugihara and R. M. May (1990). Nature, 344(6268):734–741.
7
M. Kennel and S. Isabelle (1992). Physical Review A, 46(6):3111–3118.
8
J. Theiler and S. Eubank and A. Longtin and B. G. J. D. Farmer (1992). Physica D: Nonlinear Phenomena, 58:77–94.
9
D. T. Kaplan and L. Glass (1992). Physical Review of Letters, 68(4):427–430.
23
10
L. A. Aguirre and D. D. Rodrigues and S. T. Lima and C. B. Martinez (2008). Electrical Power and Energy Systems, 30:73–82.
11
T. Nakamura and M. Small (2005). Physical Review E, 72(056216):1–5.
12
M. E. D. Gomes and A. V. P. Souza and H. N. Guimar˜ aes and L. A. Aguirre (2000). Chaos, 10(2):398–410.
13
T. Schreiber (2000). Physica D: Nonlinear Phenomena, 142(3-4) pages 346–382.
14
M. Small (2005). Nonlinear Science Series A, vol. 52. World Scientific.
15
M. Small and D. Yu and R. G. Harrison (2001). Physical Review Letters, 87(18):188101.
16
G. P. Zhang and M. Qi (2005). European Journal of Operational Research, 160, 501-514.
17
J. Shiskin and A. H. Young and J. C. Musgrave (1967). Technical paper no. 15, Department of Commerce, Bureau of Economic Analysis.
18
D. F. Findley and B. C. Monsell and W. R. Bell and M. C. Otto and B. C. Chen (1998). program. Journal of Business & economic statistics, 16(2):127–152.
19
P. J. Brockwell and R. A. Davis (2002). second edition , Springer-Verlag, New York.
20
I. J. Leontaritis and S. A. Billings. (1985a). Int. J. Control, 41(2):303–328.
21
I. J. Leontaritis and S. A. Billings (1985b). Int. J. Control, 41(2):329–344.
22
S. Chen and S. A. Billings (1989). Int. J. Control, 49(3):1013–1032.
23
S. Chen and S. A. Billings and W. Luo (1989). Int. J. Control, 50(5):1873–1896.
24
J. D. Petruccelli and B. Nandram and M. Chen (1999). Upper Saddle River - NJ, USA: Prentice Halll, Inc.
25
J. D. Cryer (1986). Duxbury Press, Boston.
26
K. W. Hipel and A. I. McLeod (1994). Elsevier.
24