We propose a simple method for the reconstruction of slow dynamics perturbations from non-stationary time series records
Extracting Driving Signals from Non-Stationary Time Series M. I. Széliga, P. F. Verdes, P. M. Granitto, H. A. Ceccatto Instituto de Física Rosario, CONICET-UNR, Boulevard 27 de Febrero 210 Bis, 2000 Rosario, República Argentina Abstract We propose a simple method for the reconstruction of slow dynamics perturbations from non-stationary time series records. The method traces the evolution of the perturbing signal by simultaneously learning the intrinsic stationary dynamics and the time dependency of the changing parameter. For this purpose, an extra input unit is added to a feedforward artificial neural network and a suitable error function minimized in the training process. Testing of our algorithm on synthetic data shows its efficacy and allows extracting general criteria for applications on real-world problems. Finally, a preliminary study of the well-known sunspot time series recovers particular features of this series, including recently reported changes in solar activity during last century.
1. Introduction Most real-world time series have some degree of nonstationarity due to external perturbations and/or changes in the internal parameters of the observed system. Furthermore, natural dynamics are often complex enough to comprise multiple time scales, so that for short observational periods the effective degrees of freedom with the largest scales act as external perturbations for the fastest observed modes. In spite of this, nonlinear methods for time-series analysis, including most of the techniques developed from the theory of dynamical systems [1], mostly rely on the stringent condition of stationarity. In the last years, however, an increasing effort is being devoted to devise methods for nonstationary time series analysis [2,3], a problem that presents many theoretical and practical challenges. Some of the recent works have addressed the question of the proper characterization of non-stationarity [4], caused either by slow continuous perturbations (driving forces) or by abrupt discrete changes in the dynamics on very short time intervals (change point detection [5]). Very recent works have also extended delay embedding ideas and used them to cope with non-stationarity [6]. In addition, methods have been proposed for applications that range from monitoring physiological and mechanical signals [7] to extracting messages from a chaotic
background [8]. Non-stationary time series analysis is also of major relevance for ecosystem modeling [9] and population dynamics [10] under changing environmental conditions. In this work we focus on the accurate reconstruction of the driving parameters responsible for the non-stationary behavior in time series. This problem has been already discussed by Casdagli [2] using recurrence plots but, as this author himself states, his method is not very accurate. Moreover, it is in general difficult to extract parameter variations from recurrence plots, although qualitative information can often be inferred. A more efficient method has been proposed by Schreiber [3], and it is based on the calculation of a cross-prediction error dissimilarity matrix and a cluster algorithm to define coordinates in parameter space. In a previous work [11], we introduced a simpler method that produced very accurate results on computer-generated data and was able to predict interesting features of real-world data. Here we propose yet another method to trace slow parameter variations from the non-stationary dynamics of complex systems, and assess its performance by applying it to synthetic data. Furthermore, we present results of a concrete application to the real-world sunspot time series that agree qualitatively with observed changes in solar dynamics during last century [12].
2. Learning Driving Signals Consider an observational record D={xt, t=1,N} corresponding to a process generated by a deterministic system. We model this process in a d-dimensional pseudo-phase space according to xt+1 = f (xt ,!t) + "t . Here xt = (xt , xt-1 ,..., xt-d+1), time t is measured in units of the lag # between observations, and "t is some residual additive noise of zero mean. The parameter !t accounts for the effects of either a slow external perturbation acting on the system or internal parameters or degrees of freedom varying on large time scales T >> # not modeled by f. We want to reconstruct both the intrinsic dynamical function f and the nonstationary signal ! from the
Here o(• ;w) is the network’s output and &2x the data variance. Notice that in this step the input !t corresponding to the extra neuron in the input layer is set to 0. Furthermore, although some validation may be required to avoid overfitting, for simplicity we will assume here that the ANN architecture has been chosen not flexible enough to account for noise, so that the training process can be carried out until convergence.
available data D. To this end we propose the following algorithm: 1. Train the artificial neural network (ANN) shown in Fig. 1, minimizing with respect to weights and biases w the normalized error:
Emod =
1 N 2x
N $1
%[ x
$ o ( xt ,
t +1
t
= 0; w)]2 .
t =1
!t
xt
xt-1 ot+1
xt-d+1
xt
f(xt,!t)
ot+1 = f(xt,!t)
Figure 1. Feedforward artificial neural network with the extra input unit used to model the parameter change.
2. Switch on the non-stationary perturbation !t, and retrain the network minimizing the error Etot = Emod + 'Esmooth with respect to w and the unknown !t’s. Here
Esmooth =
1 N 2x
N $1
%(
t +1
$
t
)2
t =1
is a smoothing error term introduced to penalize sudden !t variations (Notice that we normalize it using the data variance &2x like Emod). The parameter ' sets the appropriate scale between Emod and Esmooth as discussed below. This new error term is necessary for two reasons: First, we assumed that the non-stationary driving signal was slow in comparison with the lag # between observations. Secondly, the smoothing error avoids the wrong adjustment of !t that, instead of reconstructing the actual parameter drift, tries to
compensate the residual error in each point left by fitting f disregarding the !t variation. 3. Repeat the last step for different values of the scale parameter ', and plot the minimum Emod and Esmooth obtained in each case as a function of this relative scale. The expectation is that: i) for ' too small one will obtain rough parameter !t variations (highly correlated to the residual errors in step 1 and, consequently, large Esmooth and very small Emod values (the inputs !t will be tuned to produce outputs of the ANN that exactly match the targets). ii) For ' too large Esmooth will be almost zero, !t will be nearly constant and, consequently, Emod will not vary much from the values obtained in step 1. iii) For intermediate values of ' there should be plateaus in Emod and Esmooth, indicating that the results become
2
Parameter !t
1
0
-1
-2
-3 0
Erec =
1 N 2r
N
% (r $ t
t
)2 .
t =1
To make the determination of !t more demanding, in addition to the level of noise incorporated to the logistic map we considered only N=100 iterates in the data set D. We used simple 2:3:1 architectures for the ANN and trained it with the standard backpropagation rule. Fig. 3 shows a typical evolution of the different errors as a function of the number of training epochs.
60
80
100
2
10
0
10
-2
10
Emod Esmooth Etot Erec
1x10
In order to check the algorithm above described we have implemented it in a controlled situation: We considered the chaotic logistic map with additive noise: yt+1 = r yt (1-yt), xt+1 = yt+1 + "t+1, where "t is Gaussian noise with a noise-to-signal ratio of 0.1. The parameter r was slowly driven according to the law rt = r0 – A cos[(2*t/T) e$t/T], with r0=3.8, A=0.045 and T=50 (see Fig. 2), keeping the map in the interesting chaotic regime. Since in this artificial case we know the exact !t ) rt, we can assess the efficacy of the algorithm proposed above by monitoring the reconstruction error
40
Figure 2. Driving parameter for the logistic map (full line) and its best reconstruction (dashed line). The gray area indicates the dispersion of 10 independent runs of our
-4
3. Applications
20
Time
Error
insensitive to '. The curves !t vs. t in this region are practically identical to each other and any one of them can be taken as the optimal reconstruction. Some comments are in order at this point. First, in the implementation of the stochastic backpropagation rule the adjustment of !t requires simply the calculation of the gradients (Esmooth/(!t = 2(!t+1 - !t-1) and (Emod/(!t = (xt+1 - ot+1) %kwok &k(1-&k)wk!. Here k runs on hidden-layer units, wok are weights and biases from these units to the output one, and wk! are the connections between the extra neuron for the ! input and the hidden units. As usual, & denotes the sigmoid activation function. Secondly, by a convenient rescaling of weights and shift of biases it is always possible to modify the function !t so that its mean value E[!])0 and its variance E[!2]-E2[!])1. Actually, this is a reflection of the fact that the dynamics perturbation can only be reconstructed up to a linear transformation. In practice, we have determined !t according to step 2 starting from !t = 0 and normalizing the inputs after each epoch to fulfill these two conditions. This speeds up the learning process and sets an absolute unit to measure the scale '. Notice that due to this renormalization of !, even for very large values of ' the !t=constant solution will never be reached. Instead, one obtains flat curves with a high spike at some random point. Third, we performed step 1, in which the ANN approximately learns the global dynamics before starting to adjust !t, to help avoiding the wrong tuning of the ! inputs to the errors the ANN makes while learning the targets.
-6
10
0
10
algorithm.
1
10
2
10
3
10
4
10
5
10
Epoch
Figure 3. Evolution of different errors during the training process for the case of the logistic map.
Let’s discuss these evolutions in detail: • With '=0, the total error Etot decreases during training until settling in a low error value. Then, at epoch 500, when we switch the extra input and make '+0, it jumps to a larger value that depends on ' and rapidly decreases until reaching a plateau. After epoch 1100 the training process forces a crossover to a final situation in which !t becomes too smooth. • Emod is equivalent to Etot for '=0. At epoch 500 the network starts making use of the new input to model the dynamics, and this error drops approximately one order of magnitude. Finally, when !t becomes too
extends for almost 5 decades). Then, for ' ~ 70 this error has a sudden drop, indicating that !t becomes too smooth. The final increase in Esmooth is simply caused by the rescaling of the input, which leads to the flat solutions with high random spikes mentioned before. • In accordance with the behavior of Esmooth, we see that Etot presents an approximately linear rise with ' for the intermediate ' values corresponding to the plateau in Esmooth (Emod is much smaller and can be neglected). • Finally, the reconstruction error Erec confirms the previous analysis, showing a similar quality in the curves for !t obtained with ' values that, remarkably, may differ up to 5 orders of magnitude. In Fig. 2 we present the best reconstruction of !t, corresponding to '=50, together with the ±& error bounds estimated from the standard deviation of the 10 experiments performed at this ' value. 6
10
Erec Emod Esmooth Etot
3
10
Error
smooth it increases up to a value slightly below the minimum error without the extra input. (the final reconstructed perturbation, even if not completely correct, still helps in modeling the non-stationary dynamics). • Esmooth is only defined after epoch 500. It starts from a large value and quickly decreases to reach a first plateau, which corresponds to the correct reconstruction of !t. Then, for the chosen value of ' (0.1 in this case) the training process smoothes this curve too much, and Esmooth shows the final crossover to this situation. • The error Erec in the reconstruction of !t, only available in this controlled situation, has the expected behavior: For '=0 it is equal to one by construction (we initialize !t= 0). Then, while the algorithm seeks the appropriate parameter variation Erec drops more than one order of magnitude, until after epoch 1100 it starts increasing because of the excessive smoothing of !t above discussed. What we have described here is the typical situation for intermediate to large values of '. In the case of small '’s, the training errors follow different evolutions that can be understood in a similar way. Instead of discussing these curves, notice that in Fig. 3 the profile of Emod as a function of the training epochs mimics very closely the behavior of Erec. This is not surprising, since one expects that good reconstructions of !t lead to small modeling errors and vice-versa. Consequently, as a practical criterion to identify the right curve in a real situation we will consider the ‘optimal’ !t’s to be the values of this extra input at the minimum Emod. In Fig. 4 we show the behavior as a function of ' of all the errors at the minimum of Emod. For instance, according to Fig. 3 the error values for '=0.1 correspond to the network trained approximately 1000 epochs. Actually, the results in Fig. 4 have been obtained training 10 different ANNs and choosing the best network (with the smallest Emod) for each value of '. Let’s discuss again the errors’ behaviors in detail: • The error Emod has a steady rise with ' that was to be expected: For very small ' the network tunes the extra input to cancel the modeling errors, without paying much attention to the term 'Esmooth since its value is negligible with respect to Emod. On the contrary, for very large ' the parameter variation becomes too smooth to reduce as much as possible Esmooth, and Emod tends to the value obtained without the extra input. In between, there is an uninteresting and almost featureless increase of this latter error. • Esmooth starts from a large value at small ', and decreases with ' until it reaches a very flat and extended plateau. In this region this error is practically insensitive to ', corresponding to the best reconstruction of !t (in passing, notice that the plateau
0
10
-3
10
-6
10
-4
1x10
-2
10
0
10
2
10
4
10
Lambda Figure 4. Different errors at the minimum of Emod as a function of ' for the logistic map.
Finally, we present a real-world application of the reconstruction method. We have considered the sunspot time series, which is frequently used as a benchmark in the statistical literature and is known to have an intrinsic dynamics with several time scales [13]. Since this is a very noisy time series, a thorough investigation would require optimizing the ANN architecture used and controlling the possibility of overfitting. In this preliminary study we will work instead with a very small 3:2:1 ANN; though not optimal in terms of modeling error, training it to convergence does not lead to serious overfitting. In Fig. 5 we show the results obtained by the same procedure used in the case of the logistic map. We see that the behavior of the different errors is quite similar to what we found previously (compare Fig. 4 and 5). The only difference is the reduction of the plateau for the sunspot time series, although there is still a consistent behavior (i.e., !t roughly independent of ') for nearly a
decade of ' values (20-150). Fig. 6 shows the best reconstruction of !t, obtained with '=100, and the corresponding bounds obtained as before. Fine details of this curve might be artifacts due to the poor modeling of the solar dynamics by the small ANN used. However, the perturbation of approximately 100-year period (Gleissberg cycle [13]) and the important rise in the last century [12] are certainly characteristics in agreement with results of other independent studies. In particular, this last effect has been attributed to chaotic changes in the dynamo that generates the solar field.
6
10
Emod Esmooth Etot
4
10
Error
2
10
0
4. Summary and Conclusions We have proposed a simple method for the reconstruction of perturbing signals from non-stationary time-series. The algorithm simply incorporates an extra input unit in a feedforward ANN, and adjusts the corresponding input values during the training phase to learn simultaneously the intrinsic dynamics and the temporal profile of the driving parameter. Using synthetic data from a forced logistic map, we have shown that our algorithm is able to perform this task with good accuracy, allowing a better modeling of the whole system. Moreover, a concrete application to the analysis of the sunspot time series reveals changes in the solar dynamics that are in agreement with other recent studies. In principle, the algorithm here proposed could be extended to trace the simultaneous variation of several parameters, something that has not been fully explored in the literature. Furthermore, these ideas can also be applied in general regression problems where the underlying phenomenon is changing in time. Work in these directions are in progress.
10
References -2
10
[1] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, Cambridge Nonlinear Science Series 7 (Cambridge Univ. Press, Cambridge, 1997). [2] M. Casdagli, Physica D 108, 12 (1997), and references therein. [3] T. Schreiber, Phys. Rep. 308, 1 (1999). [4] R. Manuca and R. Savit, Physica D 99, 134 (1996); T. Schreiber, Phys. Rev. Lett. 78, 843 (1997). [5] F. Lombard and J.D. Hart, in Change-Point Problems, eds. E. Carlstein, H.-G. Muller and D. Siegmund, IMS Lecture Notes – Monograph Series, Vol. 23 (Inst. Nath. Statist., Hayward, CA, 1994); M.B. Kennel, Phys. Rev. E 56, 316 (1997). [6] J. Stark, J. Nonlinear Sci. 9, 255 (1999); R. Hegger, H. Kantz, L. Matassini, and T. Schreiber, Phys. Rev. Lett. 84, 4092 (2000). [7] J. Stark and B.V. Arumugan, Int. J. Bifurcation and Chaos 2, 413 (1992); L.M. Hively, P.C. Gailey and V.A. Protopopescu, Phys. Lett. A 258, 103 (1999). [8] K.M. Short, Int. J. Bifurcation and Chaos 7, 1579 (1997). [9] D. Summers, J.G. Cranford, and B.P. Healey, Chaos, Solitons and Fractals 11, 2331 (2000). [10] D.J.D. Earn et al., Science 287, 667 (2000). [11] P.F. Verdes, P.M. Granitto, H.D. Navone, H.A. Ceccatto, Phys. Rev. Lett. 87, 124101 (2001). [12] M. Lockwood, R. Stamper and M.N. Wild, Nature 399, 437 (1999). [13] W. Gleissberg, Solar Phys. 2, 231 (1967); A. García and Z. Mouradian, Solar Phys. 180, 495 (1998).
-4
10
-4
-2
10
0
10
2
10
4
10
10
Lambda Figure 5. Different errors at the minimum of Emod as a function of ' for the sunspot time series.
Parameter !t
2 1 0
-1 -2 0
50
100
150
200
250
300
Time [Years] Figure 6. Best reconstruction of !t for the sunspot problem (full line). The gray area indicates the dispersion of 10 independent runs of our algorithm.
.