Optimal Selection of Embedding Parameters for Time ... - CiteSeerX

Optimal Selection of Embedding Parameters for Time Series Modelling Michael Small∗ Abstract — Time delay embedding is the first step in reconstruction of deterministic nonlinear dynamics from a time series. Unfortunately, there is no generic way to select the best time delay embedding. We show that for time series modelling it is possible to apply information theoretic arguments which lead to optimal selection of embedding window. Our results show that selection of embedding dimension and embedding lag should be considered not as part of the embedding process but as part of the modelling procedure. Nonlinear time series modelling results show qualitative and quantitative improvement in both long term and short term dynamics.

1

INTRODUCTION

Takens’ embedding theorem [1] is very often invoked as the motivation for applying a time delay embedding to reconstruct multi-dimensional dynamics from a scalar variable. Let xt be the scalar observable observed at integer times t = 0, 1, 3, 4, . . . , N . The usual incarnation of the time delay embedding is to obtain vector variables vt such that vt

=

(xt , xt−τ , xt−2τ , . . . , xt−(de −1)τ )

(1)

and, by appealing to the theorem of Takens one claims that for suitable τ , and sufficiently large de and N the evolution of vt is topologically equivalent to the underlying dynamical system. Unfortunately, N will normally be constrained and there is no generic rule for the selection of de and τ . Within the dynamical systems community methods such as minimum mutual information, false nearest neighbours and plateau onset of dynamical invariant are commonly applied [2]. Engineers would be more familiar with the Nyquist limit which implies an absolute criteria in the case of systems which exhibit finite bandwidth (this is not strictly applicable to deterministic aperiodic nonlinear systems). Very often, the aim of reconstructions such as (1) is to be able to successfully estimate dynamic invariants of the underlying system (such as correlation dimension and the leading Lyapunov expo∗ Department of Electronic and Information Engineering, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China, e-mail: [ensmall,encktse]@polyu.edu.hk, tel.: +852 2766 4744, fax: +852 2362 8439.

Chi K. Tse∗ nent) [2]. In this case one can appeal to theoretical results that suggest de > 2dc + 1 [1], de > dc [3], or that only suitable selection of τ is significant [4]. Contradictory numerical results have also shown that the crucial parameter is actually the embedding window de τ [5] In this paper we ask a slightly different question, and naturally arrive at a different answer. We are interested not in correct estimation of dynamic invariant but only optimal reconstruction of the underling dynamics for a specific finite noisy time series. We find that in this situation choice of embedding lag τ should be left to the modelling algorithm. In fact, our results suggest that embedding and modelling are two parts of the same process and it is generally not possible to find the optimal embedding parameters without first building a model (and vice versa!). We derive an expression for the optimal embedding window as a function of the underlying dynamics, the system noise and the observation length N . Using this measure we provide an algorithm which can be used to estimate this embedding window and show that this method can produce superior modelling results. In section 2 we discuss the necessary theoretical framework. Section 3 describes the numerical modelling algorithm and in section 4 we present some modelling results. 2

THE CRITERION

We first need to define what we mean by the “best” model. Suppose that a time series xt of N observations has been observed and that we wish to construct an embedding such that zt

=

(xt−`1 , xt−`2 , xt−`3 , . . . , xt−`n )

(2)

where the embedding lags ì satisfy 0 ≤ `1 ≤ ì < ì+1 ≤ `n = dw . Notice that (2) represents a slight generalisation of (1). Equation (2) is completely defined by dw and a binary vector a = (a1 , a2 , . . . an ) ∈ {0, 1}dw such that aj = 1 ⇐⇒ j = ì for some i. Our objective is obtain a model f of the underlying dynamics from (2) such that xt+1 = f (zt ) + et

(3)

DL(x) = DL(x|f ) + DL(f )

(4)

is minimised. The first term DL(x|f ) is the description length of the data given the model, this is the description length of the model prediction errors et : DL(x|f ) = − ln P (x|N (0, σ 2 ))

(5)

where we approximate et ∼ N (0, σ 2 ). The second term, DL(f ) is the cost of specifying the particular model we use, the model parameters and the initial conditions of that model. The term DL(f ) is dependent on the embedding parameters a and dw (defined above). The description length of the data is (roughly) the number of bits needed to describe the data to some fixed precision. One can either specify the data completely or specify a model of that data and the model prediction errors. The rationale of minimum description length is that a compact (efficient) description of the data is best. Previously, Judd and Mees [7] have provided an algorithm to achieve an approximate solution to (4) for fixed a and dw . In this contribution we investigate the optimisation of (4) with respect to a and dw . If we restrict our attention to the variation of (4) with the embedding parameters we can write the second term DL(f ) as DL(f )

ln(error) / description length

where et are minimal. Ideally, et ∼ N (0, σ 2 ) such that σ 2 isP minimal. In general we only choose f such that t e2t is minimised. Clearly this minimisation can be achieved by making the model f sufficiently complicated. To avoid this un-enlightening eventuality we impose the minimum description length criteria [6] and instead require that the description length of the data

2 = − ln P (x0 , x1 , . . . , xdw −1 |N (µX , σN )) +DL(a) + DL(P). (6)

The first term on the right hand side is the description length of the model initial conditions, which we approximate as dw Gaussian random variable 2 . The second term with mean µX and variance σX is the description length of the embedding parameter a and clearly DL(a) = dw . The third term is the description length of the model parameters P. Finally, combining (4), (5) and (6); and expanding the negative log likelihood terms; we obtain √ σX ) DL(x) = N ln 2πσ + dw (1 + ln σ N + + DL(µX ) + DL(P). (7) 2 For a given model all but the last term are readily computable, and this term may be estimated using the methodology described in [7].

2000 0 −2000 −4000

0

10

20 30 40 embedding window dw

50

Figure 1: Computation of description length as a function of embedding window dw . The solid line is proportional to ln σ and is non-increasing (and constant for dw ≥ 34). The upper line is an evaluation of (7), which exhibits a minimum at dw = 15. Notice that equation (7) is not dependent on the actual embedding vector a. Therefore, the description length in not dependent on the embedding chosen and that all embedding lags should be presented for the modelling algorithm to select the optimal set. Hence, prior to modelling it is only necessary to determine the maximum embedding lag dw . 3

THE ALGORITHM

In an effort to make the minimisation of (7) tractable we make one substantial simplification. Instead of optimising over all possible models f or a large class of parameterised nonlinear functions, we restrict ourselves to local constant nonlinear models [8]. We choose s 6= t such that kzs − zt k is minimal. Then the “model” is given by f (xt+1 ) = xs+1

(8)

Clearly, this model is not predictive. But, this formulation has the advantage that it is extremely robust and is also provides a good characterisation of the topological properties of a chosen embedding. One further advantage of (8) is that this modelling scheme is entirely parameter free1 . and we may set DL(P) = 0. To choose the optimal dw we must compute (7) as a function of dw . To achieve this we employ the following algorithm. • Let d = 0 and let a = ∅ be the set of selected lags. Initialise the model prediction errors so that et = xt . • Repeat until a minimum of (7) is reached: 1 Alternatively, one could legitimately argue that the data are the parameters. In either case DL(P) = constant

At each iteration of the model we recompute (7) for the new value of dw , but only update the model to include the largest lag if it actually improves the result. This dependence of a is necessary only because the modelling algorithm is not sophisticated enough to select the optimal embedding parameters. Figure 1 demonstrates that this added complication is sufficient to achieve the required results. The curve depicting model prediction error is smooth and a decreasing function of dw , as expected. In figure 1 we also note that the optimal embedding window occurs at dw = 34. For this data set we obtain a value of τ ≈ 8 by examining the autocorrelation curve and the embedding dimension is known to be de = 3 or 4. Therefore, we see good agree between the new algorithm and expected results. In the next section we describe the application of this algorithm for selection of model lags for certain simple test systems and for experimental time series data. 4

THE MODEL

To test the application of this algorithm we consider data generated by the chaotic Rössler system and the Ikeda map. Both systems are affected by 5% noise (i.e. σσX = 20) and we vary the length of data. Figure 3 depicts the results of this calculations. As expected, figure 3 shows that the value of dw which minimises − ln σ is an upper bound on the value which minimises (7). For the Rössler time series we see that the optimal embedding lag increases with time series length. This is consistent with our intuition: as the amount of available data increases we become more confident of our predictive ability and look further into the past to predict the future. Furthermore, we see that the selected values correspond to significant time lags in the data. According to a calculation of autocorrelation, the first zero occurs at a lag of about 8 and the period is approximately 31. Our calculation for the Ikeda time series shows very little variation with time series length, and no

embedding window d

w

Rossler: 5% noise 60 40 20

embedding window dw

– Compute the description length for dw = d according to (7). – Compute the prediction error of the local constant model with time delay embedding (2) such that ì ∈ a ∪ {d + 1} for all i. If this is smaller than the current best, let a = a ∪ {d + 1} and update the current best model prediction error. – Increment d.

0

4

5

4

5

6 7 Ikeda: 5% noise

8

9

6 7 8 ln(time series length (n))

9

6 4 2 0

Figure 2: Variation of dw with N for Rössler and Ikeda systems with 5% observational noise. The bar chart and astricks depict results for the algorithm described here and according to the local model. distinction between the value which minimises (5) and that which minimises (7). Because this is a low dimensional chaotic map the best embedding lag τ = 1. Although the map is 2 dimensional, it is highly “twisted” and requires 3 dimensions for a time lag embedding to have no self intersections. This is consistent with the optimal embedding window dw = 3. Higher values of dw for longer data sets are the result of additional information from previous time series observation being available to “smooth” the noise in the system. We have conducted similar calculations for various noise levels and found results to be consistent with those presented here.In this communication we also present results for experimental data: the sunspot time series and chaotic laser dynamics. The experimental origin of the data and further examples are given in [9]. Figure 3 depicts the experimental sunspots time series and nonlinear models built using a minimum description length nonlinear modelling scheme described in [10]. We compare the typical dynamic behaviour for models built using embedding parameters estimated in the standard way (de = 6 and τ = 3) or using the optimal embedding window (dw = 6). The optimal embedding window actually

200 100 0 0 200

50

100

150

200

250

300

100 0 0 200

50

100

150

200

250

300

50

100

150

200

250

300

100 0 0

Figure 3: The top panel shows the sunspot time series data. The middle and bottom plot show representative iterated (noise-free) predictions from models built either with a standard embedding (centre) or a windowed embedding estimated the method described here (bottom panel).

uses less information about the system (in the sense that the embedding does not look as far into the past), but the dynamic performance is actually superior. The asymptotic dynamics exhibited by the model built using a standard embedding are a stable focus, for the window embedding technique we observe chaos. We also observed similar improvement in dynamic performance for models built from the Rössler system and the chaotic laser time series [9]. Finally, table 1 presents a summary of the modelling results for each of the three continuous systems considered in this communication. As the model building scheme is stochastic, the results quoted are the mean of 50 nonlinear models. We see that in each case the model built using the embedding window suggest by our new algorithm produced superior results: lower description length and lower prediction error. The model size (number of nonlinear terms in the model) was comparable in each case. ACKNOWLEDGEMENTS This work was supported by a Hong Kong Polytechnic University Research Grant (No. G-YW55).

model Rössler (de = 4, τ = 8) (dw = 15) sunspots (de = 6, τ = 3) (dw = 6) laser (de = 5, τ = 2) (dw = 10)

MDL

size

RMS

−655 −716

15.6 21.1

0.158 0.151

1267.9 1230.1

7.32 6.96

13.16 12.31

5753.6 5239.8

100.8 109.5

2.405 1.767

Table 1: Comparison of model performance with standard constant lag embedding and embedding over the embedding window suggested by the new algorithm. References [1] F. Takens. Detecting strange attractors in turbulence. Lecture Notes in Mathematics, 898:366–381, 1981. [2] H.D.I. Abarbanel. Analysis of observed chaotic data. Institute for nonlinear science. SpringerVerlag, New York, 1996. [3] M. Ding et al. Plateau onset for correlation dimension: when does it occur? Physical Review Letters, 70:3872–3875, 1993. [4] Y.C. Lai and D. Lerner. Effective scaling regime for computing the correlation dimension from chaotic time series. Physica D, 115:1–18, 1998. [5] H.S. Kim, R. Eykholt, and J.D. Salas. Delay time window and plateau onset of the correlation dimension for small data sets. Physical Review E, 58:5676–5682, 1998. [6] J. Rissanen. Stochastic complexity in statistical inquiry. World Scientific, Singapore, 1989. [7] K. Judd and A. Mees. On selecting models for nonlinear time series. Physica D, 82:426–444, 1995. [8] G. Sugihara and R.M. May. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature, 344:734–741, 1990. [9] M. Small and C.K. Tse. Optimal embedding parameters: A modelling paradigm. [10] M. Small and C.K. Tse. Minimum description length neural networks for time series prediction. Physical Review E, 66:066706, 2002.

Optimal Selection of Embedding Parameters for Time ... - CiteSeerX

Optimal Selection of Embedding Parameters for Time ... - CiteSeerX

Suggest Documents

Optimal time delay embedding for nonlinear time series modeling

application of taguchi method for the selection of optimal parameters ...

Selection of Optimal Parameters for ECG Signal ... - Semantic Scholar

Selection of Optimal Process Parameters for Wire Arc Additive ... - FloRe

Selection of Optimal Process Parameters for Wire Arc ... - ScienceDirect

Derivation of optimal input parameters for minimizing execution time of ...

Optimal state selection and tuning parameters for a degradation model

Selection of Cutting Parameters for Minimizing the Real Time Current ...

Optimal Configuration for BGP Route Selection - CiteSeerX

Optimal Data Selection for Unit Selection Synthesis - CiteSeerX

Time-optimal Active Portfolio Selection - Google Sites

Optimal Selection of ANN Training and Architectural Parameters Using

Optimal selection of operating parameters in end ... - CyberLeninka

Optimal Technology Selection and Operation of ... - CiteSeerX

Optimal Time Bounds for Approximate Clustering - CiteSeerX

selection and updating of parameters for an aluminium ... - CiteSeerX

Optimal time-activity basis selection for exponential spectral analysis ...

Selection of appropriate laser parameters for launching ... - CiteSeerX

Optimal Measurement Selection For Any-time Kalman ... - Google Sites

Optimal Stopping and Embedding - Skokholm

Sliding Window Analyses for Optimal Selection of Mini - CiteSeerX

optimal selection of speech data for automatic speech ... - CiteSeerX

Optimal Membrane Selection for

Optimal Membrane Selection for