Studies in Nonlinear Dynamics and Econometrics Quarterly Journal Volume 4, Number 3 The MIT Press
Studies in Nonlinear Dynamics and Econometrics (ISSN 1081-1826) is a quarterly journal published electronically on the Internet by The MIT Press, Cambridge, Massachusetts, 02142-1407. Subscriptions and address changes should be addressed to MIT Press Journals, Five Cambridge Center, Cambridge, MA 02142-1407; (617)253-2889; e-mail:
[email protected]. Subscription rates are: Individuals $40.00, Institutions $135.00. Canadians add additional 7% GST. Submit all claims to:
[email protected]. Prices subject to change without notice. Permission to photocopy articles for internal or personal use, or the internal or personal use of specific clients, is granted by the copyright owner for users registered with the Copyright Clearance Center (CCC) Transactional Reporting Service, provided that the per-copy fee of $10.00 per article is paid directly to the CCC, 222 Rosewood Drive, Danvers, MA 01923. The fee code for users of the Transactional Reporting Service is 0747-9360/00 $10.00. For those organizations that have been granted a photocopy license with CCC, a separate system of payment has been arranged. Address all other inquiries to the Subsidiary Rights Manager, MIT Press Journals, Five Cambridge Center, Cambridge, MA 02142-1407; (617)253-2864; e-mail:
[email protected].
c 2001 by the Massachusetts Institute of Technology °
On Nonlinear, Stochastic Dynamics in Economic and Financial Time Series Christian Schittenkopf Austrian Research Institute for Artificial Intelligence
[email protected]
Georg Dorffner Department of Medical Cybernetics and Artificial Intelligence University of Vienna
[email protected]
Engelbert J. Dockner Department of Business Administration University of Vienna
[email protected]
Abstract. The search for deterministic chaos in economic and financial time series has attracted much interest over the past decade. Evidence of chaotic structures is usually blurred, however, by large random components in the time series. In the first part of this paper, a sophisticated algorithm for estimating the largest Lyapunov exponent with confidence intervals is applied to artificially generated and real-world time series. Although the possibility of testing empirically for positivity of the estimated largest Lyapunov exponent is an advantage over other existing methods, the interpretability of the obtained results remains problematic. For instance, it is practically impossible to distinguish chaotic and periodic dynamics in the presence of dynamical noise even for simple dynamical systems. We conclude that the notion of sensitive dependence on initial conditions, as it has been developed for deterministic dynamics, can hardly be transferred into a stochastic context. Therefore, the second part of the paper aims to measure the dependencies of stochastic dynamics on the basis of a distributional characterization of the dynamics. For instance, the dynamics of financial return series are essentially captured by heteroskedastic models. We adopt a sensitivity measure proposed in literature and derive analytical expressions for the most important classes of stochastic dynamics. In practice, the sensitivity measure for the a priori unknown dynamics of a system can be calculated after estimating the conditional density of the system’s state variable.
Keywords. chaos, Lyapunov exponents, stochastic dynamics, time series
Acknowledgments. This work was supported by the Austrian Science Fund (FWF) within the research project “Adaptive Information Systems and Modelling in Economics and Management Science” (SFB 010). The
c 2001 by the Massachusetts Institute of Technology °
Studies in Nonlinear Dynamics and Econometrics, 4(3): 101–121
Austrian Research Institute for Artificial Intelligence is supported by the Austrian Federal Ministry of Education, Science, and Culture. 1 Introduction
The enormous interest in deterministic chaotic systems stems from two opposite characteristics of these dynamical systems: First, they are unpredictable in the long term. Any measurement error in the initial conditions grows exponentially in time. This sensitive dependence on initial conditions is usually quantified by a positive Lyapunov exponent (Eckmann and Ruelle 1985; Oseledec 1968), which is just the exponential growth rate of errors.1 An equivalent information theory–based characterization of dynamical systems is based on the Kolmogorov-Sinai entropy (Kolmogorov 1958; Sinai 1959), which can be interpreted as the amount of uncertainty in predictions about the state2 of the system one time step ahead.3 Second, if the dynamics of a system, which has been considered stochastic thus far, is shown to be chaotic but deterministic, short-term predictions may be possible. This means that a time series that looks random at first glance may in fact be predictable on short time scales. There has been an intensive debate in the literature as to whether economic and financial time series exhibit low-dimensional chaos (see, e.g., Jaditz and Sayers 1993 for a review). An application of basic concepts of chaos theory to capital markets and economics can be found, for instance, in Dockner, Prskawetz, and Feichtinger 1997; Elsner 1996; Frank, Gen¸cay, and Stengos 1988; Hsieh 1991; and Peters 1991. Nowadays most specialists in the field would probably say that there is no conclusive evidence of chaos in financial data. The detection of chaotic structures in financial data is usually complicated by the large noise component inherent in the underlying dynamical system. A test to detect nonlinear departures from random walk behavior is described in Scheinkman and LeBaron 1989. One approach to test a time series for chaos is based on entropy and dimension estimates on the raw series and on the residual series of a linear model in combination with the BDS (Brock, Dechert, and Scheinkman) test (see Brock 1986; Brock, Dechert, and Scheinkman 1987; Brock, Hsieh, and LeBaron 1991). Another approach is to estimate the largest Lyapunov exponent and to test for positivity empirically,4 as in the first part of this paper. Lyapunov exponents are global quantities; that is, they are average values over the whole attractor. Therefore they do not provide information about the local rate of divergence of initially nearby trajectories, which can even be negative; that is, the distance between trajectories may decrease in certain areas of the attractor. In terms of prediction, there are thus areas where reliable short-term forecasts are possible and areas where they are impossible because of the exponential growth rate of errors. This phenomenon of locally changing divergence rates has been termed effective Lyapunov exponents (Grassberger, Badii, and Politi 1988), local Lyapunov exponents (Eckhardt and Yao 1993; Wolff 1992) and predictability portraits (Doerner, H¨ubinger, and Martienssen 1991) in the literature. The interpretation of local and global Lyapunov exponents, however, gets difficult for stochastic systems, both at the theoretical level and at the practical level. The deterministic and stochastic (random) components of a dynamical system can interact in a very complicated manner that makes a reasonable definition of sensitive dependence on initial conditions difficult (Nychka et al. 1992). This type of noise is called dynamical noise, in contrast to measurement noise, which does not influence the dynamics of a system.5 The term noisy
1 Of
course, measurement errors can grow exponentially only for finite time because of the finite size of attractors. This boundedness of trajectories is, besides trajectories’ sensitive dependence on initial conditions, central to the definition of deterministic chaos. 2 The state of the system is the position on the attractor, the coordinates of the quantity defining the dynamical system, or loosely speaking, the value of the observed time series. 3 This concept can be generalized to stochastic systems (Deco, Schittenkopf, and Sch¨ urmann 1997b) and to multi-step predictions (Deco, Schittenkopf, and Sch¨urmann 1997a; Schittenkopf and Deco 1996). 4 There is no asymptotic distribution theory for Lyapunov exponent estimates. 5 In the presence of measurement noise, one can always try to reveal the underlying deterministic system by applying methods for noise reduction (see, e.g., Kostelich and Schreiber 1993 for a survey of common methods).
102
On Nonlinear, Stochastic Dynamics
chaos, which can be found in the literature to denote chaotic systems disturbed by dynamical noise, is problematic: in principle, the standard assumption of Gaussian noise permits stochastic components of infinite size for any (positive) variance. A deterministic chaotic system having a stable attractor can thus easily become unstable if it is disturbed by stochastic components with noise distributions having infinite support (Chan and Tong 1994).6 The reasons for studying stochastic dynamical systems with dynamical noise are as diverse as the following ones. An attractor may admit infinitely many invariant measures. The physically relevant measure should be close to the invariant measure of the corresponding dynamical system with small dynamical noise (Eckmann and Ruelle 1985).7 Or even if a dynamical system, which we want to model, is deterministic, it is not conceivable that the system is contained in our particular class of models. Thus, a perhaps more robust approach is to model the observed data by a nonlinear dynamical system disturbed by dynamical noise (Chan and Tong 1994). Finally, in economics and finance one is often confronted with dynamical systems the observed trajectories of which (e.g., stock indices) show negligible correlations in the conditional mean, which corresponds to the deterministic component, but a rich structure in the conditional variance, which characterizes the stochastic component. In fact, for these systems one is particularly interested in the properties of the stochastic component, that is, the dynamics of the conditional variance. In time series analysis the time dependence of conditional variances is referred to as heteroskedasticity. Especially in finance, the literature on heteroskedastic time series models is enormous (see, e.g., Bollerslev, Chou, and Kroner 1992 for a review). Although the concept of Lyapunov exponents is defined only for deterministic systems, algorithms for the estimation of all or at least the largest Lyapunov exponent (Eckmann et al. 1986; Gen¸cay and Dechert 1992; Kadtke, Brush, and Holzfuss 1993; Sano and Sawada 1985; Wolf et al. 1985) have been applied to time series generated and observed under substantial influence of noise. For stochastic dynamical systems, however, a naive application of these algorithms can give spurious positive exponents (Tanaka, Aihara, and Taki 1998). For instance, D¨ammig and Mitschke (1993) report that for a time series that is white noise, the algorithm proposed in Wolf et al. 1985 may produce any positive value for the Lyapunov exponent, depending on computational parameters. If Lyapunov exponents are estimated from embedded dynamics, which is usually the case in practice, it is possible that spurious Lyapunov exponents are obtained that are larger than the largest Lyapunov exponent of the true dynamics, even for purely deterministic systems (Dechert and Gen¸cay 1996; Gen¸cay and Dechert 1997). This is usually avoided by starting with a small embedding dimension, which is then increased step by step. Some of the algorithms proposed for estimation of Lyapunov exponents have also been applied to financial time series such as returns of stock and stock indices and exchange rates (Dechert and Gen¸cay 1993; Eckmann et al. 1988; Elsner 1996). The empirical results of these studies are similar in the sense that the estimated Lyapunov exponents are only slightly positive. One possible explanation is that the underlying process has a unit root (Dechert and Gen¸cay 1993).8 Another way to interpret small but positive Lyapunov exponents is to assume the existence of an underlying high-dimensional, chaotic system that generates random-similar behavior because of its hidden dimensionality (Elsner 1996). Yet another reason for exponents close to zero could simply be numerical artifacts as a consequence of inappropriately chosen parameters in the estimation algorithm. In the first part of this paper we investigate a framework introduced in Gen¸cay 1996 in which the largest Lyapunov exponent of a time series generated by a dynamical system can be estimated with confidence intervals. In contrast to the algorithms mentioned above, the method discussed here is based on the
6 Under
the assumption of Gaussian noise, even simple nonlinear systems such as polynomial autoregressive processes of degree higher than one are transient; that is, any trajectory almost surely explodes to infinity. 7 In computer experiments, roundoff errors play the role of small dynamical noise. 8 Loosely speaking, a unit root means that the time series should be differenced in order to obtain a more stationary series.
Christian Schittenkopf et al.
103
calculation of the empirical distribution of Lyapunov exponents using a bootstrapping technique. Therefore, it may be assumed that this method provides a much more reliable tool to analyze deterministic and stochastic dynamical systems. In particular, the well-known logistic map for a chaotic parameter value without noise and for a periodic and a chaotic parameter value with and without noise, a return series of the Austrian stock market index ATX, and a time series without temporal dependencies obtained from shuffling the ATX series are investigated. The results of the empirical analysis lead us in the second part of the paper to a concept of (sensitive) dependence on initial conditions that is well defined for stochastic dynamical systems. We adopt the idea of Yao and Tong (1994) to describe the interaction between the deterministic and stochastic components by an information-theoretic measure of the distance between probability density functions. This sensitivity measure is a natural extension of the concept of Lyapunov exponents, which is based on the distance between trajectories. In contrast to the concept of Lyapunov exponents, this sensitivity measure is local, that is, state-dependent. As an extension of Yao and Tong 1994, we derive analytical expressions for the sensitivity measure for stochastic dynamics with constant or state-dependent variance, which include the dynamics of many models applied in nonlinear time series analysis. Among them are homo- and heteroskedastic models such as autoregressive (AR) models, autoregressive conditional heteroskedastic (ARCH) models (Engle 1982), and generalized ARCH (GARCH) models (Bollerslev 1986). In Section 2 the statistical framework for estimating the largest Lyapunov exponent of a dynamical system with confidence intervals is described and applied to deterministic and stochastic dynamics. Section 3 gives the definition of the sensitivity measure for stochastic dynamics. In Section 4 analytical expressions for this measure are derived for different types of dynamics. Additionally, it is demonstrated that the sensitivity measure can be estimated empirically if the dynamics are unknown. In Section 5 the sensitivity measure is applied to various models of financial return series. Section 6 concludes. 2 Reliable Estimation of Lyapunov Exponents?
In this section we first recall a method for estimating the empirical distribution of the Lyapunov exponents of a dynamical system (Gen¸cay 1996). Then numerical experiments with several artificially generated data sets and one real-world financial time series are performed. 2.1 Estimation of empirical distributions
In general, the dynamics of an n-dimensional dynamical system Xt+1 = f (Xt )
(2.1)
is not accessible to an observer of this system, who usually records a single one-dimensional time series of finite size N : {xt } = (x1 , . . . , xN ). The common method for reconstructing the real dynamics from the time series {xt } is to embed the time series into an m-dimensional embedding space (Packard et al. 1980): xtm = (xt−m+1 , . . . , xt ), t = m, . . . , N
(2.2)
The theoretical foundations for this technique can be found in Ma˜ne´ 1981; Sauer, Yorke, and Casdagli 1991; and Takens 1981. The evolution of the dynamical system may therefore be formulated as xt−m+1 xt−m+2 . . .. .. ˜ (2.3) f: −→ xt−1 xt xt g(xtm )
104
On Nonlinear, Stochastic Dynamics
where g is an unknown map that must be estimated. In this paper we estimate g by neural networks, or more precisely, by multilayer perceptions.9 The mapping of the networks is given by xˆt+1 = g(xtm ) Ã ! H m X X vi tanh wij xt−j +1 + ci + b =
(2.4) (2.5)
j =1
i=1
where H denotes the number of hidden units, vi (wij ) the components of the weight vector (matrix) of the second (first) layer, b the bias of the second layer, and ci the components of the bias vector of the first layer. The first layer is nonlinear because of the sigmoid activation function in the hidden units, whereas the second layer is linear (with the identity as the activation function). In principle, networks of this type can approximate any smooth, nonlinear function and simultaneously the unknown derivatives with an arbitrary degree of accuracy as the number of hidden units goes to infinity (Hornik, Stinchcombe, and White 1990). Throughout this paper the neural networks are trained by minimizing the mean squared error (MSE) function MSE =
−1 1 NX (xt+1 − xˆt+1 )2 N − m t=m
(2.6)
where xt+1 is the target value to the m-dimensional input vector xtm . After training a neural network on an embedded time series the Lyapunov exponents of the dynamical system defined by Equation (2.3) can be estimated. By linearizing the function f˜ we get 1xNm+1 =
N Y
(D f˜)i 1xmm
(2.7)
i=m
where the matrices (D f˜)i , i = m, . . . , N , are given by 0 1 0 0 0 1 . ˜ . (D f )i = . 0 0 0 hi,m hi,m−1 hi,m−2
0 0 ··· 0 1 · · · hi,2 hi,1 ··· 0 ··· 0
(2.8)
∂g (xim ) can be calculated analytically from the neural network. A similar The partial derivatives hi,j = ∂xt−j +1 approach is described in Dockner and Woehrmann 1995. The application of a sequence of QR decompositions (Press et al. 1988) to the matrix product in Equation (2.7) then yields the spectrum of Lyapunov exponents as the average value of the logarithm of the diagonal elements of the upper triangular matrices R (see Eckmann et al. 1986 for details). In the rest of this section we will concentrate on the estimation of the largest Lyapunov exponent λ. The value estimated by the procedure described above is denoted λmax . It is clear that this procedure contains several pitfalls that may have dramatic effects on the obtained results. The length of the recorded time series, the random initialization of the neural networks, and local minima of the error function may all influence the estimated value λmax . It is therefore much more reliable to estimate confidence intervals for the largest Lyapunov exponent (Gen¸cay 1996), for example, by extending the bootstrapping technique to stationary sequences of time series (K¨unsch 1989). As described in Gen¸cay 1996, the resampling is done with
9 Similar
approaches to calculating the largest Lyapunov exponent and the whole spectrum of Lyapunov exponents are described in Gen¸cay and Dechert 1992 and Nychka et al. 1992, respectively.
Christian Schittenkopf et al.
105
replacement from the set of embedded vectors {xtm : t = m, . . . , N } with each vector xtm equally likely to be drawn. For each resampled set of embedded vectors Si ,10 a feedforward network is trained,11 and the largest Lyapunov exponent λi is estimated. Finally, one can compare the estimated exponent λmax and the empirical distribution of λi . In particular, confidence intervals for the estimated Lyapunov exponent are available from the empirical distribution. Another important issue with respect to the reliable estimation of the largest Lyapunov exponent is the appropriate application of the semi-nonparametric estimation procedure. Feedforward networks fit noise easily, which heavily affects the estimated derivatives entering the Lyapunov exponent calculation. Therefore, the method of stopped training is applied: the idea is to estimate the model parameters on one data set (the training set) and to evaluate the model performance on a different data set (the validation set) in parallel. The training procedure is stopped as soon as the error on the validation set increases, which indicates an overfitting of the training data. In fact, we train the networks for a fixed number of iterations, store the network weights at each iteration, and finally choose the weights that yield the minimal error on the validation set. This variation of the original method avoids stopping too early. 2.2 Experimental results
In all numerical experiments, feedforward networks with m inputs (where m denotes the embedding dimension), five hidden units, and one output unit as specified in Equation (2.5) are used. For each dynamical system, 100 sets Si of embedded vectors are generated by resampling with replacement, and one network is trained on each set. The mean of the resulting values λ1 , . . . , λ100 is denoted λ¯ . An empirical 90% confidence interval for λmax is obtained by selecting λ˜ 6 and λ˜ 95 , where the values λ˜ i are obtained by sorting the estimated exponents λi . The first example is the well-known logistic map xt+1 = r xt (1 − xt )
(2.9)
for the parameter value r = 4.0, which almost surely gives rise to a chaotic time series for a random starting point in the unit interval.12 We generated a time series of length 1,000 (the training set) and applied the method described in the previous subsection13 for embedding dimensions m = 1, 5, 10, 15. The results are summarized in Table 1. For all embedding dimensions, the estimated Lyapunov exponent λmax and the mean λ¯ are close to the true Lyapunov exponent λ = 1.0, which is contained in the corresponding confidence intervals. Figure 1 is a plot of λmax and λ¯ with confidence intervals as a function of m. Obviously, the algorithm yields reliable results for this deterministic chaotic system even if the embedding dimension is much larger than the true dimension of the system (m = 1). The next experiment deals with dynamical systems disturbed by dynamical noise. We choose two logistic maps with very similar parameter values: for r = 3.56, the deterministic (undisturbed) system defined by Equation (2.9) converges, after a transient phase, to a sequence of points with period eight. The Lyapunov exponent λ is equal to −0.111. For r = 3.59, the logistic map is chaotic, with λ = 0.207. Both systems are disturbed by dynamical noise of constant variance σ 2 = 0.001: xt+1 = f (xt ) + et
(2.10)
et ∼ N (0; σ )
(2.11)
2
10 Our
algorithm differs from the algorithm in Gen¸cay 1996 in that the resampled sets are of the same size as the set of embedded vectors of the time series. 11 For each resampled set, the training starts with a different random initialization. 12 The set of points leading to a periodic cycle has to be excluded. 13 The validation set consisted of 200 points. The error on the validation set steadily decreased, meaning that the networks did not overfit the training data. This also holds for the experiments with other parameter values r and with dynamical noise described below.
106
On Nonlinear, Stochastic Dynamics
Table 1 Statistics for estimations of the largest Lyapunov exponent of different dynamical systems Dynamics
λ
m
λmax
Mean
Std. dev.
log. map (r = 4.0)
1.0
1 5 10 15
0.983 0.976 0.976 0.966
0.982 0.973 0.984 0.980
0.040 0.038 0.043 0.044
[0.914, [0.914, [0.914, [0.905,
log. map (r = 3.56) log. map (r = 3.59)
(−0.111) (0.207)
1 1
0.177 0.224
0.189 0.215
0.053 0.043
[0.099, 0.267] [0.152, 0.287]
1 5 10 15 1 5 10 15
−1.407 −0.793 −0.189 −0.185 −5.669 −1.264 −0.515 −0.332
−1.392 −0.692 −0.223 −0.215 −5.207 −1.190 −0.533 −0.334
0.023 0.105 0.034 0.038 0.404 0.175 0.079 0.049
ATX
shuffled ATX
Conf. interval
[−1.431, [−0.900, [−0.275, [−0.263, [−5.857, [−1.421, [−0.662, [−0.433,
1.038] 1.037] 1.056] 1.043]
−1.359] −0.563] −0.176] −0.144] −4.678] −0.842] −0.426] −0.271]
Note: All exponents are measured in bits. See text for detailed explanations.
1.2
Lyapunov Exponent
1
0.8
0.6
0.4
0.2
0 0
2
4
6 8 10 Embedding Dimension m
12
14
16
Figure 1 Estimated Lyapunov exponents λmax (circles) and means λ¯ with 90% confidence intervals for different embeddings of the logistic map with r = 4.0. The dotted line, which corresponds to a Lyapunov exponent equal to zero, marks the border between nonchaotic (below) and chaotic (above) behavior.
where f (xt ) is given by the term on the right-hand side of Equation (2.9) and N (0; σ 2 ) denotes the probability distribution of a zero-mean Gaussian random variable of variance σ 2 . The deterministic and stochastic systems are depicted in Figure 2. The stochastic systems look, of course, very similar. From the clouds of points it is practically impossible to determine in which case the skeleton (the deterministic part) is periodic and in which case it is chaotic. The estimated Lyapunov exponents and confidence intervals (see Table 1) are also very similar. In other words, the algorithm estimates the Lyapunov exponent very reliably for the disturbed chaotic system, but it also returns a positive value for the disturbed periodic system. A similar phenomenon has also been reported in Sugihara 1994 where the corresponding dynamical system, with a skeleton of period two, is termed stochastically chaotic. This notion expresses the fact that the estimated Lyapunov
Christian Schittenkopf et al.
107
0.8
0.8
0.6
0.6
x
t+1
1
xt+1
1
0.4
0.4
0.2
0.2
0 0
0.5 xt
0 0
1
1
0.8
0.8
0.6
0.6
1
0.5 x
1
x
x
t+1
t+1
1
0.5 xt
0.4
0.4
0.2
0.2
0 0
0.5 x t
1
0 0
t
Figure 2 (Upper left) 1,000 points of the deterministic logistic map with r = 3.56 (period eight). (Lower left) The same map with dynamical noise. (Upper right) 1,000 points of the deterministic logistic map with r = 3.59 (chaotic). (Lower right) The same map with dynamical noise.
exponent of the disturbed system is positive. In other words, the system shows the same behavior as a chaotic system with dynamical noise. In this sense, it is in our opinion a mainly philosophical question whether one should divide stochastic dynamical systems into stochastically chaotic and stochastically nonchaotic ones. Yet we applied the algorithm to a real-world, stochastic dynamical system, namely the time series of continuously compounded returns rt of the daily closing values of the Austrian stock market index ATX from 7 January 1986 until 14 June 1996. The 2,575 returns are divided into two sets: the training set comprises the first 2,060 returns, whereas the last fifth of the data (515 returns) forms the validation set. The time series of continuously compounded returns (in percent) is depicted at the top of Figure 3. One of the characteristics of time series of returns is volatility clustering; that is, large shocks (deviations from the mean) of either sign tend to be followed by a large shock of either sign. Several of these volatility clusters are clearly visible in Figure 3. Since the data are recorded over a period of more than 10 years, the assumption of stationarity of the data is hardly valid. Indeed, the absolute returns of the training set are much larger than the ones of the validation set, which seems to cover a rather calm period. On the other hand, the autocorrelation functions on the training set and on the validation set are similar, which indicates that at least the linear structure in the data is not very time dependent. The largest Lyapunov exponent of the ATX data set is estimated (with confidence intervals) for embedding dimensions m = 1, 5, 10, 15. The results are summarized in Table 1 and depicted on the left-hand side of Figure 4. The estimated exponent initially increases with the embedding dimension but then “converges” to a maximum value of about −0.2. Looking at the confidence intervals, the hypothesis of a positive Lyapunov exponent is clearly rejected. For this data set, the application of the stopped training procedure turns out to
108
On Nonlinear, Stochastic Dynamics
10
ATX
5 0 −5 −10
500
1000
1500
2000
2500
500
1000
1500
2000
2500
Shuffled ATX
10 5 0 −5 −10
Figure 3 (Top) The return series of the Austrian stock market index ATX from 7 January 1986 until 14 June 1996. The training (validation) set is given by the returns to the left (right) of the vertical dashed line. (Bottom) The same time series after shuffling. 0.2 0
0
Lyapunov Exponent
Lyapunov Exponent
−0.2 −0.4 −0.6 −0.8 −1
−1 −2 −3 −4
−1.2 −5
−1.4 −1.6 0
5 10 Embedding Dimension
15
−6 0
5 10 Embedding Dimension
15
Figure 4 (Left) The estimated Lyapunov exponents λmax (circles) and the mean values λ¯ with 90% confidence intervals for the ATX return series for different embedding dimensions. The dotted line represents a Lyapunov exponent equal to zero. (Right) The results of the same calculations for the shuffled time series.
be crucial to obtaining meaningful results. If this procedure is not applied, the neural networks tend to heavily overfit the training data, which drives the estimated Lyapunov exponents much closer to zero.14 Finally, the bootstrapping algorithm is tested on a time series without any temporal dependencies, that is, on a so-called surrogate data set (Deco, Schittenkopf, and Sch¨umann 1997c; Theiler et al. 1992). The new time series is generated by shuffling the values of the original ATX return series.15 Thereby, the mean, the variance
14 The 15 The
hypothesis of a positive largest exponent cannot be rejected in this case. training set and the validation set are shuffled separately.
Christian Schittenkopf et al.
109
and all higher-order moments of the empirical distribution of the original time series are preserved, whereas all linear and nonlinear temporal correlations are destroyed. The new time series is depicted at the bottom of Figure 3. The largest positive return, for instance, which is roughly at t = 1, 000 in the original time series, is now close to the beginning of the time series. On the right-hand side of Figure 4 the estimated Lyapunov exponent λmax and the mean λ¯ with 90% confidence interval is plotted as a function of the embedding dimension m. For this time series, there is no clear convergence of λmax with increasing m. The confidence intervals are sufficiently small, however, to correctly reject the hypothesis of a positive exponent. All together, we conclude that the algorithm correctly identifies the “extreme” cases of deterministic chaos and purely stochastic dynamics (without any statistical dependencies). It also rejects the hypothesis of a positive largest Lyapunov exponent for a financial return series with significant correlations, which is probably correct. It cannot, however, distinguish a simple periodic system and a chaotic system if both are blurred by dynamical noise. More precisely, the estimated Lyapunov exponent is positive for both systems. In other words, even this sophisticated method, which tries to approximate the distribution of Lyapunov exponents by using a bootstrapping technique, can produce misleading results for time series observed from stochastic dynamical systems. 3 A Local Sensitivity Measure
When applying the algorithm of the previous section (or any of the algorithms mentioned in Section 1) to dynamical systems with a stochastic component, the problematic interpretation of results stems from the fact that the definition of Lyapunov exponents is a definition for deterministic systems. Consequently, the idea of measuring the divergence (or convergence) of two initially nearby trajectories can hardly be transferred to stochastic dynamical systems. Following Yao and Tong (1994), however, studying the distance between the probability density functions (pdfs) of the state conditioned on two nearby initial conditions is a well-defined extension of the concept of Lyapunov exponents to the stochastic case. In information theory the most important measure of the distance between two pdfs p(x) and q(x) is the relative entropy or the Kullback Leibler (KL) distance D(pkq) (Cover and Thomas 1991; Kullback 1967), which is defined by Z p(x) dx (3.1) D(pkq) = p(x) log q(x) The KL distance is always non-negative and equal to zero if and only if p(x) = q(x) almost everywhere. It is not symmetric in p and q, however, which is not desirable for a measure of distance. The symmetry is easily obtained by addition of the KL distance between q and p: Z p(x) dx (3.2) K (pkq) = D(pkq) + D(qkp) = (p(x) − q(x)) log q(x) which we call symmetric KL distance. This measure can now be used to investigate stochastic dynamical systems as defined by Equation (2.10), where, in general, et may denote any zero-mean random variable. The stochastic component in Equation (2.11) is thus only a special case. An equivalent definition of a stochastic dynamical system may be given in terms of the pdf of the next state xt+1 conditioned on the present state xt , which is denoted ρ(xt+1 |xt ). The symmetric KL distance of the pdfs conditioned on two nearby initial conditions xt and xt + δ is consequently given by Z (3.3) K (xt ; δ) = {ρ(xt+1 |xt + δ) − ρ(xt+1 |xt )} log{ρ(xt+1 |xt + δ)/ρ(xt+1 |xt )}dxt+1 For small δ, K (xt ; δ) has the approximation (Kullback 1967) K (xt ; δ) ≈ I (xt )δ 2
110
On Nonlinear, Stochastic Dynamics
(3.4)
with Z
1
I (xt ) =
ρ(xt+1 |xt )
µ
dρ(xt+1 |xt ) dxt
¶2 (3.5)
dxt+1
I (xt ) is thus a measure of the sensitive dependence of the pdf of xt+1 on the initial condition xt : The larger I (xt ), the more sensitively the pdf depends on the initial condition. If we treat xt as a parameter of the pdf, I (xt ) is the so-called Fisher information16 on xt contained in xt+1 . We want to reemphasize that for stochastic dynamics, it is more natural to measure the distance between pdfs conditioned on nearby initial values than to measure the distance between initially nearby trajectories, which are just realizations of random variables. Before we calculate the measure I (xt ) explicitly for different dynamics, we remark that it can be generalized in at least two directions. First, one may consider pdfs of states not only one but s steps ahead,17 that is, ρ(xt+s |xt ). Because of the accumulation of noise through the time evolution, one may expect that for large s the state of the system does not really depend on the initial conditions. The corresponding measure will be denoted Is (xt ). Second, the state of the dynamical system might depend not only on the present state but also on past states. In this case Equation (2.10), which describes the evolution of the system, has to be replaced by xt+1 = f (xt , xt−1 , . . . , xt−m+1 ) + et ,
(3.6)
where m is called the order of the process. The corresponding conditional pdf ρ(xt+1 |xt , . . . , xt−m+1 ) is abbreviated ρ(xt+1 |x¯t ). In order to derive the sensitivity measure in the general case m > 1, we must replace xt by x¯t in Equation (3.3) and think of δ as a vector of small disturbances. The approximation (3.4) now reads K (x¯t ; δ) ≈ δ T I (x¯t )δ
(3.7)
with the Fisher information matrix Z I (x¯t ) = where
∂ρ(xt+1 |x¯t ) ρ(xt+1 |x¯t ) ∂ x¯t 1
∂ρ(xt+1 |x¯t ) = ∂ x¯t
µ
∂ρ(xt+1 |x¯t ) ∂ x¯t
∂ρ(xt+1 |x¯t ) ∂xt
.. .
∂ρ(xt+1 |x¯t ) ∂xt−m+1
¶T (3.8)
dxt+1
(3.9)
I (x¯t ) is now a measure of the information on (xt−m+1 , . . . , xt ) contained in xt+1 . 4 Analysis of Stochastic Dynamics 4.1 Analytical results
The first type of stochastic dynamics that we investigate is defined by Equations (2.10) and (2.11) with arbitrary deterministic f . In this case the variance σ 2 of the Gaussian dynamical noise does not depend on the state xt of the system,18 or equivalently, the conditional pdf of the next state is given by µ ¶ 1 (xt+1 − f (xt ))2 exp − ρ(xt+1 |xt ) = √ (4.1) 2σ 2 2πσ 2
16 The
Fisher information is a measure of the minimum error in estimating a parameter of a distribution (Cover and Thomas 1991). this situation the look-ahead is equal to s. 18 Time series obtained from systems of this type are called homoskedastic. 17 In
Christian Schittenkopf et al.
111
t+1
x
0.15
500
0.1
400
0.05
300
0
I1(xt) I2(xt) I (x ) 3
t
200
−0.05
100
−0.1 0 xt
−0.1−0.05 0 0.05 0.1 0.15
−0.1−0.05 0 0.05 0.1 0.15
Figure 5 (Left) 1,000 points of the AR(1) process xt+1 = 0.5xt + et , et ∼ N (0; 0.001). (Right) The sensitivity measure Is (xt ), s = 1, 2, 3, for this process.
Inserting ρ(xt+1 |xt ) into Equation (3.5) and some straightforward calculations lead to I1 (xt ) =
1 σ2
µ
d f (xt ) dxt
¶2 (4.2)
This measure of sensitive dependence on initial conditions naturally includes the derivative of f (xt ) with respect to the state xt (as the definition of the Lyapunov exponent for one-dimensional systems) and the (constant) variance of the noise. For σ 2 → ∞, I (xt ) → 0, which is reasonable since in the limit of infinite disturbances, the state of the system does not depend on the previous state. For finite σ 2 , I (xt ) is a monotonously increasing function of |df (xt )/dxt |. One of the simplest dynamics matching Equations (2.10) and (2.11) is the dynamics of an autoregressive process of first order (AR(1) process): xt+1 = axt + b + et
(4.3)
that is, the next state xt+1 is the sum of a linear function of the present state xt and a random component et . Stationarity is guaranteed for |a| < 1. In this case, the measure Is (xt ) can be calculated explicitly for all look-aheads s, since all conditional pdfs ρ(xt+s |xt ) are Gaussians of constant variance: Is (xt ) =
(1 − a2 )a2s σ 2 (1 − a2s )
(4.4)
On the left-hand side of Figure 5, 1,000 points of an AR(1) process with a = 0.5, b = 0, and σ 2 = 0.001 are displayed. On the right-hand side of this figure, Is (xt ), s = 1, 2, 3, is plotted for this process. Because of the linear structure of f (xt ), the measures Is (xt ) are constant functions of xt . This means that any current state influences the conditional pdf of future states the same way. From Equation (4.4), it also follows that Is (xt ) tends to zero for increasing s. In other words, the farther we look into the future, the less the future state depends on the current state because of the accumulation of random components. As an example of nonlinear dynamics, the logistic map with Gaussian dynamical noise (see Equations (2.9)–(2.11)) is analyzed. Inserting the dynamics into Equation (4.2) immediately gives the sensitivity measure of the next state: I1 (xt ) =
r 2 (1 − 2xt )2 σ2
(4.5)
For r = 3.59 and σ 2 = 0.001 (see the lower right-hand side of Figure 2), this function is depicted in Figure 6. The sensitivity measure is close to zero around xt = 0.5, where the gradient of the skeleton f (xt ) vanishes,
112
On Nonlinear, Stochastic Dynamics
7000 6000 5000
I(xt)
4000 3000 2000 1000 0 0
0.2
0.4
x
t
0.6
0.8
1
Figure 6 The sensitivity measures I1 (xt ) (dotted), I2 (xt ) (dashed), and I3 (xt ) (solid) for Gaussian approximations of the conditional pdf of the logistic map (r = 3.59; dynamical noise of variance σ 2 = 0.001). Since the noise is normally distributed, the depicted curve of I1 (xt ) is not an approximation but the exact sensitivity measure.
and it increases rapidly towards the “edges” of the data set. In order to calculate Is (xt ) for larger look-aheads s, more general dynamics have to be considered. An obvious generalization of the stochastic dynamics defined by Equations (2.10) and (2.11) is to allow the variance of the Gaussian dynamical noise to be state-dependent, that is, to consider conditional pdfs µ
(xt+1 − f (xt ))2 exp − ρ(xt+1 |xt ) = p 2σt2 2πσt2 1
¶ (4.6)
with σt2 = g(xt ) for some deterministic function g. For dynamical systems of this type, the measure of sensitive dependence on initial conditions can be calculated as 1 I1 (xt ) = 2 σt
µ
d f (xt ) dxt
¶2
1 + 2σt4
µ
dg(xt ) dxt
¶2 (4.7)
The extension with respect to the homoskedastic case in Equation (4.2) consists of the state-dependent variance σt2 in the first term and the whole second term, which includes the derivative dg(xt )/dxt . Therefore, the dependence of the conditional variance on the current state is also taken into account. As an example we go back to the logistic map disturbed by Gaussian dynamical noise described above. For s ≥ 2, the conditional pdf ρ(xt+s |xt ) is not Gaussian because of the interaction of deterministic and stochastic components. For small noise levels σ 2 and small look-aheads s, however, one can roughly think of a Gaussian conditional pdf with state-dependent variance. For s = 2, for instance, we have xt+2 = f (xt+1 ) + et+1 = f ( f (xt ) + et ) + et+1
(4.8)
and therefore by Taylor expansion xt+2 ≈ f ( f (xt )) + e˜t , e˜t ∼ N (0; σ 2 (1 + (d f ( f (xt ))/dx)2 ))
(4.9)
with f (x) = r x(1 − x). Similar expressions can be derived for s = 3. Using these Gaussian approximations, the measures I2 (xt ) and I3 (xt ) are depicted in Figure 6. Obviously, the sensitivity measure is decreasing in s for all states xt , which is due to the accumulation of the dynamical noise. This means that future states
Christian Schittenkopf et al.
113
depend less and less on the current state xt . Furthermore, the sensitivity measure for look-ahead s vanishes at xt if the (s − 1)-fold application of f to xt yields 0.5, where the gradient of f vanishes. For instance, I2 (xt,1 ) = I2 (xt,2 ) = 0 for xt,1 ≈ 0.17 and xt,2 ≈ 0.83, which are characterized by f (xt,1 ) = 0.5 and f (xt,2 ) = 0.5, respectively. As mentioned in Section 2.2, it is rather pointless to divide stochastic dynamics into chaotic and nonchaotic dynamics. In fact, time series sampled from slightly disturbed (σ 2 = 0.001) chaotic and nonchaotic systems are practically indistinguishable. This observation is reflected in the sensitivity measure for the logistic map given by Equation (4.5). The shape of the sensitivity measure is a parabola centered at 0.5 with a quadratic dependence on the parameter r . Therefore, small changes in r will lead to very similar sensitivity measures in the stochastic case, whereas they may result in very different dynamics (periodic or chaotic) and Lyapunov exponents (negative or positive) for the deterministic systems. 4.2 Practical determination of the sensitivity measure
In practice, one is usually confronted with the situation that the underlying dynamics (including the noise distribution) is unknown. In this case, it is necessary to extract the dynamics of the system from the observed time series, which can be done by applying any powerful conditional density estimator. A semi-nonparametric estimator that has been successfully applied by the neural networks community is mixture density networks (MDNs) (Bishop 1995). The main idea is to approximate the conditional density by a mixture of Gaussians the parameters of which are the outputs of multilayer perceptrons (MLPs). This approach is fully general, since Gaussian mixture models can approximate any density with arbitrary accuracy (McLachlan and Basford 1988). In our context, the MLPs receive the current state xt as a one-dimensional input that is then transformed by means of nonlinear mappings into the parameters of a mixture of Gaussians representing the conditional density ρ(xt+1 |xt ). More details on MDNs can be found in Bishop 1995. We apply the concept of MDNs to estimate the conditional densities ρ(xt+s |xt ), s = 1, 2, 3, for the logistic map of the previous section (r = 3.59, σ 2 = 0.001).19 For s = 1, the mixture of Gaussians is replaced by a single Gaussian that is sufficient to capture the dynamics (see Equations (2.10) and (2.11)). In this case, the sensitivity measure can be calculated analytically because of Equation (4.7) and because the derivatives of the network outputs (the mean and the variance) with respect to the network input xt are given by closed-form expressions. For look-aheads s = 2 and s = 3, the conditional density is approximated by a mixture of two Gaussians. In this case, the sensitivity measure must be estimated by numerical integration. Since the derivative of the conditional density with respect to xt is given by a closed-form expression, however, numerical integration of Equation (3.5), where t + 1 must be substituted by t + s, is easily performed by any standard integration routine. In Figure 7, the sensitivity measures with respect to the conditional densities estimated by the MDNs are depicted. The curves are very similar to the curves in Figure 6 obtained through a Taylor series approximation. We remark that differences between the curves do not necessarily mean that the dynamics have not been fully extracted by the neural networks, since the true conditional density is not given analytically for s ≥ 2. However, the Taylor series approximation is assumed to be more precise because it is independent of finite-sample effects. This is supported by the fact that the differences between the curves increase toward the “edges” of the data set, where fewer observations are available.20 Nevertheless, the main characteristics extracted from the dynamical system (from the time series) are the same. For instance, the sensitivity measures are minimal at xt = 0.5 for all look-aheads s. For s ≥ 2, the other minima are also detected. All together, it is found that the sensitivity measure can be estimated empirically from observed time series.
19 The 20 The
114
training set and the validation set consist of 8,000 and 2,000 points, respectively. 5th and the 95th percentile of the training sample are 0.286 and 0.913, respectively.
On Nonlinear, Stochastic Dynamics
7000 6000
4000
t
I(x )
5000
3000 2000 1000 0 0
0.2
0.4
x
0.6
0.8
1
t
Figure 7 The sensitivity measures I1 (xt ) (dotted), I2 (xt ) (dashed), and I3 (xt ) (solid) for MDN approximations of the conditional pdf of the logistic map (r = 3.59; dynamical noise of variance σ 2 = 0.001).
5 Application to Asset Return Series Models
We now turn to real-world dynamical systems in which the underlying dynamics is not known a priori, namely financial markets. A typical example of “trajectories” observable at the markets are asset price series. Because of their nonstationary behavior, price series are usually transformed into return series. Since the seminal works of Engle (1982) and Bollerslev (1986) it has become clear that return series are not white noise but show some structure in the conditional variances. It is thus interesting to analyze these time series as realizations of dynamical systems with a dominating stochastic component by means of the sensitivity measure. Since the true dynamics of the conditional variance or volatility is not known and not observable at the market, any parametric volatility model, such as an ARCH or GARCH model, is in fact an assumption concerning the underlying dynamics that is based on empirical findings and usually also theoretically motivated. In the last decade a large number of extensions of these standard volatility models has been proposed in the literature, incorporating different concepts (e.g., the leverage effect Glosten, Jagannathan, and Runkle 1993). By pinpointing the stochastic dynamics via specifying a parametric, heteroskedastic model, the shape of the sensitivity measure is determined as well. In other words, the sensitivity measure is a measure of (sensitive) dependence with respect to the parametric specification. In the remainder of this section we fit ARCH and GARCH models to the return series rt of the Austrian stock market index ATX, which is depicted on the left-hand side of Figure 8. We study the stochastic dynamics of the models by calculating the sensitivity measure I1 (rt ). First, we fit the ARCH(1) model (Engle 1982) rt+1 ∼ N (b; σt2 )
(5.1)
σt2 = α0 + α1 (rt − b)2
(5.2)
with constant conditional mean b to the data set (b = 0.0054, α0 = 0.8301, α1 = 0.5481). The parameter space for α0 and α1 is restricted by α0 > 0 and α1 ≥ 0. For stationarity, it is required that α1 < 1 hold. The unconditional variance of this model is given by α0 /(1 − α1 ). From Equation (4.7), the sensitivity measure I1 (rt ) can be calculated as I1 (rt ) =
2α12 (rt − b)2 (α0 + α1 (rt − b)2 )2
(5.3)
Christian Schittenkopf et al.
115
10
0.3 0.25
I1(rt)
rt+1
5 0
0.2
0.15
−5
0.1 0.05
−10 −10
−5
0
5
10
rt
0
−10
−5
0
5
10
Figure 8 The returns of the ATX time series (left) and the measure I1 (rt ) of an ARCH(1) model with constant conditional mean (right).
which is plotted on the right-hand side of Figure 8. Because of the variance specification, the sensitivity measure is symmetric around rt = b. It is minimal, that is, zero, at rt = b, increases with increasing |rt − b|, has √ a maximum at rt = b ± α0 /α1 , and tends to zero for large positive and negative returns. This means that the conditional pdf ρ(rt+1 |rt ) is least sensitive to the current state at rt = b, expressing the fact that the conditional variance specified in Equation (5.2) has a (local and global) minimum at rt = b. For other values of rt , the sensitivity measure I1 (rt ) is strictly positive, and its shape is determined by the variance specification.21 Since the ATX return series has a significant autocorrelation at lag 1, it is reasonable to include an autoregressive term in the ARCH model specified above, which is now given by rt+1 ∼ N (art + b; σt2 )
(5.4)
σt2 = α0 + α1 (rt − art−1 − b)2
(5.5)
The conditional mean is thus modeled by a linear function of the last return rt . The estimated parameter values of this volatility model are a = 0.3393, b = 0.0069, α0 = 0.7730, and α1 = 0.4798. From Equations (5.4) and (5.5) we see that rt+1 now depends on both rt and rt−1 . Consequently, the sensitive dependence of the pdf on initial conditions is quantified by an information matrix, which can be calculated from Equation (3.8): 1 I1 (rt , rt−1 ) = α0 + α1 (rt − art−1 − b)2
Ã
a2 0
à ! ! 2a12 (rt − art−1 − b)2 0 1 −a + 0 (α0 + α1 (rt − art−1 − b)2 )2 −a a 2
(5.6)
Following Equation (3.7), an “error” δ in the initial condition (rt , rt−1 ) “evolves” after one step into δ T I1 (rt , rt−1 )δ. It is thus tempting to analyze the dependence of the eigenvalues of I1 (rt , rt−1 ) on the “parameters” rt and rt−1 . The results for the ARCH(1) model fitted to the ATX data are depicted in Figure 9. The larger eigenvalue (solid line) is roughly 50 times larger than the smaller eigenvalue (dashed line). For clearness, both eigenvalues are plotted as functions of the error rt − art−1 − b. The characteristics of the curves are very similar to those of the curve depicted on the right-hand side of Figure 8. ARCH models typically cannot capture long-term dependencies in the conditional variance unless the order of the model is chosen sufficiently high. To overcome this problem, Bollerslev (1986) introduced the family of GARCH models, which are characterized by a recursive specification of the conditional variances. The
is worth pointing out that the two maxima of I1 (rt ) do not have an immediate economic interpretation. They arise from the particular specification of the conditional variance. If one defined the conditional variance, for example, by σˆ t2 = exp(σt2 ), with σt2 given by Equation (5.2), the sensitivity measure would be I1 (rt ) = α0 + α1 (rt − b)2 , which is also minimum for rt = b but increases as a quadratic function of rt − b.
21 It
116
On Nonlinear, Stochastic Dynamics
0.45
λ1 λ2
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0
−10
−5
0 rt−art−1−b
5
10
Figure 9 The larger eigenvalue (solid line) and the smaller eigenvalue (dashed line) of the information matrix of an ARCH(1) model with a linear model for the conditional mean fitted to the ATX return series. The eigenvalues are plotted as functions of rt −art−1 −b.
simplest GARCH model is the GARCH(1,1) model with constant conditional mean, which is given by rt+1 ∼ N (b; σt2 )
(5.7)
2 σt2 = α0 + α1 (rt − b)2 + β1 σt−1
(5.8)
Besides the parameter restrictions of the ARCH(1) model, β1 ≥ 0 and also α1 + β1 < 1 are required for stationarity. The unconditional variance of the model is then given by α0 /(1 − α1 − β1 ). The parameter values estimated for the ATX return series are b = −0.0076, α0 = 0.0404, α1 = 0.3404, and β1 = 0.7169, which corresponds to a nonstationary model. For stationary models, the conditional variance can be rewritten as σt2 =
∞ X α0 + α1 β1i (rt−i − b)2 1 − β1 i=0
(5.9)
which expresses the fact that all previous returns influence the next conditional variance. The weight of the influence of previous returns, however, diminishes exponentially fast (at a rate of β1 ). Since the conditional pdf of this stochastic process depends on all previous returns, the information matrix is of infinite size.22 Some tedious but straightforward calculations lead to the result that the element in line i and column j of the information matrix is given by Ii,j (rt , rt−1 , . . .) =
1 ∂σt2 ∂σt2 2σt4 ∂rt−i ∂rt−j
(5.10)
with ∂σt2 = 2α1 β1i (rt−i − b) ∂rt−i
22 This
(5.11)
holds for all GARCH models.
Christian Schittenkopf et al.
117
Because of the special structure of this matrix, the eigenspectrum can be calculated analytically. Standard matrix manipulations lead to the result that all eigenvalues are equal to 0 except one eigenvalue, which is given by λ=
∞ 2α12 X β 2i (rt−i − b)2 σt4 i=0 1
(5.12)
The dependence of this eigenvalue on the previous returns is an infinite-dimensional generalization of the dependence depicted in Figure 8: The minimum (0) is obtained for rt−i = b, i ≥ 0, λ is strictly positive elsewhere and tends to 0 for |rt−i | → ∞. Since the sensitivity measures obtained for the return series depend on the model specification, one might call for a model-free characterization of the dependencies in the time series. This can be accomplished, for instance, by fitting the semi-nonparametric models described in Section 4.2 to the return series (Schittenkopf, Dorffner, and Dockner 1998) and by calculating the sensitivity measure via numerical integration. Another possibility is to apply nonparametric methods such as kernel-based density estimators (Silverman 1986). 6 Conclusion
In the first part of the paper we performed extensive numerical experiments in order to analyze how reliable the largest Lyapunov exponent of a (stochastic) dynamical system can be estimated within a statistical framework using a bootstrapping technique. Although the results are satisfactory for deterministic chaotic systems and for dynamical systems with a dominating stochastic component (such as the return series of a stock index), the interpretation of the results for a simple logistic map with dynamical noise is difficult, mainly because the concept of Lyapunov exponents is well defined for deterministic dynamics only. Consequently, in the second part of the paper, the question was raised whether a more natural approach to characterizing stochastic dynamics should not actually be based on a sensitivity measure that is well defined for stochastic dynamics. We picked up an information-theoretic measure of distance between probability density functions that locally measures the dependence of the future state of a dynamical system on the previous states. We derived analytical expressions for this sensitivity measure for a variety of stochastic dynamical systems, among them the frequently hypothesized and analyzed classes of dynamical systems disturbed by Gaussian noise of constant or state-dependent variance. Examples of particular interest in our study are the stochastic dynamics of asset return series. Since many of these time series are close to uncorrelated noise with some structure in the conditional variance, one cannot expect pronounced dependencies to be detected. For parametric models such as ARCH and GARCH models, the shape of the sensitivity measure (in dependence of the previous states) is dominated by the parametric specification. This observation may motivate the application of (semi)nonparametric density estimators. An important question independent of the degree of parametrization of the return series model is how the information provided by the sensitivity measure can be interpreted from an applicative point of view. If we assume, for instance, that the return series is best described by a GARCH model, which is characterized by its time-dependent conditional variance, the sensitivity measure should contain information about the predictability of the volatility of the time series. Since the sensitivity measure is a local measure, the notion of predictability is also a local one, that is, in dependence of the current state.23 At this stage it is not clear whether this information on predictability is relevant for financial time series analysis in practice.
might think that a promising approach would be to select the conditional variance σt2 as the state variable and to work with conditional 2 |σ 2 ). It turns out, however, that this approach is infeasible: from Equation (5.8), it follows immediately that the conditional densities ρ(σt+s t 2 |σ 2 ) is nonzero only for σ 2 ≥ α + β σ 2 (which depends on σ 2 ). Therefore, the symmetric KL-distance between ρ(σ 2 |σ 2 ) density ρ(σt+1 0 1 t t t t+1 t+1 t 2 |σ 2 + h) is infinite, which prevents a further discussion as in Section 3. and ρ(σt+1 t
23 One
118
On Nonlinear, Stochastic Dynamics
References Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press. Bollerslev, T. (1986). “A generalized autoregressive conditional heteroskedasticity.” Journal of Econometrics, 31: 307–327. Bollerslev, T., R. Y. Chou, and K. F. Kroner. (1992). “ARCH modelling in finance: A review of the theory and empirical evidence.” Journal of Econometrics, 52: 5–59. Brock, W. A. (1986). “Distinguishing random and deterministic systems: Abridged version.” Journal of Economic Theory, 40: 168–195. Brock, W. A., W. D. Dechert, and J. A. Scheinkman. (1987). “A test for independence based on the correlation dimension.” Working paper no. 8702, Social Systems Research Institute, University of Wisconsin–Madison. Brock, W. A., D. Hsieh, and B. LeBaron. (1991). A Test of Nonlinear Dynamics, Chaos, and Instability. Cambridge: MIT Press. Chan, K. S., and H. Tong. (1994). “A note on noisy chaos.” Journal of the Royal Statistical Society B, 56: 301–311. Cover, T. M., and J. A. Thomas. (1991). Elements of Information Theory. New York: Wiley. D¨ammig, M., and F. Mitschke. (1993). “Estimation of Lyapunov exponents from time series: The stochastic case.” Physics Letters A, 178: 385–394. Dechert, W. D., and R. Gen¸cay. (1993). “Lyapunov exponents as a non-parametric diagnostic for stability analysis.” In M. H. Pesaran and S. M. Potter (eds.), Nonlinear Dynamics, Chaos and Econometrics. Chichester, U.K.: Wiley, pp. 33–52. Dechert, W. D., and R. Gen¸cay (1996). “The topological invariance of Lyapunov exponents in embedded dynamics.” Physica D, 90: 40–55. Deco, G., C. Schittenkopf, and B. Sch¨urmann. (1997a). “Information flow in chaotic symbolic dynamics for finite and infinitesimal resolution.” International Journal of Bifurcation and Chaos, 7: 97–105. Deco, G., C. Schittenkopf, and B. Sch¨urmann. (1997b). “Determining the information flow of dynamical systems from continuous probability distributions.” Physical Review Letters, 78: 2345–2348. Deco, G., C. Schittenkopf, and B. Sch¨urmann. (1997c). “Dynamical analysis of time series by statistical tests.” International Journal of Bifurcation and Chaos, 7: 2629–2652. Dockner, E. J., A. Prskawetz, and G. Feichtinger. (1997). “Non-linear dynamics and predictability in the Austrian stock market.” In C. Heij, J. M. Schumacher, B. Hanzon, and C. Praagman (eds.), System Dynamics in Economic and Financial Models. Chichester, U.K.: Wiley, pp. 45–72. Dockner, E. J., and P. Woehrmann. (1995). “Die Instabilit¨at des Deutschen Aktienmarktes: Eine Sch¨atzung des gr¨oßten Ljapunov-Exponenten mittels neuronaler Netze.” Working paper, Department of Business Studies, University of Vienna. Doerner, R., B. H¨ubinger, and W. Martienssen. (1991). “Predictability portraits for chaotic motions.” Chaos, Solitons & Fractals, 1: 553–571. Eckhardt, B., and D. Yao. (1993). “Local Lyapunov exponents in chaotic systems.” Physica D, 65: 100–108. Eckmann, J.-P., S. O. Kamphorst, D. Ruelle, and S. Ciliberto. (1986). “Lyapunov exponents from time series.” Physical Review A, 34: 4971–4979. Eckmann, J.-P., S. O. Kamphorst, D. Ruelle, and J. Scheinkman. (1988). “Lyapunov exponents for stock returns.” In The Economy as an Evolving Complex System. SFI Studies in the Science of Complexity. Reading, MA: Addison-Wesley, pp. 301–304. Eckmann, J.-P., and D. Ruelle. (1985). “Ergodic theory of chaos and strange attractors.” Reviews of Modern Physics, 57: 617–656. Elsner, J. (1996). Chaos und Zufall am deutschen Aktienmarkt. Heidelberg, Germany: Physica-Verlag. Engle, R. F. (1982). “Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation.” Econometrica, 50: 987–1008. Frank, M., R. Gen¸cay, and T. Stengos. (1988). “International chaos?” European Economic Review, 32: 1569–1584. Gen¸cay, R. (1996). “A statistical framework for testing chaotic dynamics via Lyapunov exponents.” Physica D, 89: 261–266.
Christian Schittenkopf et al.
119
Gen¸cay, R., and W. D. Dechert. (1992). “An algorithm for the n Lyapunov exponents of an n-dimensional unknown dynamical system.” Physica D, 59: 142–157. Gen¸cay, R., and W. D. Dechert. (1997). “The identification of spurious Lyapunov exponents in Jacobian algorithms.” Studies in Nonlinear Dynamics and Econometrics, 1: 145–154. Glosten, L. R., R. Jagannathan, and D. E. Runkle. (1993). “Relationship between the expected value and the volatility of the normal excess return on stocks.” Journal of Finance, 48: 1779–1801. Grassberger, P., R. Badii, and A. Politi. (1988). “Scaling laws for hyperbolic and nonhyperbolic attractors.” Journal of Statistical Physics, 51: 135–178. Hornik, K., M. Stinchcombe, and H. White. (1990). “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks.” Neural Networks, 3: 535–549. Hsieh, D. A. (1991). “Chaos and nonlinear dynamics: Applications to financial markets.” Journal of Finance, 46: 1839–1877. Jaditz, T., and C. L. Sayers. (1993). “Is chaos generic in economic data?” International Journal of Bifurcation and Chaos, 3: 745–755. Kadtke, J. B., J. Brush, and J. Holzfuss. (1993). “Global dynamical equations and Lyapunov exponents from noisy chaotic time series.” International Journal of Bifurcation and Chaos, 3: 607–616. Kolmogorov, A. N. (1958). “A metric invariant of transient dynamical systems and automorphisms in Lebesgue spaces.” Doklady Akademii Nauk SSSR, 119: 861–864. Kostelich, E. J., and T. Schreiber. (1993). “Noise reduction in chaotic time-series data: a survey of common methods.” Physical Review E, 48: 1752–1763. Kullback, S. (1967). Information Theory and Statistics. New York: Dover. K¨unsch, H. R. (1989). “The jackknife and the bootstrap for general stationary observations.” Annals of Statistics, 17: 1217–1241. McLachlan, G. J., and K. E. Basford. (1988). Mixture Models: Inference and Applications to Clustering. New York: Marcel Dekker. Ma˜ne´ , R. (1981). “On the dimension of the compact invariant sets of certain nonlinear maps.” In Dynamical Systems and Turbulence. Lecture Notes in Mathematics no. 898. Berlin: Springer, pp. 230–242. Nychka, D., S. Ellner, A. R. Gallant, and D. McCaffrey. (1992). “Finding chaos in noisy systems.” Journal of the Royal Statistical Society B, 54: 399–426. Oseledec, V. I. (1968). “A multiplicative ergodic theorem: Lyapunov characteristic numbers for dynamical systems.” Transactions of the Moscow Mathematical Society, 19: 197–231. Packard, N. H., J. P. Crutchfield, J. D. Farmer, and R. S. Shaw. (1980). “Geometry from a time series.” Physical Review Letters, 45: 712–716. Peters, E. E. (1991). Chaos and Order in the Capital Markets. New York: Wiley. Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. (1988). Numerical Recipes in C. Cambridge: Cambridge University Press. Sano, M., and Y. Sawada. (1985). “Measurement of the Lyapunov spectrum from a chaotic time series.” Physical Review Letters, 55: 1082–1085. Sauer, T., J. A. Yorke, and M. Casdagli. (1991). “Embedology.” Journal of Statistical Physics, 65: 579–616. Scheinkman, J. A., and B. LeBaron. (1989). “Nonlinear dynamics and stock returns.” Journal of Business, 62: 311–337. Schittenkopf, C., and G. Deco. (1996). “Exploring the intrinsic information loss in single-humped maps by refining multi-symbol partitions.” Physica D, 94: 57–64. Schittenkopf, C., G. Dorffner, and E. J. Dockner. (1998). “Volatility prediction with mixture density networks.” In L. Niklasson, M. Bod´en, and T. Ziemke (eds.), ICANN 98—Proceedings of the Eighth International Conference on Artificial Neural Networks. Berlin: Springer, pp. 929–934. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.
120
On Nonlinear, Stochastic Dynamics
Sinai, Y. G. (1959). “On the concept of entropy for a dynamic system.” Doklady Akademii Nauk SSSR, 124: 768–771. Sugihara, G. (1994). “Nonlinear forecasting for the classification of natural time series.” Philosophical Transactions of the Royal Society London A, 348: 477–495. Takens, F. (1981). “Detecting strange attractors in turbulence.” In Dynamical Systems and Turbulence. Lecture Notes in Mathematics no. 898. Berlin: Springer, pp. 366–381. Tanaka, T., K. Aihara, and M. Taki. (1998). “Analysis of positive Lyapunov exponents from random time series.” Physica D, 111: 42–50. Theiler, J., S. Eubank, A. Longtin, B. Galdrikian, and J. D. Farmer. (1992). “Testing for nonlinearity in time series: The method of surrogate data.” Physica D, 58: 77–94. Wolf, A., J. B. Swift, H. L. Swinney, and J. A. Vastano. (1985). “Determining Lyapunov exponents from a time series.” Physica D, 16: 285–317. Wolff, R. C. (1992). “Local Lyapunov exponents: Looking closely at chaos.” Journal of the Royal Statistical Society B, 54: 353–371. Yao, Q., and H. Tong. (1994). “On prediction and chaos in stochastic systems.” Philosophical Transactions of the Royal Society London A, 348: 357–369.
Christian Schittenkopf et al.
121
Advisory Panel
Jess Benhabib, New York University William A. Brock, University of Wisconsin-Madison Jean-Michel Grandmont, CREST-CNRS—France Jose Scheinkman, University of Chicago Halbert White, University of California-San Diego
Editorial Board
Bruce Mizrach (editor), Rutgers University Michele Boldrin, University of Carlos III Tim Bollerslev, University of Virginia Carl Chiarella, University of Technology-Sydney W. Davis Dechert, University of Houston Paul De Grauwe, KU Leuven David A. Hsieh, Duke University Kenneth F. Kroner, BZW Barclays Global Investors Blake LeBaron, University of Wisconsin-Madison Stefan Mittnik, University of Kiel Luigi Montrucchio, University of Turin Kazuo Nishimura, Kyoto University James Ramsey, New York University Pietro Reichlin, Rome University Timo Terasvirta, Stockholm School of Economics Ruey Tsay, University of Chicago Stanley E. Zin, Carnegie-Mellon University
Editorial Policy
The SNDE is formed in recognition that advances in statistics and dynamical systems theory may increase our understanding of economic and financial markets. The journal will seek both theoretical and applied papers that characterize and motivate nonlinear phenomena. Researchers will be encouraged to assist replication of empirical results by providing copies of data and programs online. Algorithms and rapid communications will also be published. ISSN 1081-1826