BAYESIAN ESTIMATION OF SELF - SIMILARITY EXPONENT Natallia Makarava1 and Matthias Holschneider1 University of Potsdam Interdisciplinary Center for Dynamics of Complex Systems Karl-Liebknecht-Str. 24, 14476 Potsdam, Germany (e-mail:
[email protected])
A BSTRACT In this study, we propose a Bayesian approach for the estimation of the Hurst exponent in terms of linear mixed models. We assume the underlying process to be fractional Brownian motion. Even for unevenly sampled signals and signals with gaps the method is applicable. Furthermore, we provide a comparison with the detrended fluctuation analysis technique. The Rosenblatt process was used to test the Bayesian approach on an H-self-similar process with non-Gaussian dimensional distribution.
1
I NTRODUCTION
Self-similar processes are very important examples of stochastic processes. To estimate a measure of the intensity of self-similarity, the so called Hurst exponent, is an issue in several areas of science. Many different estimators of the Hurst exponent have been proposed in the literature: Flandrin (1992), Geweke and Porter-Hudak (1983), Hurst (1951), Whittle (1953), McCoy and Walden (1996), and far more. However, most of the existing methods calculate either a point estimator of the Hurst exponent, or a confidence interval that indicates the Hurst estimates. In contrast to this, we here propose the Bayesian inference for the estimation of the Hurst exponent. This naturally provides a degree of belief about the assumed model and as a side-product the associated confidence intervals. The proposed Bayesian method is considered in terms of a linear mixed model Xt = λBtH + at + b,
(1)
where the Hurst exponent H parametrizes the model, λ > 0, λ ∈ R is the amplitude, a ∈ R and b ∈ R are slope and offset, respectively. We assume that the underlying process BtH is a fractional Brownian motion (Mandelbrot and Van Ness (1968)). Additionally, we will also show that random gaps in data sets only weakly influence the estimation of the parameters. Moreover, the proposed Bayesian method outperforms the most popular technique for Hurst exponent estimation, the detrended fluctuation analysis. The performance of the proposed method for non-Gaussian data is further tested on a Rosenblatt process.
2
D EFINITION
OF THE MODEL
Let D denote the set of N data observations, D = {(tk ,Yk = Xtk )}, k = 1, . . . , N, where Xtk are the data values at time points tk . Let F be the N × 2 system matrix with entries Fk,1 =
1, Fk,2 = tk and β = [b, a]T . Then, the form of a linear mixed effect model (Demidenko (2004)) can be used to describe our model as follows: Y = Fβ + λu,
u ∼ N(0, Σ(H)),
(2)
where the Hurst exponent plays the role of a hyperparameter. Here, u is a Gaussian random variable independent of the other parameters with zero mean and covariance matrix with entries 1 Σi, j (H) = E(BtHi BtHj ) = (|ti |2H + |t j |2H − |ti − t j |2H ). 2
(3)
For fixed parameter β and observation points {tk }, the outcomes vector is Yk ∼ N(Fβ, λ2 Σ(H)). Hence, the likelihood of observing the data for a fixed set of parameters and observation points is L(Yk |H, λ, β, {tk }) =
1 (2π)N/2 λN |Σ(H)|1/2
e−(Y −Fβ)
T Σ(H)−1 (Y −Fβ)/2λ2
.
(4)
Since no prior information is given about the parameters of the model, a Jeffrey’s prior is used for the amplitude P(λ) ∼ λ−1 and flat priors for the other parameters, i.e. P(β) ∼ 1, P(H) = χ[0,1] . Thus, applying the Bayes theorem, we obtain the following posterior distribution (Makarava (2011)): P(H, λ, β|D) = C λ−N−1 |Σ(H)|−1/2 e−R
2 /2λ2
e−(β−β
∗ )T F T Σ(H)−1 F(β−β∗ )/2λ2
(5)
with C being a normalization constant and β∗ = (F T Σ(H)−1 F)−1 F T Σ(H)−1Y,
(6)
R = (Y − Fβ ) Σ(H)
(7)
2
∗ T
−1
∗
(Y − Fβ ) .
Here β∗ and R2 are the parameters of the model that are most likely to occur and R its residuum around the linear trend. In this form, the Gaussian part of the posterior distribution can be clearly identified. Therefore the marginal distribution of H, obtained by integration over β and λ, can be computed using Gaussian integrals: P(H|D) = |Σ(H)|−1/2 |F T Σ(H)−1 F|−1/2 R2−N .
(8)
The point-estimator of H can be taken at the location of the maximum of the posterior density: Hˆ = argmax P(H|YK ). In the same way, the posterior distribution for the other parameters of the model can be estimated. ˆ E(H) ˆ − H, as a function of We quantify the bias of the maximal posterior estimator H, the number of data points in the case of evenly sampled data. Here, the expectation value is estimated using 500 random realizations at each data point for a Hurst exponent H = 0.3. We show that the bias quickly decays for large N (see Figure 1).
The sensitivity of the estimator on data loss is tested by the production of randomly chosen gaps with irregular sampling steps. With such kind of approach, we loose from 0% up to 100% of the information about the model parameters. We performed 100 realizations and for each of them calculated the posterior density P(H|D). As shown in Figure 2, even for just 27% of the information left from the original data set, the maxima of the averaged posterior densities falls in the interval that contains ≥ 75% of the distribution. Thus, the proposed method seems to be an efficient tool for the estimation of the Hurst exponent, and relatively robust to data loss.
Maxima averaged posterior densities True value
Hurst exponent, H
0.25
0.20 0.15
Bias
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.40
0.10 0.05
100
200 300 Time points, N
400
500
0.000
20
40 60 Time points, N
80
100
Figure1. The validation test for Bayesian estima- Figure2. The maxima averaged posterior densities + 0.2tk + 10 with 100 tion with 500 realizations for Hurst exponent H = for the model Xtk = 0.8Bt0.1 k realizations. 0.3
3
C OMPARISON OF THE BAYESIAN F LUCTUATION A NALYSIS
ESTIMATION AND
D ETRENDED
We chose the detrended fluctuation analysis (DFA) as a point estimator to compare with the point estimator proposed in this work. In order to access the bias and the variance of this estimator, we perform Monte-Carlo simulations for a fixed true Hurst exponent H = 0.3. In particular, we generate 25000 realizations of fractional Brownian motion, and apply for each of them the DFA algorithm to produce an estimation of H. We repeated the analysis with the maximum posterior estimator Hˆ of our method. Figure 3 shows a comparison between both methods. The proposed here Bayesian estimation method significantly outperforms the DFA algorithm: not only its expectation value is closer to the true value, but also, its variance is considerably smaller. The intervals containing above 95% of the distribution for both methods are: [0.2748, 0.3342] for the Bayesian method and [0.159, 0.3247] for the DFA technique. In order to test the performance of the Bayesian estimator of the Hurst exponent for Hself-similar process, whose finite-dimensional distributions are non-Gaussian, we consider the Rosenblatt process (Rosenblatt (1961), Taqqu (1975)). For the simulations, we used the wavelet-based synthesis of the Rosenblatt process as proposed and provided in Abry and Pipiras (2005). As before, we generated 25000 realizations of the Rosenblatt process of length
N = 1025 (due to the generation procedure) and applied the DFA and the Bayesian approach as point estimator in order to estimate the Hurst exponent and compare the results of both. We report the results in Figure 4. The intervals that contain above 95% of the distribution for each estimator are [0.7629, 0.8919] for the Bayesian method and [0.5047, 0.745] for the DFA technique, which shows that the variance of the Bayesian point estimator is clearly smaller than the one given by DFA. Moreover, the detrended fluctuation analysis estimates a value far away from the true Hurst exponent value, whereas the Bayesian method estimates H more reliably compared to the former.
20.0
DFA Bayesian method
15.0 10.0 5.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Hurst exponent, H
12.0 10.0
Posterior distribution
Posterior distribution
25.0
DFA Bayesian method
8.0 6.0 4.0 2.0 0.0
0.2
0.4 0.6 0.8 Hurst exponent, H
1.0
Figure3. The distribution of the estimator of Figure4. The probability density function for a the Hurst exponent for H = 0.3, N = 1000 with point-estimator of the Hurst exponent H = 0.8 of 25000 realizations by DFA (solid line) and length N = 1025 with 25000 realizations by DFA (solid line) and direct Bayesian method (dashed Bayesian method (dashed line). line).
4
C ONCLUSIONS
In this work, we proposed a Bayesian approach to estimate the Hurst exponent for self-similar processes. We used a formulation in terms of linear mixed models keeping the trend, instead of removing it in a first step. We have shown that the Bayesian approach for the Hurst exponent estimation outperforms the well known method of detrended fluctuation analysis as point estimator as well as in the sense of information content. Moreover, the Bayesian method does not depend on the data sampling technique, which is a considerable advantage, especially in a real-world context where data sets usually contain gaps. The proposed approach was also applied to non-Gaussian data of the Rosenblatt process. The results show that also in this case, the Bayesian method performs significantly better than the standard DFA analysis.
R EFERENCES ABRY, P., PIPIRAS, V. (2005): Wavelet-based synthesis of the Rosenblatt process. Signal Process, 86, 2326–2339.
DEMIDENKO, E. (2004): Mixed Models: Theory and Applications. Wiley Series in Probability and Statistics, New Jersey. FLANDRIN, P. (1992): Wavelet analysis and synthesis of fractional Brownian motion. IEEE Transactions on Information Theory, 38, 910. GEWEKE, J., PORTER-HUDAK, S. (1983): The estimation and application of long memory time series models. Journal of Time Series Analysis, 4, 221–238. HURST, H.E. (1951): Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers, 116, 770–799. MAKARAVA, N., BENMEHDI, S., HOLSCHNEIDER, M. (2011): Bayesian estimation of selfsimilarity exponent. Phys Rev E, submitted. MANDELBROT, B.B., NESS, J.W.V. (1968): Fractional Brownian Motions, Fractional Noises and Applications. SIAM Review 10, 422–437. MCCOY, E.J., WALDEN,A.T. (1996): Wavelet analysis and synthesis of stationary long-memory processes. Journal of Computational and Graphical Statistics, 5, 25–26. ROSENBLATT, M. (1961): Independence and dependence. In Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, 2, 431–443. TAQQU, M.S. (1975): Weak convergence to fractional Brownian motion and to the Rosenblatt process. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 31, 287–302. WHITTLE, P. (1953): The Analysis of Multiple Stationary Time Series. Journal of the Royal Statistical Society, Series B, 15, 125–139.