Wavelet-Based Estimation for Seasonal Long-Memory Processes Brandon Whitcher∗ Geophysical Statistics Project, National Center for Atmospheric Research P.O. Box 3000, Boulder, CO 80307-3000 United States
[email protected] March 9, 2001
Abstract We introduce the multiscale analysis of seasonal persistent processes; i.e., time series models with a singularity in their spectral density function at one or more frequencies in [0, 1/2]. The discrete wavelet packet transform (DWPT) and a non-decimated version of it known as the maximal overlap DWPT (MODWPT) are introduced as alternative methods to spectral techniques for analyzing time series that exhibit seasonal long-memory. The approximate log-linear relationship between the wavelet packet variance and frequency is utilized to produce a least-squares estimator of the fractional difference parameter. Approximate maximum likelihood estimation is performed by replacing the variance/covariance matrix with a diagonalized matrix based on the DWPT. Simulations are performed to compare the wavelet-based techniques with those based on the spectral estimates, for both least-squares and maximum likelihood procedures. We find that least-squares estimation is quite poor when assuming a log-linear relationship between the spectral density function (wavelet packet variance) and frequency (scale) around the Gegenbauer frequency for spectral (wavelet-based) procedures. Maximum likelihood is more robust when using either the true spectral density function or the log-linear approximation. An application of this methodology to atmospheric CO2 measurements is also presented. Keywords. Discrete wavelet packet transform, Gegenbauer process, maximum likelihood estimation, ordinary least squares, time series, wavelet variance.
∗
This work was performed while the author was a Research Scientist at EURANDOM, P.O. Box 513, 5600 MB Eindhoven, The Netherlands.
1
Introduction
A simple generalization of the fractional difference process (or fractional ARIMA model) was mentioned, in passing, by Hosking (1981) and allows the singularity in the spectral density function (SDF) of the process to be located at any frequency 0 < f < 1/2. Such a process has been referred to as a Gegenbauer process (Gray et al. 1989) and also a seasonal persistent (SP) process (Andˇel 1986). We prefer the latter term because it more accurately and concisely describes the content of the time series. That is, a sinusoid of particular frequency is associated with the singularity present in the spectrum thus causing a persistent oscillation in the process. The autocovariance sequence for such a process is a hyperbolically decaying oscillation. While Gray et al. (1989) fit an SP process to the Wolfer sunspot data, where short-range dependence was allowed through fitting ARMA components, recent attention with respect to seasonal long-memory has also appeared in, e.g., Ray (1993), Ooms (1995), Lobato (1997) and Arteche and Robinson (1999). In recent years the discrete wavelet transform (DWT) has become a useful alternative to the discrete Fourier transform with applications in a wide variety of disciplines – especially in time series analysis. While both are orthonormal transforms, the basis functions for the DWT are dilated and translated versions of a compactly supported function in time, and thus correspond to approximate band-pass filters. In contrast, the sine and cosine waves which form the basis functions for the Fourier transform have infinite support in time and cover the so-called ‘Fourier frequencies’ in the spectral domain. By giving up frequency resolution, wavelets have the ability to capture timevarying features across different frequency intervals. This is useful when one considers an observed time series to come from a superposition of ‘oscillations’ not necessarily constant in amplitude or phase over time. Here we are interested in developing estimation procedures for covariancestationary time series models, and therefore ignore the inherent time-localization property of the DWT. Instead we make use of the fact that wavelets, as approximations to band-pass filters, are able to adapt to a wide variety of time series models (specifically SP processes) and produce parameter estimates that are comparable, and in some cases superior, to those obtained via Fourier-based methodology. The DWT has been widely used in the analysis of time series which exhibit long-range dependence; usually characterized by a slowly decaying autocovariance sequence or SDF with a singularity at the frequency f = 0. The DWT is efficiently implemented through a series of filtering and downsampling operations via the pyramid algorithm (Mallat 1989). Let {hl } and {gl } denote the unit 1
scale wavelet (high-pass) and scaling (low-pass) filters, respectively. Let H(f ) and G(f ) denote their transfer or gain functions (Fourier transforms), respectively. Arbitrary scale wavelet filters {hj,l } may be obtained through the inverse discrete Fourier transform of Hj (f ) = H(2j−1 f )
j−2 Y
G(2l f ),
for |f | ≤ 1/2.
(1.1)
l=0
The key feature of these wavelet filters is that Hj (0) = 0 for all j, and hence, the effect of the singularity in the spectrum of a fractional difference (FD) process is essentially eliminated for each scale of wavelet coefficients. The downsampling operation, represented by the multiplicative factor of frequency in (1.1), corresponds to stretching and folding the SDF of the filtered coefficients thereby ‘flattening’ the spectra of the wavelet coefficients even further (Whitcher et al. 1998; Percival and Walden 2000, Sec. 9.1). The result of these operations is that the wavelet coefficients of a long memory process are approximately uncorrelated. Although the DWT works very well for long-memory processes, it fails to approximately decorrelate processes with more general SDFs. An example of such a process is the MA(1) time series model Xt = ²t − θ²t−1 where its SDF varies rapidly in the higher frequencies as θ → −1. The orthonormal basis associated with the DWT corresponds to a partitioning of the frequency axis that is inappropriate for this MA(1) process. Figure 1 shows the spectrum for an FD process with parameter d = 0.4 along with the natural frequency partition produced by the DWT. This partition is so effective in producing approximately uncorrelated wavelet coefficients because it takes finer and finer partitions of the spectrum as f → 0. The spectrum of an MA(1) process with θ = −0.99, also shown in Figure 1, is rapidly changing as f → 1/2 – the opposite direction to the FD process. The DWT coefficients will exhibit correlation structure because of this mismatch. A generalization of the DWT – the discrete wavelet packet transform (DWPT) – involves a redundant partitioning of the frequency axis depicted as rectangles in the frequency partition plots in Figure 1 (an explanation of the partitioning of the DWPT is provided in Section 3.1). An orthonormal transform may be thought of as a subset of basis functions (disjoint frequency partitions) of the DWPT that cover [0, 1/2]. The DWT is one example of such a subset and the ‘reversed’ DWT in Figure 1 is another example more appropriate for decorrelating the spectrum from the MA(1) process shown here. The DWPT allows one to develop an orthonormal transform that adapts to the underlying SDF of a given process, thus producing approximately uncorrelated wavelet coefficients. We will use this property to develop estimation techniques for a particular class of time series models. The model of interest here, the SP process, is introduced in Section 2. Its spectrum and 2
FDP(0.4) Spectrum
DWT Freqency Partition
15
0 1 2 3
5
dB
Level
10
4
5
6
0
0.1
0.2
0.3
0.4
0.5
0.0
0.1
0.2
0.3
0.4
0.5
Frequency
Frequency
MA(1, θ = -0.99) Spectrum
‘Reverse’ DWT Freqency Partition
0
0.0
0
-10
1 2
-20
Level
3
-30
dB
4
5
-40
0.0
0.1
0.2
0.3
0.4
0.5
6
Frequency
0.0
0.1
0.2
0.3
0.4
0.5
Frequency
Figure 1: Spectral density functions and ‘appropriate’ frequency partitions using the DWT and DWPT. For an FD process with d = 0.4 the frequency partitioning induced by the DWT splits up the frequency axis into finer pieces as f → 0 – where the SDF is rapidly changing. For a moving-average (MA) process with θ = −0.99 the frequency partition of the DWT is not able to adapt to its SDF. An alternative frequency partition is a reversal of the DWT frequency partition available through the DWPT.
asymptotic expression of the autocorrelation function are provided. Section 3.1 gives a brief introduction to the DWPT and a non-decimated version of it – the maximal overlap DWPT. Parameter estimation for SP processes, through frequency and wavelet domain techniques, are provided in Section 4. For least-squares regression, we utilize an approximate log-linear relationship between the periodogram, or wavelet packet variance, and a translated frequency axis to estimate the FD parameter. For maximum likelihood estimation, we take advantage of the transforms, both Fourier and wavelet packet, to approximately diagonalize the variance/covariance matrix of an SP process. Results from simulations using both least-squares regression and maximum likelihood estimation
3
are given in Section 5. We analyze monthly atmospheric CO2 measurements in Section 6, fitting a 2-factor SP process via maximum likelihood and provide confidence intervals. Conclusions are provided in Section 7.
2
Seasonal Persistent Processes
Let {Yt } be a stochastic process such that ¡ ¢δ 1 − 2φB + B 2 Yt = ²t
(2.1)
is a stationary process, then {Yt } is a seasonal persistent (SP) process. Two parameters characterize an SP process: the long-memory or fractional difference parameter δ, which measures the rate of increase of the singularity in the spectrum; and the location of the spectral singularity φ. Gray et al. (1989) showed that {Yt } is stationary and invertible for |φ| = 1 and −1/4 < δ < 1/4 or |φ| < 1 and −1/2 < δ < 1/2. Clearly, the definition of an SP process also includes an FD process. When φ = 1 we have that {Yt } is an FD process (Hosking 1981) with parameter d = 2δ. If {²t } is a Gaussian white noise process, then {Yt } is also called a Gegenbauer process since (2.1) may be written as an infinite moving-average process via Yt =
∞ X
(δ)
Ck,φ ²t−k ,
k=0 (δ)
where Ck,φ is a Gegenbauer polynomial (Rainville 1960, Ch. 17). The SDF of {Yt } is given by SY (f ) = σ²2 {2| cos(2πf ) − φ|}−2δ ,
for |f | < 1/2,
(2.2)
so that SY (f ) becomes unbounded at frequency fG ≡ (cos−1 φ)/(2π), sometimes called the Gegenbauer frequency. Figure 2 shows the spectra of four example SP processes with varying FD parameters and Gegenbauer frequencies. These are identical to the example processes found in Andˇel (1986). The first two SP processes correspond to an annual oscillation at two different levels of persistence observed through monthly samples. The third SP process exhibits a persistent highfrequency oscillation and the fourth SP process has a very low-frequency oscillation. The autocovariance sequence {sτ } of an SP process may be expressed via the relationship between it and (2.2)
Z sτ =
1/2
−1/2
SY (f ) cos(2πf τ ) df.
4
(2.3)
20 15 0.2
0.3
0.4
0.5
0.0
0.1
0.2
0.3
Frequency
Frequency
Figure 10
Figure 11
0.4
0.5
0.4
0.5
20 15 10 5 0 -5
-5
0
5
10
dB
15
20
25
0.1
25
0.0
dB
10
dB
-5
0
5
10 -5
0
5
dB
15
20
25
Figure 9
25
Figure 8
0.0
0.1
0.2
0.3
0.4
0.5
0.0
0.1
0.2
Frequency
0.3
Frequency
Figure 2: Spectral density functions (in decibels) for SP processes with (8) δ = 0.4, fG = 1/12, (9) δ = 0.2, fG = 1/12, (10) δ = 0.3, fG = 0.352 and (11) δ = 0.3, fG = 0.016. These are Figures 8-11 in Andˇel (1986).
An explicit solution for {sτ } is known only for special cases (Andˇel 1986). Gray et al. (1994) showed that the autocorrelation sequence of an SP process is given by rτ ∼ τ 2δ−1 cos(2πfG τ ) as τ → ∞.
(2.4)
An obvious extension of (2.1) would be to allow multiple singularities to appear in the SDF of the process. Consider the zero mean k-factor SP process given by k Y ¡ ¢δ 1 − 2φi B + B 2 i Yt = ²t ,
(2.5)
i=1
exhibiting k singularities located at the frequencies fi = (cos−1 φi )/(2π), i = 1, . . . , k, in its spec-
5
trum (k)
SY (f ) ≡ σ²2
k Y {2| cos(2πf ) − φi |}−2δi ,
for |f | < 1/2.
i=1
This process was introduced by Woodward et al. (1998) under the name ‘k-factor Gegenbauer process.’ A 2-factor SP process will be used to model monthly atmospheric CO2 measurements in Section 6.
3
Discrete Wavelet Packet Transforms
The orthonormal discrete wavelet transform (DWT) has a very specific band-pass structure that partitions the spectrum of a long-memory process finer and finer as f → 0, where the spectrum is unbounded. This is done through a succession of low-pass and high-pass filtering operations; see, e.g., Percival and Walden (2000, Ch. 4) for an in depth introduction to the DWT. In order to exploit the approximate decorrelation property for SP processes we need to generalize the partitioning scheme of the DWT. This is easily done by performing the discrete wavelet packet transform (DWPT) on the process (Wickerhauser 1994, Ch. 7). Instead of one particular filtering sequence, the DWPT executes all possible filtering combinations to obtain a wavelet packet tree, denoted by T = {(j, n) : j = 0, . . . , J; n = 0, . . . , 2j − 1}. An orthonormal basis B ⊂ T is obtained when a collection of DWPT coefficients is chosen, whose ideal band-pass frequencies are disjoint and cover [0, 1/2].
3.1
The Discrete Wavelet Packet Transform
We start off with a vector of observations X, and let h0 , . . . , hL−1 be the unit scale wavelet (highpass) filter coefficients from a Daubechies compactly supported wavelet family (Daubechies 1992) of even length L. In the future, we will denote the Daubechies family of extremal phase compactly supported wavelets with D(L) and the Daubechies family of least asymmetric compactly supported wavelets with LA(L). An interesting family of wavelets that better approximate band-pass filters, for comparable lengths, is the minimum-bandwidth discrete-time wavelets of Morris and Peravali (1999). We will denote them by MB(L). The unit scale scaling (low-pass) coefficients g0 , . . . , gL−1 may be computed via the quadrature mirror relationship gl = (−1)l+1 hL−l−1 ,
6
l = 0, . . . , L − 1.
j=0
W0,0 ≡ X ↓ G(Nk ) ↓2
j=1
↓
H(Nk ) ↓2
W1,0
W1,1
↓ G(2k N) ↓2
j=2
↓ H(2k N) ↓2
W2,0
j=3
↓ G(2k N) ↓2
W2,1
↓ H(2k N) ↓2
W2,2
W2,3
↓ G(4k N) ↓2
↓ H(4k N) ↓2
↓ H(4k N) ↓2
↓ G(4k N) ↓2
↓ G(4k N) ↓2
↓ H(4k N) ↓2
↓ H(4k N) ↓2
↓ G(4k N) ↓2
W3,0
W3,1
W3,2
W3,3
W3,4
W3,5
W3,6
W3,7
0
1 16
1 8
3 16
1 4
5 16
3 8
7 16
Frequency Figure 3: Graphical representation of the DWPT applied to a dyadic length vector X. The vector of wavelet coefficients associated with scale j and interval n is denoted by Wj,n , where W0,0 ↓2
is equivalent to the original data vector X. The notation X → H(f ) → W means that the length N vector X has been convolved with the filter {hl }, whose Fourier transform is H(f ), and downsampled by two in order to produce a new vector W of length N/2. Now define
½ un,l ≡
gl , if n mod 4 = 0 or 3; hl , if n mod 4 = 1 or 2,
(3.1)
to be the appropriate filter at a given node of the wavelet packet tree. This ordering is necessary to force the frequency intervals to be monotonically increasing; it is also called a sequency ordering by Wickerhauser (1994). Let Wj,n,t denote the tth element of the length Nj ≡ N/2j wavelet coefficient vector Wj,n , (j, n) ∈ T . Given the DWPT coefficient vector Wj−1,b n2 c , of length Nj−1 , we can directly compute Wj,n,t via Wj,n,t ≡
L−1 X
un,l Wj−1,b n2 c,2t+1−l
mod Nj−1 ,
t = 0, 1, . . . , Nj − 1.
l=0
Note, the recursion is started off with the data such that W0,0 ≡ X. This is one possible formulation of the DWPT, we may also directly filter the observations by generating unique filter coefficients at each level or apply a series of matrix operations; see Percival and Walden (2000, Ch. 6) for more details on this formulation and others. As with the DWT, the DWPT is most efficiently computed using a pyramid algorithm (Mallat 1989) of filtering and downsampling steps. Figure 3 provides a 7
graphical representation of the filtering operations required to construct the vectors of a wavelet ↓2
packet tree down to level 3. The notation X → H(f ) → W means that the length N vector X has been convolved with the filter {hl }, whose Fourier transform is H(f ), and downsampled by two in order to produce a new vector W of length N/2. The algorithm has O(N log N ) operations. An analysis (decomposition) of variance of the original time series may be performed via the DWPT by selecting an orthonormal basis B; i.e., kXk2 =
X
kWj,n k2 .
(j,n)∈B
Let us define the wavelet packet variance ν 2 (λj,n ) associated with frequencies in the interval λj,n ≡ [−(n + 1)/2j+1 , −n/2j+1 ) ∪ (n/2j+1 , (n + 1)/2j+1 ] to be the variance of the DWPT coefficients Wj,n . The wavelet packet variance is related to the variance of the original process through the squared gain function of its corresponding wavelet packet filter {uj,n } via
Z 2
ν (λj,n ) ≡
1/2
−1/2
|Uj,n (f )|2 SX (f ) df,
(3.2)
where Uj,n (f ) is the transfer (or gain) function associated with Wj,n . A unique sequence of filtering and downsampling steps are required to produce Uj,n (f ), following the flow-chart presented in Figure 3, and are not presented here. The interested reader is referred to Percival and Walden (2000, Ch. 6). From (3.2) the wavelet packet variance may be interpreted as a piecewise constant approximation to the underlying SDF over the frequency interval λj,n . The specific frequency bands used by ν 2 (λj,n ) are determined by the choice of orthonormal basis B. An unbiased DWPT-based estimator is given by Nj /2−1 X 1 2 νˆ (λj,n ) ≡ j Wj,n,t , 2 (Nj /2 − L0j ) 0 2
t=Lj
where L0j ≡ d(L − 2)(1 − 1/2j )e. When computing the DWPT in practice, the time series is assumed to be periodic. The estimator νˆ2 (λj,n ) is unbiased because all wavelet coefficients that are affected by the boundary assumption have been removed.
3.2
The Maximal Overlap Discrete Wavelet Packet Transform
Definition of the maximal overlap DWPT (MODWPT) is straightforward, given that of the DWPT. Simply define a new filter u ˜n,l ≡ un,l /21/2 , replace it with the filter for computing DWPT coefficients 8
in (3.1) and do not downsample the filtered output. Hence, the vector of MODWPT coefficients f j,n is computed recursively via W fj,n,t ≡ W
L−1 X
fj−1,b n c,t−2j−1 l u ˜n,l W 2
l=0
mod N ,
t = 0, 1, . . . , N − 1,
f 0,0 ≡ where each vector of MODWPT coefficients has length N (to begin the recursion define W X). This formulation leads to efficient computation using a pyramid-type algorithm (Percival and Walden 2000, Ch. 6). As with the DWPT, the MODWPT is an energy preserving transform and we may define an unbiased MODWPT-based estimator of the wavelet packet variance ν 2 (λj,n ) to be
N −1 X 1 2 fj,n,t ν˜ (λj,n ) ≡ W , N − Lj + 1 2
t=Lj −1
where Lj = (2j − 1)(L − 1) + 1. As with the DWPT-based estimator, all coefficients affected by the boundary have been removed for the calculation. Given a particular orthonormal basis B, we may also reconstruct X by projecting the MODWPT coefficients back onto the filter coefficients via Xt =
j −1 X LX
fj,n,t+l u ˜j,n,l W
mod N ,
t = 0, . . . , N − 1.
(3.3)
(j,n)∈B l=0
ej,n be the wavelet packet detail associated with the frequency interval λj,n , then we have Let D Lj −1
ej,n,t ≡ D
X
fj,n,t+l u ˜j,n,l W
mod N ,
t = 0, . . . , N − 1,
l=0
and an additive decomposition in (3.3) may be rewritten as Xt =
P (j,n)∈B
ej,n,t . These details are D
associated with zero-phase filters and therefore line up perfectly with features in the original time series X at each time t (Percival and Walden 2000, Ch. 6).
4
Parameter Estimation for Seasonal Persistence
Common techniques for estimating the long-memory parameter in a fractional ARIMA model have recently been extended to SP processes, including log-periodogram and Gaussian semi-parametric analysis (Arteche and Robinson 2000). As an alternative to the periodogram, the wavelet variance has proven quite effective in estimating the long-memory parameter in fractional ARIMA models (McCoy and Walden 1996; Jensen 1999b; Jensen 1999a). Here we provide an estimation procedure 9
for the FD parameter δ in a SP processes based on ordinary least squares (OLS) regression of the wavelet packet variance and a maximum likelihood procedure for the joint estimation of the FD parameter δ and Gegenbauer frequency fG .
4.1
Ordinary Least Squares Regression
Log-periodogram regression is a very popular method for estimating the FD parameter for longmemory processes; see, e.g., Geweke and Porter-Hudak (1983) and Robinson (1995). Let us derive an OLS estimator for the FD parameter δ of an SP process. Applying the logarithmic transform to both sides of (2.2) yields log SY (f ) = −2δ log 2| cos(2πf ) − cos(2πfG )|,
(4.1)
(p)
and suggests an OLS regression of log SbY (f ) on log 2| cos(2πf ) − cos(2πfG )| in order to estimate the FD parameter δ, where (p) SbY (f )
1 ≡ N
¯N ¯2 ¯X ¯ ¯ ¯ Yt exp(−i2πf t)¯ ¯ ¯ ¯
(4.2)
t=1
is the periodogram. The FD parameter may then be obtained from the slope of this regression β¯ ¯ via δ¯ = −β/2. One alternative estimator to the periodogram is the multitaper spectral estimator (mt) SbY (f ) which applies K orthogonal data tapers to Y0 , . . . , YN −1 , computes the periodogram of each
new series (called the direct spectral estimator), and then averages these direct spectral estimators over all Fourier frequencies; for more details see Thomson (1982) and Percival and Walden (1993, Ch. 7). A simple modification to (4.1) was suggested by Arteche and Robinson (1999) and consists of replacing the explicit parametric form of the spectrum for an SP process with a simple log-linear relationship, yielding log SY (f ) ≈ −2δ log 2|f − fG |.
(4.3)
This relationship may be seen in Figure 4, where the logarithm of the SDF for various SP processes is plotted against the logarithm of frequencies centered at the Gegenbauer frequency. While the loglinear relationship for long-memory processes is quite good for frequencies 0 < f < 1/4 (McCoy and Walden 1996), the relationship for SP processes depends heavily on the location of the Gegenbauer frequency. Looking at the first two curves, corresponding to processes with the same Gegenbauer frequency fG = 1/12 but different FD parameters, we see that the slope of the curves are linear at the lower frequencies and then diverge from this line dramatically as f → 1/2. When looking at 10
100.0 50.0 10.0
Spectrum
5.0
SPP(1/12, 0.4) 1.0
SPP(1/12, 0.2) SPP(0.35, 0.3)
0.5
SPP(0.016, 0.3)
0.00050.0010
0.00500.0100 0.05000.1000 2 * Frequency
0.50001.0000
Figure 4: Approximate log-linear relationship between the spectral density function of SP processes, given in Figure 2, and frequency. There are two curves for each process, the longer is log SY (f ) versus log 2|f − fG | for frequencies fG < f < 1/2 while the shorter curve is log SY (f ) versus log 2|f − fG | for frequencies 0 < f < fG .
the last two curves, corresponding to processes with the same FD parameter δ = 0.3, but different Gegenbauer frequencies, the two curves again start out with similar slope at the lower frequencies, but the SP process with fG = 0.352 bends downward and parallels the first curve with FD parameter δ = 0.4. Estimation based on the full range of frequencies for the SP(δ = 0.3, fG = 0.352) process would therefore produce a biased estimate of the FD parameter. From this brief list of possible SP processes, the range of frequencies included in the regression will seriously effect the estimation of δ in practice. We return to this issue in our simulation study (Section 5.1). McCoy and Walden (1996) observed that the wavelet variance of a long-memory process obeys a log-linear relationship with the level of the wavelet transform and went on to introduce a maximum likelihood estimator for the FD parameter. Jensen (1999b) studied the ability of an OLS regression procedure, based on the wavelet variance, to estimate the FD parameter of a fractional ARIMA process and compared it to the log-periodogram method of Geweke and Porter-Hudak (1983) and the McCoy-Walden wavelet estimator. We extend Jensen’s previous results to formulate a MODWPT variance estimator for the FD
11
parameter of SP processes. If we use the argument that the wavelet packet variance is a piecewise constant approximation to the true SDF, then an equivalent expression to (4.1) for ν 2 (λj,n ) is given by log ν 2 (λj,n ) = −2δ log 2| cos(2πµj,n ) − cos(2πfG )|,
(4.4)
where µj,n is the midpoint of the frequency interval λj,n . In other words, the (j, n)th wavelet packet variance covers the entire interval of frequencies λj,n but it suffices to represent this interval by its midpoint here. As in (4.1) the slope from the OLS regression of log ν˜2 (λj,n ) on log 2| cos(2πµj,n ) − cos(2πfG )|, appropriately normalized, provides an estimate of the FD parameter of an SP process. Simplifying (4.4) to just be a function of frequency, and not the full parametric form of the SDF, yields log ν 2 (λj,n ) ≈ −2δ log 2|µj,n − fG |.
(4.5)
For both formulations we prefer to utilize the MODWPT coefficients in order to estimate ν 2 (λj,n ) since Percival (1995) showed that a non-decimated wavelet estimator of variance is more efficient than its decimated counterpart.
4.2
Approximate Maximum Likelihood Estimation
With current computing power, maximum likelihood (ML) estimation is a viable alternative to the least-squares methods previously proposed. An obvious starting point for approximate ML estimation is a discrete version of the Whittle likelihood (Beran 1994, Ch. 6). A popular estimator for the FD parameter of long-memory processes, here we develop a simple likelihood based on the periodogram estimate of an SP process. The Whittle likelihood for an SP process Y with parameters θ ≡ (δ, fG , σ²2 ) is given by " # N (p) SbY (fk ) 1 X Q(θ | Y) ≡ log SY (fk ; θ) + N SY (fk ; θ) k=1 ¸ N · 1 X 4δ | cos(2πfk ) − φ|2δ b(p) = log σ²2 − 2δ log[2| cos(2πfk ) − φ|] + S (f ) k Y N σ²2 k=1
= log σ²2 −
N N 4δ X 2δ X (p) log[2| cos(2πfk ) − φ|] + 2 | cos(2πfk ) − φ|2δ SbY (fk ), (4.6) N σ² k=1
k=1
(p) where φ = cos(2πfG ) and SbY (f ) is the periodogram from (4.2). Substituting σ²2 with its ML
b b estimate (a function of ϑ = (δ, fG )) yields the reduced likelihood Q(ϑ), and minimizing Q(ϑ) with ˆ If we replace the parametric form of the SDF respect to ϑ will yield approximate ML estimates ϑ. 12
in (4.6) by a log-linear relationship between the true spectrum and frequency, as in (4.3), then we obtain Q∗ (θ | Y) ≡ log σ²2 −
N N 2δ X 4δ X (p) log[2|fk − fG |] + 2 |fk − fG |2δ SbY (fk ), N σ² k=1
(4.7)
k=1
which is related to the symmetric version of the so-called Gaussian semi-parametric (GS-P) likelihood of Arteche and Robinson (2000) (although (4.7) utilizes all Fourier frequencies versus only a b∗ (ϑ), and minimizsubset m). Substituting σ²2 with its ML estimate yields the reduced likelihood Q b∗ (ϑ) with respect to ϑ will yield approximate ML estimates where the true SDF is assumed to ing Q be linear around the Gegenbauer frequency on a log scale. Arteche and Robinson (2000) propose a ‘trimmed’ version of (4.7) which we investigate in Section 5.2. McCoy and Walden (1996) and Jensen (1999a) have both provided an approximate waveletbased ML estimator to the FD parameter for long-memory time series models. The DWT provides a simple and effective method for approximately diagonalizing the variance/covariance of the original time series. We extend their results to the case of SP processes, where the parameters δ, fG and σ²2 define the SDF in (2.2). As with long-memory processes, we utilize the DWPT under a particular orthonormal basis B to approximately diagonalize the variance/covariance matrix of an SP process. Let Y be a realization of a zero mean stationary SP process with unknown parameters θ = (δ, fG , σ²2 ). The likelihood function for Y, under the assumption of multivariate Gaussianity, is given by L(θ | Y) ≡
¡ ¢ 1 T −1 exp −Y Σ Y/2 , Y (2π)N/2 |ΣY |1/2
where ΣY is the variance/covariance matrix of Y and |ΣY | is the determinant of ΣY . As previously alluded to, we avoid computing the exact ML estimators of the parameters of interest and instead b Y ≡ W T ΩN WB , where WB is an use the DWPT to approximately diagonalize ΣY ; i.e., ΣY ≈ Σ B N × N orthonormal matrix defining the DWPT through the basis B and ΩN is a diagonal matrix containing the band-pass variances of Y. It is convenient to work with a rescaled band-pass variance 2 ≡ σ2ω 2 ωj,n ² ¯ (λj,n ) such that
Z 2 ω ¯ j,n
≡
n+1 2j+1 n 2j+1
2j+1 df, {4[cos(2πf ) − cos(2πfG )]2 }δ
for all (j, n) ∈ B. The approximate log-likelihood function is now h i b b L(θ | Y) ≡ −2 log L(θ | Y) − N log(2π) = N log(σ²2 ) +
X
2 Nj,n log(¯ ωj,n )+
(j,n)∈B
T W 1 X Wj,n j,n , 2 2 σ² ω ¯ j,n (j,n)∈B
13
(4.8)
using the fact that the DWPT of Y is given by WB = WB Y. Differentiating (4.8) with respect to σ 2 and setting the result equal to zero, the ML estimator of σ 2 is equal to σ ˆ²2 (δ, fG )
T W 1 X Wj,n j,n ≡ . 2 N ω ¯ j,n (j,n)∈B
Replacing σ²2 with its ML estimate, we reduce the complexity of (4.8) to obtain b | Y) = N log(ˆ L(ϑ σ²2 (δ, fG )) +
X
2 Nj,n log(¯ ωj,n ),
(4.9)
(j,n)∈B
where ϑ = (δ, fG ). The reduced log-likelihood in (4.9) is now a function of only two parameters b δ and fG , whose space of possible solutions lives on Ξ = (−1/2, 1/2) × (0, 1/2). Minimizing L(ϑ) with respect to ϑ over Ξ will yield approximate ML estimates based on the DWPT.
4.3
Basis Selection Procedure
Given that we are working with time series that exhibit a wide range of characteristics, through rather loose assumptions on their SDFs, selecting the orthonormal basis for the wavelet transform is important. We want to adapt as best as possible to the underlying SDF, but only have the observations to help us with this task. For long-memory processes, the DWT works extremely well at approximately decorrelating the process (McCoy and Walden 1996). Whitcher et al. (1998) related this ability to the fact that the SDFs of the wavelet coefficient vectors are essentially flat; e.g., only varying by 3 dB for the unit scale DWT coefficients when the long-memory parameter is associated with stationary and invertible fractional ARIMA models (−1/2 < d < 1/2); see also Percival and Walden (2000, Sec. 9.1). The criterion of ‘approximately flat’ SDFs was used to select an appropriate orthonormal basis with which to simulate SP processes in Whitcher (2001). R A constant SDF is associated with a white noise process, where S(f ) df = σ 2 . Several methods have been proposed in order to test for white noise in time series, such as the cumulative periodogram and portmanteau test; see, e.g., Brockwell and Davis (1991). A cumulative sum of squares (CSS) test statistic was proposed by Brown et al. (1975) for testing the constancy of regression relationships over time and successfully applied to output from the DWT by Whitcher et al. (1998). Figure 5 shows results from a small simulation study comparing the three proposed methods (cumulative periodogram, portmanteau and CSS) for selecting an orthonormal basis. Realizations from an SP(fG = 1/12, δ = 0.4) process of length N = 1024 were generated using an exact timedomain method (Hosking 1984). A partial DWPT (J = 6) was applied using the MB(8) wavelet 14
Portmanteau Test
Cumulative Periodogram Test
0
0
1
1
2 3
2
Level
Level
3
4
4
5
5
6
6 0.0
0.1
0.2
0.3
0.4
0.5
Frequency
0.0
0.1
0.2
0.3
0.4
0.5
Frequency
Cumulative Sum of Squares Test 0
1 2
Level
3
4
5 6
0.0
0.1
0.2
0.3
0.4
0.5
Frequency
Figure 5: Wavelet packet table (J = 6) for three white noise tests applied to an SP process (N = 1024) with δ = 0.4 and fG = 1/12, summed over 500 simulations. The frequency a particular node of the wavelet packet tree T was chosen in the orthonormal basis is given by the shade of the rectangle – darker shades correspond to higher frequencies. All hypothesis tests were performed at the α = 0.05 level of significance.
filter. As seen from Figure 5, all methods capture the general shape of the SDF (the true SDF is in the upper left-hand plot of Figure 2). This is not surprising since the goal is to partition the spectrum finer as it changes more rapidly and reflects the fact that the basis B is providing a piecewise constant approximation to the true spectrum. However, the portmanteau and cumulative periodogram tests appear to select a basis which closely matches the true SDF more often than the CSS test (this is seen by the higher number of ‘dark’ rectangles in the portmanteau and cumulative periodogram wavelet packet tables). In the following simulations (Section 5), the preferred method for selecting an orthonormal basis B is the portmanteau test (α = 0.05). One disadvantage of the cumulative periodogram test is that, by applying the DFT to each 15
vector of wavelet packet coefficients, the number of values used in the test is halved. Given the inherent downsampling of the DWPT, each level j of the DWPT has only Nj = N/2j coefficients, and hence, the cumulative periodogram test will only contain Nj /2 periodogram ordinates. This is quite restrictive on the depth of the DWPT for a given sample size. Using reflection at the boundaries when performing the DWPT, as opposed to assume a periodic time series, and therefore effectively doubling the number of wavelet coefficients at each node of the wavelet packet table may be an effective way of overcoming this difficulty.
5
Simulations
Here we provide simulation studies on OLS regression and maximum likelihood based procedures in order to fit SP process time series models to observed data. All SP processes were generated using an exact time-domain technique (Hosking 1984). The estimation procedures here were implemented in the Ox programming language (Doornik 1999). The autocovariance sequences for SP processes were computed using numeric integration via (2.3) for all lags instead of using the available asymptotic approximation (2.4).
5.1
OLS Estimates using all Fourier Frequencies
We first explore estimators based on an OLS regression over all available frequencies (wavelet scales), similar to the original method proposed by Geweke and Porter-Hudak (1983) for longmemory processes. Realizations were calculated from the four models given in Figure 2, where the estimated Gegenbauer frequency f¯G is given by the Fourier frequency associated with the maximum periodogram ordinate. Table 1 summarizes the results of OLS estimates based on (4.1) using the periodogram and multitaper (K = 5 sine data tapers, Riedel and Sidorenko 1995) spectral estimators, and based on (4.4) using the unbiased MODWPT-based wavelet variance estimator. These results come with the assumption of a specific parametric form for the underlying SDF (2.2). With respect to the wavelet-based estimator, an MB(8) wavelet filter was used and B was selected using a portmanteau test on the wavelet coefficient vectors with α = 0.05. If we define a piecewise constant SDF on the frequency intervals λj,n to be proportional to ν˜Y2 (λj,n ), then we obtain a type of histogram spectral estimator. When an explicit parametric form for the underlying SDF is used, through (4.1) and (4.4), along with a relatively short time series (N = 128) the log-periodogram regression outperforms the MODWPT variance regression for the SP(δ = 0.4, fG = 1/12) and SP(δ = 0.3, fG = 0.016) 16
processes, by exhibiting less bias and a smaller standard deviation (SD). In the SP(δ = 0.2, fG = 1/12) process the MODWPT variance regression outperforms the log-periodogram regression, while for the SP(δ = 0.3, fG = 0.352) process the MODWPT variance regression shows smaller bias but a slightly larger SD. The log-multitaper regression exhibits a larger bias for all four models, but a smaller SD. The bias may be due to the fact that the multitaper spectral estimator spreads out spectral peaks while reducing the overall variation of the estimated spectrum. For all simulations, the Gegenbauer frequency is nicely estimated by using the Fourier frequency associated with the maximum in the periodogram. A duplicate simulation study was performed for slightly longer time series of length N = 512. All three methods show improvement in estimating the FD parameter δ. The estimated Gegenbauer frequencies are identical since the same random seed was utilized in the pseudo-random number generator. When comparing the log-periodogram and MODWPT variance regression estimators, no method outperforms the other. The empirical bias is roughly the same with the log-periodogram regression showing a slightly smaller SD for all four models. The log-multitaper regression still exhibits elevated bias for all models, but reduced SD. Table 2 uses the approximate log-linear relationship between the true SDF and translated frequency axis given in (4.3) and (4.5) for the Fourier-based spectral estimators and wavelet packet variance estimator, respectively. The results are clearly much worse when we assume a log-linear relationship between estimates of the SDF and the translated frequency axis versus the true parametric form for the SDF. When the approximate log-linear relationship between the SDF and frequency axis is used, through (4.3) and (4.5), the performance of all three procedures depends heavily on the particular time series model. For SP(δ = 0.4, fG = 1/12) and SP(δ = 0.3, fG = 0.016) processes, all three methods produce biases in the range of 0.10–0.20 with increased standard deviations. All three methods show increased bias for the SP(δ = 0.2, fG = 1/12) process, less than those found for the previous SP process models but still large. The least affected model is the SP(δ = 0.3, fG = 0.352) process, where there is no serious increase in bias or SD. The results here are not surprising when looking back at Figure 4, which displays how well the approximate log-linear relationship holds for the SDFs of these SP process. The problem appears to be that at higher frequencies, this relationship breaks down – a well-known result from longmemory processes. However, the log-linear relationship depends on the Gegenbauer frequency and therefore varies across all possible models. Modifications, such as trimming the number of Fourier
17
Model (δ, fG )
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE
MODWPT Periodogram ¯ ¯ δ fG δ¯ f¯G N = 128 0.4344 0.0830 0.4262 0.0837 0.0344 −0.0003 0.0262 0.0004 0.0735 0.0090 0.0753 0.0086 0.0811 0.0090 0.0797 0.0086 0.2037 0.0809 0.1844 0.0866 0.0037 −0.0024 −0.0156 0.0032 0.0726 0.0445 0.0835 0.0432 0.0727 0.0445 0.0850 0.0434 0.3040 0.3555 0.2827 0.3552 0.0040 0.0035 −0.0173 0.0032 0.1067 0.0230 0.0922 0.0253 0.1068 0.0232 0.0938 0.0255 0.3159 0.0177 0.3011 0.0167 0.0159 0.0017 0.0011 0.0007 0.0627 0.0088 0.0569 0.0075 0.0647 0.0090 0.0569 0.0075 N = 512 0.4066 0.0835 0.4129 0.0837 0.0066 0.0002 0.0129 0.0004 0.0412 0.0021 0.0342 0.0019 0.0417 0.0021 0.0365 0.0019 0.1933 0.0812 0.1946 0.0805 −0.0067 −0.0021 −0.0054 −0.0028 0.0495 0.0185 0.0346 0.0165 0.0500 0.0186 0.0350 0.0168 0.2989 0.3523 0.3005 0.3522 −0.0011 0.0003 0.0005 0.0002 0.0489 0.0057 0.0399 0.0052 0.0489 0.0057 0.0399 0.0052 0.3046 0.0153 0.3010 0.0150 0.0046 −0.0007 0.0010 −0.0010 0.0345 0.0035 0.0257 0.0039 0.0348 0.0035 0.0257 0.0040
Multitaper ¯ δ f¯G 0.4674 0.0674 0.0674 0.0953 0.2103 0.0103 0.0608 0.0617 0.3366 0.0366 0.0838 0.0915 0.3300 0.0300 0.0437 0.0530
0.0837 0.0004 0.0087 0.0087 0.0853 0.0020 0.0417 0.0417 0.3558 0.0038 0.0235 0.0238 0.0174 0.0014 0.0086 0.0087
0.4249 0.0249 0.0287 0.0380 0.2033 0.0033 0.0290 0.0292 0.3137 0.0137 0.0350 0.0375 0.3108 0.0108 0.0216 0.0242
0.0834 0.0001 0.0020 0.0020 0.0820 −0.0013 0.0184 0.0185 0.3528 0.0008 0.0062 0.0062 0.0149 −0.0011 0.0038 0.0039
Table 1: Simulation results for OLS estimators of the FD parameter δ¯ and periodogram-based estimators of the Gegenbauer frequency f¯G for various SP processes, using 500 simulations with N observations. The regression model utilizes the true SDF for seasonal persistent processes in (2.2), hence, the regression equation is given by (4.1) for the spectral estimators, and (4.4) for the MB(8) wavelet packet variance estimator. The orthonormal basis was adaptively chosen using a portmanteau test (α = 0.05) on the wavelet packet vectors.
18
Model (δ, fG )
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE
MODWPT Periodogram ¯ ¯ δ fG δ¯ f¯G N = 128 0.5271 0.0830 0.5137 0.0837 0.1271 −0.0003 0.1137 0.0004 0.0874 0.0090 0.0905 0.0086 0.1542 0.0090 0.1453 0.0086 0.2539 0.0809 0.2217 0.0866 0.0539 −0.0024 0.0217 0.0032 0.0980 0.0445 0.0967 0.0432 0.1119 0.0445 0.0991 0.0434 0.3184 0.3555 0.2946 0.3552 0.0184 0.0035 −0.0054 0.0032 0.1167 0.0230 0.0942 0.0253 0.1181 0.0232 0.0943 0.0255 0.4847 0.0177 0.4561 0.0167 0.1847 0.0017 0.1561 0.0007 0.1034 0.0088 0.0898 0.0075 0.2117 0.0090 0.1801 0.0075 N = 512 0.4630 0.0835 0.4896 0.0837 0.0630 0.0002 0.0896 0.0004 0.0506 0.0021 0.0413 0.0019 0.0808 0.0021 0.0986 0.0019 0.2294 0.0812 0.2316 0.0805 0.0294 −0.0021 0.0316 −0.0028 0.0592 0.0185 0.0388 0.0165 0.0661 0.0186 0.0500 0.0168 0.2973 0.3523 0.3073 0.3522 −0.0027 0.0003 0.0073 0.0002 0.0511 0.0057 0.0418 0.0052 0.0512 0.0057 0.0424 0.0052 0.4585 0.0153 0.4551 0.0150 0.1585 −0.0007 0.1551 −0.0010 0.0582 0.0035 0.0410 0.0039 0.1689 0.0035 0.1604 0.0040
Multitaper ¯ δ f¯G 0.5665 0.1665 0.0806 0.1849 0.2552 0.0552 0.0719 0.0906 0.3520 0.0520 0.0811 0.0963 0.4965 0.1965 0.0705 0.2087
0.0837 0.0004 0.0087 0.0087 0.0853 0.0020 0.0417 0.0417 0.3558 0.0038 0.0235 0.0238 0.0174 0.0014 0.0086 0.0087
0.5062 0.1062 0.0345 0.1117 0.2425 0.0425 0.0336 0.0542 0.3219 0.0219 0.0354 0.0416 0.4705 0.1705 0.0329 0.1737
0.0834 0.0001 0.0020 0.0020 0.0820 −0.0013 0.0184 0.0185 0.3528 0.0008 0.0062 0.0062 0.0149 −0.0011 0.0038 0.0039
Table 2: Simulation results for OLS estimators of the FD parameter δ¯ and periodogram-based estimators of the Gegenbauer frequency f¯G for various SP processes, using 500 simulations with N observations. The regression model utilizes the log-linear approximation log 2|f − fG | to the true SDF of an SP process, hence, the regression equation is given by (4.3) for the spectral estimators, and (4.5) for the MB(8) wavelet packet variance estimator. The orthonormal basis was adaptively chosen using a portmanteau test (α = 0.05) on the wavelet packet vectors.
19
frequencies, are needed in order to improve the properties of these OLS estimators. Determining the ‘correct’ number of frequencies to use in log-periodogram regression has been an important topic in recent time series literature. Robinson (1995) proposed selecting two parameters, l and m, which determine the starting frequency and subsequent number of frequencies to be used in the linear regression, respectively. No specific recommendations are given on determining the values of these parameters. Hurvich and Deo (1999) recently determined a ‘plug-in’ method for selecting the number of frequencies to be used. The optimal number of frequencies is given by m(opt) = Cn4/5 (Hurvich et al. 1998), where C depends on the unknown SDF. The ‘plug-in’ method provides an estimate of C based on a Taylor series expansion around the origin for log S(f ). Instead of concentrating on this issue, we prefer to investigate the finite sample performance of maximum likelihood-based procedures and may return to this research topic in the future.
5.2
Maximum Likelihood Estimates
To assess the performance of the approximate wavelet-based ML methodology, we simulate the four time series in Figure 2 as in Section 5.1 using numeric integration of (2.2) to compute their autocovariance sequences. Table 3 summarizes the results of this simulation study for 500 iterations. The average ML estimates δˆ and fˆG are given along with their empirical bias, standard deviation (SD) and empirical RMSE. These time series models provide an adequate representative sample of SP processes. The first two provide an annual periodicity with two different levels of persistence. The third exhibits high-frequency oscillations with a moderate level of persistence, while the fourth model exhibits a very low-frequency oscillation with Gegenbauer frequency very close to the origin. An orthonormal basis B was chosen by applying the portmanteau test to the squared wavelet coefficients for all vectors in the wavelet packet table. The first set of results in Table 3 covers wavelet-based ML estimation of ϑ when the parametric form of the SDF is used and N = 128. For the SP(δ = 0.4, fG = 1/12) process, the ML estimate δˆ for all three wavelet filters exhibits a slight negative bias while the estimated Gegenbauer frequency is very close to its true value. Reducing the level of persistence to δ = 0.2 does not diminish the ability of the method to accurately estimate the either parameter, in fact both the δˆ and fˆG show reduced (positive) bias but increased standard deviation versus δ = 0.4. The likelihood surface when δ = 0.2 is probably more diffuse than for δ = 0.4, thus producing increased RMSE. The ML estimates from the SP(δ = 0.3, fG = 0.352) process are negatively biased in the case of δˆ and negligible bias in the Gegenbauer frequency. This pattern of empirical bias and RMSE is similar for 20
Model (δ, fG )
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
MB(8) δˆ Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE
0.3791 −0.0209 0.0418 0.0467 0.2070 0.0070 0.0606 0.0610 0.2827 −0.0173 0.0582 0.0607 0.2991 −0.0009 0.0469 0.0469
Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE
0.3818 −0.0182 0.0262 0.0319 0.1969 −0.0031 0.0293 0.0294 0.2840 −0.0160 0.0309 0.0347 0.2929 −0.0071 0.0228 0.0239
LA(16) ˆ ˆ fG δ fˆG N = 128 0.0892 0.3828 0.0896 0.0059 −0.0172 0.0063 0.0252 0.0488 0.0266 0.0258 0.0517 0.0274 0.0954 0.2071 0.0969 0.0120 0.0071 0.0135 0.0497 0.0623 0.0503 0.0511 0.0627 0.0521 0.3587 0.2851 0.3585 0.0067 −0.0149 0.0065 0.0338 0.0574 0.0337 0.0344 0.0593 0.0343 0.0303 0.2953 0.0282 0.0143 −0.0047 0.0122 0.0289 0.0456 0.0299 0.0322 0.0459 0.0323 N = 512 0.0838 0.3950 0.0840 0.0005 −0.0050 0.0006 0.0049 0.0297 0.0049 0.0049 0.0301 0.0049 0.0860 0.1996 0.0854 0.0027 −0.0004 0.0020 0.0295 0.0272 0.0237 0.0296 0.0272 0.0238 0.3579 0.2843 0.3578 0.0059 −0.0157 0.0058 0.0096 0.0319 0.0097 0.0113 0.0356 0.0113 0.0180 0.2937 0.0176 0.0020 −0.0063 0.0016 0.0089 0.0230 0.0087 0.0092 0.0238 0.0089
MB(16) ˆ δ fˆG 0.3830 −0.0170 0.0468 0.0498 0.2065 0.0065 0.0602 0.0605 0.2912 −0.0088 0.0571 0.0578 0.2943 −0.0057 0.0451 0.0454
0.0883 0.0050 0.0280 0.0285 0.0978 0.0145 0.0466 0.0488 0.3558 0.0038 0.0335 0.0338 0.0254 0.0094 0.0272 0.0288
0.3910 −0.0090 0.0278 0.0292 0.2010 0.0010 0.0289 0.0290 0.2887 −0.0113 0.0309 0.0329 0.2963 −0.0037 0.0219 0.0222
0.0835 0.0002 0.0050 0.0051 0.0838 0.0005 0.0272 0.0272 0.3554 0.0034 0.0104 0.0109 0.0183 0.0023 0.0091 0.0094
Table 3: Simulation results of 500 DWPT-based approximate ML estimates δˆ and fˆG using the MB(8), LA(16) and MB(16) wavelet filters. An initial parameter estimate of δ was obtained by leastsquares estimation where fG was chosen to be the Fourier frequency with the largest contribution to the periodogram. The portmanteau test (α = 0.001) was applied to each wavelet packet vector in order to select the orthonormal basis B ⊂ T .
21
Model (δ, fG )
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
MB(8) δˆ Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE
0.4076 0.0076 0.0303 0.0312 0.2375 0.0375 0.0525 0.0645 0.2956 −0.0044 0.0531 0.0533 0.3913 0.0913 0.0336 0.0973
Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE
0.4186 0.0186 0.0261 0.0321 0.2299 0.0299 0.0263 0.0398 0.2957 −0.0043 0.0293 0.0296 0.3953 0.0953 0.0244 0.0984
LA(16) fˆG δˆ fˆG N = 128 0.0815 0.4116 0.0830 −0.0018 0.0116 −0.0003 0.0250 0.0323 0.0271 0.0251 0.0344 0.0271 0.0735 0.2381 0.0798 −0.0098 0.0381 −0.0035 0.0445 0.0547 0.0459 0.0456 0.0666 0.0460 0.3661 0.2952 0.3658 0.0141 −0.0048 0.0138 0.0330 0.0541 0.0321 0.0359 0.0543 0.035 0.0338 0.3915 0.0324 0.0178 0.0915 0.0164 0.0296 0.0347 0.0282 0.0345 0.0979 0.0326 N = 512 0.0789 0.4250 0.0782 −0.0044 0.0250 −0.0051 0.0065 0.0273 0.0068 0.0079 0.0369 0.0085 0.0751 0.2317 0.0746 −0.0082 0.0317 −0.0088 0.0295 0.0250 0.0273 0.0306 0.0403 0.0287 0.3635 0.2972 0.3641 0.0115 −0.0028 0.0121 0.0093 0.0291 0.0099 0.0148 0.0292 0.0156 0.0228 0.3960 0.0233 0.0068 0.0960 0.0073 0.0093 0.0242 0.0099 0.0116 0.0990 0.0123
MB(16) δˆ
fˆG
0.4124 0.0124 0.0317 0.0341 0.2359 0.0359 0.0545 0.0653 0.2995 −0.0005 0.0524 0.0524 0.3931 0.0931 0.0337 0.0990
0.0814 −0.0019 0.0282 0.0283 0.0789 −0.0045 0.0438 0.0440 0.3634 0.0114 0.0314 0.0334 0.0321 0.0161 0.0293 0.0334
0.4241 0.0241 0.0257 0.0352 0.2340 0.0340 0.0263 0.0430 0.3010 0.0010 0.0280 0.0281 0.3973 0.0973 0.0242 0.1003
0.0782 −0.0052 0.0064 0.0082 0.0747 −0.0086 0.0288 0.0301 0.3609 0.0089 0.0094 0.0129 0.0233 0.0073 0.0098 0.0122
Table 4: Simulation results of 500 DWPT-based approximate ML estimates δˆ and fˆG using the MB(8), LA(16) and MB(16) wavelet filters. The procedure used here is similar to that of Table 3 except that the log-linear approximation to the spectrum is used instead of its true parametric form.
22
Model (δ, fG )
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
(0.4, 0.083)
(0.2, 0.083)
(0.3, 0.352)
(0.3, 0.016)
Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE Mean Bias SD RMSE
Whittle GS-P I ˆ ˆ ˆ δ fG δ fˆG N = 128 0.4090 0.0842 0.6045 0.0802 0.0090 0.0008 0.2045 −0.0031 0.0586 0.0049 0.2268 0.0257 0.0593 0.0050 0.3053 0.0259 0.1724 0.0852 0.3615 0.0847 −0.0276 0.0018 0.1615 0.0014 0.0498 0.0202 0.2162 0.0478 0.0570 0.0203 0.2698 0.0478 0.2833 0.3548 0.5132 0.3511 −0.0167 0.0028 0.2132 −0.0009 0.0649 0.0074 0.2255 0.0581 0.0670 0.0079 0.3103 0.0581 0.2683 0.0206 0.4733 0.0212 −0.0317 0.0046 0.1733 0.0052 0.0448 0.0103 0.1321 0.0267 0.0549 0.0113 0.2179 0.0272 N = 512 0.4034 0.0835 0.4329 0.0824 0.0034 0.0002 0.0329 −0.0010 0.0270 0.0012 0.0964 0.0040 0.0272 0.0012 0.1019 0.0042 0.1913 0.0834 0.2129 0.0826 −0.0087 0.0001 0.0129 −0.0008 0.0243 0.0052 0.0918 0.0227 0.0258 0.0052 0.0927 0.0227 0.2982 0.3526 0.3373 0.3524 −0.0018 0.0006 0.0373 0.0004 0.0294 0.0023 0.0992 0.0121 0.0295 0.0024 0.1060 0.0121 0.2941 0.0164 0.4478 0.0160 −0.0059 0.0004 0.1478 0.0000 0.0193 0.0022 0.0667 0.0016 0.0202 0.0023 0.1622 0.0016
GS-P II δˆ
fˆG
0.5502 0.1502 0.0951 0.1778 0.2492 0.0492 0.0917 0.1040 0.2854 −0.0146 0.1175 0.1184 0.4415 0.1415 0.0839 0.1645
0.0816 −0.0017 0.0057 0.0059 0.0791 −0.0043 0.0175 0.0181 0.3549 0.0029 0.0430 0.0430 0.0193 0.0033 0.0061 0.0070
0.4728 0.0728 0.0603 0.0946 0.2268 0.0268 0.0496 0.0563 0.2964 −0.0036 0.0551 0.0552 0.4589 0.1589 0.0440 0.1649
0.0830 −0.0004 0.0013 0.0013 0.0816 −0.0017 0.0056 0.0059 0.3532 0.0012 0.0029 0.0031 0.0161 0.0001 0.0017 0.0017
Table 5: Simulation results of 500 approximate ML estimates δˆ and fˆG using the Whittle and Gaussian semi-parametric (GS-P) methods. Initial parameter estimates were obtained as in Table 3. GS-P I refers to the Gaussian semi-parametric method with m = 20 for N = 128 and m = 60 for N = 512, while GS-P II refers to the Gaussian semi-parametric method with m = 40 for N = 128 and m = 120 for N = 512.
23
the SP(δ = 0.3, fG = 0.016) process, which is characterized by a very low-frequency (long period) oscillation. When the sample size is increased to N = 512, we observe a reduction in both bias and SD across all combinations of SP process and wavelet filter. There is a slight improvement in Table 3 when using a wavelet filter that better approximates a band-pass filter, the MB(16), but not overwhelming evidence with respect to these specific time series models. It is recommended that the practitioner utilize several wavelet filters of different lengths and compare the resulting estimates. The longer the wavelet filter in the time domain, the better frequency resolution it will exhibit. Regardless, the ML estimates should be quite robust to the choice of wavelet filter, however, as noted in Whitcher (2001) longer wavelet filters L > 8 are needed to adequately filter SP processes. The frequency-domain equivalent to the approximate ML estimation in Table 3 is the Whittle likelihood given by (4.6). A duplicate simulation study was performed across sample sizes N ∈ {128, 512} and all four SP processes by minimizing the Whittle likelihood; see the first two columns of Table 5. For N = 128, the Whittle estimates of (δ, fG ) have smaller bias and standard deviation than the DWPT-based estimates for SP(δ = 0.4, fG = 1/12) and SP(δ = 0.3, fG = 0.016) processes. When longer time series are considered, the DWPT-based estimates are comparable to those obtained from the Whittle likelihood. For the SP(δ = 0.3, fG = 0.352) process either method estimates the parameters well, regardless of sample size. In the case of the SP(δ = 0.2, fG = 1/12) process, the DWPT-based method estimates δ better when N = 128 and as well as the Whittle method when N = 512. The DWPT-based estimates of fG for the SP(δ = 0.2, fG = 1/12) process exhibit a much larger standard deviation, most likely because the method is using a histogram approximation to the spectrum whereas the Whittle likelihood has much better frequency resolution (the Fourier frequencies). If we relax our assumption of knowing the explicit form of the spectrum when formulating the likelihood, and instead use a log-linear approximation, the ability to estimate (δ, fG ) suffers. Table 4 summarizes the simulation study, with the same setup as Table 3, but with the approximate log-linear SDF instead of the parametric spectrum of an SP process. On the whole, for N = 128, there is an increase in magnitude for estimates of the FD parameter. For the first two models, this results in a slight positive bias where there was a negative bias in Table 3. No noticeable affect is seen for the SP(δ = 0.3, fG = 0.352) process, but a substantial positive bias is observed for the SP(δ = 0.3, fG = 0.016) process in both the FD parameter and Gegenbauer frequency. A reduction
24
in both bias and standard deviation is seen when the sample size is increased to N = 512. The results are comparable to those in Table 3 for all SP processes except the last (δ = 0.3, fG = 0.016). The DWPT-based method has a difficulty estimating the parameters for this particular time series model. Just as the Whittle likelihood was the frequency-based equivalent to the parametric estimation via the DWPT, Gaussian semi-parametric estimation via (4.7) is the frequency-based equivalent to the results in Table 4. The third through sixth columns of Table 5 provide simulation results for GS-P estimation of (δ, fG ) for the four SP processes of interest. Trimming of the periodogram was performed through the parameter m, by only using Fourier frequencies [fˆG − m/2, fˆG + m/2] in constructing the likelihood. With respect to the parameter m, two values were used depending on the sample size of the original time series. The smaller value performed as well as the larger value for N = 512, but exhibited a drastically larger bias and standard deviation when N = 128. This suggests that there is a minimum number of frequencies to be included, regardless of sample size, but after a certain point no extra information is being included in the likelihood by the frequencies further away from the Gegenbauer frequency. When compared to the DWPT-based results in Table 4, the GS-P estimates of the FD parameter have larger bias and standard deviation for all SP processes when N = 128. GS-P estimates of the Gegenbauer frequency show less variability. When the sample size is increase to N = 512, the DWPT-based estimates of δ exhibit less bias and standard deviation for all SP processes except the SP(δ = 0.2, fG = 1/12) process. For this lesser persistent annual process the GS-P estimates are less biased but exhibit greater standard deviation. It should be noted that neither method estimated the FD parameter for the SP(δ = 0.3, fG = 0.016) process very well. This indicates that the log-linear approximation is worse for this process than the other three (see also Figure 4).
6
Application to Atmospheric CO2 Data
Woodward et al. (1998) analyzed monthly atmospheric CO2 measurements from the Mauna Loa Observatory, Hawaii. We analyze an extended version of these data obtained from the Carbon Dioxide Information Analysis Center (CDIAC) website.1 The current record of CO2 measurements is from 1958 through 1999, but contains several missing values in the early years. The longest continuous record begins in June 1964 with N = 428; see Figure 6. We observe an obvious annual 1
http://cdiac.esd.ornl.gov/ndps/ndp001.html
25
concentration (ppmv)
370 360 350 340 330 320 1970
1980
1990
2000
Figure 6: Time series plots of the raw Mauna Loa CO2 measurements.
periodicity and time-dependent mean structure in the series. Exploratory analysis was performed via a multiresolution analysis of the data; see Figure 7. For simplicity, a partial MODWPT (J = 6) using the standard orthonormal basis B = {(j, 1) : j = 1, . . . , 6} ∪ {(6, 0)} was applied to these data with an MB(8) wavelet filter. Although this particular orthonormal basis is not adapted to the SDF of this time series, the strongest periodicity appears e3,1 which is associated with the frequency interval λ3,1 = (1/16, 1/8] or 8–16 month oscillations. in D e3,1 appears to have a single dominant frequency (f = 1/12) with relatively conThe oscillation in D e2,1 , capturing 4–8 month oscillations, also contains stant amplitude and phase. The wavelet detail D a modest periodicity with frequency around twice that of the annual frequency. Other oscillations, of both higher and lower frequencies, are present but the process appears to be dominated by the annual and its first harmonic. We focus our model fitting on these two periodicities. Before fitting a model to the process, the time-dependent mean should be removed. This was accomplished in Woodward et al. (1998) by taking the second difference of the raw series and modeling the residuals. This second-differenced series (not shown) did not appear to contain the obvious periodicity of the original series and instead had a very rough structure. To see why this occurred, let SCO2 (f ) denote the true SDF for the CO2 measurements and let D(f ) ≡ 4 sin2 (πf ) be the squared gain function for the first order backward difference filter. Hence, the true SDF for the second-differenced series is SCO2 (f )/D2 (f ) and differs from SCO2 (f ) at every frequency! As an alternative to traditional differencing, consider the multiresolution analysis in Figure 7. By definition of the MODWPT, all wavelet details are guaranteed to have mean zero as long as
26
D1,1
D2,1
D3,1
D4,1
D5,1
D6,1
D6,0
1970
1980
1990
2000
Figure 7: MODWT-based multiresolution analysis (J = 6) of the Mauna Loa CO2 series using the MB(8) wavelet filter and standard orthonormal basis. The same vertical scale is used for the six e1,1 –D e6,1 (which each have mean zero), but not for the wavelet smooth D e6,0 . wavelet details D
the dth order backward difference of the original process is stationary. For Daubechies families of wavelet filters with even length L, they correspond to differences of order L/2. Therefore, the LA(8) wavelet filter produces mean zero wavelet details for processes whose 4th order difference is stationary. This appears to be a reasonable assumption for the Mauna Loa CO2 measurements. e6,0 , which is associated with the frequency interval λ6,0 = [0, 1/128], appears The wavelet smooth D to be capturing the time-dependent mean of the time series quite well. Hence, we may produce a low-pass filtered version of the series by summing over the first six wavelet details and ignoring 27
Estimates CIs
δˆ1 0.438 [0.425, 0.450]
fˆ1 0.080 [0.078, 0.083]
δˆ2 0.270 [0.248, 0.291]
fˆ2 0.163 [0.160, 0.167]
Table 6: Wavelet-based maximum likelihood estimates for the 2-factor SP process fit to monthly atmospheric CO2 measurements (N = 428). e 6 = P6 D e the wavelet smooth – corresponding to the wavelet rough R j=1 j,1 . Filtering in this way only affects frequencies in the range f ∈ [0, 1/128] (note, this is approximate since all compactly supported wavelet filters are approximations to band-pass filters). Using a multiresolution analysis in this way, the periodicity is well preserved in both amplitude and phase when compared to the second-differenced series, making wavelet-based filtering an interesting alternative to traditional differencing. e6,0,t is zero padded up to length N = 512 and decomposed The detrended time series Yt − D using the DWPT. All wavelet packet coefficients related to the zero padding are removed when constructing the likelihood. An appropriate orthonormal basis B is selected using a portmanteau test on the wavelet packet coefficient vectors. The wavelet variances ω ¯ j,n may then be computed via numeric integration and optimization of the concentrated likelihood proceeds. Allowing for two asymptotes in the SDF of our process, corresponding to a 2-factor SP process in (2.5), the ML estimates for the wavelet-based seasonal long-memory model are given in Table 6 and suggest the following model e6,0,t ) = ²t (1 − 1.75B + B 2 )0.44 (1 − 1.01B + B 2 )0.27 (Yt − D for the monthly atmospheric CO2 measurements. The asymptotic distributions of δˆ and fˆG under the conditional sum of squares method were worked out by Chung (1996a, 1996b). Maximizing the conditional sum of squares function is asymptotically equivalent to ML estimation, so we utilized these distributional statements to construct confidence intervals (CIs) for the parameters in Table 6. ˆ fˆG )-pair corresponds to the strong annual component in the data. This is apparent The first (δ, e3,1 , corresponding to the frequency in Figure 7 where the majority of energy is contained in D ˆ fˆG )-pair is associated with the first harmonic of the interval λ3,1 = (1/16, 1/8]. The second (δ, annual frequency and contributes less, as indicated by its smaller FD parameter. The CI for δˆ1 is smaller than the corresponding CI for δˆ2 , since the large sample variance of ML estimates of δ decrease as the Gegenbauer frequency associated with it moves away from f = 1/4 (Chung 28
1996a). When compared with the model obtained in Woodward et al. (1998), the estimated FD parameters differ while the Gegenbauer frequencies are very similar. This may be due to the fact that Woodward et al. (1998) obtained initial parameter estimates through a limited search in fourdimensional space and then increased the estimated FD parameters until the residuals were free from a 12-month cycle. The methodology presented here is ‘unrestricted’ in that the entire space may be efficiently searched through the simple representation of the wavelet-based likelihood.
7
Discussion
Research on wavelet methods for time series analysis has focused primarily on so-called long-memory processes, this is because the DWT of a long-memory process produces approximately uncorrelated wavelet coefficients within each level and between levels. By using all combinations of filtering and downsampling through the DWPT, we compute a redundant selection of wavelet coefficient vectors. Recursively performing tests for white noise, an orthonormal basis may be selected which produces an approximately uncorrelated set of wavelet coefficients. This fact is used to develop estimation techniques for the parameters of a seasonal long-memory process. Given the simplicity of regression analysis in semi-parametric estimation of long-memory models, we first investigated an OLS estimation procedure using the wavelet packet variance. It compares quite well to estimation using the classical periodogram and also a multitaper spectral estimator when explicit knowledge of the spectral density function is incorporated into the estimation. If we assume only a log-linear relationship between the spectrum and frequency, all of the procedures show inflated bias and variance depending on the specific SP process being estimated. As an alternative to OLS estimates, wavelet-based approximate maximum likelihood estimation is also proposed. When compared with the Whittle likelihood and Gaussian semi-parametric estimation, the wavelet-based procedure does as well as the Whittle likelihood when the parametric form of the spectrum is used in constructing the likelihood and outperforms GS-P estimation – especially for small sample sizes. Wavelet-based maximum likelihood also gets around the issue of frequency subset selection required by GS-P estimation since the orthonormal basis automatically adapts to the underlying spectrum. The estimation procedures presented here provide an obvious extension to those of long-memory processes. Although we have chosen to concentrate on the seasonal long-memory model, the wavelet-based procedures may be applied to time series models with piecewise continuous spec-
29
tra. We selected the seasonal long-memory model for its flexibility; e.g., short-memory parameters may be easily incorporated allowing for a much richer class of spectra to be modeled. It is hoped that these procedures open up wavelet methods for further research in time series analysis.
References Andˇel, J. (1986). Long memory time series models. Kybernetika 22 (2), 105–123. Arteche, J. and P. M. Robinson (1999). Seasonal and cyclical long memory. In S. Ghosh (Ed.), Asymptotics, Nonparametrics, and Time Series, Volume 128 of STATISTICS: Textbooks and Monographs, pp. 115–148. New York: Marcel Dekker. Arteche, J. and P. M. Robinson (2000). Semiparametric inference in seasonal and cyclical long memory processes. Journal of Time Series Analysis 21 (1), 1–25. Beran, J. (1994). Statistics for Long-Memory Processes, Volume 61 of Monographs on Statistics and Applied Probability. New York: Chapman & Hall. Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods (2 ed.). New York: Springer-Verlag. Brown, R. L., J. Durbin, and J. M. Evans (1975). Techniques for testing the constancy of regression relationships over time. Journal of the Royal Statistical Society B 37, 149–163. Chung, C.-F. (1996a). Estimating a generalized long memory process. Journal of Econometrics 73, 237–259. Chung, C.-F. (1996b). A generalized fractionally integrated autoregressive moving-average process. Journal of Time Series Analysis 17 (2), 111–140. Daubechies, I. (1992). Ten Lectures on Wavelets, Volume 61 of CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia: Society for Industrial and Applied Mathematics. Doornik, J. A. (1999). Object-oriented Matrix Programming using Ox version 2.1. West Wickham, Kent, UK: Timberlake Consultants Limited. Geweke, J. and S. Porter-Hudak (1983). The estimation and application of long memory time series models. Journal of Time Series Analysis 4 (4), 221–238. Gray, H. L., N.-F. Zhang, and W. A. Woodward (1989). On generalized fractional processes. Journal of Time Series Analysis 10 (3), 233–257. 30
Gray, H. L., N.-F. Zhang, and W. A. Woodward (1994). On generalized fractional processes – a correction. Journal of Time Series Analysis 15 (5), 561–562. Hosking, J. R. M. (1981). Fractional differencing. Biometrika 68 (1), 165–176. Hosking, J. R. M. (1984). Modeling persistence in hydrological time series using fractional differencing. Water Resources Research 20 (12), 1898–1908. Hurvich, C. M. and R. S. Deo (1999). Plug-in selection of the number of frequencies in regression estimates of the long memory parameter of long-memory time series. Journal of Time Series Analysis 20 (3), 331–341. Hurvich, C. M., R. S. Deo, and J. Brodsky (1998). The mean squared error of Geweke and PorterHudak’s estimator of a long memory time series. Journal of Time Series Analysis 19 (1), 19–46. Jensen, M. J. (1999a). An approximate wavelet MLE of short and long memory parameters. Studies in Nonlinear Dynamics and Economics 3 (4), 239–253. Jensen, M. J. (1999b). Using wavelets to obtain a consistent ordinary least squares estimator of the long-memory parameter. Journal of Forecasting 18 (1), 17–32. Lobato, I. N. (1997). Semiparametric estimation of seasonal long-memory models: Theory and application to modeling of exchange rates. Investigaciones Econ´ omicas 21 (2), 273–295. Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (7), 674–693. McCoy, E. J. and A. T. Walden (1996). Wavelet analysis and synthesis of stationary long-memory processes. Journal of Computational and Graphical Statistics 5 (1), 26–56. Morris, J. M. and R. Peravali (1999). Minimum-bandwidth discrete-time wavelets. Signal Processing 76 (2), 181–193. Ooms, M. (1995). Flexible seasonal long memory and economic time series. Technical Report 9515/A, Econometric Institute, Erasmus University. Percival, D. B. (1995). On estimation of the wavelet variance. Biometrika 82 (3), 619–631. Percival, D. B. and A. T. Walden (1993). Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques. Cambridge, England: Cambridge University Press.
31
Percival, D. B. and A. T. Walden (2000). Wavelet Methods for Time Series Analysis. Cambridge, England: Cambridge University Press. Rainville, E. D. (1960). Special Functions. New York: The Macmillan Company. Ray, B. K. (1993). Long-range forecasting of IBM product revenues using a seasonal fractionally differenced ARMA model. International Journal of Forecasting 9 (2), 255–269. Riedel, K. S. and A. Sidorenko (1995). Minimum bias multiple taper spectral estimation. IEEE Transactions on Signal Processing 43 (1), 188–195. Robinson, P. M. (1995). Log-periodogram regression of time series with long range dependence. Annals of Statistics 23 (3), 1048–1072. Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. IEEE Proceedings 70 (9), 1055–1096. Whitcher, B. (2001). Simulating Gaussian stationary processes with unbounded spectra. Journal of Computational and Graphical Statistics 10 (1), 112–134. Whitcher, B., S. D. Byers, P. Guttorp, and D. B. Percival (1998). Testing for homogeneity of variance in time series: Long memory, wavelets and the Nile River. Technical Report 9, National Research Center for Statistics and the Environment. Wickerhauser, M. V. (1994). Adapted Wavelet Analysis from Theory to Software. Wellesley, Massachusetts: A K Peters. Woodward, W. A., Q. C. Cheng, and H. L. Gray (1998). A k-factor GARMA long-memory model. Journal of Time Series Analysis 19 (4), 485–504.
32