Bootstrap-based ARMA order selection Livio Fenga∗, Dimitris N. Politis† November 4, 2009
∗ University
of California San Diego - Department of Applied Physics and Mathematics email:
[email protected] † University of California San Diego - Department of Applied Physics and Mathematics email:
[email protected]
Abstract Modeling the underlying stochastic process is one of the main goals in the study of many dynamic phenomena, such as signal processing, system identication and time series. The issue is often addressed within the framework of ARMA paradigm, so that the related task of the identication of the "true" order is crucial. As it is well known, the eectiveness of such an approach may be seriously compromised by misspecication errors since they may aect model capabilities in capturing dynamic structures of the process. As a result, inference and empirical outcomes may be heavily misleading. Despite the big number of available approaches aimed at determining the order of an ARMA model, the issue is still open. In this paper we bring the problem in the framework of bootstrap theory in conjunction with the information based criterion of Akaike (AIC ), and a new method for ARMA model selection will be presented. A theoretical justication for the proposed approach as well as an evaluation of its small sample performances, via simulation study, are given. Keywords: AIC, ARMA processes, order selection, SIEVE bootstrap.
1 Introduction Signal modeling and system identication are often carried out in the framework of ARMA processes popularized by Box and Jenkins (1970). Both theoretical and empirical results have proven the eectiveness of this class of models, under certain conditions, in capturing and adequately modeling system dynamic structures. In this context, as it is well known, parameter estimation is surely a crucial task, and much eort has been concentrated on building, developing and rening computationally ecient algorithms. In addition, inference process has strongly beneted from computer power combined with data driven strategies (such an approach is becoming more and more frequent in many areas of statistics). Also, model checking procedures appear to be straightforward nowadays thanks to the wide range of ad hoc routines available in common statistic software.
1
However, all these techniques are based on the unrealistic assumption
of an a priori knowledge of the "true" model structure. Such an assumption is particularly misleading as it cuts o the possibility of embodying model uncertainty in the inference process. In addition, true theoretical order may be innite, whereas, in practice, we normally make inference on parameters using a nite realization of the process at hand. In "real world", in fact, unknown model order must be estimated from the data and, consequently, a number of ad
hoc methods have been developed: in addition to heuristic approaches (Box Jenkins, genetic algorithm, exhaustive search algorithms), a variety of frequency based methods (parametric - non parametric polyspectra), hypothesis testing and information-based methods, among others, have been developed. The purpose of this paper is to present a method for structure determination of a Gaussian stationary ARMA model given a set of observations, based upon a bootstrap version of the Information Criterion proposed by Akaike (1973). The choice of a selector belonging to the class of information criteria is due to their simple formulation and because they are generally easy to apply, compared both with classical model selection methods (based on testing multiple hypothesis) and with other more recent and more sophisticated (partially above mentioned) approaches. 1 As pointed out by Chateld (1995), such an automated, computer assisted approach may
lead to a danger for the analyst in that he "will choose a model to t the software rather than vice versa".
1
2 Akaike Information Criterion for the selection of ARMA models Let X = {Xt , t ∈ Z} be a real 2nd order stationary process; X is said to admit a ARMA(p,q) representation (p,q ∈ Z+ ) if, for some constant a1 ....ap ,b1 ...bq , will be: p X
aj Xt−j =
j=0
q X
bj εt−j
(t ∈ Z),
(1)
a0 = b 0 = 1
j=0
under the following conditions:
© ª E ²2 (t)|Ft−1 = σ 2
E {²(t)|Ft−1 } = 0 ,
Eε4 (t) < ∞ p X j=0
aj z j 6= 0
,
p X
bj z j 6= 0 , |z| ≤ 1
j=0
Here Ft denotes the sigma algebra induced by {²(j), j ≤ t} and Pp j j=0 bj Z are assumed not to have common zeros.
Pp j=0
aj z j and
Akaike Information Criterion (henceforth AIC) is one of the most common selector for ARMA models. It is based on evaluating the "distance" between the estimated model and the true (unknown) model and dened as following:
ˆ AIC = −2 max log(L(θ|x)) + 2K
(2)
ˆ where K is the dimension of the model and (L(θ|x)) is the log likelihood function. AIC can be considered as a criterion obtained by correcting the asymptotic bias of the log likelihood of the estimated model or, according to an heuristic interpretation, as the sum of two terms: the rst is a measure of the goodness of t while the second measures the unreliability of the model (penalty for the ˆ number of model parameter).The former, is measured by −2 max log(L(θ|x)) ,
2
which decreases as the number of parameters within the model is increased, and the latter by 2K , which increases with the number of parameters.2 Unfortunately, AIC penalty term does not fulll consistency conditions (Hannan and Quinn, 1979) so that, by adding more and more structure to candidate models, it becomes negatively biased and, as a result, overtting will arise with non zero probability. AIC selection strategy, commonly called MAICE (Minimum AIC Expectation) is a procedure aimed at extracting, among the candidate models, a model order, say (ˆ p, qˆ) satisfying:
(ˆ p, qˆ) = arg
min
p≤P0 ,q≤Q0
AIC(p, q)
MAICE procedure requires the denition of an upper bound both for p and
q , (P,Q), as a maximum order a given process can reach; the choice of these two constants is a priori and arbitrary.
3 Bootstrap for dependent data Efron's original bootstrap procedure (Efron, 1979) was designed for independent and identically distributed observations so that, when applied to data displaying serial correlation, such a procedure ceases to be valid. In fact, a random resampling of the data, ignoring by denition their given order, destroys the original dependence structure and causes the statistics computed in this way to be inconsistent. Extensions to dependent data are not straightforward and a number of modications of the original procedures, aimed at preserving the dependence structure in bootstrap samples, have been proposed. Freedman (1984) (but see also Bose, 1988; Berkowitz and Kilian, 2000; Li and Maddala, 1997) proposed a model based procedure, under the assumption of i.i.d innovations in a linear autoregressive model; usually these methods reduce time dependent data to an i.i.d. structure by generating error terms with a close 2 Akaike (1973) showed that K is an asymptotic approximation of the bias term; however,
the validity of this approximation holds under the condition that at least one model, in the set of candidate models, can be a good approximation of the "true" model.
3
pattern of dependence as the real errors. In particular Efron and Tibshirani (1986) proposed to generate bootstrap series by drawing bootstrap innovations independently with replacement from the set of mean adjusted residuals. Generalizations of this approach have been proposed both by Kreiss and Franke (1992) to ARMA models and Franke and Wendel (1992) to the case of nonlinear autoregressive processes. Applications of this bootstrap method on Markov chains are also available: Kulperger and Prakasa Rao (1989), Basawa, Green, McCormick and Taylor (1989) and Neumann (1997), devised methods for nite state Markov chains whereas Paparoditis and Politis (2001) discussed a Markov chain bootstrap, without explicit nonparametric estimation of the transition probabilities. Finally, a bootstrap extension for the case of a general state space with non parametric estimates of the transition probabilities has been proposed by Rajarshi (1990). By construction, model based bootstrap loses much of its appeal because it relies on the crucial assumption of correct specication of the underlying model; simulations conducted by Chatterjee (1986) conrm poor performances of this bootstrap approach if the selected order is not correct. However, even in the case of correct structure determination, model based procedures tends to ignore the model uncertainty issue related to the model selection process. A way to overcome such a problem, the "endogenous lag order" algorithm, has been proposed by Kilian (1998). Alternatively, non-parametric, purely model free bootstrap algorithms are available; among them, the Moving block bootstrap is one of the best well known. This method has been independently proposed by Kunsch (1989) and Liu and Singh (1992) even though the idea could be traced back in Hall (1985). In this procedure, bootstrap replications are obtained by randomly resampling (with replacement) blocks of consecutive observations from the original time series and re-assembled by joining the blocks together in random order. In more details, given a set of observation xt , t = 1, 2, ..., T , blocks of data of length δ are formed (the number of the blocks is h = T − δ + 1). Let qi ≡ xi , xi+1 , ..., xi+δ−1 be the generic block , and Q ≡ q1 , q2 , ..., qh be the set of blocks samples formed by all the contiguous (moving) blocks; from
Q, h blocks are drawn b times with replacement and selected by choosing the starting points from a uniform distribution on the integers (1,..,h) and stored in a matrix WT,B so that each of the columns of W contains the pseudo data
4
b
{ x∗t }t ,
t = 1, .., T
b = 1, ..., B .
The method requires δ to grow slowly with the sample size. A modication of the previous method has been discussed by Hall (1985), Carlstein (1986) and Shi (1986). Their proposal consists in resampling from nonoverlapping blocks of increasing length. Regarding the overlapping scheme, a number of modied versions have been so far proposed; among them the circular block bootstrap proposed by Politis and Romano (1992) and the matched block bootstrap of Carlstein, Do, Hall, Kunsch and Hesterberg (1996). The idea underlying block resampling scheme is to preserve the original time series pattern within a block. Maintaining the original pattern of dependence means for the block-length to be as long as possible, on the other hand, for a good estimate of the distribution of the statistic of interest, the block should not be too long. Therefore, the blocklength is a critical tuning parameter. It is, in fact, dicult to estimate and its interpretation is not straightforward. Blockwise bootstrap shows additional drawbacks in that it often produces resampled series aected by artifacts which are caused by the random joining process. Finally, mimicking complex, long range dependence structure may result dicult to this approach especially if the original time series is not long. In order to mitigate the problems arising from the choice of the blocklength, Politis and Romano (1992) proposed the "Blocks of blocks bootstrap". Monte Carlo evidence (see Davison and Hinkley; 1997) for time series data suggests that this method is less sensitive to the choice of blocklength, than are alternative blocking methods. Furthermore, Politis and Romano (1994) proposed the "stationary block bootstrap", where the block length δ is random rather than xed. Asymptotic validity of this method for a number of important statistics has been proved (see Kunsch, 1989; Buhlmann 1994; Gotze and Kunsch, 1996). Another non parametric, quasi model free procedure is the sieve bootstrap. This method relies on the idea of sieve approximation (Grenader, 1985) for the data generating process (Xt )t²Z by a family of parametric models. Such a method was introduced by Kreiss (1992) and analyzed by Bhullman (1997), Bickel and Bhullman (1999), Choi and Hall (2000). Recent studies (Kapenatios and Psaradakis, 2006) focused on the properties of this method for processes with a long range dependence. Being made up of a two-steps procedure of the type: tting a parametric model rst and then resampling from the residuals, this procedure may be considered close to the parametric scheme. But this time
5
the residuals are obtained from a sequence of nite-dimensional autoregressive models tting a ∞-dimensional, non parametric model. More details will be given in the next Section 4.1. Finally, a hybrid approach between the sieve and moving block bootstrap, the post-blackening bootstrap, was introduced by Davison and Hinkley (1997) and studied by Srinivas and Srinivasan (2000). In this procedure dependence structures, eventually aecting the residuals coming from the AR(p)-lter, are removed by resampling them via moving block bootstrap.
3.1 The employed scheme: the sieve bootstrap Sieve bootstrap is the scheme adopted in the present work; as already pointed out even if it is based on a parametric model it can be considered a non parametric method, given the simple role played by the AR(p) model. In essence, the true underlying stationary process is tted by an autoregressive model of order p and the conditionally stationary replications are obtained starting from the residuals generated by the tting procedure. Consistency of the sieve bootstrap for statistics based on linear processes of P∞ the type: Xt = j=0 ψj εt−j , t²Z, ψ0 = 1, with fast decaying coecients
{ψj }∞ j=0 , has been discussed by Kreiss (1988, 1992) whereas further theoretical improvement has been shown by Buhlmann (1997). This method relies on the crucial assumption that the time series must be the realization of an AR(∞) process of the type:
Xt − µx =
∞ X
ϕj (Xt−j − µx ) + εt
t²Z
(3)
j=1
with µx = E[Xt ] and (εt )t²Z is a sequence of i.i.d. variables with E[εt ] = 0 P∞ and j=0 ψj2 < ∞. Among others, the class of causal ARMA model, the object of our study, is a subset of AR(∞)−process (3) whereas others are not (e.g. non
linear time series). In the Sieve Bootstrap procedure the appropriate resampling scheme generating bootstrap residuals relies on a p−order autoregressive model Pp of the type: Xt − µx = j=1 ϕt (Xt−j − µx ) + εt , t²Z with µx and (εt )t²Z satisfying the same conditions as in (3). In general, the selected order of the lter, pˆ = p(T ), can be derived by using a wide range of criteria both in time 6
and frequency domain, provided that the following condition is satised: p(T ) →
∞,
p(T ) = o(T ) as the sample size T goes to innity. The method for the
estimation of the p − vector of coecients, (ϕˆ1 , ...ϕˆpˆ) is based upon the YuleWalker equations. Sieve Bootstrap procedure is summarized in Section 5.1, step 2 to 7.
3.2 Denition of AIC* and the proposed B-MAICE method Bootstrap estimate of Kullback-Leibler information for model selection for nested models was proposed by Shibata (1997) whereas Zoubir (1999) discussed bootstrap methods for model selection in linear and non linear model. Saho (1996) focused the problem of the bootstrap model selection in the regression and autoregressive framework whereas bootstrap applications for structure determination has been proposed by Veall (1997) as a way to cope with the data mining problem (which is likely to arise when data are subject to extensive search, such in the case of model selection methods applied to a huge set of candidate models). As already pointed out in Section 2, the dimension of the model, K, is only an asymptotically unbiased correction, making AIC to be biased in its small sample performances. We propose a boostrap version of AIC, denominated AIC*, that is obtained by bootstrapping both the likelihood and the bias term of AIC. In symbols:
AIC ∗ = −2 log(L(φˆ∗ , θˆ∗ |x∗ )) + (2K)∗
(4)
with K the number of parameters, which becomes, in the case of ARMA models: 2 AIC ∗ (p, q) = −2 log σˆ²∗ (φ∗ , θ∗ |x∗ ) + (2K)∗
K =p+q
(5)
Our procedure, denominated B-MAICE, consists in tting a set of ARMA(p,q) model for each sieve bootstrap replication Xb∗
b = 1, .., B and, for each of them,
it computes the AIC by using the estimated variance of the residuals. It results in a sequence of B (as the number of the bootstrap replications) matrixes of dimension PxQ which elements are the "pseudo-AIC's" computed for each order (p = 1, 2, ..., P ; q = 1, 2, ..., Q). As explained in Section 5, Minimum AIC procedure is applied to each of these B matrixes so that a winner model is selected for each of them. The nal
7
model is chosen on the basis of its relative frequency in the set of all bootstrap replications.
4 On the asymptotic distribution of the order selected via AIC, AIC* for ARMA processes Here we assess AIC*'s model selection performances through a comparison with its non bootstrap counterpart, via Monte Carlo simulations performed on different sets of processes with varying parameters and sample sizes. However, for this comparison to be valid and in order to provide for a theoretical justication for the practical use of AIC* and the related B-MAICE procedure, we need to prove its asymptotic equivalence with AIC. In Section 4.1 we will give a brief reminder of two important results from Shibata (1976) and Hannan (1980) for the AIC asymptotic distribution of chosen order respectively in the AR(p) and ARMA(p,q) case. The extension of Shibata's results for the ARMA case seems to be complicated. Much of the complexity arises from the simultaneous presence of both autoregression and moving average parameters (among other drawbacks) making ARMA models impossible to be nested. We solved this problem by proving in Section 4.2 the asymptotic equivalence of both AIC and AIC* for the chosen orders, in the ARMA case. Such a result has been achieved by using results from Allen and Datta (1999) who drew important conclusions starting from a work of Kreiss and Franke (1992).
4.1 Two important results of Shibata and Hannan A proof of the asymptotic distribution of the order p selected by AIC for zero mean AR(p) process, {xt ,
t = 1, .., T }, is given in Shibata (1976). In the scalar
case, he showed that, letting pˆ be the minimizer of AIC(p) over p = 0, 1, . . . , P , and 0 ≤ p0 ≤ P , that the asymptotic distribution of the selected order p, will be:
( lim P (ˆ p = p) =
T →∞
0
0 ≤ p ≤ p0
πp−p0 πP0 −p
p0 ≤ p ≤ P
8
where
πt
=
( n A X Y n
0
πt
=
(1/ri !)(αi /i)
i=1
( n A X Y n
) ri
) ri
(1/ri !)(1 − αi /i)
i=1
and π0 = π00 = 1 and αi = P (χ2 > 2i), χ2 being a random variable having
χ2i distribution with i d.o.f. and the summation
PA
is taken over all groups
(r1 , r2 , .., rn ) of non negative integers satisfying r1 + 2r2 + 3r2 + ... + trn = n. For ARMA process Hannan (1980) gave a general proof of AIC asymptotic distribution with no specication of the dierent probabilities related to the corresponding chosen orders. He showed that, under the conditions explicited in (Hannan, 1980; section 1), for AIC(p,q) the following holds: limT →∞ P (ˆ p = p, qˆ = q) = πq−q0 ,Q−q , πq−q0 ,Q−q = 0 p < p, qˆ = q0 ) = 0, P = p0 limT →∞ P (ˆ lim P (ˆ p = p) = t→∞ limT →∞ P (ˆ p = p, qˆ = q0 ) = πq−q0 ,P −p , πp−p0 ,P −p = 0 lim p = p, qˆ < q0 ) = 0, Q = q0 T →∞ P (ˆ
4.2 On the asymptotic equivalence of the distribution of the orders chosen by both AIC and AIC* for ARMA models As it is well known Maximum Likelihood estimators used in the estimation of ARMA models belong to the class of M −estimators. Bootstrap asymptotic validity of this class of estimators for Arma Models has been proved by Kreiss and Franke (1992; Theorem 2.3, 4.1 and 4.2) whereas further conclusions have been drawn by Allen and Datta (1999) (A-D henceforth). In particular, A-D (Theorem 3.1) proved the convergence in distribution of the bootstrap parameters to the true vector of parameters for ARMA models. 9
f or
f or
q < q0
p < p0
Now, using this result, we can tackle the problem of asymptotic equivalence of AIC and AIC* showing the next
Theorem: Let X = (Xt )t∈Z be a 2nd order real process with stationaryinvertible ARMA(p,q) representation (p, q ∈ Z+ ) with no common zeros, and i.i.d. residuals with variance σε2 , the information criteria and their bootstrapped version have the same asymptotic distribution of the chosen orders. Remark:
We refer to Information Criteria of the type (6) and their sieve
bootstrap counterpart (7):
ˆ IC = −2 max log(L(ϕ, ˆ θ|x)) + c(p, q, T )
(6)
IC ∗ = −2 max log(L(ϕ∗ , θ∗ |x∗ )) + c(p∗ , q ∗ , T )
(7)
where the second term in both the equations is the penalty term expressed as a function of the number of parameters in non bootstrap and bootstrap world respectively and the sample size.
Sketch of proof: The standard ML estimation of ARMA parameters and their bootstrap version, obtained by applying recursively ML estimator to each bootstrap replication of the observed data, is as below:
max log(L(ϕˆML , θˆML |x))
(8)
max log(L(ϕˆ∗ , θˆ∗ |x∗ ))
(9)
As it is well known (see, among others, Anderson and Burnham; 1999), for ARMA processes a consistent estimator for (8) is a function of the estimated ARMA parameters both in non bootstrap and bootstrap world respectively, that is σ ˆ 2 (φˆM L , θˆM L ), while for the above mentioned asymptotic validity of the bootεˆ
ˆ∗ ˆ∗ strap parameters, we can employ σε2∗ ˆ∗ (φ , θ ) as estimator for (9) consistently. So we can rewrite (6) and (7) as follows: ML ML ∗ ∗ ˆ ∗ ˆ∗ IC(p, q) = −2ˆ σε2ˆ (φˆp , θˆq )+c(p, q, T ) , IC ∗ (p, q) = −2σε2∗ ˆ∗ (φp , θq )+c(p , q , T )
10
Now, for the A-D's theorem, as the sample size T goes to innity we know that:
√
T [(ϕˆM L − ϕ), (θˆM L − θ)] −→d N (0, Σ)
and
√
T [(ϕ∗ − ϕ), (θ∗ − θ)] −→d N (0, Σ)
where Σ is as dened in formula (3.10) of the theorem. Supposing that σε2 (φ, θ) is dierentiable in (φ, θ) with non vanishing rst derivative, then by applying the δ − method the following is true:
√
T [σˆ2 εˆ(ϕˆM L , θˆM L ) − σ ˆ 2 (ϕ, θ))] −→d N (0, τ 2 )
for some τ 2 > 0, so the following is veried:
√ σˆ2 εˆ(ϕˆM L , θˆM L ) = σ 2 (ϕ, θ) + Op (1/ T ) In analogous way, for the bootstrap world, will be:
√ σε∗∗ (ϕˆ∗ , θˆ∗ ) = σ 2 (ϕ, θ) + Op (1/ T ) then:
√ IC(p, q) = IC ∗ (p, q) + Op (1/ T )
(10)
The result in (10) implies that with probability tending to one, we will have:
arg min IC(p, q) = arg min IC ∗ (p, q) ∗ ∗ ˆ (ϕ,θ)
(ϕ ,θ )
This asymptotic equivalence proves the asymptotic validity of B-MAICE procedure and then our theorem.
11
5 The B-MAICE Method and empirical results Our identication procedure is iterative and consists in systematically tting increasingly order ARMA structures to the original data and the bootstrap replications. The method is exhaustive and computationally burdensome because all [(p + 1) ∗ (q + 1)] model possibilities, up to certain predened order, must be evaluated for the original time series and each bootstrap replication. In this chapter our procedure will be illustrated step by step and the results from our simulations will be given.
5.1 The procedure Let {Xt } be a real valued stationary process generated by an ARM A(p, q) with expectation E [Xt ] = µX as in equation (1). Our procedure proceeds as follows: 1. Determine a maximum order for ARMA model, (P,Q), so as it is likely to encompass the true model; 2. approximate an AR(p) model to {Xt }. The order, say pˆ0 , of the lter parameter is selected by iterative estimate of the spectral density on the residuals coming from the tting procedure of tentative autoregressive models until closeness to a constant is reached; given pˆ0 , further orders
pˆi
i = 1, 2, ... for increasing sample sizes, Ti 1/3
according to c(Ti )
i = 1, 2, ... are determined
;
3. construct the estimators of the autoregessive coecients (ϕˆ1 , ..., ϕˆpˆ) using Yule-Walker method; 4. compute the residuals
εˆ =
pˆ X
¯ + εt ϕj (Xt−j − X)
t ∈ Z,
Ψ0 = 1;
(11)
j=1
5. dene Fε , the centered empirical distribution function as follows:
Fˆε (x) = Pˆ [εt ≤ x] = (n − p)−1
n X
1[St −S≤x] ¯
t=p+1 ˆ
with St = Xt −
Pp j=1
ϕˆj Xt−j and S¯ the mean value of the available
residuals St t = pˆ + 1, ..., T ; 12
6. draw a resample of i.i.d. variables ε∗t , 1 − max; (p, q) ≤ t ≤ T , from Fˆε ; ∗ ∗ 7. generate bootstrap sample X∗T = (X1− pˆ, ..., XT ) by the recursion pˆ X
∗ ¯ = ε∗t ϕˆj (Xt−j − X)
t ∈ (ˆ p...T )
(12)
j=1
and starting values Xt∗ = 0, ε∗t = 0 for t ≤ − max(p, q) t = T + 1...2T . Here the vector of parameter is replaced by its estimate ϕˆ, and the roles of (Xt , .., Xt−p ) and (εt , εt−1 , .., εt−q ) replaced respectively by those of ∗ (Xt∗ , .., Xt−p ) and (ε∗t , ε∗t−1 , .., ε∗t−q ), for t = 1, .., T ;
8. for each bootstrap replication Xt∗ : (a) a exhaustive set of size [(P = Q) + 1)]2 of tentative ARM A(p, q) is tted recursively up to the maximum order (see previous point) chosen, so that AIC* is computed for all the "candidate" pairs (p, q), (b) from all these pairs, minimum AIC* value is extracted according to:
(ˆ p∗ , qˆ∗ ) = arg
min
p≤P0 ,q≤Q0
AIC ∗ (p, q)
and the winning model, ARM A(ˆ p∗ , qˆ∗ ), is identied accordingly; 9. out of the group of the B winning models (one for each bootstrap replication) identied in the previous step, the overall winner is chosen on the basis of its relative frequency. That is, the procedure picks the model winning the greatest number of time in the set of the bootstrap replications. More formally, the nal model m0 is selected according to the following rule:
#[AIC(m0,b ) < AIC(mj,b )]
j = 1..M − 1
, b = 1, 2, ...B .
In steps from 2 through 7 our procedure coincides with Sieve bootstrap which is in turn theoretically justied by the AR(∞) representation of ARM A(p, q) models (3). The remaining steps summarize our procedure. Step 1 shows that, unfortunately, our procedure does not solve the problem related to the arbitrary choice of the maximum ARMA order (P,Q).
13
5.2 Some remarks on the method Our procedure relies on AIC*, which is derived by lag order re-estimation of temptative models for all the pairs (p, q) (up to a prexed maximum order) for each of the bootstrap replications. As a result, B winners models are selected by applying a minimum AIC* search procedure for each bootstrap replication. Out of the group of these winners model, the nal one is chosen on the basis of its more frequent occurrence. Many reasons motivate the use of bootstrap: in fact, it is a powerful tool requiring very little in terms of modeling and assumptions. Its employment needs heavy computational supports but its application is automatic and entirely computer controlled. The resort to bootstrap techniques is completely justied (and sometimes is the only feasible choice) when the available data set is of small size and cannot support methods based on the central limit theorem. In addition, such an approach could be a way to solve the model uncertainty problem induced by standard model selection procedures. The act of model selection, in fact, may have unwanted distributional implications. For example, Potscher (1991) shows that distributions of estimators and test statistics are dramatically aected by the act of model selection. Ideally, these distortions from standard theory should not be ignored in inference. In fact, while in some cases this problem may aect inference not too seriously, extensive structure determination search involves important loss of quality in inference (Lovel, 1983). In this framework, the reduction of the number of competitive candidate ARMA models for the selection can be seen as a way to cope with model uncertainty (and computational time!). In this direction goes the strategy suggested by Broersen and de Waele (2003). They developed a model selection method on the basis of a nested sequence of ARMA(p, p-1) models. The obvious drawback of their procedure is that the best approximating ARMA model may not be among the chosen set of models. Conversely, bootstrapping the model identication procedure is a simple method of incorporating model selection uncertainty into inference since it applies the selection procedure independently to each resample (Buckland, Burnham, Augustin; 1997). Thus, extensive search and quality of inference can be achieved with our method where bootstrapping the selector means mimicking the data mining procedure on each of articial sample. It is worth pointing out that, usually, model selection is associated with parameter estimation and inference. However, inference based on the bootstrap has 14
proved to be asymptotically more accurate than method based on the Gaussian assumption. Therefore, it is preferable to use the bootstrap for both, model selection and inference applied to the selected model. This does not involve extra costs because the observations generated by the bootstrap for model selection can be used for inference. Unfortunately, bootstrap approach does not solve one big drawback with model selection: with a small or moderate number of observations models close to each other are usually hard to be distinguished since the information criteria tend to yield values close to each other as well. As a result MAICE procedure becomes instable in that a slight change of the data is likely to lead at choosing dierent models. Such a situation may determine for the selected model to produce forecast with high variability (Yang,Zou; 2001).
15
5.3 Results and discussions Our simulation study is aimed at giving empirical evidences for the following two issues: the asymptotic theory presented in (4.2) and the better performances oered by AIC* in comparison with its non bootstrap counterpart. Such an improvement is mainly due to the capabilities of AIC* in counteracting the AIC's tendency (already emphasized) to overparametrize. In order to evaluate these empirical properties a comparative simulation study based on the frequency of selection of the true model performed both by classical MAICE procedure and by our method has been carried out via Monte Carlo experiments. These experiments have been conducted on a set S = 100 simulated time series, all of them are realization of an ARMA process, whose DGP has been maintained xed throughout all the simulation process and set to p0 = 2 and
q0 = 1. The investigated parameter space, detailed in Table 1 and Table 3, is assumed to fulll both the stationarity and the invertibility conditions for ARMA model. Simulations have been carried out assuming both a constant sample size T and increasing T (T=100, 200, 500, 1000). Each of the simulated time series has been replicated B = 125 times whereas each of these replications has been subjected to exhaustive search up to ARMA order P = Q = 3 (16 models in total). We chose "low" ARMA order for both the true DGP (p = 2, q = 1) and the maximum order (P,Q) in the exhaustive search, mainly for computational reasons. The empirical study still remains valid from a practical point of view especially if we refer, for instance, to economic time series (often studied as realization of ARMA stochastic processes with orders p and q not greater than 3). The order of the sieve p − f ilter was determined according to the method mentioned in Section 5.1 (step 2), and set accordingly to: 6, 8, 10, 13 respectively for each of the considered sample sizes. All the simulations process has been implemented using the software R (6.2 version) and performed making use of the hardware resources of the University of California San Diego3 . The computational time was approximately 10 hours for moderate sample size (T=100, 200) up to 22 for T=1000. 3 In particular we made use of the computer EULER (maintained by the Mathematical
Department) and the supercomputer IBM-TERAGRID
16
5.3.1 Empirical evidence We focus rst our attention on the frequency of selection of the true model performed by both AIC* and AIC as a function of the sample size. We chose dierent combinations of the ARMA parameters and, for each of them, a set of S=500 time series have been generated. We report (see Table 1) results from four of these simulations for the following sample size: T=100, 200, 500, 1000. From this table it is possible to draw two important conclusions:
• the eectiveness of our method for small and moderate sample size, • the closeness of the two methods' performances for increasing sample sizes model model 1
ϕ1
ϕ2
ϑ1
T=100
T=200
T=500
T=1000
AIC
AIC*
AIC
AIC*
AIC
AIC*
AIC
AIC*
-0.7
-0.7
-0.50
48
64
60.2
68.6
70.2
73.8
74.2
76.6
model 2
-0.6
-0.6
+0.50
44.5
58.2
58.2
65.6
68.0
73.5
72.7
76.5
model 3
+0.9
-0.6
-0.50
42.5
52.1
55.2
61.4
62.3
67.8
66.7
69.9
model 4
-0.6
-0.7
+0.50
46.5
67.1
62.4
69.7
67.8
74.6
73.9
75.1
Table 1: Frequency (in %) of selection of the right model for dierent sample size, performed by the two methods
The asymptotic equivalence of the two methods (shown in par. 4.2 is conrmed by our empirical results, where for large values of T, small discrepancies (in average 2.4 %) has been recorded. In order to give a better idea of these conclusions, we report, in gure 1, the results shown in the Table 1. In this simulation study we also investigated the distributions of the orders picked by the two procedures for two sample sizes: T=100, 1000. We found that, for small sample size, the distributions of the chosen orders produced by both the methods tended to exhibit a pattern with some, noticeable, disclosures. In addition, wrong ARMA orders were selected (at the expenses of the right one) quite often. Conversely, for large sample size the two patterns show the same pattern and assigned probability > 0 starting from the true order (p0 , q0 ) and covering all the orders where p ≥ p0 . Such a behavior seem to be an empirical extension of Shibata's theoretical conclusions (see par. 4.1) to the ARMA case. In fact, also in this case the AR structure seems to drive the probability pattern across the temptative models. For a 17
Frequency of selection of the true model (model 1) 80
AIC* AIC
Frequency
Frequency
80
Frequency of selection of the true model (model 2)
70
60
AIC* AIC
70 60 50
50
40 100
500
1000
100
200
500
1000
Sample size
Sample size
Frequency of selection of the true model (model 3)
Frequency of selection of the true model (model 4)
70
80
AIC* AIC
AIC* AIC
Frequency
Frequency
200
60
50
70 60 50
40
40 100
200
500
1000
100
Sample size
200
500
1000
Sample size
Figure 1: Performances of the method
graphical visualization of these conclusions we show in Figure 2 the distributions of the orders chosen by the two procedures on a set of S=500 simulated time series (T=100, 1000) from a process with parameters:
ϕ1 = −0.7,
ϕ2 = −0.6,
ϑ1 = +0.5. Due to the non-nested structure of the
ARMA model, this graph shows on the X-axis, the temptative ARMA structures, from order (0,0) to order (3,3), according to the codication reported in the next Table 2 (under this convention the true model is the number 10). As usual, the Y-axis measures the relative frequencies (in %). By inspecting Figure 2 it is possible to see that, for T=100 lower and higher orders are selected and the true model has been chosen with probabilities p(AIC)=0.41 and p(AIC*)=0.55. Conversely, for T=1000, these probabilities rise to 0.78 and 0.79 respectively whereas models from 1 to 9 show a 0probability of being selected.
18
ord.AR/ord.M A
q=0
1
2
3
4
p=0
(0,0)
(0,1)
(0,2)
(0,3)
5
6
7
8
p=1
(1,0)
(1,1)
(1,2)
(1,3)
9
10
11
12
(2,0)
(2,1)
(2,2)
(2,3)
13
14
15
16
(3,0)
(3,1)
(3,2)
(3,3)
p=2 p=3
q=1
q=2
q=3
Table 2: Codication of the candidate models. In italics the number of the model and,in brackets, the corresponding ARMA parametrization
Frequency of selection of the true model (model 5, Tab. 1)
Frequency of selection of the true model (model 5, Tab. 1)
1.0
1.0 AIC* AIC
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Frequency
Frequency
AIC* AIC
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
1
3
5
7
9
11
13
15
1
Sample size=100
3
5
7
9
11
13
15
Sample size=1000
Figure 2: Order Distribution for T=100 (left graph) and T=1000 (right graph)
Furthermore, when the candidate models increase their order starting from the true model, non zero probability in the selection of model with higher AR part,
19
with respect to the true one, arises. We carried out a nal experiment where still we dealt with the frequency of selection of the correct model. This time, however, we show the experimental results and discuss their signicance focusing on the performances of the proposed method from a more practical point of view. In fact, the sample size has been kept xed and set to T=200. Such a choice is based on the fact that, in many empirical applications, economic time series of this size are generally available. We simulated data from 56 dierent combination (detailed in Table 3) of ARMA parameters and, for each of them, a set of S=125 simulated series have been generated. Regarding the choice of parameters, while dierent combinations of autoregressive parameters have been tried, the MA(1) parameter takes only the values ± 0.5 so as it shows an opposite sign with respect to the AR(1) parameter. Table 3 shows the frequency of selection of the true model performed by AIC and AIC* as well as their absolute dierence. By inspecting this table, the quality of our identication procedure can be assessed. Our method proves to perform better than its non bootstrap counterpart for each of the selected combination of parameters (the average gain is around 10%). Our method seem to emphasize AIC's better performance in the case of more structured processes. In fact, we noticed that, when the MA parameter has been increased to ± 0.75 the frequencies of selection of both the methods increased as well, but with a proportion in favour of AIC*. Finally, in gure 3, a graphical representation of the gures shown in Table 3 is presented. We would like now to give some insights about why does our method oers such performances. As already pointed out, MAICE procedure selects the best model on the basis of the minimum AIC in the set of candidate models. This fact implies that the whole procedure is based on just one realization of the minimum AIC found. Such an approach embodies at least two sources of uncertainty: one related to the inference procedure on the vector of ARMA parameters and the others arising from the intrinsic structure of the data that might lead at having two or more candidate models showing AIC's values very close to each other. According to the already mentioned Shibata's theorem and our conclusions, the minimum AIC is a random variable with a χ2 distribution driven by the AR part, such that a considerable amount of probability of selection is spread over the P-1 overestimated orders (with P the maximum order considered).
20
Basing the selection of the winning model on just one decision (one realization of minimum AIC) as in MAICE procedure, is likely to lead to increasing the probability of choosing the wrong order. On the other hand, our B-MAICE procedure, by repeating B times the structure determination procedure, generates a set of B independent decisions, so that the winning model is selected according to its most frequent occurrence. Such a frequency-based approach, by reducing the amount of uncertainty present in the standard MAICE procedure, increases the probability of selection of the "true" model.
5.3.2 Conclusions In this paper we proposed a new method, dened AIC*, for order selection of casual ARMA models based on a bootstrap version of the Akaike Information Critrion (AIC). We showed the asymptotic equivalence of this method with its non-bootstrap counterpart from a theoretical point of view. Monte Carlo evidences corroborated the following point: simulations based on the frequency of selection of the true model performed by the standard MAICE procedure and our BMAICE method show that as the sample size increases the probability of choosing the right model becomes approximately equal. Finally better small sample performances of our method have been evaluated via Monte Carlo simulations: all the available results show that the proposed method is able to choose the correct order more frequently and to give lower orders estimations. The last feature seems to be of interest in that it shows the capability of the method in counteracting the AIC "natural" tendency of overestimating the true ARMA order.
21
ϕ1
ϕ2
ϑ1
AIC AIC* % di
-0.55
+0.5 49.6
-0.35
60
ϕ1
ϕ2
ϑ1
AIC AIC* % di
10.4
-1.75
+0.5
64
73.6
9.6
+0.5 51.2 60.8
9.6
-1.55
+0.5 69.6 82.4
12.8
0.35 -0.45 -0.5 51.2 61.6
10.4
-1.35
+0.5 65.6 72.8
7.2
0.95
-0.5 33.6 42.4
8.8
-0.95
+0.5 70.4 78.4
1.15
-0.5
34.4
6.4
-0.75
+0.5
80.8
8.8
-1.05
+0.5 54.4 60.8
6.4
-0.55
+0.5 75.2 82.4
7.2
-0.85
28
72
8
+0.5 55.2 62.4
7.2
-0.35 -0.85 +0.5 66.4 78.4
12
-0.65 -0.55 +0.5 60.8 70.4
9.6
0.45
-0.5 72.8 80.8
8
0.25
-0.5 57.6 66.4
8.8
0.65
-0.5 69.6 78.4
8.8
0.65
-0.5 60.8
7.2
0.85
-0.5 71.2
8.8
68
80
1.05
-0.5 39.2 53.6
14.4
1.25
-0.5 66.4 76.8
10.4
-1.35
+0.5 49.6 57.6
8
1.45
-0.5
74.4
10.4
-0.35
+0.5 63.2 70.4
7.2
1.65
-0.5 64.8 71.2
6.4
0.25
-0.5 69.6
76
6.4
-1.85
+0.5 56.8 70.4
13.6
0.45 -0.65 -0.5 59.2
68
8.8
-1.45
+0.5 52.8 62.4
9.6
64
0.65
-0.5
68
77.6
9.6
-1.25
+0.5 71.2
80
8.8
0.85
-0.5
64
69.6
5.6
-1.05
+0.5
80
8
1.05
-0.5 63.2 71.2
8
-0.85
+0.5 74.4
88
13.6
1.45
-0.5 48.8 60.8
12
-0.65
+0.5 77.6 86.4
8.8
-1.25
+0.5 60.8 67.2
6.4
-0.45
+0.5 72.8 84.8
12
-0.65
+0.5 66.4 74.4
8
-0.45
+0.5 60.8 70.4
9.6
0.25
-0.5 65.6
-0.25
+0.5 68.8 77.6
8.8
0.45
-0.5 76.8 85.6
8.8
8
0.65
-0.5 66.4 88.8
22.4
0.25 -0.75 -0.5 68.8 76.8
-0.25 -0.95 +0.5
72
76
88
12
80
14.4
1.05
-0.5 65.6 74.4
8.8
0.85
-0.5 73.6 87.2
13.6
1.25
-0.5 60.8 78.4
17.6
1.05
-0.5 66.4 79.2
12.8
1.45
-0.5
71.2
7.2
1.25
-0.5 66.4 80.8
14.4
1.65
-0.5 49.6
60
10.4
1.85
-0.5 58.4 71.2
12.8
64
Table 3: Performance of the two methods for varying values of the ARMA model
22
40
−0.5 −0.6
20
−0.7 −0.8 −0.9 −1.0 −2
−1
0
1
2
phi1
Figure 3: Performances of the method
23
phi2
80 60
−0.4
0
frequency of selection (%)
100
Frequency of selection for different combinations of AR parameters with MA parameter= − 0.5 (black), + 0.5 (red)
References
H. Akaike. A new look at the statistical model identication. IEEE Trans. Automatic Control, (AC-19):716723, 1974. H. Akaike. Statistical predictor identication. Ann. Inst. Statist. Math., (22): 203217, 1977. H. Akaike. Fitting autoregressive models for prediction. Ann. Inst. Statist. Math, (21):243247, 1970a. M. Allen and S. Datta. A note on bootstrappimg m-estimators in arma models. Journal of Time Series Analysis, (20)(4):365379, 1999. A. M. Alonso, D. Pena, and J. Romo. Introducing model uncertainty in time
series bootstrap. Statistica Sinica, (14):155174, 2004. F. Battaglia. Metodi di previsione statistica. Springer-Verlag Italia, 2007. F. Battaglia. Selection of a linear interpolator for time series. Statistica Sinica, (3):255259, 1993. J. Berkowitz and L. Kilian. Recent developments in bootstrapping time series. Econometric Reviews, 19(1):148, 2000. R. J. Bhansali and D. Y. Downham.
Some properties of the order of an
autoregressive model selected by a generalization of akaike's efp criterion. Biometrika, 3(64):547 551, 1977. G. E. P. Box and G. M. Jenkins. Time series analysis: forecasting and control. San Francisco: Holden Day, 1970.
24
P. M. Broersen and S. de Waele. Finite sample propierties of arma order selec-
tion. IEEE Transactions on instrumental and measurement, 53(3):645651, 2004. S. T. Buckland, K. P. Burnham, and N. H. Augustin. Model selection: An
integral part of inference. Biometrics, 53(2):603618, 1997. P. Buhlmann. Sieve bootstrap for time series. Bernoulli, (3):123148, 1997. E. Carlestein, Do, P. Hall, T. Hesterberg, and H. R. Kunsch. Matchedblock
bootstrap for dependent data. Bernoulli, (4):305328, 1998. C. Chateld. Model uncertainty and forecast accuracy. Journal of Forecasting, 15(7):495508. A. C. Davison and D. V. Hinkley. Bootstrap methods and their application. Cambridge Univ. Press, 1997. B. Efron. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1):126, 1979. B. Efron and R. Tibshirani. Bootstrap methods for standard errors, condence
intervals, and other measures of statistical accuracy. Statist. Sci., (1):5477, 1986. J. Franke and M. Wendell. A bootstrap approach for nonlinear autoregression
- some preliminary results. Bootstrapping and related techniques. Lectures Notes in Econom. and Math. Systems 376. Berlin:Springer Verlag. E. J. Hannan and B. G. Quinn. The determination of the order of an autore-
gression. J. R. Statistic. Soc., (41):190195, 1979. C. M. Hurvich and C. Tsai. Regression and time series model selection in small
samples. Biometrika, 76(2):297307, 1989. C. K. Ing and C. Z. Wei. Order selection for samerealization predictions in
autoregressive processes. The Annal of Statistics, 23(5):24232474, 2005. S. Konishi and G. Kitagawa. Generalised information criteria in model selection. Biometrika, 83(4):875890, 1996.
25
S. G. Koreisha and T. Pukkila.
A comparison between dierent order-
determination criteria for identication of arima model. American Statistical Association, 13(1):127131, 1995. J. P. Kreiss. Bootstrap procedures for ar(innity)processes. Journal of time series analysis, 13(4):297317, 1992. J. P. Kreiss and J. Franke. Bootstrapping stationary autoregressive moving-
average models. Journal of time series analysis, 13(4):297317, 1992. S. N. Neftci. Specication of economic time series models using akaike's crite-
rion. Journal of the American Statistical Association, 77(379):, 1982. T. Ozaki. On the order determination of arma models. Applied Statistics, 26 (3):290301, 1977. W. Pan. Bootstrapping likelihood for model selection with small samples. Journal of computational and graphical statistics, 8(4):687698, 1999. D. N. Politis. Computer-intensive methods in statistical analysis. IEEE Signal Proc. Magazine, 15(1):3955, 1998. D. N. Politis. Resampling time series with seasonal components. Frontiers in Data Mining and Bioinformatics: Proceedings of the 33rd Symposium on the Interface of Computing Science and Statistics, Orange County, California, June 13-17, 2001, pages 619621, 2001. D. N. Politis and J.P.Romano. A circular block resampling procedure for sta-
tionary data. In exploring the limit of bootstrap (R. Lepage and L. Billard, eds). pages 263270, 1996. D. N. Politis and J.P.Romano. The stationary bootstrap. J. Amer. Statist. Assoc., 15(1):3955, 1998. D. N. Politis and H. White. The impact of bootstrap methods on time series
analysis. Statistical Science, 18(2):219230, 2003. T. Pukkila, S. Koreisha, and A. Kallinen. The identication of ARMA model. Biometrika, 77(3):537548, 1990. J. Shao. Bootstrap model selection. Journal of American Statistical Association, 91(434):655665, 1996. 26
R. Shibata. Asymtotically ecient selection of the order of the model for esti-
mating parameters of a linear process. The Annals of Statistics, 8(1). R. Shibata. Bootstrap estimate of kullback-leibler information for model selec-
tion. Statistica Sinica, (7):375394, 1997. R. Shibata. Selection of the order of an autoregressive order by Akaike infor-
mation criterion. Biometrika, (63):117126, 1976a. R. Shibata. Consistency of model selection and parameter estimation. Journal of Applied Probability, (23):127141, 1986b. R. Shibata. Selection of the order of an autoregressive model by akaike infor-
mation criterion. Biometrika, (63):117126, 1976b. M. R. Veall. Bootstrapping the process of model selection: an econometric ex-
ample. Journal of applied econometrics, (7):9399, 1992. A. M. Zoubir. Model selection: a bootstrap approach. 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings, (3):13771380, 1999.
27