multi-stage kernel-based conditional quantile prediction in ... - CiteSeerX

1 downloads 0 Views 244KB Size Report
Jan G. De Gooijer,1 Ali Gannoun,2 and Dawit Zerom3. 1Department of Economic Statistics,. University of Amsterdam, Roetersstraat 11,. 1018 WB Amsterdam ...
COMMUN. STATIST.—THEORY METH., 30(12), 2499–2515 (2001)

MULTI-STAGE KERNEL-BASED CONDITIONAL QUANTILE PREDICTION IN TIME SERIES Jan G. De Gooijer,1 Ali Gannoun,2 and Dawit Zerom3 1

Department of Economic Statistics, University of Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands E-mail: [email protected] 2 Laboratoire de Probabilite´s et Statistique, Universite´ Montpellier II, Place Euge`ne Bataillon, 34095 Montpellier Ce´dex 5, France 3 Tinbergen Institute and Department of Economic Statistics, University of Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands

ABSTRACT We present a multi-stage conditional quantile predictor for time series of Markovian structure. It is proved that at any quantile level, p 2 ð0, 1Þ, the asymptotic mean squared error (MSE) of the new predictor is smaller than the single-stage conditional quantile predictor. A simulation study confirms this result in a small sample situation. Because the improvement by the proposed predictor increases for quantiles at the tails of the conditional distribution function, the multi-stage predictor can be used to compute better predictive intervals with smaller variability. Applying this predictor to the 2499 Copyright & 2001 by Marcel Dekker, Inc.

www.dekker.com

2500

DE GOOIJER, GANNOUN, AND ZEROM

changes in the U.S. short-term interest rate, rather smooth out-of-sample predictive intervals are obtained. Key Words: Conditional quantile; Kernel; Mean squared error; Multi-stage predictor; predictor; Time series

Markovian; Single-stage

1. INTRODUCTION Over the past two decades there has been a growing interest in prediction methods that can accommodate a broad class of time series and apply when the usual assumptions of linearity and Gaussianity no longer hold. In this respect, nonparametric methods are of interest because they do not rely heavily on a priori time series assumptions and instead base statistical inference mainly on data; see, e.g., Ha¨rdle et al. (1) for a review. While nonparametric prediction using the conditional mean has dominated the literature; see, e.g., Eubank (2) and Ha¨rdle (3), recently attention has also focussed on nonparametric estimation of other features of the conditional distribution. For example, Matzner-Løber et al. (4) and De Gooijer and Zerom (5) applied kernel estimates of the conditional median and the conditional mode to real time series. In this paper the problem of quantile based multi-step prediction is considered for Markovian type time series processes. One unattractive feature of most nonparametric quantile prediction methods is that, when making more than one-step ahead predictions, not all the information contained in the past is used. Thus a substantial loss in prediction accuracy is likely to occur. To deal with this shortcoming, we directly exploit the Markovian property of the time series. It turns out that one or more of the unused data can be easily incorporated in a recursive manner while improving prediction efficiency. Motivated by this recursion idea, we propose a multi-stage kernel smoother for conditional quantiles. We also show theoretically that the asymptotic performance of the new predictor is superior to the corresponding single-stage conditional quantile estimator in terms of mean-squared error (MSE). The remainder of the paper is structured as follows. In Section 2 we clarify the difference in information content used by the estimators of the single-stage and the multi-stage conditional quantiles. Section 3 contains the main result stating that the estimator of the two-stage conditional quantile has a smaller asymptotic MSE than the estimator of the single-stage conditional quantile. Empirical comparison of the single-stage and multistage predictors of the conditional quantile via a simulation study is carried

QUANTILE PREDICTION IN TIME SERIES

2501

out in Section 4. In Section 5 we evaluate the two prediction approaches in an application to the changes in the U.S. monthly interest rate series. Conditions and proofs are collected in the Appendix.

2. SINGLE-STAGE VERSUS MULTI-STAGE PREDICTION Let fWt ; t51g be a strictly stationary real-valued Markovian process of order m, i.e. LðWt jWt1 , . . . , W1 Þ ¼ LðWt jWt1 , . . . , Wtm Þ where L denotes the law. From the set of observations W1 , . . . , WN , we are interested in making predictions of WNþH where H ð1 4 H 4 N  mÞ denotes the prediction horizon. For that purpose, we construct the associated strictly stationary Rm R-valued process ðXt , Zt Þ defined by Xt ¼ ðWt , Wtþ1 , . . . , Wtþm1 Þ,

Zt ¼ WtþHþm1 ,

t 2 N:

ð2:1Þ

Let fðXt , Zt ; t51Þg be a sequence of Rm R–valued strictly stationary random variables with common probability density function with respect to the Lebesgue measure mþ1 on Rmþ1 . Further, we suppose that the conditional distribution function of Zt given Xt ¼ x, Fð:jxÞ, has a unique quantile of order p 2 ð0, 1Þ at a point qðxÞ, defined by FðqðxÞjxÞ ¼ p:

ð2:2Þ

Now, given the observations ðX1 , Z1 Þ, . . . , ðXn , Zn Þ, where n ¼ N H  m þ 1, an estimator qn ðxÞ of qðxÞ can be defined as the root of the equation Fn ðjxÞ ¼ p where Fn ð:jxÞ is an estimator of Fð:jxÞ. Thus a predictor of the pth conditional quantile of WNþH is given by qn ðXNmþ1 Þ. Of course, in practice a nonparametric estimate of the conditional distribution function is needed. It is known that estimating FðjxÞ, with  2 R, can be regarded as a nonparametric regression problem, i.e. FðjxÞ ¼ Eð1fZt 4 g jXt ¼ xÞ. Accordingly Collomb (6) defined the following empirical kernel-based estimate FðjxÞ, Pn Kfðx  Xt Þ=hn g1fZt 4 g ~ Pn , ð2:3Þ Fn ðjxÞ ¼ t¼1 t¼1 Kfðx  Xt Þ=hn g where 1fAg denotes the indicator function for set fAg, Kð:Þ is a nonnegative density function (kernel), and hn a smoothing parameter called the bandwidth. Other kernel smoothers which are always a distribution function like (2.3) but with better bias characteristics have also been proposed in the literature.

2502

DE GOOIJER, GANNOUN, AND ZEROM

We shall refer to the solution of the equation F~n ðjxÞ ¼ p:

ð2:4Þ

as the single-stage conditional quantile predictor and denote this by q~ n ðxÞ. Note that the conditional quantile predictor in (2.4) uses only the information in the pairs ðZt , Xt Þ ðt ¼ 1, . . . , nÞ and ignores the information contained in Ytð1Þ ¼ Xtþ1 ,

Ytð2Þ ¼ Xtþ2 , . . . , YtðH1Þ ¼ XtþðH1Þ :

ð2:5Þ

Below we illustrate the impact of the data contained in (2.5) on multi-step prediction accuracy. Let g1 ðyÞ ¼ Eð1fZt 4 g jYtðH1Þ ¼ yÞ. For j ¼ 2, . . . , H  1, also define gj ðyÞ ¼ Eðgj1 ðYtðHð j1ÞÞ ÞjYtðHjÞ ¼ yÞ. It is well known that for a pair of random variables ðB, CÞ, VarðCÞ ¼ E½VarðCjBÞ þ Var½EðCjBÞ . Hence, Var½gj ðYtðHjÞ Þ ¼ Var½Eðgj ðYtðHjÞ ÞjYtðHj1Þ þ E½Varðgj ðYtðHjÞ Þj YtðHj1Þ . But, for j ¼ 1, . . . , H  2, we have gjþ1 ðYtðHj1Þ Þ ¼ Eðgj ðYtðHjÞ Þj YtðHj1Þ Þ. Thus h  i h  i Var gjþ1 YtðHj1Þ 4 Var gj YtðHjÞ : ð2:6Þ Similarily, it is also easy to see that h  i    Var g1 YtðH1Þ jXt ¼ x 4 Var 1fZt 4 g jXt ¼ x :

ð2:7Þ

Now, directly exploiting the Markovian property of Wt , we can rewrite Eð1fZt 4 g jXt ¼ xÞ in such a way that the information in (2.5) is incorporated, i.e.      E 1fZt 4 g jXt ¼ x ¼ E g1 YtðH1Þ jXt ¼ x ,     ¼ E g2 YtðH2Þ jXt ¼ x , ð2:8Þ .. .     ¼ E gH1 Ytð1Þ jXt ¼ x : Observe that as we go down each line in (2.8) more and more information is utilized. Recalling the two previous inequalities, (2.6) and (2.7), we can see that as more information is used, the prediction variance gets smaller and hence prediction accuracy, in terms of MSE improves. Thus, at least in theory, it pays off to use all the ignored information. Based on the above recursive set-up, we now introduce a kernel-based estimator of FðjxÞ. First the estimators of g1 ð yÞ and gj ð yÞ,

QUANTILE PREDICTION IN TIME SERIES

2503

( j ¼ 1, . . . , H  2) are defined, respectively, as follows. n  o Pn y  YiðH1Þ =h1, n 1fZi 4 g i¼1 K n  o Stage 1: g^ 1 ð yÞ ¼ , Pn ðH1Þ K y  Y =h 1, n i i¼1 n  o   Pn ðHjÞ y  Ys =hj, n g^ j1 YsðHðj1ÞÞ s¼1 K n  o Stage j : g^ j ð yÞ ¼ : Pn y  YsðHjÞ =hj, n s¼1 K Then, using g^ H1 ð yÞ, we compute F^ ðjxÞ by Pn Kfðx  Xk Þ=hH, n gg^ H1 ðYkð1Þ Þ : Stage H : F^n ðjxÞ ¼ k¼1Pn k¼1 Kfðx  Xk Þ=hH, n g

ð2:9Þ

We shall refer to the root of the equation F^n ðjxÞ ¼ p as the multi-stage p-conditional quantile predictor q^ n ðxÞ. The above procedure is easy to implement. A MATLAB code is available upon request from the authors.

3. ASYMPTOTIC MSEs Before comparing the asymptotic MSE of the multi-stage conditional quantile smoother with the asymptotic MSE of the single-stage conditional quantile smoother, we first present some results on the asymptotic properties of conditional quantiles. To this end, we assume for simplicity of notation that H ¼ 2, m ¼ 1. From the process ðWi Þ, let us construct the associated process ðXi , Yi , Zi Þ defined by Xi ¼ Wi ,

Yi ¼ Yið1Þ ¼ Wiþ1 ,

Zi ¼ Wiþ2 :

Assume that the R R R-valued ðXi , Yi , Zi Þ is a sequence of independent and strictly stationary random vectors with the same distribution as a vector ðX, Y, ZÞ defined on a probability space ðO, F , PÞ. We suppose that the conditional distribution function Fð:jxÞ of Z given X ¼ x, where x 2 R, admits a unique conditional quantile (of order p 2 ð0, 1Þ) at a point qðxÞ. We also suppose that the random variables ðX, YÞ (respectively ðY, ZÞ) has a joint density pX, Y ð:, :Þ (respectively pY, Z ð:, :Þ). Let pX ð:Þ, pY ð:Þ and pZ ð:Þ be the marginal densities of X,Y, and Z, and pZjX ð:jxÞ ¼ pX, Z ðx, :Þ=pX ðxÞ be the conditional density function. Furthermore, we define @iþj FðtjsÞ dpX ðxÞ : , and pð1Þ F ði, jÞ ðtjsÞ ¼ X ðxÞ ¼ dx @si @tj

2504

DE GOOIJER, GANNOUN, AND ZEROM

Given the above set-up, and using the assumptions given in the Appendix, the following Lemma and Theorems can be stated. Lemma: (Collomb (6); Hall et al. (7)): Let ðX, ZÞ be R2 -valued random 2 variables. For  2 R, define  ð, xÞ ¼ Varð1fZ 4 g jX ¼ xÞ. Assume that Assumptions (A.1)–(A.3) given in the Appendix are satisfied. If nhn ! 1, then  1 h4 D ð, xÞ 1  þ o h4n þ EfF~n ðjxÞÞ  FðjxÞg2 ¼ D2 ð, xÞ þ n 1 ð3:1Þ nhn 4 nhn where ( D1 ð, xÞ ¼ D2 ð, xÞ ¼

k2  2 ð, xÞ , pX ðxÞ

F

ð2, 0Þ

)2 2F ð1, 0Þ ðjxÞpð1Þ X ðxÞ ðjxÞ þ , pX ðxÞ

k21

and whereR k1 and k2 are constants defined respectively as k1 ¼ and k2 ¼ R K 2 ðuÞdu.

R R

u2 KðuÞdu

Theorem 1: Assume that Assumptions (A.1)–(A.5) given in the Appendix are satisfied. If nhn ! 1 as n ! 1, then for all x 2 R where pZjX ðqðxÞjxÞ 6¼ 0, the asymptotic pointwise MSE of q~ n ðxÞ is given by ! 1 h4n D1 ðqðxÞ, xÞ 1 2 þ Efq~ n ðxÞ  qðxÞg ¼ 2 D ðqðxÞ, xÞ 4 nhn 2 pZjX ðqðxÞjxÞ   1 4 þ o hn þ : ð3:2Þ nhn Furthermore, under the same assumptions mentioned above and if D1 ðqðxÞ,xÞ 6¼ 0, the asymptotically optimal value hn , say of hn , minimizing (3.2) is given by   D2 ðqðxÞ, xÞ 1=5 1=5 hn ¼ n , D1 ðqðxÞ, xÞ and the corresponding best possible MSE of q~ n ðxÞ is given by MSE fq~ n ðxÞg ’

5n4=5 D4=5 ðqðxÞ, xÞD1=5 1 ðqðxÞ, xÞ: 4p2ZjX ðqðxÞjxÞ 2

ð3:3Þ

Remark 1: Theorem 1 is a reformulation of Theorem 5.1 of Berlinet et al. (8) established for double kernel smoothing estimator. Similar result can be found in Jones and Hall (9). Details of the proof are left to the reader.

QUANTILE PREDICTION IN TIME SERIES

2505

The main result of the paper is stated as follows. Theorem 2: Assume that Assumptions (A.1)–(A.5) given in the Appendix are satisfied, and that nh1, n ! 1 as n ! 1 and h1, n ¼ oðh2, n Þ. For  2 R, let v1 ð, xÞ ¼ Varðg1 ðYÞjxÞ (g1 ðYÞ is as defined in Section 2). Then for all x 2 R such that pZjX ðqðxÞjxÞ 6¼ 0, the asymptotic MSE of the two-stage estimator q^ n ðxÞ is given by ! h42, n D1 ðqðxÞ, xÞ 1 1 2 þ D ðqðxÞ, xÞ Efq^ n ðxÞ  qðxÞg ¼ 2 nh2, n 3 4 pZjX ðqðxÞjxÞ  1  þ o h42, n þ , nh2, n where v ðqðxÞ, xÞ : D3 ðqðxÞ, xÞ ¼ k2 1 pX ðxÞ Furthermore, it follows that the asymptotically optimal value h2, n , say, of h2, n is given by   D3 ðqðxÞ, xÞ 1=5 1=5  h2, n ¼ n , D1 ðqðxÞ, xÞ and the corresponding best possible MSE is MSE fq^ n ðxÞg ’

5n4=5 D4=5 ðqðxÞ, xÞD1=5 1 ðqðxÞ, xÞ: 4p2ZjX ðqðxÞjxÞ 3

Corollary: Let v2 ð, xÞ ¼ E½Varð1fZ 4 g jYÞjx . Then under the assumptions of Theorems 1 and 2, the ratio of the asymptotic best possible MSEs of the single-stage estimator q~ n ðxÞ and the two-stage estimator q^ n ðxÞ is given by   v2 ðqðxÞ, xÞ 4=5 rðqðxÞ, xÞ ¼ 1 þ 51: v1 ðqðxÞ, xÞ Remark 2: Note that the asymptotic results are insensitive to the choice of the bandwidth h1, n , provided nh1, n ! 1 and h1, n ¼ oðh2, n Þ. Remark 3: It can be noticed from the proof of the Corollary that the asymptotic MSE of the two-stage smoother is smaller because v1 ðqðxÞ, xÞ 4  2 ðqðxÞ, xÞ; see also the discussion in Section 2. Remark 4: It can easily be verified that  2 ðqðxÞ, xÞ ¼ pð1  pÞ. Thus we express the asymptotic ratio rðqðxÞ, xÞ as a function of p: rðqðxÞ, xÞ ¼ fpð1  pÞ=ðpð1  pÞ  v2 ðqðxÞ, xÞÞg4=5 . Note that v2 ðqðxÞ, xÞ 4 pð1  pÞ.

2506

Figure 1.

DE GOOIJER, GANNOUN, AND ZEROM

Ratio of asymptotic best possible MSEs (r) versus the quantile level p.

Figure 1 shows a plot of r versus p ð0:1 4 p 4 0:9Þ for, say v2 ¼ 0:05. Clearly r increases sharply as we go to edge of the conditional distribution. This illustrates theoretically that the improvement achieved by the multi-stage conditional estimator is more pronounced for quantiles in the tail of the conditional distribution. Remark 5: The asymptotic results in Theorem 2 and the Corollary can be shown to hold when the observations are dependent. Proofs can be obtained under some assumptions on mixing coefficients and by replacing classical inequalities by Bernstein inequalities for mixing processes; see also De Gooijer et al. (10).

4. PRACTICAL PERFORMANCE We have shown that the multi-stage conditional quantile estimator q^n ðxÞ has a better prediction performance than the single-stage conditional quantile estimator q~n ðxÞ in terms of asymptotic MSE. In this section a simulated example is used to illustrate the finite sample performance of the new predictor. Note from Section 3 that the optimal bandwidth for both predictors depends on p. Thus the amount of smoothing required to estimate different parts of Fð:jxÞ may differ from what is optimal to estimate the whole conditional distribution function. This is particularly the case for the tails of Fð:jxÞ. Therefore, a unique bandwidth is chosen for the computation of each p-conditional quantile. To this end, the following practical approach is employed. First a primary bandwidth, suitable for

QUANTILE PREDICTION IN TIME SERIES

2507

the conditional mean estimation, is selected. Then it is adjusted according to the following rule-of-thumb hn ¼ hmean ½fpð1  pÞg=fð1 ðpÞÞ2 g 1=ðmþ4Þ where hmean is the optimal bandwidth for the conditional mean.  and  are the standard normal density and distribution functions, respectively. The above approach is appropriate for the single-stage predictor q~ n ðxÞ. However, for the multi-stage predictor several values of the bandwidth need to be selected. For simplicity, we fix the bandwidth at the last prediction stage, say hH, n , at the optimal value of the single-stage estimator hn . The bandwidths in the intermediate stages are scaled downward arbitrarily vis-a-vis hH, n . This is in accordance with the theory of Section 3. Different options such as hH, n , hH, n =5, hH, n =10, and hH, n =20 were tried and the last three seem to give more or less similar results. Hence only results for hH, n =5 are reported. The standard Gaussian kernel is used throughout all computations. Consider the simple, Markovian type, nonlinear autoregressive model of order 1 Zt ¼ 0:23Zt1 ð16  Zt1 Þ þ 0:4"t ,

ð4:1Þ

where f"t g is a sequence of iid random variables each with the standard normal distribution truncated in the interval [12,12]. The objective is to estimate 2- and 5-steps ahead p-conditional quantiles using both q^ n ðxÞ and q~ n ðxÞ and compare their prediction accuracy. Predictions will be made at p ¼ 0:25, p ¼ 0:50, and p ¼ 0:75. The conditional density of ZtþH given Zt ¼ x will be examined at x ¼ 6, x ¼ 8, and x ¼ 10. Clearly a proper evaluation of the accuracy of q^ n ðxÞ and q~ n ðxÞ requires knowledge about the ‘‘true’’ conditional quantile qðxÞ. This information is obtained by generating 10,000 independent realizations of ðZtþH jZt ¼ xÞ (H ¼ 2, and 5) iterating the process (4.1) and computing the appropriate quantiles from the empirical conditional distribution function of these generated observations. From (4.1), 150 samples of sample size n ¼ 150 were generated. Each replication had a unique seed. To compare the accuracy of the predictors q^ n ðxÞ and q~ n ðxÞ with qðxÞ, the following error measures are computed for each replication j ( j ¼ 1, . . . , 150Þ: fq~ jn  qðxÞg2 fq^ j ðxÞ  qðxÞg2 and ejq^n ðxÞ ¼ n : 2 qðxÞ qðxÞ2 Then percentile values are computed from the empirical distributions of the 150 replication samples, i.e. from ejq~n ðxÞ and ejq^n ðxÞ . The graphs a)–c) in Figure 2 show that the percentiles of the squared errors from the 2-stage and 5-stage predictions (solid line) lie overall below the corresponding percentiles of the squared errors from the single-stage ejq~n ðxÞ ¼

2508

DE GOOIJER, GANNOUN, AND ZEROM

Figure 2. a)–c) Percentile plots of the empirical distribution of the squared errors for model (13) for the single-stage predictor q~ n ðxÞ (medium dashed line) and the multi(¼two)-stage stage predictor q^ n ðxÞ (solid line); d)–f) Box-plots corresponding to the percentile plots a)–c), respectively; n ¼ 150, 150 replications.

QUANTILE PREDICTION IN TIME SERIES

2509

predictions (medium dashed line). This implies that the conditional quantile predictions made by q^ n ðxÞ are more accurate than those made by q~ n ðxÞ. Now consider the box-plots d)–f) in Figure 2. It is clear from these plots that the multi-stage predictor has a much smaller variability while its bias is nearly the same as that of the single-stage estimator. This confirms the theoretical result in Section 3. Similar box-plots were also obtained for other combinations of H, p, and x.

5. APPLICATION Here we apply the multi-stage and single-stage conditional quantile predictors to obtain 6-step ahead out-of-sample prediction intervals for the monthly U.S. short-term interest rate, i.e. the yield on U.S. Treasury Bills with three months to maturity. The time series contains 348 monthly observations from January 1966 to December 1994. The data were obtained from the Internet at the website: www.bog.fed.us/releases/h15/data.htm. The first difference of the original series (after taking logarithms), denoted by Wt , will be used in our analysis with a total of 347 observations. Our motivation for using this series has grown out of recent work on predicting weekly U.S. T-bill rates; see De Gooijer and Zerom (3). Using the notations of Section 2, Xt ¼ Wt and Zt ¼ Wtþ6 where t ¼ 1, . . . , N  6, and N is the index of the prediction base. In this example, a nominal coverage probability of 0.80 is considered, i.e. ½qn ðx, 0:1Þ, qn ðx, 0:9Þ where x ¼ XN . Note that qn ðx, 0:1Þ and qn ðx, 0:9Þ are respectively the 6-step ahead 10th- and 90th-conditional quantiles. As in Section 4, we choose h6, n such that h6, n ¼ hn where hn is the optimal value of the single-stage estimator. In the intermediate stages, the theory requires the bandwidths be smaller than h6, n . In order to have some idea on how smaller they should be, we compute in-sample 6-step ahead predictions at various levels of undersmoothing (i.e., h6, n =5, h6, n =10, h6, n = 15, . . . , h6, n =70). By in-sample, it is meant that x is contained in Xt . Among various bandwidths considered, the choice: h1, n ¼ h6, n =45 and h2, n ¼ h3, n ¼ h4, n ¼ h5, n ¼ h6, n =35 seems to yield multi-stage quantile estimates (see Figure 3) which are roughly the same as that of the single-stage while being less noisy. Now using the above set of bandwidths, we compute 6-step ahead out-of-sample predictions standing on the last 42 observations, i.e. W300 , . . . , W341 . For example, at W300 we predict the 10th or 90th quantile of W306 conditional on x ¼ X300 . The respective average lengths of the intervals for the single-stage and multi-stage are 0.148 and 0.152 which, respectively, are 24% and 25% of the range of the data. Thus both

2510

DE GOOIJER, GANNOUN, AND ZEROM

Figure 3. In-sample 10th- and 90th-conditional quantile estimates of the singlestage (medium dashed line) and the multi-stage (solid line) predictors.

Figure 4. Out-sample 10th- and 90th-conditional quantile estimates of the singlestage (medium dashed line) and the multi-stage (solid line) predictors.

estimators perform comparably well in the sense of having not too wide intervals. Only about 19% of the actual observations lie outside the intervals. The average values of the upper and lower predictive intervals for the single-stage and multi-stage are [0.0719 (0.0125); 0.0759 (0.0175)] and

QUANTILE PREDICTION IN TIME SERIES

Figure 5. CDFs.

2511

a) 42 out-of-sample single-stage CDFs; b) 42 out-of-sample multi-stage

[0.0755 (0.0038); 0.0765 (0.0095)], respectively, where the numbers in the parentheses are the standard deviations. This indicates that the single-stage based confidence intervals are more erratic than those of the multi-stage approach. We can also observe this from Figure 4 which displays the 80% confidence intervals. In the foregoing analysis we have used quantiles to construct the confidence intervals. But in situations where the predictive densities are asymmetric or multi-modal, the quantile based intervals tend to give wider intervals. Therefore, De Gooijer and Gannoun (11) suggested the use of more efficient predictive intervals which are based directly on the conditional distribution function (CDF). Fortunately, the multistage approach introduced in this paper is still useful. We just have to employ the multi-stage CDF F^n ðxÞ instead of F~n ðxÞ. Figure 5 presents the 42 out-of-sample single- and multi-stage CDFs which corresponds to the quantile values in Figure 4. While the general pattern of the CDFs from

2512

DE GOOIJER, GANNOUN, AND ZEROM

both estimators is the same, the multi-stage based CDFs are noticeably smoother. 6. APPENDIX: ASSUMPTIONS AND PROOFS The asymptotic results will be derived on a set of assumptions gathered below for ease of reference. (A.1) (A.2)

(A.3) (A.4) (A.5)

The kernel K is bounded, even, and strictlyRpositive Ho¨lderian 1 function satisfying limu!1 uKðuÞ ! 0 and 1 u2 KðuÞ du < 1. The marginal density pX ðxÞ of X is lower bounded away from 0. Its first and second derivatives exist, are bounded and integrable. The conditional density function pZjX ð:jxÞ is continuous. The joint density pX, Y ðx, yÞ is Ho¨lder continue both in x and y. The functions FðjxÞ ¼ Eð1fZ 4 g jX ¼ xÞ, and FðjyÞ ¼ Eð1fZ 4 g jY ¼ yÞ are twice differentiable (with respect to x and y) and the second derivatives are Ho¨lder continuous such that jF ð0, 2Þ ðjx1 ÞF ð0, 2Þ ðjx2 Þj 4 2 jx1  x2 j 2 , and jF ð0, 2Þ ðjy1 ÞF ð0, 2Þ ðjy2 Þj 4 1 jy1  y2 j 1 : In addition FðjyÞ is Ho¨lder continuous such that jFðjy1 Þ  Fðjy2 Þj 4 3 jy1  y2 j 3 where 1 , 2 , and 3 are positive constants.

Some comments on the above assumptions are in order. Assumption (A.1) is quite usual in kernel estimation. A symmetric density with compact support satisfies this assumption. Assumption (A.2) is needed to prove the convergence of the multi-stage kernel smoother estimator of the conditional quantile. Assumption (A.5) is used to ensure that the variance of F^n ðjxÞ exists and is finite. Before we give the proof of Theorem 2, it is helpful to state the following general result. Let  be a continuously differentiable real function with a value p at a point . Suppose that  and  are unknown and that there exists an estimate n of  based on n observations. If, at a point n , n ðn Þ ¼ p, then it is natural to estimate  by n . If we consider a sequence fn g of differentiable estimates for which asymptotic results are known, it is possible to obtain for the sequence fn g asymptotic results (convergence,

QUANTILE PREDICTION IN TIME SERIES

2513

rate of convergence, asymptotic distribution) by using a Taylor series expansion of ðÞ: ðÞ ¼ n ðn Þ ¼ p ¼ ðn Þ þ ð  n Þð1Þ ð Þ,

ð6:1Þ

where  is between  and n and the superscript (1) denotes the first derivative. Note that (6.1) contains as special cases quantiles and conditional quantiles estimation problems. For ease of notation we replace X ¼ x by x in the following proofs. Proof of Theorem 2. We make use of the property FðqðxÞjxÞ ¼ p ¼ F^n ðq^ n ðxÞjxÞ: Taylor expansion of Fðqn ðxÞjxÞ about q^ n ðxÞ and various approximation (see, e.g., Lemma D on p. 97 of Serfling (12) or Lemma 4 of Cai (13) gives FðqðxÞjxÞ ¼ p ¼ F^n ðq^ n ðxÞjxÞ  F^n ðqðxÞjxÞ þ ðqn ðxÞ  q^n ðxÞÞpZjX ðq jxÞ, where q is some random point between qðxÞ and q^ n ðxÞ. By the Lemma and a slight modification of Theorem 1 of Chen (14) (replacing Z by 1fZ 4 g ), if h1, n ¼ oðh2, n Þ the asymptotic bias of F^n ðqðxÞjxÞ is given by  1  FðqðxÞjxÞ  EðF^n ðqðxÞjxÞÞ ¼ h22, n d1 ðqðxÞ, xÞ þ oðh22, n Þ þ O , ð6:2Þ nh2, n where d12 ðqðxÞ, xÞ ¼ D1 ðqðxÞ, xÞ. Further, the asymptotic variance of F^n ðqðxÞjxÞ is given by  1  1 VarðFððqðxÞjYÞjxÞÞ VarðF^n ðqðxÞjxÞÞ ¼ k2 þ o nh2, n pX ðxÞ nh2, n  1  1 ¼ D ðqðxÞ, xÞ þ o : ð6:3Þ nh2, n 3 nh2, n Note that D3 ðqðxÞ, xÞ ¼ k2

VarðFððqðxÞjYÞjxÞÞ v ðqðxÞ, xÞ ¼ k2 1 : pX ðxÞ pX ðxÞ

Thus, from (6.2) and (6.3) it follows directly that the asymptotic MSE of F^n ðqðxÞjxÞ is given by EfF^n ðqðxÞjxÞ  FðqðxÞjxÞg2 ¼

h42, n D1 ðqðxÞ, xÞ 1 D3 ðqðxÞ, xÞ þ nh2, n 4   1 þ o h42, n þ : nh2, n

2514

DE GOOIJER, GANNOUN, AND ZEROM

Now, by Assumptions (A.1), (A.2) and (A.3) and by unicity of qðxÞ, we get that q^ðxÞ converges to qðxÞ in probability. Then, by continuity of pZjX ð:jxÞ and because q ðxÞ is between qðxÞ and qn ðxÞ, we have that pZjX ðq ðxÞjxÞ ¼ pZjX ðqðxÞjxÞ þ Op ð1Þ; see for the proof, e.g., Theorem 1 of Yakowitz (15) and Theorem 1 of Samanta and Thavaneswaran (16). Then we deal with the ratio of random variables in a standard way, and obtain ! h42, n D1 ðqðxÞ, xÞ 1 1 2 þ D ðqðxÞ, xÞ Efq^ n ðxÞ  qðxÞg ¼ 2 nh2, n 3 4 pZjX ðqðxÞjxÞ   1 4 þ o h2, n þ : ð6:4Þ nh2, n The value of h2, n minimizing (6.4) is given by   D3 ðqðxÞ, xÞ 1=5 1=5 h2, n ¼ n : D1 ðqðxÞ, xÞ The corresponding best possible MSE of q^ n ðxÞ is given by   5n4=5 1=5 ðqðxÞ, xÞD ðqðxÞ, xÞ : D4=5 MSE fq^ n ðxÞg ’ 2 1 4pZjX ðqðxÞjxÞ 3 Proof of the Corollary: By Theorem 1, we have that the minimum value of the asymptotic MSE of q~ n ðxÞ is given by   5n4=5 1=5 D4=5 ðqðxÞ, xÞD ðqðxÞ, xÞ : MSE fq~ n ðxÞg ’ 2 1 4pZjX ðqðxÞjxÞ 2 It is easy to see that  2 ðqðxÞ, xÞ ¼ v1 ðqðxÞ, xÞ þ v2 ðqðxÞ, xÞ. Therefore, the ratio of the minimum asymptotic MSE s of the estimators q~ ðxÞ and q^ n ðxÞ is given by   v2 ðqðxÞ, xÞ 4=5 rðqðxÞ, xÞ ¼ 1 þ 51: v1 ðqðxÞ, xÞ

REFERENCES 1. Ha¨rdle, W.; Lu¨tkepohl, H.; Chen, R. A review of nonparametric time series analysis. International Statistical Review, 1997, 65, 49–72. 2. Eubank, R.L. Spline Smoothing and Nonparametric Regression, Marcel Dekker: New York, 1988.

QUANTILE PREDICTION IN TIME SERIES

2515

3. Ha¨rdle, W. Applied Nonparametric Regression, Cambridge University Press: Cambridge, 1990. 4. Matzner-Løber, E.; Gannoun, A.; De Gooijer, J.G. Nonparametric forecasting: a comparison of three kernel-based methods. Communications in Statistics—Theory and Methods, 1998, 27, 1593–617. 5. De Gooijer, J.G.; Zerom, D. Kernel-based multistep-ahead predictions of the U.S. short-term interest rate. J. of Forecasting, 2000, 19, 335–353. 6. Collomb, G. Proprie´te´s de convergence presque comple´te du pre´dicteur a` noyau. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 1984, 66, 441–460. 7. Hall, P.; Wolff, R.C.L.; Yao, Q. Methods for estimating a conditional distribution function. J. Amer. Statist. Assoc., 1999, 94, 154–163. 8. Berlinet, A.; Gannoun, A.; Matzner-Løber, E. Asymptotic normality of convergent estimates of conditional quantiles. Statistics, 2001, 35, 139–169. 9. Jones, M.C.; Hall, P. Mean squared error properties of kernel estimates of regression quantiles. Statistics & Probability Letters, 1990, 10, 283–289. 10. De Gooijer, J.G.; Gannoun, A.; Zerom, D. Mean squared error properties of the kernel-based multi-stage median predictor for time series. Statistics & Probability Letters, 2001 (forthcoming). 11. De Gooijer, J.G.; Gannoun, A. Nonparametric conditional predictive regions for time series. Computational Statistics & Data Analysis, 2000, 33, 259–275. 12. Serfling, R.J. Approximations Theorems of Mathematical Statistics, Wiley: New York, 1980. 13. Cai, Z. Regression quantiles for time series. Econometric Theory, 2001 (forthcoming). 14. Chen, R. A nonparametric multi-step prediction estimator in Markovian structures. Statistica Sinica, 1996, 6, 603–615. 15. Yakowitz, S.J. Nonparametric density estimation, prediction, and regression for Markov sequences. J. Amer. Statist. Assoc., 1985, 80, 215–221. 16. Samanta, M.; Thavaneswaran, A. Non-parametric estimation of the conditional mode. Communications in Statistics – Theory and Methods, 1990, 19, 4515–4524. Received August 2000 Revised August 2001

Suggest Documents