Simple Bootstrap Tests for Conditional Moment Restrictions - CiteSeerX

3 downloads 0 Views 313KB Size Report
where Yt φ (Y&,t,X/t)/, Y&t is the dependent random variable, /(Xt,θ%) is the conditional mean and σ'(Xt,θ%) is the conditional variance. Example 3: (Portfolio ...
Simple Bootstrap Tests for Conditional Moment Restrictions Juan Carlos Escanciano1 Indiana University First version: March 1, 2008. This version: January 13, 2009.

Abstract: This paper proposes consistent tests for conditional moment restrictions of stationary time series. The proposed tests are based on continuous functionals of a projected integrated conditional moment process which acknowledges that deviations in the direction of the score function cannot be di¤erentiated from deviations within the parametric model. As a result, the new tests are expected to have better power properties than existing tests and allow for a simple multiplier-type-bootstrap approximation. Thereby extending the scope of the wild-bootstrap and related methods to general conditional moment restrictions, including quantile regressions for which wild-bootstrap methods were not available. The new tests have the following remarkable properties: (i) they do not need to choose a tuning parameter, other than the number of bootstrap replications; (ii) there is no need for re-estimating the unknown parameters in each bootstrap replication; (iii) the new tests are valid under higher order conditional moments of unknown form; and (iv) they allow for non-separable and non-smooth moment functions and any root-n consistent estimator. A Monte Carlo experiment shows that the new method presents more accurate size and higher power than subsampling procedures. In an empirical application we study the dynamics in mean and variance of the Hong Kong stock market index and we evaluate models for the Value-at-Risk of the S&P 500. These applications highlight the merits of our approach and show that the new methods have higher power than popular backtesting methods. Key words and phrases: Conditional moment restrictions; Omnibus tests; Asset Pricing; Value-at-Risk; Quantile Regression; Wild bootstrap; Non-separable models; Backtesting. JEL Classi…cation Number: C12; C14; C32. 1

Department of Economics, Indiana University, email: [email protected]. Research funded by the Spanish Plan Nacional de I+D+I, reference number SEJ2007-62908. I thank seminar participants at Penn State University.

1

1. Introduction Economic theories in dynamic contexts often suggest Conditional Moment Restrictions (CMR) in the underlying economic variables. Rational expectations and asset pricing equations are prominent examples of these behavioral assumptions. Alternatively, exogeneity or other statistical assumptions can also lead to these CMR. A general framework for analyzing such CMR is given by E[h(Yt ;

0)

j Xt ] = 0 almost surely (a.s.); for a unique

0

2

Rp ;

(1)

! Rdh that is for a measurable, possibly non-smooth, moment function h : Rdy assumed to be known up-to the …nite dimensional parameter 0 2 ; with a compact 0 0 p 0 subset of R ; p 2 N: The vector-valued stochastic process fZt (Yt ; Xt ) gt2Z is a strictly stationary and ergodic time series de…ned on the probability space ( ; F; P ). Henceforth, jAj and A0 denote the Euclidean norm jAj = (tr(A0 A))1=2 and the transpose of a matrix A; respectively. The conditioning variable Xt takes values in Rdx ; dx < 1; and it can contain lagged values of Yt and other exogenous variables. Throughout the paper we shall assume that equation (1) identi…es uniquely the parameter 0 : The following examples illustrate the general applicability of our methods. Further examples abound in the literature; see examples in Chen and Fan (1999), Li (1999), Delgado et al. (2006), Anatolyev (2007) and references therein, to mention just a few. In these examples Xt denotes the agent’s information set available at period t 1; and Xt is a …nite-dimensional subset of Xt : Example 1: (Asset Pricing Models, Hansen and Singleton, 1982) The consumption-based capital asset market model studied in Hansen and Singleton (1982) leads to the CMR E[h(Yt ; with h(Yt ;

0)

an s

0)

j Xt ] = 0;

1 vector with the i hi (Yt ;

0)

=

0

(2)

= ( ; );

th component

(1 + ri;t )

ct ct

1

; 1

i

s;

1

where ct denotes spending on consumption in period t; ri;t is the net return of the asset i; 2 (0; 1) is a discount factor, is the coe¢ cient of risk aversion and Yt = (r1;t ; :::; rs;t ; ct =ct 1 )0 : Hansen and Singleton (1982) proposed estimating 0 by GMM. Di¤erent speci…cations of the agent’s preferences lead to di¤erent CMR; see e.g. Abel (1990) and Constantinides (1990) for extensions of this basic framework.

2

Example 2: (Joint speci…cation of conditional mean and variance models) Correct speci…cation of volatility models plays an important role in …nancial time series. Although it is not standard practice, a joint speci…cation of the conditional mean and variance is appropriate in this context. This correct speci…cation corresponds to the CMR with moment functions h(Yt ;

0)

=

"

(Y1t

Y1t f (Xt ; f (Xt ; 0 ))2

0) 2

(Xt ;

0)

where Yt = (Y1;t ; X0t )0 ; Y1t is the dependent random variable; f (Xt ; and 2 (Xt ; 0 ) is the conditional variance.

# 0)

;

(3)

is the conditional mean

Example 3: (Portfolio conditional mean-variance e¢ ciency) There is an extensive literature on mean-variance analysis in …nance and econometrics; see Sentana (2008) for a survey. Let rp;t be the return on the portfolio p in excess of the riskless rate and rt = (r1;t ; :::; rq;t ) be a q 1 vector of excess returns of the other assets. The portfolio p is conditionally mean-variance e¢ cient if and only if the following CMR holds 2 E[ E[rp;t j Xt ]

E[rp;t j Xt ]rp;t rt j Xt ] = 0:

(4)

2 In a parametric approach one considers parametric models for E[rp;t j Xt ] and E[rp;t j Xt ]; say 2 (Xt ; 0 ) and 1 (Xt ; 0 ), respectively, and proceeds with a joint test using the q + 2 moments 2 3 rp;t 1 (Xt ; 0 ) 6 7 2 h(Yt ; 0 ) = 4 rp;t 5: 2 (Xt ; 0 ) f 2 (Xt ; 0 ) 1 (Xt ; 0 )rp;t g rt

Nonparametric versions of (4) have been tested in Wang (2002, 2003) and Chen and Fan (1999). When the dimension of Xt is high, the resulting tests based on smoothers will be a¤ected by the lack of precision of the nonparametric estimators. Alternative parametric approaches have been considered using GMM techniques, see e.g. Cochrane (1996), generally focussing just on the restriction (4). Joint tests can be constructed using our methods, leading to consistent and simple tests of conditional mean-variance e¢ ciency. Example 4: (Evaluating Value-at-Risk models) Value-at-Risk (VaR) has become the most popular measure of market risk used by …nancial institutions to determine the amount of capital on hold. In …nancial terms, VaR is the maximum loss on a trading portfolio for a period of time given a con…dence level. In statistical terms, VaR is a quantile of the conditional distribution of returns on the portfolio given agent’s information set. Denote the real-valued time series of portfolio returns by Y1;t : Assuming that the conditional distribution 3

of Y1;t given Xt is continuous, we de…ne the -th conditional VaR m (Xt ; by the CMR P (Yt m (Xt ; 0 ) j Xt ) = ; a.s.

0)

of Y1;t given Xt

where 0 2 Rp is an unknown parameter, and 2 (0; 1) is the VaR level, usually = 0:01. Therefore, the correct speci…cation of the VaR model implies the CMR E[h(Yt ;

0)

j Xt ] = 0;

(5)

where h(Yt ; ) = 1(Y1;t m (Xt ; )) ; 1(A) the indicator function of the event A; i.e., 1(A) = 1 if A occurs and zero otherwise, and Yt = (Y1;t ; X0t )0 : The parameter 0 can be estimated by the Quantile Regression Estimator (QRE), see Basset and Koenker (1978). Example 5: (Evaluating Expected Shortfall models) Despite its universality, several authors have pointed out the de…ciencies of VaR, including its lack of subadditivity; see Artzner et al. (1997, 1999). These authors proposed the Expected Shortfall (ES) as an alternative to VaR. We can specify a parametric conditional ES as S (Xt ; ) = E[Y1;t j Y1;t

m (Xt ; ); Xt ]:

The parametric model will be correctly speci…ed if and only if E[fY1;t 1(Y1;t

m (Xt ;

0 ))

S (Xt ;

0 )g

j Xt ] = 0 a.s.

(6)

To the best of our knowledge, no consistent test has been proposed for testing (6). Tests for (1) have been traditionally tested using the the so-called Moment (M) and related speci…cation tests (cf. Newey (1985) and Tauchen (1985)); see e.g. the J-test of Hansen (1982) for (2) and (4), Wooldridge (1990) test for (3) or the Backtesting techniques in Engle and Manganelli (2004) for (5). It is well-known, however, that M-tests and J-tests are inconsistent for testing CMR. Bierens (1982) proposed the Integrated Conditional Moment test (ICM) as a solution to this problem in a regression context; see also Stute (1997) for a related approach. A tenet for ICM tests and other omnibus tests is that the asymptotic null distributions depend in a complicated way on the data generating process (DGP), the model and the unknown parameter 0 , making it di¢ cult the computation of asymptotic critical values. This problem has hampered the application of these consistent tests in applied econometric work. Several solutions have been proposed to overcome the practical problem of computing the critical values in some speci…c cases, with special focus on regression models. Existing meth4

ods are, however, computationally expensive to implement and usually require the choice of tuning parameters (e.g. subsampling size, block size, etc.) Therefore, often the researcher is presented with the following dilemma: either to choose an inconsistent test (e.g. M-test) with simple asymptotic approximation or to choose a consistent test, whose asymptotic distribution is di¢ cult to approximate and whose approximation depends substantially on some tuning parameter, without much guidance for its choice. This paper proposes new tests for the general CMR described in (1) against nonparametric alternatives of unknown form, and which are straightforward to implement. The new tests are motivated from power considerations. They are based on continuous functionals of a projected integrated conditional moment process which acknowledges that deviations in the direction of the score function cannot be di¤erentiated from deviations within the parametric model. Existing tests do not take into account this information, and hence, are expected to have less power than our tests2 . As a by-product, the new tests allow for a simple multiplier-type-bootstrap approximation, thereby extending the scope of the wild-bootstrap to general CMR, including cases with non-smooth moments such as quantile regression, and overcoming the practical limitations of the existing procedures. Our construction leads to powerful tests for (1), where no tuning parameter is necessary, other than the number of p bootstrap replications. The new tests allow for any n consistent estimator, and do not require the estimation of the asymptotic distribution of the estimator, nor estimating the parameters in each bootstrap replication. Therefore, the proposed tests are easy to implement in examples where estimators (or their asymptotic variances) are di¢ cult to obtain, as in e.g. nonlinear or quantile models. Finally, our tests allow for non-separable and non-smooth moments, and hence, they have general applicability. The rest of the paper is organized as follows. In Section 2 we introduce some notation and the test statistics. Section 3 formally establishes the asymptotic null distribution of tests and compares the power of the new tests with classical tests: Section 4 introduces the multiplier-type-bootstrap approximation, proves its consistency and establishes connections with existing methods. Section 5 provides some Monte Carlo experiments and Section 6 considers an application to the Hong Kong stock index and the S&P 500, comparing the new tests with subsampling-based tests. Extensions to nonparametric (in…nite-dimensional) scores are discussed in Section 7. Finally, in Section 8, we conclude with future research. Mathematical proofs of our results are gathered in an Appendix.

2

An exception is Wooldridge (1990). However, it should be noted that Wooldridge’s motivation was di¤erent to ours. His motivation was to construct tests robust to departures from distributional assumptions that were not being tested.

5

2. Testing Problem and Test Statistics Following the integrated approach for testing CMR (see e.g. Bierens (1982), Stute (1997), Escanciano (2007a)), we characterize the CMR by an in…nite number of unconditional moment restrictions. More speci…cally, the law of iterated expectations yields H(x;

0)

:= E[h(Yt ;

0 )1(Xt

x)] =

Z

m( ;

(7)

0 )dPX

( 1;x]

where x 2 Rdx ; 2

m(x; ) := E[h(Yt ; ) j Xt = x];

(8)

;

and PX is the stationary probability of Xt . Therefore, from (7) and Billingsley (1995, Theorem 16.10iii), we have3 m(x; The function H(x; hypothesis

0)

H0 : E[h(Yt ;

0)

0 PX -a.s () H(x;

0)

0

for all x 2 Rdx :

is called the integrated CMR, after (7). We aim to test the null

0 )1(Xt

x)] = 0; for all x 2 Rdx and some

0

Rp ;

2

against the class of nonparametric alternatives x)] 6= 0) > 0; for all

HA : PX (x : E[h(Yt ; )1(Xt

2

Rp :

In view of the characterization of the null hypothesis, it is natural to reject H0 for large values of the marked empirical process, 1 X h (Yt ; ) 1(Xt Rn (x; ) = p n t=1 n

x):

(9)

There is a long history on the use of processes like Rn (x; ) for inference in econometrics and statistics. We do not aim to review this extensive literature here. We rather refer the reader to Bierens (1982), Stute (1997) and Koul and Stute (1999) for important contributions in the context of regression models; see also Escanciano (2007a) for a uni…ed treatment. In a time series context, general CMR with smooth moment functions were studied in Chen and Fan 3

Our methods can be similarly applied to other weighting functions di¤erent from the indicator family 1(Xt x); see Bierens and Ploberger (1997), Stinchcombe and White (1998) and Escanciano (2007a) for examples.

6

(1999). For a thorough comparison with some existing methods see Section 4. Alternatively, consistent tests for CMR can be also based on smoothed versions of Rn ; see e.g. Härdle and Mammen (1993). This literature is too extensive to be summarized here; see Hart (1997) for some review of the smoothing approach when d = 1; and Li (1999), Horowitz and Spokoiny (2001) and Guerre and Lavergne (2005) for some recent references. The use of smoothers leads to an ill-posed testing problem where some kind of regularization (i.e. bandwidth) is necessary. Existing tests for CMR do not acknowledge that 0 is a nuisance parameter in the construction of the test. The situation here is similar in spirit to the one addressed in Neyman (1959). Since 0 2 is unknown, deviations in the direction of the score function cannot be di¤erentiated from deviations within the parametric model, i.e. from local deviations of 0 . A simple way to incorporate this information in the test statistic is to construct a test such that does not waste power in the direction of the score. More precisely, instead of Rn (x; n ) we consider a feasible version of the projected marked empirical process 1 X f1(Xt =p n t=1 n

Rn1 (x; n )

G0 (x;

x)Ih

n)

1

g(Xt ;

n )gh (Yt ; n ) ;

p where n is a n-consistent estimator of 0 ; g(Xt ; ) := (@=@ 0 )m(Xt ; ) is the p dh matrix of scores; Ih is the identity matrix of order dh ; G(x; ) := E[g (Xt ; ) 1(Xt x)] and := 0 E[g (Xt ; 0 ) g (Xt ; 0 )]: We stress that only m (Xt ; ) is required to be di¤erentiable in , not h (Yt ; ) : Unlike Rn ; Rn1 satis…es the following convenient property (under some regularity conditions, cf. Theorem 1) sup Rn1 (x;

x2Rdx

n)

Rn1 (x;

0)

= oP (1):

(10)

Property (10) is the key for the simple bootstrap approximation developed later. The process Rn1 is not the only one satisfying this property; any test statistic of the form P n 1=2 nt=1 a(Xt )h (Yt ; n ) ; with a(Xt ) orthogonal to the score g (Xt ; 0 ) ; i.e. E[a(Xt )g(Xt ; 0 )] = 0; satis…es (10). A prominent example satisfying the orthogonality property is the Khmaladze’s (1981) transformation. Our choice of Rn1 is motivated from power arguments and the simplicity of its construction, see Section 3 for further elaboration on these points. In particular, we shall prove that Rn1 , or a simple variation of Rn1 ; has higher power than tests based on the classical marked process Rn or tests based on Khmaladze’s (1981) transformation.

7

Our test statistics will be continuous functionals of the feasible projected process X ^ n1 (x) = p1 f1(Xt R n t=1 n

x)Ih

G0n (x;

n) n

1

g(Xt ;

n )gh (Yt ; n ) ;

(11)

P P where Gn (x; ) = n 1 nt=1 g(Xt ; )1(Xt x) and n = n 1 nt=1 g (Xt ; n ) g 0 (Xt ; n ) : An implicit assumption throughout this section is that the score g(Xt ; ) is known up to the …nite-dimensional parameter 0 ; see Section 6 for an extension of our results to the case of in…nite-dimensional unknown score g(Xt ; ). A popular example of functional is the Cramer-von Mises-type (CvM) test statistic CvMn :=

Z

2

^ 1 (x) dFn;X (x); R n

(12)

Rdx

where Fn;X is the empirical distribution function of fXt gnt=1 : For computation of (12) in regression and quantile regression models see Section 5. We choose CvMn over other alternative functionals for the simplicity of its computation. In spite of (10), we shall prove that, like with Rn ; the limit distribution of functionals ^ n1 ; such as CvMn ; will still depend on the DGP and the model, although, unlike with of R Rn ; not on the estimator n : The latter advantage su¢ ces to allow for a simple bootstrap approximation in Section 4. 3. Asymptotic Theory 3.1 Asymptotic Null Distribution This section introduces su¢ cient conditions to establish the asymptotic null distribution ^ 1 : Let Ft := (Yt 1 ; Xt ; :::) be the -…eld generated by the information set obtained up to of R n time t: Let F ( ) be the joint cumulative distribution function (cdf) of (Yt ; Xt ); and let FY ( ) ^ 1 ( 1) = and FX ( ) be their marginal cumulative distributions; respectively. By de…ning R n d x 1 1 1 ^ ^ Rn (+1) = 0, the sample paths of Rn belong to the space ` (R ); the space of all uniformly dx bounded real functions on R := [ 1; 1]dx ; which is equipped with the sup norm. Tests ^ 1 ; say ' R ^ 1 ; for some suitable statistics are continuous (in the sup norm) functionals of R n n dx

continuous functional ' : `1 (R ) 7 ! R+ [0; 1). Once a Functional Central Limit 1 ^ ^ 1 under H0 is obtained by Theorem for Rn is provided, the limiting distribution of ' R n applying the Continuous Mapping Theorem. In this article we consider convergence in distribution of empirical processes in the metric dx space `1 (R ) with the sup norm in the sense of J. Ho¤mann-Jørgensen (see, e.g., Dudley 8

dx

1999 p. 94). The symbol =) denotes weak convergence in `1 (R ) and the corresponding Borel sigma …eld. Measurability issues are resolved by standard arguments based on outer probabilities and they are not further discussed in this paper; see van der Vaart and Wellner (1996). See also Andrews (1994) for earlier applications of empirical processes theory in econometrics. ^ n1 and Rn1 ( ; 0 ): To Our …rst result proves the asymptotic uniform equivalence between R this end, we need the following assumptions. Let 0 be an arbitrary small neighborhood around 0 : For x 2 Rdx ; we de…ne the random function Gt (x) := E E[ sup jh(Yt ; )j2 j Xt ]1(Xt 2

0

x) j Ft

1

Recall m(x; ) := E[h(Yt ; ) j Xt = x] and = E[g (Xt ; 0 ) g 0 (Xt ; 0 )]: The following regularity condition is necessary for the subsequent asymptotic analysis. Assumption A1: A1(a): E[sup 2 0 jh (Yt ; )j2 ] < 1: The cdf FX is absolutely continuous. Under H0 ; E[h(Yt ; ) j Ft ] = E[h(Yt ; ) j Xt ] a.s. in an arbitrary neighborhood of 0 : A1(b): fZt gt2Z is a strictly stationary and ergodic process: A1(c): The function m(x; ) is once continuously di¤erentiable in with derivative 2 g(x; ) satisfying E sup 2 0 jg (Xt ; )j < 1: Furthermore, E[g (Xt ; ) g 0 (Xt ; )] is positive de…nite in 0 and E[jg (Xt ; 0 ) h(Yt ; 0 )j2 ] < 1: A1(d) jGt (x1 ) Gt (x2 )j Ct jx1 x2 js1 ; for each (x1 ; x2 ) 2 Rdx Rdx ; some s1 > 0: Moreover, for some s > 0; all su¢ ciently small > 0 and with g (x; ) = Ih or g (x; ) = g (x; )); E[

sup j

1

2j

;

1; 22

0

jg (Xt ;

1 ) h(Yt ; 1 )

g (Xt ;

2 2 ) h(Yt ; 2 )j

j Ft ]

Ct

s

a:s:

for a generic stationary sequence Ct with E jCt j < 1. Assumption A2: The parametric space is compact in Rp : The parameter p the interior of and n( n 0 ) = OP (1):

0

belongs to

Assumptions A1-A2 are considerably weaker than related conditions in the literature of CMR. For instance, Dominguez and Lobato (2004) require the existence of at least fourth moments and smoothness of h(Yt ; ) in : We only require second moments and conditional Lipschitz conditional moments in A1(d). The Markov assumption in A1 is needed to apply martingale theory and for the score g (Xt ; ) to depend just on Xt and not the whole history 9

Ft . The local Markov assumption can be relaxed at the cost of longer proofs. Note that we p do not need mixing assumptions in A1. In A2 we require n( n 0 ) = OP (1): Without further regularity conditions to those in Theorem 1 below, the minimum-distance estimator (cf. Wolfowitz 1957) n;CGM M

:= arg min 2

Z

jRn ( ; )j2 dFn;X (x);

Rd x

satis…es A2. Indeed, the estimator n;CGM M is a generalization to that proposed in Dominguez and Lobato (2004), allowing for non-smooth moment functions. A direct consequence of our results is that, under A1-A2, p with R1 ( ;

n(

0)

n;CGM M

1 GG0

=

GR1

+ oP (1);

R R = G(x; 0 )G0 (x; 0 )dFX (x), GR1 = G(x; 0 )R1 (x; 0 )dFX (x); and where 0 ) is Gaussian process with mean zero and covariance function GG0

K(x; y) := E[h (Yt ;

0 0) h

(Yt ;

0 ) 1(Xt

x ^ y)];

with x ^ y the coordinate-wise minimum of x and y. This estimator can be used in our Example 1 on asset pricing to avoid possible identi…cation problems with GMM, that can ^ n1 (cf. lead to inconsistency of the GMM estimator and the resulting inferences based on R Dominguez and Lobato, 2004). We are now in position to state the …rst important result of the paper. Theorem 1: Under Assumptions A1-A2, ^ 1 (x) sup R n

x2Rdx

Rn1 (x;

0)

= oP (1);

We remark that the uniform representation in Theorem 1 holds under both, the null and ^ n1 . To alternative hypotheses. A consequence of Theorem 1 is the weak convergence of R simplify the notation, de…ne 1? (Xt x) := f1(Xt x)Ih G0 ( ; 0 ) 1 g(Xt ; 0 )g: Corollary 1: Under Assumptions A1-A2, if (1) holds, then ^ n1 =) R1 ; R 1

10

1 where R1 is a Gaussian process with mean zero and covariance function:

K 1 (x; y) := E[1? (Xt

0 0) h

x)h (Yt ;

(Yt ;

? 0 ) (1 (Xt

y))0 ]:

As a result of Corollary 1, the Continuous Mapping Theorem and Glivenko-Cantelli’s Theorem, the asymptotic distribution of CvMn is CvMn =) CvM1 :=

Z

2

1 R1 (x) dFX (x);

Rdx

^ n1 in the next section. We proceed to study the power properties of tests based on R 3.2 Asymptotic Power Properties ^ n1 . Consider the local This section discusses both, global and local, power properties of R (Pitman) alternatives HA;n (a) : E[h(Yt ;

0)

a(Xt ) j Xt ] = p a.s. n

where a(Xt ) is a non-zero PX -integrable function such that E[g(Xt ;

0 )a (Xt )]

(13)

= 0:

Note that condition (13) does not entail any loss of generality, see Escanciano (2008b). A consequence of Theorem 1 is that, under the local alternatives HA;n (a); 1 ^ n1 =) R1 R + Da ;

(14)

1 where R1 is de…ned in Corollary 1 and Da (x) := E[1(Xt x)a (Xt )]: Several conclusions emerge from the convergence in (14). First, our tests are able to detect ^ n1 one entertains any Pitman local alternative satisfying (13). Second, suppose instead of R a classical test based on the marked process Rn ( ; n ): Under the standard assumption that n satis…es an asymptotic Bahadur expansion under HA;n (a) such as

p

1 X p ) = + 0 a n t=1 n

n(

n

1

k(Xt ;

0 )h (Yt ; 0 )

+ oP (1);

(15)

where a = E[k(Xt ; 0 )a(X1 )]; = E[k(Xt ; 0 )g 0 (Xt ; 0 )] and k(Xt ; 0 ) is a measurable (p dh ) matrix valued function, then it can be similarly proved that under the local alternatives 11

HA;n (a); the following weak convergence holds Rn ( ;

n)

e1 + Da ; =) R

(16)

e1 is a zero mean Gaussian process with covariance function where R e y) := E[e K(x; 1(Xt

x)h (Yt ;

0 0) h

(Yt ;

e

0 ) (1(Xt

y))0 ];

where e 1(Xt x) = 1(Xt x)Ih G0 (x; 0 ) 1 k(Xt ; 0 ): Notice that the same shift function Da appears in (14) and (16). Assuming for the time being that the model is conditionally homocedastic with unit conditional variance, i.e. 2 (Xt ) := E[h(Yt ; 0 )h0 (Yt ; 0 ) j Xt ] Ih , it follows from standard properties of orthogonal 1 e x): In other words, the process R1 projections that for all x 2 Rdx ; K 1 (x; x) K(x; is of e1 : One can prove that, as a result, tests based on R ^ n1 will have “smaller”magnitude than R better power properties than tests based on Rn ( ; n ): To prove this point formally, given a “check”function ( ), de…ne the test, say Tn1 ; based on rejecting for large absolute values of Z

^ 1 (dx; (x)R n

n ):

Similarly, denote by Tn2 the test rejecting for large absolute values of Z

(x)Rn1 (dx;

n ):

Then, for any square integrable functions and under mild regularity conditions, including (13), it can be shown that the Pitman Asymptotic Relative E¢ ciency of Tn1 and Tn2 is ARE :=

E[jP (Xt )j2 ] ; E[jL (Xt )j2 ]

where we introduce the linear operators P (Xt ) := (Xt ) E[g 0 (Xt ; 0 ) (Xt )] 1 g(Xt ; 0 ) and L := (Xt ) E[g 0 (Xt ; 0 ) (Xt )] 1 k(Xt ; 0 ); respectively. Since P is an orthogonal projection and L also projects in the space orthogonal to the score, it follows by the least squares property that ARE 1; with ARE < 1 if k(Xt ; 0 ) is not collinear to the score 4 g(Xt ; 0 ) : Therefore, we conclude that for a general class of estimators in homocedastic CMR, our construction leads to test with better power properties than classical tests based 4

The case k(Xt ; 0 ) = g(Xt ; 0 ) corresponds to the use of the e¢ cient estimator. For nonlinear models our construction is expected to lead to better …nite-sample performance even when ARE = 1; since the orthogonality to the score holds in …nite samples.

12

on Rn1 . We remark that we do not need conditional homocedasticity for Theorem 1 to hold, but only to guarantee that projected tests have higher power than non-projected tests. Under conditional heteroskedascity projected test may or may not be more powerful than nonprojected tests. Our argument, however, does not depend on the conditional variance and it can be readily extended to general conditional heteroskedasticity of unknown form if we modify our projected marked process as X ~ 1 (x) = p1 f1(Xt R n n t=1 n

x)Ih

e0 (x; G n

e 1 g(Xt ;

n) n

n )g

2 n

(Xt )h (Yt ;

n) ;

(17)

en (x; ) = n 1 Pn g(Xt ; ) 2 (Xt )1(Xt x); en = n 1 Pn g (Xt ; n ) 2 (Xt )g 0 (Xt ; where G n n t=1 t=1 2 2 (Xt ); e.g. a Nadaraya-Watson kernel estimator. The n (Xt ) is a nonparametric estimator 1 ~ can be established using the results in Section 7. In particular, asymptotic theory for R n 2 from our results it can be shown that nonparametric estimation of (Xt ) has not e¤ect 1 ~ n ; see Theorem 3. In this paper we emphasize in the (…rst order) asymptotic theory of R ~ 1 for simplicity of computation of the former. Readers concerned with e¢ ciency ^ 1 over R R n n ^ 1 : A comparison of both procedures is beyond the ~ 1 rather than R issues may prefer to use R n n scope of this paper. Regarding the global power properties, our Theorem 1 yields that under the alternative hypothesis ^ 1 ( ) =) E[1? (Xt n 1=2 R )h (Yt ; 0 )]: n Therefore, our test is consistent against all alternatives not collinear to the score, that is, all m such that E[1? (Xt x)m (Xt ; 0 )] 6= 0 in a set with positive PX -measure. Note that this is not an important limitation. After all, all tests based on integrated CMR have trivial local power against those directions, see Escanciano (2008b). As a result, the global power of all tests in the direction of the score will be also low (cf. Strasser 1990).

4. Bootstrap-Based Tests ^ 1 depends In the previous section we have shown that the asymptotic null distribution of R n in a complicated way on the DGP, as well as the speci…cation under the null hypothesis. Therefore, critical values for the tests statistics cannot be tabulated for general cases. There are some available solutions in the literature that are only applicable to or they have been only justi…ed for regression models5 . A popular choice within regression models is the wild 5

Among these are, the conditional p-value method of Hansen (1996), the Khmaladze’s transformation (cf.

13

n)

and

bootstrap used in Stute et al. (1998), Whang (2000) and Escanciano (2007a), among others. For instance, Escanciano (2007a) proposed a wild-bootstrap for general ICM tests in dynamic nonlinear regression models. The wild-bootstrap requires the estimation of parameters 0 in each bootstrap replication, and more importantly, a separable moment function h(Yt ; ); in the sense that the dependent variable, say Y1;t ; has to be recovered from the generalized error h(Yt ; ) in an additive manner6 . In particular, the classical wild-bootstrap procedure cannot be directly applied to quantile regressions. There are other, more general, resampling methods that can be applied to CMR. For instance, Corradi and Swanson (2002) considered the block bootstrap, whereas Chen and Fan (1999) used the stationary bootstrap of Politis and Romano (1994) and an extension of the conditional p-value method of Hansen (1996). For quantile regression problems, the subsampling approach has been the most popular choice; see Chernozhukov (2002), Whang (2004) and Escanciano and Velasco (2007), among others. The subsampling method requires, however, the choice of a tuning parameter, namely, the size of the subsample, whose choice can change dramatically the outcome of the resulting inferences. Two researches with the same data set and model can reach di¤erent conclusions using di¤erent tuning parameters. This undesirable property is shared by other available methods such as the block-bootstrap. Our approach is more related to, but is di¤erent from, the bootstrap method in Delgado, Dominguez and Lavergne (2006), who proposed a resampling method based on the asymptotic linear expansion of the classical marked empirical process. Unlike their test, our tests do not need neither to estimate the in‡uence function of the estimator nor require the estimator to have a Bahadur linear expansion. Moreover, since our tests are based on orthogonal projections they are expected to be more powerful than tests based on non-orthogonal projections, see Section 3. In this section we propose a bootstrap method to solve the problem of approximating ^ 1 allows for a simple application ^ 1 . Our de…nition of R the asymptotic null distribution of R n n of the multiplier-type bootstrap principle (see Chapter 2.9 in van der Vaart and Wellner (1996)), without estimating the parameters in each bootstrap replication. The multiplier approximation has been successfully applied in the literature of regression models, see, e.g., Hansen (1996). As we shall show later in this section, our approach can be seen as an extension of the wild bootstrap to general and possibly non-separable CMR. The new approach does not need any tuning parameter and seems to be superior to other competing methods in terms of accuracy of the approximation; see the Monte Carlo section. Khmaladze, 1981) in Koul and Stute (1997), upper bounds for the critical values in Bierens and Ploberger (1997) and the approximation in Escanciano and Jacho-Chavez (2008). 6 The standard example is a regression model, where h(Yt ; ) = Y1;t m(Xt ; ) and therefore, Y1;t = m(Xt ; ) + h(Yt ; ):

14

^ n1 by that of ' R ^ n1 ; where We shall approximate the asymptotic distribution of ' R ^ n1 is a simple multiplier-bootstrap approximation of R ^ n1 given by R X ^ 1 (x) = p1 f1(Xt R n n t=1 n

x)Ih

G0n (x;

n) n

1

g(Xt ;

n )gVt h(Yt ; n );

and where fVt : 1 t ng is a sequence of independent random variables (r.v’s) with zero mean, unit variance, bounded support and also independent of the sequence f(Yt0 ; Xt0 )0 : 1 t ng: Examples of fVt g sequences are i.i.d. Bernoulli variates with P (Vt = 0:5(1

p

5)) = b

P (Vt = 0:5(1 +

p

5)) = 1

b;

(18)

p p where b = (1+ 5)=2 5; or P (Vt = 1) = 0:5 and P (Vt = 1) = 0:5; see Wu (1986). To justify theoretically this bootstrap approximation we do not need any further additional assumptions. Notice that other related wild-bootstrap procedures require additional assumptions, for instance, on the bootstrap version of the estimator. To prove the consistency of the proposed bootstrap we use the concept of convergence in distribution in probability one. A less restrictive concept is convergence in distribution in ^1 ; probability, see Giné and Zinn (1990). The unknown limiting null distribution of ' R n 1 ^ 1 : i.e. the distribution of ' (R1 ) ; is approximated by the bootstrap distribution of ' R n That is, the bootstrap distribution

^ n1 F (x) = Pr ' R

x fYt ; Xt gnt=1

estimates the asymptotic null asymptotic distribution function 1 F1 (x) = Pr ' R1

x :

^1 Thus, H0 will be rejected at the 100 % of signi…cance when ' R cn; ; where F cn; = n 1 : Also, we can use the bootstrap p values; pn say, rejecting H0 when pn < ; where ^ 1 ^ 1 fYt ; Xt gn pn = Pr ' R ' R n n t=1 : The bootstrap assisted test is valid if F is a consistent estimator of F1 at each continuity point of F1 . When consistency is a.s.; it is ^ 1 !d ' (R1 ) a:s: See Ginè and Zinn (1991) or van der Vaart and Wellner expressed as ' R n 1 (1996) for discussion. Theorem 2: Assume A1-A2. Then, for any continuous functional ' 1 ^ n1 !d ' R1 ' R a:s:

15

It is straightforward to prove that Theorem 2 implies that our bootstrap test is consistent against all alternatives in HA not collinear to the score, provided ' is such that '(f ) = 0 () f = 0 a.s-PX . Moreover, it can be proved that our bootstrap test preserves the ^ n1 ; including its asymptotic admissibility. Details asymptotic local power properties of ' R are omitted to save space. We compare now the new bootstrap procedure with existing bootstrap methods such as the bootstrap test proposed in Delgado, Dominguez and Lavergne (2006) and the wild bootp strap proposed in Wu (1986). Suppose the parameter 0 can be estimated by a n consistent ^ 1 ; it can be shown that under (1) and other regularity estimator n satisfying (15). As with R n conditions, the distribution of Rn (x; n ) can be approximated by that of n

1=2

n X t=1

f1(Xt

x)Ih

G0n (x;

n)

1 n

k(Xt ;

n )gVt h (Yt ; n ) ;

(19)

P where n = n 1 nt=1 k(Xt ; n )g 0 (Xt ; n ) : This is a di¤erent, valid, bootstrap procedure to the one proposed in this paper. For iid observations and smooth moment functions, this bootstrap has been used in Delgado, Dominguez and Lavergne (2006) to approximate the asymptotic distribution of Rn (x; ; n ). This bootstrap is also related to the conditional pvalue method of Hansen (1996). For generalized linear regression models, this method was used in Su and Wei (1991). We establish some interesting connections of the bootstrap in (19), the classical wildbootstrap and our methods. When k(Xt ; 0 ) = g (Xt ; 0 ) ; our bootstrap approximation is asymptotically equivalent to that in Delgado et al. (2006). Interestingly enough, for a linear regression model using OLS estimators, our bootstrap boils down to the bootstrap approximation in Delgado et al. (2006), which in turn coincides with the classical wild bootstrap. The proof of this statement is straightforward, and hence omitted. For nonlinear models, even when k(Xt ; 0 ) = g (Xt ; 0 ) ; the three bootstrap methods are di¤erent in …nite ^ 1 . As discussed above, our construction being based samples, since in general Rn ( ; n ) 6= R n on orthogonal projections it is expected to lead to tests with better power properties. 5. Monte Carlo Simulations In this section we study, via some Monte Carlo experiments, the …nite-sample performance of the proposed bootstrap, in comparison with the subsampling. We consider two sets of simulations. The …rst experiment corresponds to a mean regression model. The model under the null hypothesis is Y1;t =

0 Y1;t 1

+

16

0 Wt

+ "t ,

where 0 = ( 0 ; 0 ) and E[Y1;t 0 Y1;t 1 0 Wt j Xt ] = 0. Henceforth, fWt g and f t g will denote a sequence of mutually independent i.i.d. N(0,1) r.v’s. We examine the adequacy of this model under the following DGPs: DGP 1. An AR(1) model with Gaussian (N(0,1)) errors: Y1;t = 0:6Y1;t

1

+ Wt +

t.

DGP 2. An AR(1) model with Stochastic Volatility (SV): Y1;t = 0:6Y1;t 1 + Wt + "t , where "t = t exp ( 2t ), 2t = 0:936 2t 1 + 0:32Ut , and Ut are i.i.d. N(0,1) r.v’s, independent of 0 s. DGP 3. An AR(1) model with Student-t errors with 3 degrees of freedom (t(3)), Y1;t = 0:6Yt 1 + Y1;t 1 + "t , where "t i.i.d. t (3). DGP 4. An AR(1) model with GARCH(1,1) errors (GARCH), Y1;t = 0:6Y1;t "t = t t and 2t = 0:001 + 0:05"2t 1 + 0:90 2t 1 .

1 + W t + "t ,

where

DGP 5. Threshold Autoregressive Model (TAR), Y1;t =

(

:6Y1;t 1 + :5Y1;t 1 +

; Y1;t ; Y1;t

t t

1 1

< 1, 1.

DGP 6. Sign autoregressive model (SIGN), Y1;t =sign(Y1;t 1 )+0:43 t , where sign(x) = 1 (x > 0) 1 (x < 0). DGP 7. Tem Map model (TM), Y1;t = where

(

(1

Y1;t 1 = Y1;t 1 ) = (1

; 0 < Y1;t ) ; Y1;t

< , 1, 1

1

= :49999 and Y1;0 is generated from U [0; 1].

DGP 8. Exponential AR(1) model (EXPAR), Y1;t = :5Y1;t

1

+ 10Y1;t

1

exp

2 Y1;t

1

+

t.

Models 1–5 are used to examine the empirical size of the tests, while models 6–10 allow us to examine the empirical power. We study the performance of our test for testing the correct speci…cation of the regression model. Therefore, the corresponding mo0 ment function is h(Yt ; 0 ) = Y1;t 0 Y1;t 1 0 Wt , Yt = (Y1;t ; Y1;t 1 ; Wt ) ; and Xt = (Y1;t 1 ; Wt )0 : The parameter 0 is estimated by the ordinary least squares estimator n;1 = P P 0 (n 1 nt=1 Xt Xt ) 1 (n 1 nt=1 Xt Y1;t ) : Note that for this model Gn (x; ) = n

1

n X

x) and

Xt 1(Xt

t=1

n

=n

1

n X t=1

17

Xt Xt0 :

The computation of CvMn is very simple. After some algebra, we can write CvMn = n 2 h0 ( n )P P 0 h( n )

(20)

where h( n ) = (h(Y1 ; n ); :::; h(Yn ; n ))0 ; P = HW; H = In X(X 0 X) 1 X 0 ; In is the n n identity matrix, X is the n 2 matrix with rows Xt0 ; and W is the n n matrix with elements wts = 1(Xt Xs ): Similarly, if we denote by CvMn the bootstrap version, we can compute CvMn = n 2 h 0 ( n )P P 0 h ( n ); where h ( n ) = (V1 h(Y1 ; n ); :::; Vn h(Yn ; n ))0 , and fVt gnt=1 are generated from (18). These formulas show the simple computation of our bootstrap procedure, only the vector of residuals changes in each bootstrap replication, from h( n ) to h ( n ): We compare our approximation with the subsampling approximation. Subsampling has been shown to be a powerful resampling scheme that allows for asymptotically valid inference under very general conditions on the DGP, see the monograph by Politis, Romano and Wolf (1999). We describe the subsampling approximation in our present framework. Let GCvM (w) n be the cdf of the CvMn test statistic, i.e. GCvM (w) = P (CvMn n

w):

^ 1 ) = CvMb (R ^ 1 (Zi ; :::; Zi+b 1 )) be the test statistic computed with the subsamLet CvMb (R b;i b ple (Zi ; :::; Zi+b 1 ) of size b; with Zi = (Yi0 ; Xi0 )0 . We note that each subsample of size b (taken without replacement from the original data) is indeed a sample of size b from the true DGP. Hence, it is clear that one can approximate the sampling distribution GCvM (w) using the n 1 ^ ) computed over the n b + 1 di¤erent subsamples of distribution of the values of CvMb (R b;i CvM size b: That is, we approximate Gn (w) by the empirical distribution GCvM n;b (w) Let cn;1

;b

be the (1

=

n

nX b+1 1 1 ^ b;i 1(CvMb (R ) b + 1 i=1

w 2 [0; 1).

w),

)-th sample quantile of GCvM n;b (w); i.e., cn;1

;b

= inffw : GCvM n;b (w)

1

g:

Thus, the subsampling test rejects the null hypothesis at % if CvMn > cn;1 ;b . The validity of the subsampling approximation was proved by Politis, Romano and Wolf (1999) for strong mixing sequences provided b ! 1 as n ! 1 and b=n ! 0. For the simulations, the size of subsamples is chosen to be b = bkn2=5 c, with b c denoting the integer part, as suggested 18

by Sakov and Bickel (2000), with k = 3, 4, and 5. We have also considered subsampling ^1 ^ 1 ; as in Chernozhukov (2002), but simulations based on the centered statistics R n 1=2 R n b;i the di¤erence with respect to the uncentered case is not signi…cant, so these simulations are not reported. In each of 1000 replications, the CvM test statistic is computed for each DGP and sample sizes n = 50, 100, and 300. The number of bootstrap replications is B = 999: The rejection probabilities (RP) for the new proposed bootstrap approximation are presented as CvMn ; with n the sample size, whereas CvMn;k denotes the ones from the subsampling-based tests based on b = bkn2=5 c subsample size. Table 1 reports the RP (in percentage points) for models 1-4. Table 1 about here The new bootstrap exhibits good size accuracy, uniformly in all nominal levels, models and for all sample sizes. It is clearly superior to the subsampling approximation. The subsampling has a special di¢ culty maintaining an accurate empirical size at 1%; overrejecting in most cases. It presents a slight underrejection for the N(0,1) model at sample sizes n = 100 and n = 300; and a large distortion (again underrejection) for the SV and t(3) models for the same sample sizes. The performance of the subsampling is substantially sensitive to the choice of b: We report in Table 2 the RP for models 5 to 8. As expected, all the tests are able to detect these alternatives and the rejection probabilities increase as n increases. The new bootstrap test is superior to the subsampling approximation in three of the alternatives considered, namely, the TAR, TM and EXPAR models, and it is comparable, or even better, to the subsampling for the SIGN model. It is remarkable that the superiority of the new method holds even uniformly in the subsample size b for most of the alternatives considered. Table 2 about here The second Monte Carlo experiment deals with a dynamic conditional median model. We entertain the model: Y1;t = 0:6Y1;t

1

+ Wt + cWt2 + u2t ;

t = 1; : : : ; n;

(21)

where Wt = 0:5Wt 1 + "t ; and where both u2t and "t are sampled independently from N (0; 1) and Y1;0 = W0 = 0. Here, the null model corresponds to c = 0: Under H0 , a linear quantile model holds with Yt = (Y1;t 1 ; Xt0 )0 ; Xt = (Y1;t 1 ; Wt )0 ; and hence the moment function for the median regression is h(Yt ; 0 ) = 1(Y1;t 0:5; with 0 = ( 0 ; 0 )0 . 0 Y1;t 1 + 0 Wt ) 19

In this context

0

is estimated by the QRE, 7 !

n X

0:5 (Y1;t

KB;n

say, which is the minimizer of

m(Xt ; ))

t=1

with respect to 2 R2 ; where (") = (1(" 0) )" and m(Xt ; ) = 00 Xt : It has been shown in the literature (see e.g. Koenker and Bassett (1978)) that under some mild regularity conditions p

1 X (0) p Xt h (Yt ; n t=1 n

n(

KB;n

0)

=

1

fut

1

n)

+ oP (1);

where = E[Xt Xt0 ]; and fut (0) is the innovation’s density. Therefore, our assumption p n( KB;n 0 ) = OP (1) holds under mild moment conditions. Note also that g (Xt ; 0 ) = fut (0)Xt : Therefore, there is no need for estimating the innovation density in our test and the computation of our test in the conditional median case can be carried out in the same way as in (20), just replacing the OLS estimator by KB;n and the regression residuals by the median residuals h(Yt ; KB;n ). We consider two sample sizes n = 100 and n = 300. Again, we choose b = kn2=5 ; but now with k = 6; 7 and 8. The new values for k are chosen such that the empirical size of the subsampling tests are close to the nominal size7 . To save space, we only report in the simulations the results at 5% nominal level. The conclusions are similar for other nominal levels. We plot in Figure 1 the rejection probabilities for our bootstrap test CvMn and the subsampling versions CvMn;k for n = 50; at 5% for model (21) and several values of c. Figure 2 replicates this plot for n = 300: Figures 1 and 2 about here For c = 0, the size performance of the bootstrap and subsampling-based tests is good for both sample sizes considered. When c 6= 0, all tests show consistency against these …xed alternatives. The larger the c; the higher is the power for all tests. The new bootstrap presents a much higher power than subsampling-based tests for all values of b considered. The di¤erence in power is particularly high for n = 50 and large values of c: The power for the subsampling tests does not depend substantially on the choice of b for this simulation experiment. These simulations have shown the excellent performance of the new bootstrap-based test, in comparison with the subsampling approach. The results for the median regression are 7

Of course, this cannot be done for real data, where we do not know the true DGP.

20

particularly illuminating, since there is substantial improvement in power with our methods. Note that for the quantile median regression subsampling is the most popular choice in the literature of speci…cation tests. Therefore, we believe that the new test can provide a more e¢ cient inference method for regression quantiles. 6. Dynamics in Stock Market Returns 6.1 Mean and Variance Dynamics in the Hong Kong Stock Market. We now apply our tests to study the dynamics in mean and variance of the Hong Seng Index (HSI) in the Hong Kong stock market. Following Koul and Ling (2006) we split the sample in four periods of two years of daily observations from June 1, 1988 to May 31, 1996. Koul and Ling (2006) proposed a goodness of …t tests for the error distribution in a class of heteroskedastic time series models. They …tted an AR(1)-GARCH(1,1) model for these data sets and tested for Gaussianity. Their test failed to reject the Gaussian distribution for the third and fourth periods, arguing that the Tian An Men Square event may be the cause for the rejection in the …rst two periods, the e¤ect gradually disappearing for the third and fourth periods. It is important to stress that their test relies on the correct speci…cation of the conditional mean and variance functions. In that sense, our test can be considered to be a useful pre-test in their framework. We check the AR(1) mean and GARCH(1,1) variance speci…cations in Table 3. That is, we entertain the model Y t = a1 Y t 2 t

=

0

+

1

+ "t ; 2 1 "t 1

"t = +

t ut ;

(22)

2 1 t 1;

and apply our CvM test, comparing with subsampling tests, to check the validity of these speci…cations. We check for correct mean speci…cation using the moment function hm (Yt ; Yt 1 ; Yt a1 Yt 1 ; with 0 = a1 estimated by OLS. For the variance speci…cation, we consider hv (Yt ; Yt 1 ; 0 ) = [(Yt a1 Yt 1 )2 = 2t ] 1; with 0 = (a1 ; 0 ; 1 ; 1 )0 estimated by QuasiMaximum Likelihood Estimation (QMLE). The necessary scores to compute our test can be found in Koul and Ling (2006). In the application we have considered Xt = (Yt 1 ; :::; Yt d ), with d = 1; 3 and 5. To save space only the results with d = 3 are reported, the outcomes with other values of d being similar. The number of bootstrap replications is B = 999; and for the subsampling test we chose b = kn2=5 ; with k from 2 to 6. Only the results for k = 3 and 5 are reported to save space. For completeness we also report the results for the joint test, using the moment h = (hm ; hv )0 : Table 3 contains the bootstrap p-values for the mean and variance speci…cations. 21

0)

=

Table 3 about here Several conclusions can be drawn from Table 3. First, the mean is correctly speci…ed, as suggested by our test and the subsampling tests. Only the second period is rejected at 10% with our test, but not with the subsampling test. Second and more importantly, the variance seems to be misspeci…ed at 5% for all periods but for the third period, as can be seen from the bootstrap p-values of the new CvM test. The joint test con…rms the results of the marginal variance test. These rejections stand in contrast with the outcomes of the subsampling tests across several values of k, thereby highlighting the high power of our test relative to subsampling methods. These rejections question the suitability of the inferences in Koul and Ling (2006) for all but the third period of the HSI, since as previously discussed, their methods relied on the correct speci…cation of the GARCH model. 6.2 Evaluating Market Risk in the S&P 500. One of the implications of the creation of the Basel Committee on Banking Supervision was the implementation of Value-at-Risk (VaR) as the standard tool for measuring market risk for banking risk monitoring. As a result of this agreement, …nancial institutions have to report their VaR and compute their capital reserves according to the outcome of a statistical test for evaluating the VaR model denominated backtesting. The computation of VaR measures and their evaluation have become then of paramount importance in risk management. In this section we use our methods to provide an alternative to standard backtesting techniques. Our methods are expected to be more powerful than traditional backtesting methods since they account for a larger information set. This higher power is con…rmed in the application below. Moreover, in the presence of estimation risk, our methods are much simpler to implement. We illustrate these points with an application to model the VaR at di¤erent quantile levels of the well-known and extensively studied S&P500 daily stock index. We consider several periods of this stock index. The …rst period spans from 2 January 1990 until 31 December 1993, the second period from 3 January 1994 until 31 December 1997 and the third from 2 January 1998 until 28 August 2002. The number of observations in each period is 1013, 1011 and 1170, respectively. Within the VaR paradigm, parametric location-scale models have been the most popular in attempting to describe the dynamics of VaR measures. In particular, Berkowitz and O’Brien (2002) compared the VaR forecasts obtained from the location-scale family (ARMAGARCH model) with the internal structural models used by commercial banks. Their main conclusion was that GARCH models generally provide lower VaRs and are better at pre22

dicting changes in volatility, thereby allowing comparable risk coverage with less regulatory capital. In this application we …t a GARCH(1,1)-VaR model to the log di¤erences of the S&P500 (Yt ) as m (Xt ; ) = c + t F" 1 ( ); (23) where t is speci…ed by a GARCH(1,1) and where F" 1 ( ) is an unconditional quantile function. We assume that this speci…cation follows from a location-scale GARCH(1,1) model with a constant location. We aim to check the conditional quantile speci…cation, and hence the moment function is h(Yt ; Xt ; ) = 1(Yt m (Xt ; )) ; and 0 is estimated by QMLE. We entertain two speci…cations for F" 1 ( ); a Gaussian and Student-t distribution, respectively. In the second case we estimated the degrees of freedom parameter by maximum likelihood: In order to monitor and assess the accuracy and quality of the di¤erent VaR techniques the Basel Accord (1996a) developed a statistical testing device that was denominated backtesting. Backtesting methods are statistical tests for the hypothesis that fh(Yt ; Xt ; 0 )gnt=1 is a sequence of iid Bernoulli variables with zero mean. Among the existing backtesting methods the most popular ones are the so-called unconditional backtest and test of independence, which reject the VaR model for large values of 1 X h(Yt ; Xt ; Kn = p n t=1 n

and

1 X Cn = p h(Yt ; Xt ; n t=1

n)

n

n )h(Yt 1 ; Xt 1 ; n )

;

respectively. To approximate the critical values of Kn and Cn we use a subsampling approximation, and denote the corresponding test statistics by Kn;k and Cn;k . In Table 4 we report the bootstrap p-values for di¤erent choices of the VaR level for the Gaussian and Student-t distributions. The simulation parameters such as bootstrap replications, subsample sizes and other parameter values are as in the mean-variance speci…cations. Table 4 about here We have several conclusions from the quantile speci…cations. First, our test suggests that a GARCH model with either Gaussian or Student-t distribution does not …t the data well. Only for the third period the GARCH model seems to be appropriate, specially with tails modelled with the Student-t distribution when = 0:01: This application shows that our test 23

has more power than competing methods such as subsampling or the popular backtesting methods; see for instance the inability of these methods in rejecting the GARCH model when = 0:05: This application suggests that the conclusions in Berkowitz and O’Brien (2002) have to be taken with caution in view of the lack of power of the traditional backtesting techniques. Overall, our simulations and applications show the superiority of our methods in terms of size accuracy and power performance over competing methods, while being simpler to implement.

7. Extensions: The Case of Nonparametric Scores This section discusses an important extension of our previous theory. There are situations where the score g (Xt ; 0 ) is not known up to the parameter 0 and depends in a nonparametric way on the true DGP. We consider in this section the case where g (Xt ; 0 ) can be estimated nonparametrically. A prominent example is quantile regression in a non-locationscale model. It can be shown that for the th conditional quantile error h(Yt ; 0 ) = 1(Y1;t m (Xt ; 0 )) ; the score process in this general case is given by g (Xt ; ) = fY =X (m (Xt ;

_ 0 ))m

(Xt ;

0 );

(24)

where fY =X (x) is the density of Y1;t given Xt evaluated at x; m (Xt ; 0 ) is the speci…ed quantile model and m _ (Xt ; 0 ) = @m (Xt ; 0 )=@ . Without further assumptions, the quantity fY =X (m (Xt ; 0 )) is unknown, but it can be estimated nonparametrically by e.g. a kernel estimator. This section extends our previous theory to cover this and other examples. This extension is highly technical and requires sophisticated empirical process methods that are of independent interest. We introduce some notation. For a given > 0 and a given family of measurable functions G; de…ne the class of score functions B = fs ( ; ) : s ( ; ) 2 G; 2

:j

0j

< g;

and the empirical process 1 X s (Xt ; ) h (Yt ; ) ; n (s; ) = p n t=1 n

indexed by l = (s; ) 2 H = B ; where = f 2 : j 0 j < g: Let B (Xt ) be the envelope function for the class B ; that is, sups2B js (Xt ; )j < B (Xt ): Then, we 24

de…ne the conditional quadratic variation of the empirical process = fHk ; 1 k N g of H ; which is de…ned as CV

;n (

) := max n 1 k N

1

n X

sup jl1 (Xt ;

E

1 ) h (Yt ; 1 )

l2 (Xt ;

n

on a …nite partition

2 2 ) h (Yt ; 2 )j

l1 ;l2 2Hk

t=1

j Ft ]

;

(25) where henceforth li = (si ; i ) 2 H ; i = 1; 2: In addition to A1 and A2, for this extension we require the following assumption. We assume B is endowed with a pseudo-metric k kB : Associated to a possibly nonparametric estimator g^ (Xt ; n ) de…ne the estimators ^ g^(x; G

n)

=n

1

n X

g^ (Xt ;

n ) 1(Xt

x)

t=1

and ^ g^ = n

1

n X

g^ (Xt ;

^0 n) g

(Xt ;

n) :

t=1

Assumption A3: For each > 0 su¢ ciently small, A3(a): E jB (Xt )j2 < 1: Furthermore, E[jB (Xt )h(Yt ; 0 )j2 ] < 1: A3(b): (B ; k kB ) is a totally bounded metric space. A3(c): For every " > 0 there exists a …nite partition " = fHk ; 1 k N" being the number of elements of such partition, such that Z1 p

log(N" )d" < 1

N" g of H ; with

(26)

0

and sup "2(0;1)\Q

CV

;n ( "2

")

= OPn (1):

(27)

A3(d): The estimator g^ (Xt ; n ) satis…es Pr (^ g (Xt ; n ) 2 B ) ! 1 as n ! 1; and it is ^ g^(x; n ) consistent for g (Xt ; 0 ), in the sense that k^ g gkB = oP (1) for all > 0: Moreover, G is consistent for G(x; 0 ); uniformly in x, and ^ g^ is consistent for : The next theorem extends Theorem 1 to the nonparametric score case. It shows that estimation of the in…nite-dimensional score function has no e¤ect on the …rst order asymptotic theory of the test statistic.

25

Theorem 3: Under Assumptions A1-A3, ^ n1 (x) sup R

x2Rdx

Rn1 (x;

0)

= oP (1);

If we assume that g (Xt ; 0 ) is a smooth function, usual examples of B are spaces of smooth functions such as Sobolev, Hölder, or Besov classes; see for instance van der Vaart and Wellner (1996, pg. 154). Under appropriate smoothness conditions we can take the partitions in A3(c) with respect to covering or bracketing numbers; see Appendix for de…nitions of these numbers. For smooth classes the entropy condition in (26) holds under mild conditions; see van der Vaart and Wellner (1996), Theorem 2.7.1. To …nish this section, let us consider the conditional quantile example. The score in (24) can be estimated by g^ (Xt ;

n)

= f^Y =X=Xt (m (Xt ;

p where n is a n-consistent estimator of (cf. Rosenblatt, 1969) f^Y =X=x (y) =

Pn

0

_ n ))m

(Xt ;

n );

and f^Y =X is the popular Rosenblatt estimator

K(x Xt =hn )k(y Yt =hn ) P ; hn nt=1 K(x Xt =hn )

t=1

(28)

where K and k are kernel functions and hn is a sequence of bandwidths converging to zero at a suitable rate. Under appropriate smoothness and boundness conditions on the joint density of Zt = (Yt ; Xt0 )0 and the kernels K and k it is not di¢ cult to show that Assumption A3 holds for this example. The uniform consistency for f^Y =X can be obtained from the results in Hansen (2008). We omitt details for the sake of space. Speci…c assumptions for the quantile example can be obtained from the author upon request.

8. Conclusions and future extensions In this paper we have proposed new bootstrap-based tests for general, possibly nonsmooth, conditional moment restrictions. We have shown that an orthogonal projection in the classical marked empirical process allows for a simple multiplier-type bootstrap approximation, and it leads to a simple and powerful testing procedure. It has the practical convenience of being free of tuning parameters such as the block-size or the subsample size, it does not need to re-estimate parameters in each bootstrap replication, and it is more powerful than subsampling approximations, as shown in the simulations. Therefore, the new 26

method should be useful for practitioners interested in testing CMR, such as Asset Pricing models, regression models or VaR models, among others. One of the most important appealing properties of the new Cramer-von Mises test is its simplicity, being a quadratic form of the generalized residuals. Our approach is geometric in nature, and therefore, it can be easily extended to more general frameworks, e.g. semiparametric CMR containing in…nite-dimensional nuisance parameters, provided the orthogonal projection into the tangent space of nuisance parameters can be constructed. Since the tangent space of a semiparametric model is substantially larger than that of a parametric model, our methods are specially well-suited for semiparametric models as they incorporate a great amount of information in the projection. Extensions to particular semiparametric models are deferred for future research. The main objective of this paper has been to propose a simple testing procedure for testing general CMR. Some extensions of the basic framework can improve the proposed test, at the cost of complicating the proofs and the implementation of the test, and obscuring the main message of the paper. Most notably, in Section 3 we discussed an extension of our basic framework to a test with better power properties than classical marked process when there is conditional heteroskedasticity of unknown form, see (17). This extension can be carried out using the methods discussed in Section 7. We have preferred to keep the exposition and the ideas as simple as possible.

Mathematical Appendix To prove Theorem 1 we need a useful lemma and further de…nitions. Let (G; k kG ) be a subset of a metric space of vector-valued functions g. The covering number N ("; G; k kG ) is the minimal number of N for which there exist "-balls ff : kf gj kG "; kgj kG < 1; j =; 1:::; N g to cover G: The covering number with bracketing N[] ("; G; k kG ) is the minimal number of N for which there exist "-brackets f[lj ; uj ] : klj uj kG "; klj kG < 1; kuj kG < 1; j =; 1:::; N g to cover G: For the de…nition of asymptotic tightness see van der Vaart and Wellner (1996). De…ne the process 1 X h Yt ; n (x; c) = p n t=1 n

indexed by (x; c) 2 R but …xed constant.

dx

0

+ cn

1=2

E[h(Yt ;

0

+ cn

CK , where CK = fc 2 Rp : jcj

27

1=2

) j Ft ] 1(Xt

x):

(29)

Kg; and K > 0 is an arbitrary

Lemma B1: Under Assumptions A1-A2, the process dx CK . respect to (x; c) 2 R

n (x; c)

is asymptotically tight with

Proof of Lemma B1: Let us de…ne the class of functions K = f h Yt ;

+ cn

0

1=2

E[h(Yt ;

0

1=2

+ cn

) j Ft ] 1(Xt

x) : (x; c) 2 R

dx

CK g:

To simplify notation write wt 1 (u)

h Yt ;

0

+ cn

1=2

E[h(Yt ;

0

+ cn

1=2

) j Ft ] 1(Xt

x);

where u = (x; c): De…ne the pseudo-metric, with s as in A1(d), FX (x2 )j1=2 + jc1

d(u1 ; u2 ) := jFX (x1 )

where hereinafter u1 = (x1 ; c1 ) and u2 = (x2 ; c2 ) belong to compactness of CK and the continuity of FX we have Z1 q

log(N[] ( ;

K ; d))d

c2 js=2 ; K

:= R

dx

CK : From the

< 1:

0

Let B = fBk ; 1 k N N[] ( ; CK ; d)g be a partition of K in -brackets Bk = [lk ; uk ]; N i.e., fBk gk=1 covers K ; lk = (x0k ; c0k )0 ; lk = (yk0 ; c0k )0 ; xk yk ; ck ck ; and d(xk ; yk ) : The quadratic variation of n (x; c) (cf. 25) is bounded by max n

1

1 k N

C max n 1 k N

+ Cn

1

n X

n X

t=1 n X 1 t=1

j

sup jwt 1 (u1 )

u1; u2 2Bk

2

2j

yk )

# 1(Xt

xk )g

0

sup 1

wt 1 (u2 )j2 j Ft

E[ sup jh(Yt ; )j2 j Ft ] f1(Xt

E[

t=1

2

E

"

2=s

;

1;

22

0

jh(Yt ;

1)

h(Yt ;

2 2 )j

j Ft ]

C OP (1); where the …rst inequality follows from the triangle inequality and the boundness and monotonicity of the indicator function. The last inequality follows from A1(d) and the proof of Theorem 1 in Escanciano (2007b). Hence, the conditions of Theorem A1 in Delgado and Escanciano (2007) hold for the partition B , and the asymptotically tightness of n is then proved.

28

Proof of Theorem 1: Without loss of generality we assume dh = 1. We write X ^ n1 (x) = p1 h (Yt ; R n t=1 n

G0n (x; :

n) n

= Rn (x;

n ) 1(Xt

x)

1 X p g(Xt ; n t=1

!

n

1

G0n (x;

n)

n) n

1

n )h (Yt ; n )

Ig;n ;

where Ig;n is implicitly de…ned. Notice that the uniform law of large numbers (ULLN) of Jennrich (1969, Theorem 2) coupled with the Ergodic Theorem, see e.g. Dehling and Philipp (2002), implies that sup jGn (x;

n)

Gn (x;

0 )j

1

and

= oP (1)

x2Rdx

Note that A2 yields that for each x 2 R E j where

n (x; c)

dx

1

n

= oP (1):

(30)

and each c 2 Rp 2 n (x; 0)j

n (x; c)

= o(1);

is de…ned in (29). Thus, by Lemma B1, for any K > 0 sup x2Rdx ;jcj K

j

n (x; c)

n (x; 0)j

= oP (1):

The latter display and (30) imply the following uniform representation for any K > 0; sup Taking c =

p

Rn x;

jcj

K;x2Rdx

n(

n

0)

0

+ cn

1=2

Rn (x;

G0 (x;

0)

0 )c

= oP (1):

and by A2 we conclude

sup Rn (x;

n)

Rn (x;

0)

G0 (x;

p

0)

n(

n

0)

= oP (1):

(31)

n

0)

= oP (1):

(32)

x2Rdx

Similarly by an application of Lemma B1 and A1(d) 1 X p g(Xt ; n t=1

!

n

Ig;n

0 )h (Yt ; 0 )

29

p

n(

Therefore, (31) and (32) jointly imply that, uniformly in x; ^ n1 (x) = Rn (x; R 0

G (x; = Rn (x;

0)

+ G0 (x;

0)

0)

p

0)

n(

n X

1 p g(Xt ; n t=1

1

0

G (x;

0 )h (Yt ; 0 )

1 X p g(Xt ; n t=1

p

+

n(

!

0)

n

!

n

1

0)

0)

n

0 )h (Yt ; 0 )

+ oP (1)

+ oP (1);

which completes the proof of Theorem 1. Proof of Corollary 1: Part (i) follows from the expansion Theorem 1 and the joint weak convergence of the process 0 P 1 n h (Yt ; 0 ) 1(Xt x) 1 B C p @ t=1 A; n P n g(Xt ; 0 )h (Yt ; 0 ) t=1

which follows from our Lemma B1 and the Central Limit Theorem (CLT) for stationary and ergodic martingale di¤erence sequences, cf. Billingsley (1961). Part (ii) follows again from Theorem 1, and the uniform version of the Ergodic Theorem. Remark that we say that the bootstrap statistic n converges in probability a.s. to n if for all > 0; Pr (j n j fYt ; Xt gnt=1 ) ! 0 a.s.; which is expressed as n = n + oP (1) nj a.s. Also, bootstrap expectations are denoted by E ( n ) = E ( n j fYt ; Xt gnt=1 ) : Proof of Theorem 2: Following the arguments of the proof of Theorem 1, it can easily proved that n 1 X ? 1 ^ sup Rn (x) p 1 (Xt x)Vt h(Yt ; 0 ) = oP (1) a:s:; n t=1 x2Rdx where recall 1? (Xt

x) := f1(Xt

G0 ( ;

x)Ih

1 X ? =p 1 (Xt n t=1

0)

1

g(Xt ;

0 )g:

De…ne

n

Rn1 (x)

x)Vt h(Yt ;

0 ):

At this point the proof follows standard arguments, see e.g. Delgado et al. (2006). We need to show that (i) the process Rn1 (conditionally on the sample) has the same asymptotic ^ 1 ; and that (ii) R1 is asymptotically tight, …nite dimensional distributions that the process R n n both with probability one. Part (i) follows from the Lindeberg-Levy CLT, whereas part (ii) follows from our Lemma B1, see also Theorem 2.9.2 in van der Vaart and Wellner (1996). 30

Proof of Theorem 3: The proof follows the same steps as Theorem 1. We only need to prove (32). By A3(c) and Theorem A1 in Delgado and Escanciano (2007) the empirical process n 1 X g (Xt ; ) fh (Yt ; ) E[h(Yt ; ) j Ft ]g; n (g; ) = p n t=1 converges weakly to a Gaussian process with zero mean, indexed by l = (g; ) 2 H = B ; where = f 2 : j 0 j < g: By Theorem 1.5.7 in van der Vaart and Wellner (1996) and by A3(b), the process n (g; ) is equicontinuous, for the semimetric g (Xt ; n ) 2 B g and ((g1 ; 1 ); (g2 ; 2 )) = kg1 g2 kB + j 1 2 j : De…ne the sets An = f^ Bn = fk^ g g2 kB < g: Hence, for any " > 0 and > 0; we can choose a su¢ ciently small > 0 such that Pr (j

g; n) n (^

n (g; n )j

> ") = Pr (j Pr

Suggest Documents