Consistency of model-based bootstrap methods for degenerate U - and V -type statistics Anne Leucht Friedrich-Schiller-Universit¨at Jena Institut f¨ ur Stochastik Ernst-Abbe-Platz 2 D-07743 Jena Germany E-mail:
[email protected]
Michael H. Neumann Friedrich-Schiller-Universit¨at Jena Institut f¨ ur Stochastik Ernst-Abbe-Platz 2 D-07743 Jena Germany E-mail:
[email protected]
Abstract We provide general results on the consistency of certain bootstrap methods applied to degree-2 degenerate statistics of U - and V -type. While it follows from well known results that the original statistic converges in distribution to a weighted sum of centered chisquared random variables, a direct proof for the bootstrap counterpart can be difficult for several reasons. We employ simple coupling arguments to show that the bootstrap statistic converges to the same limit as the original one.
2000 Mathematics Subject Classification. Primary 62G09, 62F40; secondary 62G20, 62F05. Keywords and Phrases. Bootstrap, consistency, U -statistics, V -statistics. Short title. Bootstrap consistency for U -and V -statistics.
1
1. Introduction The bootstrap is a well established universal tool for approximating the distribution of any statistic of interest. Sometimes its asymptotic validity for a particular purpose can be inferred from a general result, however, quite often its consistency is checked in a case-by-case manner. Available results of general type include those by Bickel and Freedman (1981) for Efron’s bootstrap applied to linear and related statistics, Arcones and Gin´e (1992) for Efron’s bootstrap in connection with U - and V -statistics, and Stute, Gonz´ales Manteiga and Presedo Quindimil (1993) who showed that the bootstrap version of the empirical process with estimated parameters converges to the same limit as the original empirical process. In this paper we derive quite general results on the asymptotic validity for both model-based and model-independent bootstrap methods applied to statistics of degenerate U - and V -type. Statistics of this type often emerge from goodness-of-fit tests; see for example de Wet (1987, Sect. 2). Under usual assumptions, the limit distribution of such a U -statistic will be that of a weighted sum of independent and centered χ2 random variables with 1 degree of freedom. In a few cases this distribution is actually known; see for example Darling (1955, Sect. 7) and Moore and Spruill (1975). In an overwhelming number of cases, however, the weights in the limit random variable depend on an unknown parameter in a complicated way. Then the bootstrap offers a convenient and perhaps unrivalled way of determing critical values for tests. We adapt a method of proof originally proposed by Bickel and Freedman (1981), for proving consistency of Efron’s bootstrap for statistics as the sample mean or related quantities, and modified by Dehling and Mikosch (1994), for Efron’s bootstrap in connection with degenerate U -statistics of i.i.d. real-valued random variables. Bickel and Freedman employed the fact that Mallows’ distance between the distribution of the sample mean and the distribution of its bootstrap counterpart can be estimated from above by Mallows’ distance between the sample and the bootstrap distribution. Therefore, it was not necessary to re-derive the asymptotic distribution of the statistic of interest on the bootstrap side; rather, it sufficed to check convergence of the respective distributions at the level of individual random variables. In the case of U or V -statistics, one cannot simply copy this scheme of proof since the summands in the statistic of interest are not independent in general. Dehling and Mikosch (1994) have shown, however, that a simple coupling of the underlying random variables can be used for showing that a degenerate U -statistic and its bootstrap counterpart converge to the same limit. We extend this idea to more general bootstrap schemes and to the case of degenerate U - and V -statistics with kernels which may depend on some parameter that has to be estimated. We note that an attempt to achieve general results of this type has already been made by Jim´enez-Gamero, Mu˜ noz-Garc´ıa and Pino-Mej´ıas (2003). Apart from the fact that we are not convinced that all technical details in their paper are correct, we think that the assumptions for our general results are easier to verify in many instances. For the particular case of goodness-of-fit tests based on an L2 -distance, Fan (1998) derived the limit distribution of such test statistics via U -statistics arguments; however, on the bootstrap side he dismissed this possibility and employed empirical process arguments. While Jim´enez-Garc´ıa et al. (2003) and Fan (1998) imposed some regularity conditions on the parametric family of random variables involved, we try to avoid such conditions since they can hardly be checked in cases where these densities do not have a closed form; see for example our application in Subsection 3.1 below. We also note that in cases where the U or V -statistic emerges from a Cram´er-von Mises test one can alternatively use the
2
stochastic process approach in conjunction with the continuous mapping theorem for showing consistency of the bootstrap; see for example Stute et al. (1993). Sometimes, however, this way turns out to be rather cumbersome and our approach then offers a simple and easily applicable alternative. Our paper is organized as follows. In Section 2 we derive general results for the validity of model-based bootstrap schemes applied to U - and V -statistics. These results are used in Section 3 for devising a goodness-of-fit test based on the empirical characteristic function and its model-based estimate. In this particular case, there do not exist closed-form expressions for the densities of the observations and our approach seems to be actually easier than competing ones. We also revisit a twosample test originally proposed by Anderson, Hall and Titterington (1994), for which a proof that the bootstrap quantity converges to the same limit as the original test statistic was lacking. A slight modification of the approach from Section 2 gives actually a simple proof of consistency of the bootstrap in this case. All proofs are deferred to a final Section 4. 2. Consistency of model-based bootstrap methods for U - and V -statistics Throughout this section we make the following assumption: (A1)(i) (Xn )n∈N is a sequence of independent, identically distributed Rd -valued random variables, defined on a probability space (Ω, A, P ), with common distribution function Fθ , where θ ∈ Θ ⊆ Rp . (ii) The kernel h(·, ·; θ)R is symmetric in the first two arguments and degenerate d under R F2θ , that is, h(x, y; θ) dFθ (x) = 0 ∀y ∈ R . (iii) 0 < h (x, y; θ) dFθ (x) dFθ (y) < ∞. It is well known (see, for example, Serfling (1980, p. 194) or Lee (1990, Theorem 3.2.2.1, p. 90) that under (A1) the following result holds true: Un =
∞ n X X 1X P λν (Zν2 − 1), h(Xj , Xk ; θ) −→ Z := n j=1 k6=j ν=1
(2.1)
where Z1 , Z2 , . . . are independent standard normal random variables and the λν are the eigenvalues of the integral equation Z
h(x, y; θ)g(y) dFθ (y) = λg(x).
2 2 Furthermore, since ∞ ν=1 λν = Eθ h (X1 , X2 ; θ) < ∞ (see Serfling (1980, p. 197)) the infinite sum on the right-hand side of (2.1) actually converges in L2 . We intend to derive simple criteria for the consistency of model-based bootstrap versions of Un . When doing so, we have to take into account that the distribution of the bootstrap random variables X1∗ , . . . , Xn∗ is random, typically converging to that of the original random variables X1 , . . . , Xn . To give a clear description of our basic idea, we consider first the simpler situation where the distribution of Un = Un (X1 , . . . , Xn ; θ) has to be compared with that of
P
Unn =
n X 1X h(Xnj , Xnk ; θn ). n j=1 k6=j
3
These random variables should imitate that what usually happens with the bootstrap in probability, that is, we will consider the case that Xn1 , . . . , Xnn are independent with a distribution converging to that of X1 . We make the following assumption: (A2)(i) (Xnj )j=1,... ,n , n ∈ N, is a triangular scheme of random variables defined on respective probability spaces (Ωn , An , Pn ), where Xn1 , . . . , Xnn are independent with common distribution function Fn . Furthermore, it holds that Fn =⇒ F and θn n→∞ −→ θ. R d (ii) h : R × Rd × Rp −→ R is a continuous function and h(x, y; θn ) dFn (x) = 0 ∀y ∈ Rd , that is, the kernel h(·, ·; θn ) is degenerate under Fn . (iii) The sequence of random variables (h2 (Xn1 , Xn2 ; θn ))n∈N is uniformly integrable. Lemma 2.1. Suppose that (A1) and (A2) are fulfilled. Then, as n → ∞, d
Unn −→ Z, where the random variable Z is defined in equation (2.1) above. Moreover, sup −∞ M ) | X1 , . . . , Xn ) > ) −→ 0. n→∞
Theorem 2.1. Suppose that (A1) and (A2∗ ) are fulfilled. Then, as n → ∞, d
Un∗ −→ Z
in probability,
4
where Z is defined in (2.1) above. Moreover, sup −∞ 0, δ > 0, µ > − √ 2 . α − β2
(
0
Θ =
The test problem can be formulated in terms of the characteristic function c of the increments Xj as H0 : c = c(·; θ) for some θ ∈ Θ
against
H1 : c 6= c(·; θ) for all θ ∈ Θ.
We consider the following test statistic: Tn = n −1
Z
∞
−∞
2 cbn (t) − c(t; θbn ) w(t) dt,
Pn
where cbn (t) = n j=1 exp(itXj ) is the empirical characteristic function of the increments, θbn is some estimator of θ and w : R −→ [0, ∞) some weight function. For the latter we assume that (A5) The function w is symmetric around 0, vanishes only on a set of Lebesgue R measure zero and satisfies (1 + |t|)4 w(t) dt < ∞. For definiteness, we restrict our attention to a method-of-moments estimator θbn of θ. Under the null hypothesis, the distribution of the increments Xj has the following characteristics (see, for example Schoutens (2005)): • mean: • variance: • skewness: • excess:
δβ , α2 −β 2 2 α δ , (α2 −β 2 )3/2 3β 1/2 2 2 1/4 , αδ (α −β )
µ+
3 1+
2 + 4β 2 α√
δα2
α2 −β 2
.
b n , βbn , µ b n , δbn )0 by equating the first four theoretical Accordingly, we obtain θbn = (α moments with their empirical counterparts. It can be shown that θ is a twice differentiable function of the first four theoretical moments and vice versa that these
7
moments are continuous functions of θ. Similar results are obtained for θbn applying the empirical moments instead. Thus, by Taylor expansion, there exists some function g : R4 × R −→ R4 such that n 1X b θn − θ = g(θ; Xj ) + oP (n−1/2 ) (3.2) n j=1 and Eθ g(θ; Xj ) = 0, Eθ kg(θ; Xj )k2l2 < ∞. This implies in particular that θbn − θ = OP (n−1/2 ).
(3.3)
Since c is twice differentiable with respect to θ a Taylor series expansion leads to Tn = n
Z
∞
−∞
2 1 ¯ θbn − θ) w(t) dt, cbn (t) − c(t; θ) − Dc(t; θ)(θbn − θ) − (θbn − θ)0 D 2 c(t; θ)(
2
¯ between θbn and θ, where Dc and D2 c denote the gradient (row) vector for some θ(t) and the Hessian matrix of c with respect to θ, respectively. Since assumption (A5) 2 ¯ ensures the w-integrability of kD2 c(t; θ(t))k l2 we obtain by (3.3) that Tn = In + oP (1), 2 R ∞ where In = n −∞ cbn (t) − c(t; θ) − Dc(t; θ)(θbn − θ) w(t) dt. Furthermore, (3.2) al-
lows us to approximate further: Tn = In0 + oP (1),
2
∞ −1/2 n n where In0 = −∞ j=1 {exp(itXj ) − c(t; θ) − Dc(t; θ)g(θ; Xj )} w(t) dt. The statistic In0 is a degenerate V -statistic with a symmetric kernel R
h(x, y; θ) =
P
Z
∞
{exp(itx) − c(t; θ) − Dc(t; θ)g(θ; x)}
−∞
× {exp(−ity) − c(−t; θ) − Dc(−t; θ)g(θ; y)} w(t) dt. Moreover, we have that 0 < Eθ h2 (X1 , X2 ; θ) < ∞ and Eθ |h(X1 , X1 ; θ)| < ∞. Hence, assumptions (A1) and (A3) from Section 2 are fulfilled and we obtain from (2.2) the following proposition. Proposition 3.1. Suppose that X1 , . . . , Xn are the increments of a NIG process with parameter θ, that is, they are independent with a common characteristic function c(·; θ). Furthermore, the weight function w is chosen such that (A5) is fulfilled. Then d
Tn −→
∞ X
λν (Zν2 − 1) + Eθ h(X1 , X1 ; θ),
ν=1
where Z1 , Z2 , . . . are independent standard normal random variables and the λν are the eigenvalues of the equation Eθ [h(x, X1 ; θ)g(X1 )] = λg(x).
8
To implement a test which has asymptotically a prescribed size α, we still have to determine an appropriate critical value. This can hardly be done on the basis of the asymptotic result in Proposition 3.1 alone since the limit distribution depends on the eigenvalues λν which in turn depend on the true parameter θ in a complicated way. Therefore, we propose the following bootstrap procedure: (1) Given X1 , . . . , Xn , estimate θ by θbn and generate a sample X1∗ , . . . , Xn∗ with a common density fN IG (·; θbn ). According to Barndorff-Nielsen (1997, p. 2), a random variable X with density fN IG (·; α, β, µ, δ) can be generated by drawing first a random √variable Z with an inverse Gaussian distribution with parameters δ and α2 − β 2 , and then, conditioned on Z = z, drawing X ∼ N (µ + βz, z). See also Schoutens (2005, p. 111 f.) for simulating an inverse Gaussian distributed random variable. The bootstrap sample conditioned on X1 , . . . , Xn satisfies H0 with θbn instead of θ. on the bootstrap sample (2) Define a bootstrap counterpart θbn∗ to θbn which is based P by the method of moments and set cb∗n (t) = n−1 nj=1 exp(itXj∗ ). Then Tn∗ = n
Z
∞
2 ∗ cbn (t) − c(t; θbn∗ ) w(t) dt
−∞
is the bootstrap version of our test statistic Tn . (3) Define the critical value t∗α as the (1 − α)-quantile of the (conditional) distribution of Tn∗ . (In practice, steps (1) and (2) will be repeated B times, for some large B. The critical value will then be approximated by the (1 − α)-quantile ∗ ∗ .) , . . . , TnB of the empirical distribution associated with Tn1 To justify this approach, we will briefly argue that the conditional distribution of Tn∗ given X1 , . . . , Xn converges (this time even Pθ -almost surely) to the same limit as that of Tn . First, it follows from the strong law of large numbers that P −a.s.
θ θbn −→ θ.
(3.4)
Analogously to (3.2), we obtain that n 1X θbn∗ − θbn = g(θbn ; Xj∗ ) + Rn∗ , n j=1
(3.5)
where
P |Rn∗ | > n−1/2 | X1 , . . . , Xn
Pθ −a.s.
−→ 0
∀ > 0.
(3.4) and (3.5) yield that
P |Tn∗ − In0∗ | > | X1 , . . . , Xn where In0∗ = n−1
Pn
j,k=1
Pθ −a.s.
−→ 0
∀ > 0,
h(Xj∗ , Xk∗ ; θbn ). It follows from the construction that
E h(X1∗ , y; θbn ) | X1 , . . . , Xn = 0
∀y ∈ R.
And finally, we obtain from (3.4) that the random variables (h2 (X1∗ , X2∗ ; θbn ))n∈N and (h(X1∗ , X1∗ ; θbn ))n∈N are Pθ -almost surely uniformly integrable. To summarize we have verified that the assumption (A1), (A2∗ ), (A3), and (A4∗ ) are fulfilled, even Pθ -almost surely rather than in probability. Hence, we obtain from Theorem 2.2 the following assertion.
9
Proposition 3.2. Suppose that the assumptions of Proposition 3.1 are fulfilled. Then P −a.s.
sup −∞ 0 which means in particular that the limit distribution is continuous. Although the limit distribution of the test statistic is now identified, it seems to be rather difficult to determine an appropriate critical value from (3.6) and (3.7). The main obstacle here is that the eigenvalues λν depend on the unknown distribution function F in a complicated way. Anderson et al. (1994) proposed to obtain it from a (1)∗ (2)∗ simple bootstrap procedure where bootstrap variates X1 , . . . , Xn(1)∗ , X1 , . . . , Xn(2)∗ 1 2 (1) , are independently drawn with replacement from the pooled sample X = {X1 , . . . , Xn(1) 1 (2) (2) X1 , . . . , Xn2 }. We add here a simple proof of the consistency of this method, which was lacking in Anderson et al. (1994). As in Section 2, we consider first the simple situation where the distribution of Tn = Tn (X1 , . . . , Xn ; F ) has to be compared with that of Tnn = n
Z
Rd
2
fbn1 (x) − fbn2 (x)
(i)
dx,
(i)
ni where fbni (x) = n−1 i j=1 K(x − Xnj ), i = 1, 2, and (Xnj )j=1,... ,ni , n ∈ N, are triangular schemes of row-wise independent random variables. These random variables will again imitate that what happens with the bootstrap, this time even P -almost surely. We will assume that
P
(1)
(2)
(A7) (Xnj )j=1,... ,n1 and (Xnj )j=1,... ,n2 are triangular schemes of random variables (1) (1) defined on respective probability spaces (Ωn , An , Pn ), where Xn1 , . . . , Xnn , 1 (2) (2) Xn1 , . . . , Xnn2 are independent with a common distribution function Fn , Fn =⇒ F , as n → ∞. Lemma 3.1. Suppose that (A6) and (A7) are fulfilled. Then, as n → ∞, Tnn
∞ X
1 1 d − EF Tn −→ λν √ Z1ν + √ Z2ν ρ 1−ρ ν=1
!2
−
! 1 1 + , ρ 1−ρ
where the constants λν and the random variable Ziν are defined right after equation (3.7) above. Moreover, sup −∞