Approximated inference for the quantile function via Dirichlet processes

0 downloads 0 Views 156KB Size Report
where B(IR) is the Borel class on IR, and δx(·) is the Dirac measure concentrated ... µ(∞) + Sn. Dn(x). (4). The following well-known representation formula for ..... Page 12 ..... Bahadur, R.R. (1966) A note on quantiles in large samples, Ann. Math. ... Nonparametric and Semiparametric Bayesian Statistics, D. Dey, P. Müller ...
METRON - International Journal of Statistics 2004, vol. LXII, n. 2, pp. 201-222

PIER LUIGI CONTI

Approximated inference for the quantile function via Dirichlet processes

Summary - In this paper, a nonparametric Bayesian analysis of quantiles is performed. The prior law of the population distribution function is assumed to be a Dirichlet process. After deriving the posterior law of quantiles, conditions for the existence of its moments are studied. Then, the posterior law of the quantile process is shown to converge weakly to a Gaussian process, as the sample size increases. This result is proved by resorting to an appropriate almost sure representation. Related results for bootstrap approximations of the (posterior) quantile process are also given, and applications to confidence bands for the quantile function are provided. Finally, extensions to mixtures of Dirichlet processes are briefly discussed. Key Words - Quantile function; Dirichlet process; Bernstein-von Mises theorem.

1. Introduction The problem of estimating quantiles is of considerable importance in nonparametric statistics. This paper deals with the problem of estimating the quantiles of a population from a Bayesian point of view. A few results on the estimations of quantiles, in a Bayesian nonparametric framework, are in Ferguson (1973). As in Ferguson’s paper, we adopt here a nonparametric approach based on a Dirichlet prior law. The results obtained will be then generalized to mixtures of Dirichlet processes. More precisely, let (X n ; n ≥ 1) be a sequence of real-valued random variables (r.v.s), independent and identically distributed (i.i.d.) conditionally on F, where F(x) = P(X i ≤ x|F), with x lying on the real line IR. As a prior law for F, we take a Dirichlet process, with shape measure α(·). For every −∞ = x0 < x1 < . . . < xk < xk+1 = +∞, and for every k ≥ 1, the joint distribution of (F(x1 ) − F(x0 ), F(x2 ) − F(x1 ), . . . , F(xk+1 ) − F(xk )) Received February 2004 and revised May 2004.

202

PIER LUIGI CONTI

is a Dirichlet distribution with density function f (y1 , . . . , yk+1 ) =

(a) k+1 

k+1 

(a(π(x j ) − π(x j−1 ))

a(π(x j )−π(x j−1 ))

yj

,

j=1

j=1

yj ≥ 0,

k+1 

yj = 1

j=1

where a = α(IR), and π(x) = a −1 α((−∞, x]) for every real x. Conditionally on the observed sample data Xn = (X 1 , . . . , X n ), the posterior law of F is still a Dirichlet process, with shape measure αn (B) =

n 1 n  δ X (B), B ∈ B(IR) α(B) + n+a n + a i=1 i

where B(IR) is the Borel class on IR, and δx (·) is the Dirac measure concentrated on x. The posterior mean of F is equal to: F n (x) = E[F(x)|Xn ] = where Fn (x) =

a n π(x) + Fn (x) n+a n+a

(1)

n 1 I(−∞,x] (X i ) n i=1

is the empirical distribution function (e.d.f.), and I A (x) is the indicator of the set A. For every 0 < y < 1, define the quantile of order y (yth quantile, for short) of F as (2) Q(y) = F −1 (y) = inf{x : F(x) ≥ y} . As well-known (and easy to verify, as well) the relationship F(x) ≥ y ⇐⇒ F −1 (y) ≤ x

(3)

holds. For a fixed 0 < y < 1, the exact posterior law of the yth quantile, F −1 (y), can be easily evaluated. In fact, from (3), it is immediate to to show that: P(F −1 (y) ≤ x|Xn ) = P(F(x) ≥ y|Xn )  1 1 = t (n+a) F n (x) (1 − t)(n+a)(1− F n (x)) dt B((n + a) Fn (x), (n + a)(1 − Fn (x)) y

Approximated inference for the quantile function via Dirichlet processes

where B(α, β) =

 0

1

203

t α−1 (1 − t)β−1 dt

is the usual Beta function. From now on, we will denote by X n:1 ≤ . . . ≤ X n:n the ordered sample observations. Consider three independent stochastic process defined as follows: a. a Gamma process (µ(x); x ∈ IR) with shape measure α(·) (i.e. an independent increments stochastic process where µ(x) possesses a Gamma distribution Ga(α(x), 1)); b. a sequence (Yn ; n ≥ 1) of i.i.d. r.v.s with exponential E x(1) distribution; c. a sequence (Un ; n ≥ 1) a sequence of i.i.d. r.v.s with uniform U n(0, 1) distribution and let Sn be equal to Y1 + . . . + Yn , n ≥ 1, and Dn (·) be a distribution function that gives masses Un−1: j − Un−1: j−1 to the points X n: j , j = 1, . . . , n, respectively (Un−1:0 = 0, Un−1:n = 1). Define the random function: Fn∗ (x) =

µ(x) Sn + Dn (x) . µ(∞) + Sn µ(∞) + Sn

(4)

The following well-known representation formula for the posterior Dirichlet process (see, for instance: Lo (1987)): {F(x); −∞ < x < +∞} = {Fn∗ (x); −∞ < x < +∞} d

(5)

d

holds true, where the symbol = means “equality in distribution”. 2. Posterior mean of quantiles The main goal of this section is to study the posterior mean of quantiles, E[Q(y)|Xn ]. Our first result concerns the existence of the moments of the posterior law of quantiles Proposition 1. Let r be a positive integer. If 

+∞

−∞

|x|r dπ(x) < ∞

then E[|Q(y)|r |Xn ] < ∞ for every 0 < y < 1. Proof. See Appendix.

(6)

204

PIER LUIGI CONTI

In

particular, from (6) it follows that posterior mean of Q(y) exists provided that

2 xdπ(x) < ∞, and that the posterior variance V [Q(y)|Xn ] is finite if x dπ(x) < ∞. The exact computation of the posterior mean E[Q(y)|Xn ] is difficult in many cases. For this reason, it is of interest to study some approximations. In the sequel, we consider two different approximations of E[Q(y)|Xn ]. The starting point consists in using the formula (obtained by a little re-elaboration of (21.9) in: Billingsley (1995)) E[Q(y)|Xn ] = − =−



0

−∞  0 −∞

P(Q(y) ≤ x|Xn )d x + P(F(x) ≥ y|Xn )d x +





0  ∞ 0

P(Q(y) > x|Xn )d x (7) P(F(x) < y|Xn )d x.

Let xn:h be the unique (ordered) sample observation such that xn:h ≥ 0, xn:h−1 < 0. We then have 



0



P(F(x) < y|Xn )d x =

xn:h

0

+ +

P(F(x) < y|Xn )d x

n−1  

xn: j+1

j=h xn: j  ∞ xn:n

P(F(x) < y|Xn )d x

(8)

P(F(x) < y|Xn )d x

and similarly 

0

−∞



P(F(x) ≥ y|Xn )d x =

xn:1

P(F(x) ≥ y|Xn )d x

−∞ h−2  xn: j+1 

+

j=1

+



xn: j

0 xn:h−1

P(F(x) ≥ y|Xn )d x

(9)

P(F(x) ≥ y|Xn )d x.

Using the inequalities F(xn: j ) ≤ F(x) ≤ F(xn: j+1 ); xn: j ≤ x ≤ xn: j+1 , j = 1, . . . , n − 1 we have the following approximations for the integrals that appear in (8), (9): 

xn: j+1 xn: j

P(F(x) < y|Xn )d x

P(F(xn: j+1 ) < y|Xn ) + P(F(xn: j ) < y|Xn ) ≈ (xn: j+1 − xn: j ) 2

(10)

Approximated inference for the quantile function via Dirichlet processes



xn:h

0

P(F(x) < y|Xn )d x

≈ xn:h 

xn: j+1 xn: j

P(F(xn:h ) < y|Xn ) + P(F(0) < y|Xn ) 2

(11)

P(F(x) ≥ y|Xn )d x

P(F(xn: j+1 ) ≥ y|Xn ) + P(F(xn: j ) ≥ y|Xn ) ≈ (xn: j+1 − xn: j ) 2 

205

0 xn:h−1

(12)

P(F(x) ≥ y|Xn )d x

P(F(xn:h ) ≥ y|Xn ) + P(F(0) ≥ y|Xn ) ≈ −xn:h−1 2 



P(F(x) < y|Xn )d x ≈ 0,

x  n:n xn:1 −∞

(13)

(14) P(F(x) ≥ y|Xn )d x ≈ 0.

Taking: P(F(xn:n ) < y|Xn ) + P(F(xn:n−1 ) < y|Xn ) 2 P(F(xn: j−1 ) < y|Xn ) − P(F(xn: j+1 ) < y|Xn ) wn, j = , 2 j = 2, . . . , n − 1, j = h − 1, h P(F(0) < y|Xn ) − P(F(xn:h+1 ) < y|Xn ) wn,h = 2 P(F(xn:h−2 ) < y|Xn ) − P(F(0) < y|Xn ) wn,h−1 = 2 2 − P(F(xn:2 ) < y|Xn ) − P(F(xn:1 ) < y|Xn ) wn,1 = 2 on the basis of (7) - (14) we obtain the following approximation for the posterior expectation of Q(y) via a linear combination of order statistics: wn,n =

E[Q(y)|Xn ] ≈

n 

wn, j xn: j .

(15)

j=1

To have a rough idea of the quality of approximation (15), observe first that, as it appears from the proof of Proposition 1, 

xn:1

−∞

−1

P(F(x) ≥ y|Xn )d x = O(n ),



∞ xn:n

P(F(x) < y|Xn )d x = O(n −1 ).

206

PIER LUIGI CONTI

Furthermore, the differences X n: j+1 − X n: j are typically O(n −1 ) (as a consequence of Lemma 1), while (P(F(x) ≥ y|Xn ) − (P(F(xn: j+1 ) ≥ y|Xn ) + P(F(xn: j ) ≥ y|Xn ))/2) are o(1) on every interval (xn: j+1 − xn: j ). Hence, we may write E[F −1 (y)|Xn ] =

n 

wn, j xn: j + O(n −1 ) .

(16)

j=1

A second approximation of the posterior expectation of Q(y) = F −1 (y) is obtained by taking n (y) = F −1 (y) = inf{x : F n (x) ≥ y} . E[Q(y)|Xn ] ≈ Q n

(17)

From the results in the subsequent sections (see, in particular Proposition 3), n (y) + o(n −1/2 ). This result suggests it could be argued that E[Q(y)|Xn ] = Q that, at least when the sample size is moderate, approximation (15) should work better than (17).

3. Large-sample behaviour of the posterior quantile process The goal of this section is to study the limiting behaviour of the posterior probability law of quantiles. In particular, we will prove a Bernstein-von Mises type theorem for the posterior law of quantiles, when the prior is a Dirichlet process. From a theoretical point of view, a study of the behaviour of the posterior distribution is of interest because in Bayesian nonparametrics there are many cases where the asymptotic normality of the posterior (i.e. Bernstein-von Mises theorem) does not hold. See, for instance, the papers by Cox (1993), Freedman (1999) for negative results, and by Kim and Lee (2001) for positive results. From a practical point of view, on the other hand, Bernstein-von Mises theorem provides a rationale basis to approximate the posterior law of quantities of interest, and to construct Bayesian confidence regions. As a by-product of our results, we will also prove the consistency of the posterior distribution of quantiles. The importance of consistency as a “validation” of Bayesian nonparametric procedures is stressed, for instance in Wasserman (1998) and Ghosal et al. (1999). Instead of a single quantile, we consider in the sequel the whole quantile process (Q(y); 0 < y < 1). Our main result (Proposition 3) is that the posterior its law, when properly normed, converges weakly to a Brownian bridge, i.e. to a Gaussian stochastic process (B(y); 0 < y < 1) with mean function E[B(y)] = 0 and covariance kernel E[B(y)B(t)] = min(t, y) − t y. The first assumption we make is the existence of a “true population” generating the observed data.

Approximated inference for the quantile function via Dirichlet processes

207

A1. There exists a “true” (population) d.f. F0 , such that (X n ; n ≥ 1) are i.i.d. with common d.f. F0 . This essentially means that the sequence of observables (X n ; n ≥ 1) lives on the probability space (IR∞ , B(IR)∞ , P0∞ ), with  P0 (B) =

B

d F0 (x), B ∈ B(IR)

and P0∞ is the product measure generated by P0 . A2. If −∞ ≤ l = inf{x : F0 (x) > 0}, u = sup{x : F0 (x) < 1}, then the support of π contains (l, u). Let D[−∞, +∞] be the set of real functions right-continuous with left hand limits in [−∞, +∞]. As well known (Lo√(1983)), under conditions A1, A2, as n goes to infinity, the posterior law of ( n(F(x) − F n (x)); −∞ < x < +∞) converges weakly, in D[−∞, +∞] equipped with the Skorokhod topology, to a centered Gaussian process with covariance kernel min(F0 (x), F0 (y))− F0 (x)F0 (y). Of course, such a limiting process can be represented as B(F0 (x)), where B(·) is a Brownian bridge. Furthermore, the convergence takes place with P0∞ -probability one, i.e. for “almost all” sequences of sample data. To develop large sample approximations of the process (Q(y); 0 < y < 1), we make the following further assumptions (where Q 0 (y) = F0−1 (y) denotes the “true” population yth quantile). A3. The d.f. F0 is twice differentiable on (l, u), with F0 (x) = f 0 (x) > 0 for every x ∈ (l, u). A4. There is a constant c > 0 such that: sup F0 (x)(1 − F0 (x)) x∈(l,u)

f 0  (x) f 0  (Q 0 (y)) = sup y(1 − y) ≤ c. f 0 (x)2 0

Suggest Documents