Exact adaptive pointwise estimation on Sobolev classes of ... - CiteSeerX

0 downloads 0 Views 303KB Size Report
Keywords and phrases: Density estimation, exact asymptotics, pointwise risk, ... Lepski and Levit [26] gave exact adaptive results in pointwise estimation over a.
ESAIM: Probability and Statistics

May 2001, Vol. 5, 1–31

URL: http://www.emath.fr/ps/

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

Cristina Butucea 1 Abstract. The subject of this paper is to estimate adaptively the common probability density of n independent, identically distributed random variables. The estimation is done at a fixed point x0 ∈ R, over the density functions that belong to the Sobolev class Wn (β, L). We consider the adaptive problem setup, where the regularity parameter β is unknown and varies in a given set Bn . A sharp adaptive estimator is obtained, and the explicit asymptotical constant, associated to its rate of convergence is found. Mathematics Subject Classification. 62N01, 62N02, 62G20. Received November 15, 2000. Revised April 9, 2001.

1. Introduction Consider n independent, identically distributed random variables X1 , . . . , Xn , having common unknown probability density f : R → [0, ∞). We assume that f belongs to a Sobolev class of densities. For any L > 0 and β positive integer, we define the Sobolev class of densities W (β, L), as the set of functions W (β, L) =

  Z  Z 2 f : R → [0, ∞) : f = 1, f (β) (x) dx ≤ L2 , R

R

where f (β) will denote from now on the generalized derivative ofR order β of f . We may define for an absolutely integrable function f : R → R its Fourier transform F(f )(x) = R f (y)e−ixy dy, for any x in R. We adopt now a more general definition of the Sobolev class, allowing non-integer values of β > 1/2  W (β, L) =

Z f : R → [0, ∞) :



Z 2

f = 1, R

R



|F (f ) (x)| |x|

dx ≤ 2πL2

·

Let fn be an estimator of f based on the sample X1 , . . . , Xn and x0 a fixed point. The performance of the estimator fn at the point x0 is measured by the maximal risk h i ∗ q Rn,β (fn , ϕn,β ) = sup Ef ϕ−q , (1.1) n,β |fn (x0 ) − f (x0 )| f ∈W (β,L)

Keywords and phrases: Density estimation, exact asymptotics, pointwise risk, sharp adaptive estimator. 1

Universit´ e Paris 10, Modal’X, bˆ atiment G, 200 avenue de la R´epublique, 92001 Nanterre, France; e-mail: [email protected] and Universit´ e Paris 6, Laboratoire Probabilit´es et Mod` eles Al´ eatoires, 6 rue Clisson, 75013 Paris, France; e-mail: [email protected] c EDP Sciences, SMAI 2001

2

C. BUTUCEA

where Ef (·) is the expectation with respect to the distribution Pf of X1 , . . . , Xn , when the underlying probability density is f , ϕn,β is a given sequence of positive numbers and q > 0. The sequence ϕn,β such that the maximal risk related to (1.1) remains positive for all estimation procedures fn , asymptotically, and finite for some explicit estimator, asymptotically, is called the optimal rate of convergence for the class W (β, L). Using the argument of optimal recovery as in Donoho and Low [10], it is easy to find that β−1/2

the optimal pointwise rate of convergence over the Sobolev class W (β, L) is ϕn,β = (1/n) 2β . This rate and the estimator attaining this rate depend on the regularity β of the unknown density f . Thus, it is difficult to implement such an estimator in practice. Our goal is to suggest an adaptive estimator fn (x0 ) of f (x0 ), x0 ∈ R, i.e. an estimator independent of the regularity β of f , which is optimal in an exact asymptotic sense. To define the notion of adaptive optimality we follow the minimax framework applied to the problem of adaptivity by Lepskii [23]. He considered the Gaussian white noise model, rather than density estimation. In this context, he introduced the notions of adaptive rate of convergence and rate adaptive estimator. Adaptive rates of convergence on different functional classes for the Gaussian white noise model were obtained by Donoho et al. [8] (who give a detailed overview of the results in adaptive estimation), by Lepski et al. [27], Goldenshluger and Nemirovski [13], Juditsky [19]. Most of these results relate to Besov classes of functions. The latter three papers use the Lepski type of adaptation. For the same framework of the Gaussian white noise model, exact adaptive results are available for several cases. Exact adaptivity means that not only the rate but the best asymptotic constant associated to it is attained by the proposed adaptive estimator. The first result of this kind in the estimation in L2 norm on Sobolev periodic classes belongs to Efromovich and Pinsker [12]. For further developments see Golubev [14,15], Golubev and Nussbaum [17]. Then Lepskii [23], Lepski and Spokoiny [28] obtained exact adaptive results in L∞ and at a fixed point, respectively, on the H¨ older classes with 0 < β ≤ 2 (see also Lepskii [24] and [25]). Tsybakov [30] proved exact adaptive results for the Gaussian white noise model both in L∞ and at a fixed point, on the Sobolev classes. Lepski and Levit [26] gave exact adaptive results in pointwise estimation over a large scale of infinitely differentiable functions. Similar results on the adaptive rates of convergence exist in density estimation. We refer here to Donoho et al. [9], Kerkyacharian et al. [21] and Juditsky [19] for the general setting of Besov classes and Lp norm, with p < ∞. They applied the wavelet shrinkage in order to construct the adaptive estimator. Barron et al. [1], Birg´e and Massart [3] propose different adaptive density estimators constructed by the methods of penalization and prove their adaptivity in the L2 norm. Devroye and Lugosi [7] obtained similar results for the L1 norm using an original adaptation procedure. The results on exact adaptive estimation of the density of i.i.d. random variables, under the quadratic L2 risk, are due to Efromovich [11] and Golubev [16]. Efromovich [11] considered estimation over Sobolev and more general ellipsoids of periodic densities on [0, 1], Golubev [16] estimated a density in Sobolev classes over the real line. In this paper, we consider the problem of exact adaptive density estimation at a fixed point on the Sobolev classes. Note that, similarly to the results of Lepskii [23], Brown and Low [4] concerning the pointwise estimation β−1/2

in Gaussian white noise model, the adaptive estimation with the optimal rate ϕn,β = (1/n) 2β is not possible and we have to normalize the risk by the adaptive rate of convergence, which is logarithmically worse than the optimal rate. In fact, the adaptive rate of convergence on a slightly modified Sobolev class Wn (β, L), described β−1/2

below, is of the order (log n/n) 2β (see Butucea [5]). The main result of this paper is to find the constant c (β, L, q, f (x0 )) multiplying this rate of convergence which allows to attain the exact asymptotics of the minimax adaptive risk. We also construct explicitly the adaptive estimator that attains this exact asymptotics. This extends a pointwise adaptive result of Tsybakov [30] to the problem of density estimation.

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

3

2. Results Our results concern the exact adaptive estimation, at a fixed point x0 ∈ R, over the Sobolev class of densities, with regularity β ∈ Bn , for some set Bn . We introduce the following class of Sobolev densities Wn (β, L) = {f ∈ W (β, L) : f (x0 ) ≥ ρn } , where ρn is a sequence of positive real numbers that satisfies lim ρn = 0 and lim inf (ρn log n) > 0.

n→∞

(2.1)

n→∞

This truncation prevents us from the possible case of a density f that varies with n such that f (x0 ) → 0 too fast as n → ∞. An alternative possibility is to consider that our density belongs to a local Sobolev class, where the smoothness on a neighborhood around the estimation point is quantified. In this context only the adaptive rate of convergence is maintained by the estimation approach described here and not the exact constant normalization. Consider now the following maximal risk over Wn (β, L) at fixed x0 , for q > 0 h i −q Rn,β (fn , ψn,β ) = sup Ef ψn,β |fn (x0 ) − f (x0 )| q . (2.2) f ∈Wn (β,L)

We assume that the set Bn of regularities is a discrete set, Bn = {β1 , . . . ,βNn }, where 1/2 < β1 < . . . < βNn < +∞ are positive integers. Moreover, we suppose that β1 > 1/2 is fixed, while lim βNn = +∞ and {Nn }n≥1 is n→∞

a nondecreasing sequence of positive integers. We define ∆n =

max

i=1...Nn −1

lim sup∆n < +∞

|βi+1 − βi | and assume that it satisfies (2.3)

n→∞

together with lim

∆n log n = ∞. log log n

n→∞ β 2 Nn

(2.4)

Considering a discrete set Bn of regularities is not only a technical matter. Similarly to Lepski’s result, we may as well consider a bounded set of regularities [a, b] ⊆ (1/2, ∞). This changes nothing to the adaptive rate of 1 1 convergence, but a factor 2β − 2b should appear in the constant (where β is the true underlying regularity). We refer to Klemel¨ a and Tsybakov [22] for such type of sharp normalizations. A continuous set of values [a, b] would reduce to the same techniques by considering a grid of values on this interval which is growing finer with n. From a practical point of view, the estimation method described later works on discretized sets of regularities. Our approach considers a larger union of classes, where βNn goes to infinity, without loss in the rate. In particular, conditions (2.3) and (2.4) entail



log n → +∞ βNn n→∞

1/2 βNn log βNn → 0 n→∞ log n  1/2 1 βNn log → 0. log n ∆n n→∞

This is proved in the beginning of Section 3.

(2.5) (2.6) (2.7)

4

C. BUTUCEA

The following definition is a modification for the density problem of the adaptive optimality introduced by Lepskii [23] (see also Tsybakov [30]). Definition 2.1. The sequence ψn,β is an adaptive rate of convergence over the set Bn , if: 1. there exists an estimator fn∗ , independent of β over Bn , which is called rate adaptive estimator, such that lim sup sup Rn,β (fn∗ , ψn,β ) < ∞;

(2.8)

n→∞ β∈Bn

2. if there exists another sequence of positive real numbers ρn,β and an estimator fn∗∗ such that lim sup sup Rn,β (fn∗∗ , ρn,β ) < ∞ n→∞ β∈Bn

and, at some β 0 in Bn ,

ρn,β 0 → ψn,β 0 n→∞

0, then there is another β 00 in Bn such that

ρn,β 0 ψn,β 0

·

ρn,β 00 → ψn,β 00 n→∞

+∞.

Note that condition (2.8) introduces a wide class of rates. We choose between those rates by a criterion of uniformity over the set Bn , expressed in the second part of Definition 2.1. If some other rate satisfies a condition similar to (2.8) and if this rate is faster at some point β 0 then the loss at some other point β 00 has to be infinitely greater for large sample sizes n. Let us denote B− = Bn \ {βNn } and   ψn,β =

 c

log n n

 β−1/2 2β

 β−1/2 1 2β



n

, for β ∈ B−

(2.9)

, for β = βNn ,

where the constant c is function of β, L, q and f (x0 ), c = c (β, L, q, f (x0 )) > 0 and satisfies 0 < lim inf c (β, L, q, f (x0 )) β→∞

p p β ≤ lim sup c (β, L, q, f (x0 )) β < ∞. β→∞

2.1. Exact adaptive estimation procedure The estimation procedure contains three steps. First, we consider a preliminary estimator fbn (x0 ) of f (x0 ), as the following kernel estimator   n 1 X Xi − x fbn (x0 ) = K , nhn i=1 hn

(2.10)

R where K is a bounded, positive kernel, such that |u| |K (u)| du < ∞, and the bandwidth is hn > 0. Let us mention that the regularity β of f being greater than 1/2 the density is regular enough in order to be pointwise evaluable. The bandwidth satisfies lim hn = 0 and lim nhn = ∞, which guarantee that fbn (x0 ) is a consistent estiman→∞ n→∞ tor. This preliminary estimator does not need to be particularly well performing, but for technical reasons in Lemma 3.2 it must not be too slow. Let us consider a preliminary bandwidth such that hn = O(n−α0 ),

(2.11)

for some fixed 0 < α0 < 1/2.n We truncate o this estimator at ρn that tends to 0 when n → ∞ such that (2.1) b holds and get ρbn (x0 ) = max fn (x0 ) , ρn .

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

5

The second step consists in defining a family of kernel estimators, whose bandwidths contain the preliminary estimator as follows b hn,β b hn,βNn



1 1   2β q ρbn (x0 ) log n 2β = kβ , where β ∈ B− and kβ = n 2β (2β − 1) L2   2β1 Nn 1 = , if β = βNn . n

For β > 1/2, define a kernel Kβ by the expressions Kβ (x) =

1 2π

Z

Z

1 du = 2β π 1 + |u|

exp (ixu)

R



0

cos (xu) 2β

1 + |u|

du.

(2.12)

Define the kernel estimator depending on β n 1 X fbn,β (x0 ) = Kβ nb hn,β i=1

Xi − x0 b hn,β

! ·

At the third step, we use a version of Lepski’s method to approximate β by an estimator βb and to substitute it in the expression of the kernel estimator fbn,β (x0 ). For any β > 1/2, we define b2β =

1 2π

Z R



1 |u| 2  2 du, νβ = 2π 1 + |u|2β

Z R



1 1 + |u|2β

2 du.

By simple calculation, we can prove that νβ2 = (2β − 1) b2β . We consider the sequence r ηbn,β = νβ and the estimator of β defined as n βˆ = max β ∈ Bn

q 2βkβ



ρbn (x0 ) log n n

 β−1/2 2β

o : fbn,γ (x0 ) − fbn,β (x0 ) ≤ ηbn,γ , ∀γ ∈ Bn , γ ≤ β ·

Finally, we replace β by βˆ in the kernel estimator fbn,β (x0 ) in order to get the estimator fn∗ (x0 ) = fbn,βˆ (x0 ) .

(2.13)

This estimator will be shown to be exact adaptive i.e. to attain the adaptive rate of convergence in our setup, up to a constant, explicitly given. The key point in this construction is Lepski’s algorithm in the third step for the evaluation of the smoothness β of the underlying density, at each point x0 . The same method was employed by Tsybakov [30] in the Gaussian white noise model but for orthogonal series estimators of the signal function. We introduce here kernel estimators which are more suitable for density estimation and are largely used in practice. Moreover, we provide optimal kernel (in the sense of exact adaptivity) and bandwidth expression of the adaptive estimator. For a simulation study of the behavior of this estimator we refer to Butucea [6]. We have implemented the described adaptive estimator and tested it over ten different densities. We have considered a set B = {1, . . . , 6} and L = 10. As a preliminary estimator we used a Gaussian kernel estimator, with a rather large bandwidth.

6

C. BUTUCEA

We came to the conclusion that the adaptive estimator is particularly robust with respect to this preliminary estimator. Also, the choice of L is not of crucial importance. Choosing L adaptively reduces to taking a finer grid on β. The local sharp adaptive estimator behaves uniformly well for many different densities in the described class. Also, this study cleared out the behaviour of this procedure for more general analytic functions, as the classical Gaussian density. This function belongs to each Sobolev class W (β, L) (where L gets larger with β) and βˆ chooses βNn (for large enough L). The adaptive estimator achieves the rate n−(βNn −1/2)/βNn which is close to n−1/2 for large βNn . Finally, a major difference with respect to the nonparametric regression or Gaussian white noise model, as considered in Lepski and Spokoiny [28], Tsybakov [30], is the fact that the density model is heteroscedastic. This means more precisely that the variance of the kernel estimator is proportional to f (x0 ), the value of the unknown density at the estimation point. In consequence, this value appears into our exact normalization and optimal estimators and hence the use of a preliminary estimator of f (x0 ), which has to be free of unknown values f (x0 ) and β.

2.2. Statement of results Let c = c (β, L, q, f (x0 )) be a positive constant defined by c (β, L, q, f (x0 )) = bβ L

1 2β

(2β)

β+1/2 2β



qf (x0 ) 2β − 1

 β−1/2 2β .

(2.14)

Theorem 2.2. The estimator fn∗ (x0 ) defined by (2.13) is rate adaptive estimator and the adaptive rate ψn,β associated to the constant c (β, L, q, f (x0 )) in (2.14) is such that lim sup sup Rn,β (fn∗ , ψn,β ) ≤ 1

(2.15)

lim inf inf sup Rn,β (fn , ψn,β ) ≥ 1.

(2.16)

lim supRn,βNn (fn∗ , ψn,βNn ) < ∞.

(2.17)

n→∞ β∈B−

and n→∞

fn β∈B−

Moreover, n→∞

This exact constant is obviously similar to the case of Gaussian white noise model in Tsybakov [30]. Nevertheless, as we stressed in our introduction very few results of this type are known in the density estimation model. The pointwise estimation allows more flexibility than the global adaptation and, in particular, the bandwidth and the kernel can be adjusted locally. Moreover, the adaptive setup in the pointwise estimation is more interesting because the rates are significantly different (slower by a log n factor) than the nonadaptive rates, except at the last point of the set BNn . This is usually not happening in global Lp estimation. Our theorem states that ψn,β and fn∗ satisfy the exact adaptive problem on B− : lim inf sup Rn,β (fn , ψn,β ) = lim sup Rn,β (fn∗ , ψn,β ) = 1

n→∞ fn β∈B−

n→∞ β∈B−

split in two inequalities (2.15) – also called upper bound and (2.16) – known as the lower bound. The same procedure attains the optimal rate of convergence at the last point of the set, βNn . The proof of the upper bounds (2.15) and (2.17) is given in Section 5. The different tools and techniques that are used here are described in Sections 3 and 4. In Section 3, we study the preliminary estimator and give

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

7

exact bounds for the bias and the variance of the optimal kernel estimator fn,β . Section 4 contains the main tools in our proof, stated as theorems. On one hand, in order to bound uniform risks of the adaptive estimator both exponential and uniform exponential inequalities are needed (Bernstein’s and, respectively, van de Geer’s inequalities). An original technique is used in Theorem 4.6, where we split the integration domain of the risk and treat each case differently. A global entropy reasoning or chaining would not provide us the right constants. On the other hand, it is necessary to study the estimator of β and to quantify exactly the probability that βb is strictly less than the true β. We see that this event happens with exponentially small probability. Section 6 gives a constructive proof of the lower bounds. Namely, the subexperiments that are difficult enough and lead to the exact lower bounds are given.

3. Auxiliary results From now on o (1) denotes any sequence that tends to 0 when n → ∞, O (1) and di , i = 0, 1, 2, . . . are positive constants, depending eventually on fixed, positive β1 , q and L, while ci , i = 1, 2, . . . are absolute positive constants. Proof of (2.5) to (2.7). We apply (2.3) and (2.4) and we pass to limits: β 2 log log n ∆n βNn = Nn ≤ o(1)lim sup∆n = o(1). log n ∆n log n βNn log log n n→∞ 2 2 Next, for n large enough, βNn log2 βNn ≤ βN log βNn ≤ βN log log n and then n n



βNn log n

1/2

 log βNn ≤

2 βN log log n n log n

Finally, we remark that for n large enough,  1/2 n and we write log ∆1n ≤ log∆log . Then n 

βNn log n

1/2 log

1 ∆n

1/2 ≤

 1/2 β 2 log log n ∆n Nn → 0. n→∞ ∆n log n

≤ log n and then log ∆1n ≤ log log n. We also have log ∆1n ≤

1 ≤ ∆n



2 log log n 1 βN n βNn ∆n log n

1 ∆n

1/2 → 0.

n→∞

2 Let us remark that the kernel Kβ in (2.12) has the Fourier transform F (Kβ ) (u) = 2

and by Plancherel formula kKβ k2 =

1 2π

1 1 + |u|2β

2

kF (Kβ )k2 = νβ2 .

Lemma 3.1. There exist positive constants Kmax , kmax , kmin , νmax , νmin and vmax depending only on fixed β1 , q and L such that: 1. kKβ k∞ ≤ Kmax , kmin ≤ kβ ≤ kmax and νmin ≤ νβ ≤ νmax , for any β in Bn ; 2. max sup kf k∞ ≤ vmax . β∈B f ∈Wn (β,L)

8

C. BUTUCEA

Proof. 1. We shall prove only the first statement, the rest being an easy consequence of the definition of kβ and νβ . We have for 2β1 > 1: kKβ k∞

Z

1 ≤ 2π

!

du

1+ |u|>1

= Kmax (β1 ) < +∞.

2β1

1 + |u|

2. For β > 1/2, f in Wn (β, L) is a continuous function and Z Z  2 1/2 1 νβ 2 β F(f )(y)eixy dy ≤ √ |F(f )(y)| 1 + |y| dy . |f (x)| = 2π R 2π R We finish the proof by writing for F(f ) continuous that Z 2

R

|F(f )(y)|



β

1 + |y|

2

Z

!

Z 2

dy ≤ 4

|y|≤1

|F(f )(y)| dy +

2

|y|>1



|F(f )(y)| |y|

dy

which is finite. From now on, we suppose that β is fixed, β ∈ B, f belongs to the class Wn (β, L) and γ is also in B such that 1/2 < γ ≤ β. Let us define the nonrandom sequences  hn,γ = kγ

f (x0 ) log n n

1  2γ

r and ηn,γ = νγ

q 2γkγ



f (x0 ) log n n

 γ−1/2 2γ .

Let δn = 1/ log n and let 1/2 < β0 ≤ 1. Then (2.1) and (2.11) entail: 2

β −1/2

n (hn δn ρn ) hn0 → ∞ and log n δn ρn

→ 0.

(3.1)

We put for any γ in B  γ e=

β, when γ = β or γ < β < 2γ γ + β1 +1/2 , when 2γ ≤ β, 2

(3.2)

which satisfies γ e < 2γ and γ < γ e ≤ β. We define the random event

An,γ

  !γe−1/2  b  hn,γ = − 1 ≤ δn  hn,γ 

and the corresponding nonrandom set  ) h eγ −1/2 h: − 1 ≤ δn · hn,γ

( Hn,γ =

The following result concerns the preliminary kernelRestimator fbn (x0 ) in (2.10), having the bandwidth hn that satisfies (2.11) and a bounded, kernel K, such that |u| |K (u)| du < ∞.

9

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

Lemma 3.2. If An,γ denotes the complementary set of An,γ , then for sufficiently small, fixed α > 0: ( ) 2   n ((1 − α) hn δn ρn ) Pf An,γ ≤ 2 exp − = o (1) . 2 2 kKk∞ Proof. We use the facts that |xα − 1| ≤ |x − 1| for any 0 < α < 1, x > 0 and |max {x, ρn } − max {y, ρn }| ≤ |x − y| for any fixed n and x, y > 0. For α = eγ −1/2 ≤ 1 − 4β1 1 and f (x0 ) ≥ ρn , we have 2γ 

Pf An,γ



    γe −1/2 h i 2γ ρbn (x0 ) = Pf  − 1 ≥ δn  ≤ Pf fbn (x0 ) − f (x0 ) ≥ δn ρn f (x0 ) h i ≤ Pf fbn (x0 ) − Ef fbn (x0 ) + Ef fbn (x0 ) − f (x0 ) ≥ δn ρn .

By embedding Theorem 15.1 in Besov et al. [2] we deduce that f which belongs to Wn (β, L) belongs also to Wn (β0 , L) with β0 = min{β1 , 1}, which is included in the H¨older class H (β0 − 1/2, L0 ), L0 > 0. This means f verifies: |f (x) − f (y)| ≤ L0 |x − y|β0 −1/2 . R Let us consider a kernel K such that |u| |K (u)| du < ∞. Then, we have: Z b Ef fn (x0 ) − f (x0 ) ≤ |K(x)| |f (x0 + hn x) − f (x0 )| dx ≤ L0 hβn0 −1/2

Z |x|≤1

β0 −1/2

|K(x)| |x|

Z dx + |x|>1

! |K(x)| |x| dx

≤ c (L0 , β0 ) hnβ0 −1/2 , with a constant c (L0 , β0 ) > 0. The last term of the inequality above is equal to o(δn ρn ) by (3.1). We obtain, for fixed, small α > 0 h i   Pf An,γ ≤ Pf fbn (x0 ) − Ef fbn (x0 ) ≥ (1 − α) δn ρn . We apply Hoeffding’s inequality for the i.i.d. variables K ((Xi − x0 ) /hn ), bounded by kKk∞ < ∞: " n   # ( )    1 X Xi − x0 nz 2 Xi − x0 K − Ef K Pf ≥ z ≤ 2 exp − n hn hn 2 kKk2∞ i=1 for any z ≥ 0, f ∈ Wn (β, L). It suffices to take z = (1 − α) hn δn ρn in order to get the stated result. Moreover, 2 n (hn δn ρn ) → ∞ with n, by (3.1), then the right-hand side in the lemma is o (1).  Pn Xi −x0 1 Denote fn,γ (x0 , h) = nh the kernel estimator of f (x0 ), having kernel Kγ and bandwidth h i=1 Kγ h in the set Hn,γ . In particular, we denote fn,γ (x0 ) = fn,γ (x0 , hn,γ ). We study in the following this kernel estimator using the classical decomposition |fn,γ (x0 , h) − f (x0 )| ≤ Bn,γ (h) + Zn,γ (h) , where Bn,γ (h) = |Ef fn,γ (x0 , h) − f (x0 )| is the bias of the estimator fn,γ (x0 , h) of f (x0 ), for a bandwidth h ∈ Hn,γ and Zn,γ (h) = |fn,γ (x0 , h) − Ef fn,γ (x0 , h)| its stochastic term.

10

C. BUTUCEA

e > 0 be Lemma 3.3. For any h in Hn,γ , f ∈ Wn (β, L) and γ, β in BNn such that 1/2 < γ ≤ β, let L  1 Lbβ hβ− 2 , γ = β some constant, bn,γ (h) = γ is defined by (3.2) and s2n,γ (h) = f (x0 ) νγ2 / (nh). e max heγ − 12 , γ < β , where e Lν (In particular, we denote bn,γ = bn,γ (hn,γ ) and sn,γ = sn,γ (hn,γ )). We have: 1. Bn,γ (h) ≤ bn,γ  (h); 2 2. Ef Zn,γ (h) ≤ s2n,γ (h) (1 + o (1)), where o (1) → 0 uniformly in γ and β. n→∞  ·−x0 1 ixx0 (x) = e Proof. 1) We have F h Kγ h F (Kγ ) (hx). Then Z Z 2γ 1 1 |hx| ixx0 F (f ) (x) e [F (K ) (hx) − 1] dx ≤ |F (f ) (x)| dx γ 2π 2π R 1 + |hx|2γ R  1/2 Z 1/2 Z 2eγ 2(2γ−e γ) 1 1 dx   h |hx| 2 2γ e ≤√ |F (f ) (x)| |x| dx   2  . 2π 2γ 2π R R 1 + |hx|

Bn,γ (h) =

by the Cauchy–Schwarz inequality. a) When γ = β we have e γ = β and  β− 12

Lh  Bn,β (h) ≤ √  2π

1/2

Z R





|u|



1 + |u|

 2 du

= bn,β (h) .

  e , for some b) When γ < β, we have e γ such that γ < γ e < 2γ and γ e ≤ β. Then f ∈ Wn (β, L) ⊆ Wn γ e, L e > 0, by embedding theorem of Sobolev spaces. We get fixed constant L 1/2  Z 2(2γ−e γ) γ − 12 e e Lh |u|   Bn,γ (h) ≤ √ ≤ bn,γ (h) .   2 du 2γ 2π R 1 + |u|  2  R 1 2 x−x0  1 2) We have Ef Zn,γ (h) ≤ nh f (x) dx and we conclude by Bochner’s lemma applied to conh Kγ h tinuous f and kernel Kγ , that   Z 1 2 x − x0 K f (x) dx ≤ f (x0 ) kKγ k22 (1 + o (1)) . h γ h We give rough non-random upper bounds concerning the kernel estimator fbn,γ (x0 ) with bandwidth b hn,γ depending on random data X1 , . . . , Xn . Lemma 3.4. For γ < β, there exist constants d0 and d1 , such that, a.s. q  −q sup sup ψn,β ηbn,β + fbn,β (x0 ) − f (x0 ) ≤ 2d0 nd1 β∈B f ∈Wn (β,L)

and sup

sup

β∈B f ∈Wn (β,L)

q −q b ψn,β fn,γ (x0 ) − f (x0 ) ≤ d0 nd1 .

11

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

Proof. Let us bound at first the preliminary estimator fbn (x0 ) ≤ kK0 k∞ /hn . Then, a.s. 1  2γ  kK k 1 Kmax n γ ∞ b ≤ ≤ O (1) n 2β1 . fn,γ (x0 ) ≤ b k ρ b (x ) log n min n 0 hn,γ

At last, for β in B− 1 √   12 − 4β p 1 2β − 1 p n − 1 ≤ kβ ≤ O (1) βNn n 2 4β1 β f (x ) log n 0 2βLνβ kβ

−1 ψn,β

−1 and the same holds for β = βNn , ψn,β ≤ O (1) n1/2−1/(4β1 ) . Nn We may conclude that a.s., for any 1/2 < γ ≤ β:  1 1  p 1 + − 1 −1 b ψn,β fn,γ (x0 ) − f (x0 ) ≤ O (1) βNn n 2 4β1 + n 2 4β1 ≤ d0 nd1 ,

since βNn = o(1). Similarly, ηbn,β ηn,β = · ψn,β ψn,β



ρbn (x0 ) f (x0 )

 (β−1/2) 2β

 ≤

1 1− 2β



ρbn ρn

1  12 − 4β

≤ d0 nd1 .

Let us choose a convenient sequence which shall appear in the proof of the upper bounds: "  1 #   12  q 1 1 log n 4 τn,γ = sn,γ , for any γ ≤ β. − log n + 2 γ β βNn 1/4

Remark that τn,β /sn,β = (log n/βNn )

→ ∞, when n → ∞.

Lemma 3.5. 1. The following inequalities hold for any β ∈ Bn and f ∈ Wn (β, L)



log n n

sn,β ψn,β  ∆2n β

N

r  r  1/2 1 2 βNn τn,β 2 βNn 4 , ≤ , q log n ψn,β q log n   ∆n log n ≤ exp − 2 → 0. βN log log n n→∞ ≤

2. If γ < β are in Bn we have q bqn,γ + τn,γ 1 1 q/2 ≤ O (1) (log n) nq( 4γ − 4β ) . q ψn,β f ∈Wn (β,L)

sup

2 Proof. 1. It suffices to note that ηn,β = 2. First, if β is in B− , we have

−1 ψn,β

p ≤ O (1) βN





2β−1 2β

2

n f (x0 ) log n

2 ψn,β =

1  12 − 4β

q 2 2β sn,β

log n and use (2.4) for the limit.

p ≤ O (1) log n



n f (x0 ) log n

1  12 − 4β

.

12

C. BUTUCEA

We see as well that s τn,γ ≤ sn,γ

q log n (1 + o (1)) ≤ O (1) 2γ



f (x0 ) log n n

1  12 − 4γ

and also e− 1 γ bn,γ ≤ cb hn,γ 2 ≤ O (1)



f (x0 ) log n n

1  12 − 4γ

as γ ≤ γ e < 2γ and kmin ≤ kγ ≤ kmax . Moreover, f (x0 ) log n ≥ ρn log n > 0, then q bqn,γ + τn,γ 1 1 q/2 ≤ O (1) (log n) nq( 4γ − 4β ) . q ψn,β f ∈Wn (β,L)

sup

If β = βNn , the result is immediate.

4. Exponential inequalities and applications The key lemmas in the proof of the upper bounds are given here and the results as we apply them in the next section are stated as theorems.

4.1. Exponential bounds The next proposition recalls two inequalities on empirical processes. Denote Pn the empirical distribution associated to the i.i.d. observations X1 , . . . , Xn of common law P by Pn = 1/n i=1 δXi . Proposition 4.1. Let us consider a class of functions K = {fh (·) / h ∈ H} satisfying sup kfh k∞ ≤ K and sup kfh kL2 (P ) ≤ M

h∈H

h∈H

for some positive K and M . Then for all u > 0 and for fixed h ∈ H  Z    nu2 Pf fh d (Pn − P ) ≥ u ≤ 2 exp − 2 (M 2 + 2uK) (Bernstein’s Inequality see Pollard [29]); Moreover, if   Z n o 2C1 M 2 C0 M 1/2 √ u ≤ min 8M, and u ≥ max HB (K, x) , 1 dx, K n u6 2

where HB denotes the entropy with bracketing with respect to the L2 -norm of a class of functions, then there exists some a < 1/2 such that  Pf (Van de Geer [31]).

Z    nu2 sup fh d (Pn − P ) ≥ u ≤ 8 exp −a 2 M h∈H

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

13

One can prove without difficulty that there exists a positive constant d2 = 1/ (β1 − 1/2), such that: Hn,γ ⊆

  h h : − 1 ≤ d2 δn = Hn,γ . hn,γ

(4.1)

We shall apply the previous results, successively, to two classes of measurable functions, for observations having distribution Pf , corresponding to the probability density f ∈ Wn (β, L) and empirical distribution Pn . Let us take on one hand     1 · − x0 : h ∈ Hn,γ · K = Kγ,h (·) = Kγ h h Then sup kKγ,h k∞ ≤

h∈Hn,γ

kKγ k∞ Kmax ≤ hn,γ hn,γ

and by Lemma 3.3

sup kKγ,hkL2 (Pf )

h∈Hn,γ

p √ νγ f (x0 ) ≤ p = sn,γ n. hn,γ

For different h1 , h2 in Hn,γ we have kKγ,h1 − Kγ,h2 kL2 (Pf )







1 √ kF (Kγ ) (h1 ·) − F (Kγ ) (h2 ·)k2 2π  1/2 2 2γ 4γ 2γ Z − h |y| dy h 1 2 1   √   2  2  2γ 2γ 2π 1 + |h1 y| 1 + |h2 y|   2γ O (1) h1 p 1 − . h2 hn,γ

Then, we may write that the  ε-entropy with bracketing of the class K with respect to the L2 (Pf ) norm is: HB (K, ε) ≤ O log 1ε + log n . Thus C √0 n

Z 0

M

r   log n 1/2 max HB (K, x) , 1 dx ≤ M · n

On the other hand, consider the following class of measurable functions  K (hn,γ ) = Kγ,h (·) − Kγ,hn,γ (·) : h ∈ Hn,γ

(4.2)

14

C. BUTUCEA

for which

sup Kγ,h − Kγ,hn,γ ∞

h∈Hn,γ

Z  1 ixy ≤ sup sup √ F Kγ,h − Kγ,hn,γ (y) e dy 2π h∈Hn,γ x 2γ 2γ Z h − h2γ dy 1 n,γ |y|    ≤ sup √ 2γ 2γ h∈Hn,γ 2π 1 + |hy| 1 + |hn,γ y| 2γ  O (1) h βN δn ≤ 1 − ≤ O (1) hn,γ hn,γ hn,γ

and

sup Kγ,h − Kγ,hn,γ L

h∈Hn,γ

2 (Pf )

O (1) ≤ sup p hn,γ h∈Hn,γ

 2γ h βN δn · 1 − ≤ O (1) p hn,γ hn,γ

(4.3)

It is easy to see, that (4.2) still holds for this class.

√ Lemma 4.2. For the kernel estimator fn,γ (x0 , h), for any fixed x0 and for u ≤ c1 sn,γ log n (c1 > 0, an absolute constant), there exists δn0 > 0 small enough, independent of γ, such that:   u2 Pf [|Zn,γ (hn,γ )| ≥ u] ≤ 2 exp − 2 (1 − δn0 ) · 2sn,γ Proof. Write Z√n,γ (h) = and M = sn,γ n to get

R

Kγ,h d (Pn − Pf ) and apply Bernstein’s inequality of Proposition 4.1 for K = Kmax /hn,γ (

Pf [|Zn,γ (hn,γ )| ≥ u] ≤ 2 exp −

2s2n,γ

u2  1 + 2uKmax/νγ2 f (x0 )

) ·

√ We remark that for u ≤ c1 sn,γ log n (c1 > 0) we have √  1− 1 2uKmax 2c1 Kmax log n ρn log n 2 4γ p ≤ ≤ O (1) = o (1) p νγ2 f (x0 ) n νγ f (x0 ) nhn,γ and thus we can find a δn0 > 0 small enough, such that   2uKmax 1 1+ 2 ≤ νγ f (x0 ) 1 − δn0 which we can replace in the exponential inequality above in order to get the stated lemma. √ Lemma 4.3. For u, such that τn,γ ≤ u ≤ c1 sn,γ log n, there exist a constant d3 > 0 and δn1 > 0 sufficiently small, independent of γ, such that " Pf

#

(

) u2 (1 − δn1 )2 sup |Zn,γ (h)| ≥ u ≤ d3 exp − (1 − δn0 ) , 2s2n,γ h∈Hn,γ

for n large enough, where δn0 is the sequence in Lemma 4.2.

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

Proof. Let us consider the sequence δn1 = βNn δn "

p βNn log n → 0. We have n→∞

# sup |Zn,γ (h)| ≥ u

Pf

15

≤ Pf [|Zn,γ (hn,γ )| ≥ u (1 − δn1 )]

h∈Hn,γ

"

# sup |Zn,γ (h) − Zn,γ (hn,γ )| ≥ uδn1 .

+Pf

h∈Hn,γ

We apply Lemma 4.2 to the first term on the right-hand side of the previous inequality. We see that u (1 − δn1 ) √ ≤ u ≤ c1 sn,γ log n and thus there exist a sufficiently small δn0 > 0 such that: (

) 2 u2 (1 − δn1 ) (1 − δn0 ) · Pf [|Zn,γ (hn,γ )| ≥ u (1 − δn1 )] ≤ 2 exp − 2s2n,γ For the second term we first see that Z Zn,γ (h) − Zn,γ (hn,γ ) =

 Kγ,h − Kγ,hn,γ d (Pn − Pf ) .

p Then we apply Van de Geer’s inequality from Proposition 4.1, where K = βNn δn /hn,γ and M = βNn δn / hn,γ " Pf

#

(

sup |Zn,γ (h) − Zn,γ (hn,γ )| ≥ uδn1 ≤ 8 exp −anhn,γ h∈Hn,γ

(uδn1 )

2

(βNn δn )

)

2

n o   √ 2 1M = O (1) βNn δn , for all u in τn,γ , c1 sn,γ log n . Indeed, min 8M, 2CK C √0 n

Z

M

0

n o βN δn p 1/2 max HB (K, x) , 1 dx ≤ O (1) p n log n nhn,γ

and " # h i p βNn δn p τn,γ δn1 , c1 δn1 sn,γ log n ⊆ p log n, βNn δn , nhn,γ for n large enough. By summing the two terms again, we get " Pf

# sup |Zn,γ (h)| ≥ u

h∈Hn,γ

(

) ( ) 2 2 u2 (1 − δn1 ) (uδn1 ) ≤ 2 exp − (1 − δn0 ) + 8 exp −anhn,γ 2s2n,γ (βNn δn )2 ( ) u2 (1 − δn1 )2 ≤ 8 (1 + o (1)) exp − (1 − δn0 ) · 2s2n,γ

Theorem 4.4. We have " sup

sup

Pf

β∈Bn f ∈Wn (β,L)

# sup |Zn,β (h)| ≥ τn,β

h∈Hn,β

→ 0.

n→∞

16

C. BUTUCEA 2

Proof. Let us apply the Lemma 4.3, for γ = β, u = τn,β and δn2 → 0 such that (1 − δn1 ) (1 − δn0 ) = 1 − δn2 n→∞

" Pf

# sup |Zn,β (h)| ≥ τn,β

h∈Hn,β

(

2 τn,β ≤ d3 exp − 2 (1 − δn2 ) 2sn,β

)

(

1 ≤ d3 exp − 2



log n βNn

 12

) (1 − δn2 )

and this sequence tends to 0, by Lemma 3.5, uniformly in f ∈ Wn (β, L) and β ∈ Bn . √ √ Lemma 4.5. Let c2 > 0 be a constant such that c2 ≥ c1 sn,γ log n. For n large enough and c1 sn,γ log n ≤ u ≤ c2 , we have that " Pf

#

  u2 sup |Zn,γ (h)| ≥ u ≤ 8 exp −a 2 · sn,γ h∈Hn,γ

R Proof. Let us note that Zn,γ (h) = √ Kγ,h d (Pn − Pf ) and apply Van de Geer’s inequality of Proposition 4.1, for K = Kmax /hn,γ and M = sn,γ n. Indeed,   2C1 M 2 2C1 M 2 min 8M, = ≤ c2 K K for c2 large enough, and Z

C √0 n

M

0

√ n o p M log n 1/2 max HB (K, x) , 1 dx ≤ C0 √ ≤ c1 sn,γ log n. n

The hypothesis are verified, then " Pf

#

    nu2 u2 sup |Zn,γ (h)| ≥ u ≤ 8 exp −a 2 ≤ 8 exp −a 2 · M sn,γ h∈Hn,γ

Theorem 4.6. We have βNn −q sup sup ψn,β Ef ∆ n f ∈W (β,L) n γ∈B−

"

!q sup |Zn,γ (h)|

!# sup |Zn,γ (h)| ≥ τn,γ

I

h∈Hn,γ

h∈Hn,γ

→ 0,

n→∞

γ ηn,γ0 (1 − δn ) . h∈Hn,γ0

20

C. BUTUCEA

Let us write that sup |fn,γ1 (h) − f | + h∈Hn,γ1



sup |fn,γ0 (h) − f | h∈Hn,γ0

sup (Bn,γ1 (h) + |Zn,γ1 (h)|) + h∈Hn,γ1

sup (Bn,γ0 (h) + |Zn,γ0 (h)|) h∈Hn,γ0

≤ (bn,γ1 + bn,γ0 ) (1 + δn ) +

sup |Zn,γ1 (h)| + h∈Hn,γ1

|Zn,γ0 (h)| .

sup h∈Hn,γ0

By Lemma 4.7, (bn,γ1 + bn,γ0 ) (1 + δn ) ≤ d5 δn3 ηn,γ0 (1 − δn ), for some d5 > 0 and for n large enough. Then, we replace " p1

#

≤ Pf

sup

|Zn,γ1 (h)| +

h∈Hn,γ1

" ≤ Pf

sup |Zn,γ0 (h)| ≥ ηn,γ0 (1 − δn ) (1 − d5 δn3 ) h∈Hn,γ0

#

sup |Zn,γ0 (h)| ≥ ηn,γ0 (1 − δn ) (1 − 2d5 δn3 ) h∈Hn,γ0

"

+Pf

#

sup |Zn,γ1 (h)| ≥ d5 δn3 ηn,γ0 (1 − δn ) . h∈Hn,γ1

For the first term on the right-hand side we apply Lemma 4.3, for τn,γ0 ≤ ηn,γ0 (1 − δn ) (1 − 2d5 δn3 ) ≤ c1 sn,γ0

p log n

(that holds for c1 large enough) and by the choice of δn3 we get " Pf

#

  q q log n (1 − O (1) δn3 ) ≤ d3 n− 4γ . sup |Zn,γ0 (h)| ≥ ηn,γ0 (1 − δn ) (1 − 2d5 δn3 ) ≤ d3 exp − 4γ 0 h∈Hn,γ0

For the second term, τn,γ1 ≤

q

√ q 2γ1 sn,γ1 log n

≤ d5 δn3 ηn,γ0 (1 − δn ) ≤ c2 and we apply Lemma 4.5 for γ0 ≤ γ

as follows #

" Pf

(

log n sup |Zn,γ1 (h)| ≥ d5 δn3 ηn,γ0 (1 − δn ) ≤ 8 exp −a γ h∈Hn,γ1



δn3 sn,γ0 sn,γ1

)

2 2

(1 − δn )

q

and by the third statement of Lemma 4.7 this exponential is infinitely small with respect to n− 4γ . That allows us to conclude that q

p1 ≤ O (1) n− 4γ . We finish the proof of the lemma by using Lemma 3.2 ( ) 2     n ((1 − α) hn δn ρn ) p2 ≤ Pf An,γ0 + Pf An,γ1 ≤ 4 exp − , 2 kK0 k2∞ q

which is o (1) n− 4γ by (3.1).

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

21

Theorem 4.9. We have the following convergence q h i (1 + δn ) bqn,γ + τn,γ sup Pf βˆ = γ → 0. q n→∞ ψn,β f ∈Wn (β,L) γ∈B− q

sup cardB− sup β∈BN n

γ 0

(β) e β ≤ 1 − δ/2, K e e β supported on (−D, D) such that K and an integrable function K β ≤ 1 − δ/2 and 2

e β (0) ≤ K ∗ (0). (1 − δ) Kβ∗ (0) ≤ K β

2

e β1 ∈ W (β1 , 1), satisfying the same conditions. Let By this lemma, we get an integrable kernel, K β1 − 2 e β1 gn,β1 (x) = Le hn,β ·K 1 1

x − x0 e hn,β

! ,

1

kβ1 where e hn,β1 = e Denote εn,β1 =

 R

fn,0 (x0 )·log n n

g R n,β1

 2β1

and e kβ1 =

1



 2β1

q

L2 ·2β

1

1

.

(x) dx which is finite by the previous lemma. Consider fn,1 (x) = fn,0 (x)(1 − εn,β1 ) + gn,β1 (x).

Lemma 6.3.  Let fn,0 , gn,β1 and εn,β1 be defined as above. Then, fn,0 ∈ Wn (βN −1 , L) and fn,1 ∈ Wn (β1 , L), β1 + 12 e εn,β = O h and (1 − δ) ψn,β ≤ gn,β (x0 ) ≤ ψn,β . 1

n,β1

1

1

1

Proof. We can easily prove that fn,0 is also a positive density function, infinitely continuously differentiable on R. Moreover,

δ Lδ

(β1 )

·

fn,0 = f (β1 ) ≤ 2 2 2 2  Consequently, fn,0 ∈ Wn β1 , L 2δ and obviously fn,0 ∈ Wn (βN −1 , L), too. We can see that Z   β1 + 12 β1 + 12 e e β1 (x) dx = O e εn,β1 = Lhn,β1 K hn,β 1 R

β −1

β −1

1 1 2 e 2 hn,β hn,β Kβ1 (0) and by Lemma 6.3 and that gn,β1 (x0 ) = Le Kβ1 (0). Remark that ψn,β1 = Le 1 1

(1 − δ) ψn,β1 ≤ gn,β1 (x0 ) ≤ ψn,β1 .





e (β1 )

(β ) δ Remark also that gn,β1 1 = L K β1 ≤ L 1 − 2 . 2 2 It is easy to prove that fn,1 is a density function, positive for n large enough and that fn,1 (x0 ) ≥ ρn (1 − εn,β1 ) + gn,β1 (x0 ) ≥ ρn

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

27

since gn,β1 (x0 ) ≥ ρn εn,β1 , for n large enough. Moreover,  



δ δ

(β1 )

(β1 )

(β1 ) = L.

fn,1 ≤ (1 − εn,β1 ) fn,0 + gn,β1 ≤ L + L 1 − 2 2 2 2 2 Then fn,1 ∈ Wn (β1 , L).

6.2. Proof of the lower bound For chosen fn,0 ∈ Wn (βN −1 , L) and fn,1 ∈ Wn (β1 , L) as in the previous section, we have inf sup Rn,β (fbn , ψn,β ) ≥ inf max Rn,β (fbn , ψn,β ) fbn β∈B fbn β∈{β1 ,βN −1 } n −q ≥ inf max Efn,0 [ψn,β · | fbn (x0 ) − fn,0 (x0 ) |q ], N −1 fbn o −q Efn,1 [ψn,β · | fbn (x0 ) − fn,1 (x0 ) |q ] · 1 Let us denote: ψn,β1 −1 · | fbn (x0 ) − fn,0 (x0 ) | and qn = · Tn = ψn,β 1 ψn,βN −1 Remark that qn ≥ a



n log n

−β1  β4βN −1 β 1 N −1

→ +∞, when n → ∞. Then, we may write

q   fn,1 (x0 ) − fn,0 (x0 ) q inf sup Rn,β (fbn , ψn,β ) ≥ inf max E0 |qn Tn | , E1 Tn − Tn ψn,β1 fbn β∈B q

q

≥ inf max {E0 |qn Tn | , E1 |Tn − θ1 | } , Tn

(6.1)

with the notation E0 = Efn,0 , E1 = Efn,1 (with the associated probability laws P0 and P1 ) and θ1 = (gn,β1 (x0 ) − εn,β1 · fn,0 (x0 )) /ψn,β1 . Let us denote Rn (Tn , θ1 ) the right-hand side term in (6.1), where inf Tn

denotes the infimum over all random functions Tn . Following the proof in Theorem 6, Tsybakov [30] we state here a similar result. Lemma 6.4. Let the numbers hqn , q > 0,i τ > 0, 0 < δ < 0 and let P0 , P1 be such that P1 dP dP1 ≥ τ ≥ 1 − δ. Then

1 2

be fixed, let θ1 be a real number such that |θ1 | ≥ 1−2δ

q

Rn (Tn , θ1 ) ≥

2q

(1 − δ) τ qnq (2δ) (1 − 2δ) q q (1 − 2δ) + τ qnq (2δ)

·

(6.2)

Proof. Remark that gn,β1 (x0 ) εn,β1 · fn,0 (x0 ) gn,β1 (x0 ) εn,β1 · fn,0 (x0 ) · |θ1 | = − ≥ ψn,β − ψn,β1 ψn,β1 ψn,β1 1   εn,β1 ·fn,0 (x0 ) g 1 (x0 ) e By Lemma 6.3, 1 − δ < n,β < 1. By the definition of ε , ≤ O h = o (1) which allows n,β n,β 1 1 ψn,β1 ψn,β1 εn,β1 ·fn,0 (x0 ) us to consider < δ, for sufficiently large n. Then |θ1 | ≥ 1 − 2δ. ψn,β 1

28

C. BUTUCEA

o n 2 Let B = {|Tn | ≥ 2δ (1 − 2δ)}. For δ < 12 , the event |Tn − θ1 | ≥ (1 − 2δ) contains B and we deduce that h i   2 P1 |Tn − θ1 | ≥ (1 − 2δ) ≥ P1 B . Then: n  o q q 2q Rn (Tn , θ1 ) ≥ inf max qnq (2δ) (1 − 2δ) · P0 [B] , (1 − 2δ) · P1 B · Tn

n Moreover, let A =

dP0 dP1

≥τ

(6.3)

o and suppose for the moment that P1 [A] ≥ 1 − δ

(6.4)

 for arbitrary small δ ∈ 0, 12 , and τ > 0. Then  P0 [B] = E1

 dP0 I (B) ≥ τ P1 [A ∩ B] ≥ τ (P1 [B] − δ) dP1

and we use this to bound from below the right hand expression in (6.3) to get n  o q 2q Rn (Tn , θ1 ) ≥ inf max (qn 2δ (1 − δ)) τ (P1 [B] − δ) , (1 − 2δ) P1 B Tn n o q 2q ≥ inf max (qn 2δ (1 − δ)) τ (t − δ) , (1 − 2δ) (1 − t) 0≤t≤1

q



2q

(1 − δ) τ qnq (2δ) (1 − 2δ) q q (1 − 2δ) + τ qnq (2δ)

·

Let us prove that (6.4) holds for n large enough, τ > 0. We write " n # " n # Y fn,0 (Xi ) X 1 fn,0 (Xi ) log τ √ P1 [A] = P1 ≥ τ = P1 log ≥√ · f (Xi ) fn,1 (Xi ) log n log n i=1 n,1 i=1 f

1 Consider the random variables Zn,i = √log log fn,0 (Xi ), for i = 1, . . . , n, n ≥ 1, which are independent and n n,1 identically distributed variables within each series. Let V1 [Zn,i ] be the variance with respect to the distribution of X1 , . . . , Xn when the underlying probability density is fn,1 and let Un,i = (Zn,i − E1 [Zn,i ]) /σn .  Lemma 6.5. We have, for arbitrary small δ ∈ 0, 12 :  2 n P √ 2 1. E1 [Zn,i ] ≥ − log n 4βq 1 1 − δ4 ; i=1  2 n P 2 2. σn2 = V1 [Zn,i ] ≤ 2βq 1 1 − δ4 ; h i=1 i h i n P 3 3 3. E1 |Un,i | < +∞ and lim E1 |Un,i | = 0. n→∞ i=1

Proof. 1. Let us recall that there exists a fixed a0 (δ) ∈ (0, 1) such that −x −

x2 2

  δ 1+ ≤ log(1 − x) ≤ −x, 2

(6.5)

EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

29

for any 0 < x < a0 (δ). Then     Z n gn,β1 − log (1 − εn,β1 ) + log 1 − E1 [Zn,i ] = √ (x) fn,1 (x) dx . fn,1 log n i=1

n X

As εn,β1 → 0 and supgn,β1 /fn,1 (x) → 0 when n → ∞, we may suppose without loss of generality that they are x

smaller than a0 (δ) and apply the inequality (6.5). Then, for n large enough, n X i=1

  Z 2 gn,β1 δ n √ 1+ (x) dx 2 fn,1 log n ! √   Z 1 x − x0 dx δ qfn,0 (x0 ) log n 2 e Kβ1 ≥− 1+ e e 2 2 2β1 fn,1 (x) hn,β1 hn,β1  2

p δ q e 2 ≥ − log n 1 +

Kβ1 (1 + o (1)) 2 4β1 2   2 2 p δ q ≥ − log n 1 − (1 + o (1)) , 4 4β1

E1 [Zn,i ] ≥ −

1 2

where we used Lemma 6.2, Bochner’s lemma and the fact that fn,0 /fn,1 (x0 ) = o (1) and becomes less than 1 + δ/2 when n is large enough. 2. For the variance, we have similarly    n n gn,β1 2 2 σn ≤ E1 log 1 − (X1 ) ≤ log n fn,1 log n   2 2

δ q δ2 q

e 2 ≤ 1+

Kβ1 ≤ 1 − 2 2β1 4 2β1 2

Z

2 gn,β 1 1 (x) dx + fn,1 2

!  Z 4 gn,β1 δ 1+ (x) dx 2 fn,1

for n large enough. 3. By similar considerations, we get:    Z 2  

gn,β1 gn,β1 log n δ q

e 2 E1 log2 1 − (X1 ) ≥ (x) dx ≥ 1−

Kβ1 fn,1 fn,1 n 2 2β1 2 and 

 2 gn,β1 E1 log 1 − (X1 ) ≤ fn,1

−εn,β1

1 − 2

!2  Z 2 gn,β1 δ log n 1+ (x) dx = o (1) · 2 fn,1 n

From those two last inequalities, we conclude that σn2 ≥ 1 − n P exists lim inf V1 [Zn,i ] = σ2 > 0, for δ ∈ (0, 1/2).Then

 δ 2 q 2 2β1



e 2

Kβ1 for n large enough. Then there 2

n→∞ i=1

h i 3 E1 |Ui,n | ≤

  3 gn,β1 . E log 1 − (X ) 1 i 3/2 fn,1 (log n) O (1)

30

C. BUTUCEA

Let us note that

"   3 # Z g |gn,β1 |3 n,β 1 E1 log 1 − (Xi ) ≤ O (1) (x) dx 2 fn,1 fn,1 ≤

3β −1/2 O (1) e hn,β11 e

1

hn,β1

 ≤ O (1)

log n n

Z e Kβ1

x − x0 e hn,β1

! 3 dx 2 fn,1 (x)

 3β12β−1/2 1

= o(1).

Thus −1/2   4β1   β12β h i 1 1 1 1 3 E1 |Ui,n | ≤ O (1) = o (1) . log n n i=1

n X

So, the variables (Un,i )1≤i≤n,n≥1 are centered,

n P

V1 [Un,i ] = 1 and they satisfy the property 3 of Lemma 6.5.

i=1

Then we may apply Lyapounov’s central limit theorem and get the following convergence in law to the standard Gaussian distribution: Un =

n X

D

Un,i → N (0, 1) , as n → ∞.

i=1

We take mn =

log τ −n·E1 Zn,1 √ n·V1 Zn,1

and see that "

log τ − nE1 Zn,1 p Un ≥ nV1 Zn,1

P1 [A] = P1  Let us choose τ = n−ξ , ξ = − 4βq 1 1 −

δ2 4

# = P1 [Un ≥ mn ] .

 and use Lemma 6.5 r p 1 q δ2 mn ≤ − log n · 2 2β1 4

Then, it is easy to see that mn → −∞, for arbitrarily small δ ∈ (0, 1/2) which implies that P1 [A] → 1. This n→∞

proves that (6.4) holds for n large enough and δ ∈ (0, 1/2) arbitrarily small. By (6.1) and (6.2), we conclude that: (1 − δ) τ qnq (2δ)q (1 − 2δ)2q inf sup Rn,β (fbn , ψn,β ) ≥ · q q (1 − 2δ) + τ qnq (2δ) fbn β∈B

(6.6)

As, furthermore: lim inf τ qnq n→∞

− 4βq

≥ lim inf n n→∞

≥ lim inf exp n→∞

 1



2

1− δ4



q log n 4



ψn,β1 ψn,βN −1

q

1 δ2 1 log log n − − β1 4 βN −1 log n



1 1 − β1 βN −1



EXACT ADAPTIVE POINTWISE ESTIMATION ON SOBOLEV CLASSES OF DENSITIES

which is ∞, for arbitrarily small δ ∈ (0, 1/2). Then by (6.6) we get the stated result (2.16).

31 2

The author is very grateful to Alexandre Tsybakov for his generous help.

References [1] A. Barron, L. Birge and P. Massart, Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 (1995) 301-413. [2] O.V. Besov, V.L. Il’in and S.M. Nikol’skii, Integral representations of functions and imbedding theorems. J. Wiley, New York (1978). [3] L. Birge and P. Massart, From model selection to adaptive estimation, Festschrift fur Lucien Le Cam. Springer (1997) 55-87. [4] L.D. Brown and M.G. Low, A constrained risk inequality with application to nonparametric functional estimation. Ann. Statist. 24 (1996) 2524-2535. [5] C. Butucea, The adaptive rates of convergence in a problem of pointwise density estimation. Statist. Probab. Lett. 47 (2000) 85-90. [6] C. Butucea, Numerical results concerning a sharp adaptive density estimator. Comput. Statist. 1 (2001). [7] L. Devroye and G. Lugosi, A universally acceptable smoothing factor for kernel density estimates. Ann. Statist. 24 (1996) 2499-2512. [8] D.L. Donoho, I. Johnstone, G. Kerkyacharian and D. Picard, Wavelet shrinkage: Asymptopia? J. R. Stat. Soc. Ser. B Stat. Methodol. 57 (1995) 301-369. [9] D.L. Donoho, I. Johnstone, G. Kerkyacharian and D. Picard, Density estimation by wavelet thresholding. Ann. Statist. 24 (1996) 508-539. [10] D.L. Donoho and M.G. Low, Renormalization exponents and optimal pointwise rates of convergence. Ann. Statist. 20 (1992) 944-970. [11] S.Yu. Efromovich, Nonparametric estimation of a density with unknown smoothness. Theory Probab. Appl. 30 (1985) 557-568. [12] S.Yu. Efromovich and M.S. Pinsker, An adaptive algorithm of nonparametric filtering. Automat. Remote Control 11 (1984) 1434-1440. [13] A. Goldenshluger and A. Nemirovski, On spatially adaptive estimation of nonparametric regression. Math. Methods Statist. 6 (1997) 135-170. [14] G.K. Golubev, Adaptive asymptotically minimax estimates of smooth signals. Problems Inform. Transmission 23 (1987) 57-67. [15] G.K. Golubev, Quasilinear estimates for signals in L2 . Problems Inform. Transmission 26 (1990) 15-20. [16] G.K. Golubev, Nonparametric estimation of smooth probability densities in L2 . Problems Inform. Transmission 28 (1992) 44-54. [17] G.K. Golubev and M. Nussbaum, Adaptive spline estimates in a nonparametric regression model. Theory Probab. Appl. 37 (1992) 521-529. [18] I.A. Ibragimov and R.Z. Hasminskii, Statistical estimation: Asymptotic theory. Springer-Verlag, New York (1981). [19] A. Juditsky, Wavelet estimators: Adapting to unknown smoothness. Math. Methods Statist. 6 (1997) 1-25. [20] G. Kerkyacharian and D. Picard, Density estimation by kernel and wavelet method, optimality in Besov space. Statist. Probab. Lett. 18 (1993) 327-336. [21] G. Kerkyacharian, D. Picard and K. Tribouley, Lp adaptive density estimation. Bernoulli 2 (1996) 229-247. [22] J. Klemel¨ a and A.B. Tsybakov, Sharp adaptive estimation of linear functionals, Pr´ epublication 540. LPMA Paris 6 (1999). [23] O.V. Lepskii, On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 (1990) 454-466. [24] O.V. Lepskii, Asymptotically minimax adaptive estimation I: Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 (1991) 682-697. [25] O.V. Lepskii, On problems of adaptive estimation in white Gaussian noise. Advances in Soviet Math. Amer. Math. Soc. 12 (1992b) 87-106. [26] O.V. Lepski and B.Y. Levit, Adaptive minimax estimation of infinitely differentiable functions. Math. Methods Statist. 7 (1998) 123-156. [27] O.V. Lepski, E. Mammen and V.G. Spokoiny, Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 (1997) 929-947. [28] O.V. Lepski and V.G. Spokoiny, Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist. 25 (1997) 2512-2546. [29] D. Pollard, Convergence of Stochastic Processes. Springer-Verlag, New York (1984). [30] A.B. Tsybakov, Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes. Ann. Statist. 26 (1998) 2420-2469. [31] S. Van de Geer, A maximal inequality for empirical processes, Technical Report TW 9505. University of Leiden, Leiden (1995).

Suggest Documents