Electronic Journal of Statistics Vol. 2 (2008) 993–1020 ISSN: 1935-7524 DOI: 10.1214/07-EJS127
arXiv:0710.1688v2 [math.ST] 27 Oct 2008
Adaptive estimation of linear functionals by model selection B´ eatrice Laurent Institut de Math´ ematiques (UMR 5219), INSA de Toulouse, Universit´ e de Toulouse, France e-mail:
[email protected]
Carenne Lude˜ na IVIC, Venezuela e-mail:
[email protected]
Cl´ ementine Prieur Institut de Math´ ematiques (UMR 5219), INSA de Toulouse, Universit´ e de Toulouse, France e-mail:
[email protected] Abstract: We propose an estimation procedure for linear functionals based on Gaussian model selection techniques. We show that the procedure is adaptive, and we give a non asymptotic oracle inequality for the risk of the selected estimator with respect to the Lp loss. An application to the problem of estimating a signal or its r th derivative at a given point is developed and minimax rates are proved to hold uniformly over Besov balls. We also apply our non asymptotic oracle inequality to the estimation of the mean of the signal on an interval with length depending on the noise level. Simulations are included to illustrate the performances of the procedure for the estimation of a function at a given point. Our method provides a pointwise adaptive estimator. AMS 2000 subject classifications: 62G05, 62G08. Keywords and phrases: Nonparametric regression, white noise model, adaptive estimation, linear functionals, model selection, pointwise adaptive estimation, oracle inequalities. Received October 2007.
1. Introduction We consider the following model: σ Y (t) = hs, ti + √ L (t) , for all t ∈ H, n
(1.1)
where H is a separable Hilbert space endowed with the scalar product h., .i and L is some centered Gaussian isonormal process, which means that L maps isometrically H onto some Gaussian subspace of L2 (Ω), where (Ω, G, P ) is some canonical probability space. This framework includes the finite dimensional Gaussian regression model, the Gaussian sequence model and the multivariate white noise model. 993
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
994
Let T be a linear functional defined over S ⊂ H. In this paper we consider the problem of estimating T (s), based on the observation of (Y (t), t ∈ H). Our main goal will be to develop procedures which adapt to the smoothness of the underlying function s in the framework of model selection as proposed by Barron et al. [3]. Minimax theory for estimating linear functionals is well developed in the Gaussian setting. Ibragimov and Hasminskii [16] obtained the best minimax linear estimator over classes of smooth functions. For any convex parameter space F , the minimax mean squared error is of the order of the modulus of continuity of the functional over F (see Donoho and Liu [13], Donoho [12] and Cai and Low [8, 9], the latter being a generalization to certain non convex parameter spaces). These authors have also constructed procedures which have maximum risk close, up to a small factor, to the minimax rates. However, these rates cannot be attained when dealing with adaptive estimation over several classes of parameters. For the problem of estimating a function at a given point, Lepski [19] showed that it is necessary to include a logarithmic factor in the mean squared error when dealing simultaneously with two Lipschitz classes. For general parameter spaces, Cai and Low [11, 10] show that it is necessary to include a between class modulus of continuity to quantify precisely the degree of adaptability for the estimation of a linear functional with respect to the mean squared error. They also proposed an adaptive estimator based on multiple tests over an ordered sequence of parameter spaces. Their methodology thus resembles “Lepski’s method” (see for example [19, 20, 21]) in the sense that the estimation procedure chooses the best possible over a finite selection of parameter spaces. This point of view is also developed in Klemel¨a and Tsybakov [17], where the authors construct an asymptotically sharp adaptive estimator of T (s) based on kernel methods. They assume that the signal s belongs to a class of regular functions, the index of regularity being bounded from above and below by known constants. Lepski and Spokoiny [23], Lepski, Mammen and Spokoiny [22] propose methods based on kernel estimates with variable bandwidth selector for pointwise adaptive estimation in the Gaussian white noise model. Model selection methods for adaptive estimation have been initiated in a series of papers by Birg´e and Massart (see for example Birg´e and Massart [5, 6], Barron, Birg´e and Massart [3]). These methods have been used in the framework of the regression with fixed or random design, to estimate the regression function by Baraud [1] and [2]. In this article, following Birg´e [4] we take a model selection point of view at adaptive estimation via Lepski’s method. In order to construct the adaptive estimator of the linear functional we shall choose among an ordered family of finite dimensional linear subspaces of H. Over each subspace we consider an estimator based on projection methods and the problem is thus establishing a best possible procedure for determining the subspace. The main issue here is that, unlike the case of penalized least squares, the bias of the estimator is not a monotonically decreasing sequence over the family of nested subspaces. Hence it is necessary to modify the procedure as developed by Birg´e [4] in order to obtain an appropriate estimator of the bias. The main advantage of our formulation is that it allows to obtain non asymptotic oracle inequalities
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
995
for general linear functionals. In the framework of the Gaussian sequence model Yi = θi + σξi , i ∈ N where the ξi′ s are i.i.d. standard Gaussian variables, Golubev and Levit [14] obtain an oracle inequality for the estimation of a general linear functional of θ = (θi )i∈N . They assume that the θi′ s are independent centered Gaussian variables, this is not the framework that we consider in the present paper. We propose a general method to estimate a linear functional of s in the framework of Model (1.1). We apply this general procedure to estimate the value of the rth derivative of a function at a point. For this problem, we provide minimax rates uniformly over Besov balls that correspond to the rates established by Lepski [19]. We also give an application of our procedure to pointwise adaptive estimation in a multidimensional framework. Moreover, since we have obtained a non asymptotic oracle inequality, we are able to apply our result to the estimation of linear functionals that depend on the noise level (or on the number of observations). In the white noise model, we consider the estimation of the mean of the signal on an interval with length depending on the noise level. The interesting fact in this case is that we obtain two kinds of rates of convergence, according to the relationship between the length of the interval, the noise level, and the regularity of the signal. When the length of the interval is too small, this problem is as hard as estimating the signal at some fixed point and when the length of the √ interval is large, the functional can be estimated at the parametric rate 1/ n. All intermediate rates are obtained as the length of the interval grows. We present simulation results to estimate a function at a point, and we compare our method to a global (not pointwise) model selection estimator and to an estimator based on wavelet shrinkage. Our method provides a locally adaptive estimator of a regression function s on [0, 1]. It has good properties when estimating functions that are very oscillating over some regions and nearly flat over other ones. The article is organized as follows. In Section 2 we present the framework, the estimation procedure and our main result. In Section 3 we develop three examples: estimating the value of the rth derivative of a function at a point using a multiresolution analysis, estimating the mean of the signal on an interval with length depending on the noise level and estimating the value of a multidimensional function at a point. In Section 4 we present the simulation study. Proofs of our main results are given in Section 5. 2. Main results 2.1. The framework Given some separable Hilbert space H, endowed with the scalar product h., .i, one observes (Y (t), t ∈ H) as defined by Model (1.1). Since L is some centered Gaussian isonormal process defined on H, we have that for all t, u ∈ H, Cov(L(t), L(u)) = ht, ui.
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
996
Let us first consider three particular cases of Model (1.1). The finite dimensional Gaussian regression. One observes Yi = si + εi , i = 1 . . . , n where (ε1 , . . . , εn ) are independent standard normal variables. Pn We consider H = Rn endowed with the scalar product hx, yi = n1 i=1 xi yi and set s = (s1 , . . . , sn ). Model (1.1) is obtained P by setting, for all t = (t1 , . . . , tn ) ∈ Rn , Y (t) = P n n 1 1 √ i=1 ti εi . i=1 ti Yi and L(t) = n n The Gaussian sequence model. In the Gaussian sequence model, one observes
1 Yλ = βλ + √ ελ , λ ∈ N∗ , n
(2.2)
where (ελ )λ∈N∗ is a sequence of independent standard normal variables. P Setting H = l2 (N∗ ) endowed with the usual scalar product hβ, γi =P λ∈N∗ βλ γλ and s = (βλP )λ∈N∗ , we define for any t = (αλ )λ∈N∗ ∈ H, Y (t) = λ∈N∗ αλ Yλ and L(t) = λ∈N∗ αλ ελ and we see that (2.2) is a particular case of Model (1.1). The multivariate white noise model. One observes Z 1 1I[0,x1 ]×···×[0,xd ] (u)s(u)du + √ W (x) Z(x) = n d [0,1]
for all x = (x1 , . . . , xd ) ∈ [0, 1]d , where W is the standard Wiener Process on d [0, 1]d. We consider R its usual scalar product. R H = L2 ([0, 1] ) endowed with We set Y (t) = [0,1]d t(u)dZ(u) and L(t) = [0,1]d t(u)dW (u). Our purpose is to propose new adaptive estimators of T (s), where T is a linear functional, from observation (1.1). 2.2. The estimation procedure We consider a finite or countable collection (Sm , m ∈ M) of linear subspaces of H. For all m ∈ M, we can define the estimator sˆm of s, which is the projection estimator of s onto Sm . Given some orthonormal basis (φλ , λ ∈ Λm ) of Sm , it is natural to consider the projection estimator X Y (φλ )φλ . sˆm = λ∈Λm
It is easy to verify that sˆm = argminv∈Sm kvk2 − 2Y (v) ,
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
997
which shows that sˆm does not depend on the particular choice of the basis (φλ , λ ∈ Λm ). It is natural to estimate T (s) by T (ˆ sm ). Let sm denote the orthogonal projection of s onto Sm . Since T is a linear functional, E (T (ˆ sm )) = T (sm ). Hence, the quadratic risk of the estimator T (ˆ sm ) can be decomposed into a variance term and a bias term: i i h h 2 2 2 sm ) − T (sm )) . E (T (ˆ sm ) − T (s)) = (T (sm ) − T (s)) + E (T (ˆ
The variance term can be easily computed, by using the properties of the isonormal process L. h i σ2 X 2 2 T 2 (φλ ) := σm . E (T (ˆ sm ) − T (sm )) = n λ∈Λm
Our aim is to find an estimator among the collection (T (ˆ sm ), m ∈ M) that minimizes the quadratic risk 2
2 (T (sm ) − T (s)) + σm .
Model selection by penalized criterion has been introduced by Barron et al. [3] and used in the framework defined by model (1.1) for the estimation of the whole object s by Birg´e and Massart [5], and by Laurent and Massart [18] for the estimation of quadratic functionals of s. Usually, the bias term, or the sum of this bias term and a term that does not depend on m ∈ M, is estimated, and the methods proposed in previous papers consist in minimizing over m ∈ M this estimation of the bias term, plus some penalty term pen(m), which has to be suitably chosen. For example, when one estimates s, the bias term appearing in the quadratic risk E(ks − sˆm k2 ) equals ks − smk2 . Using Pythagoras’equality, this bias term equals ksk2 − ksm k2 . Hence, minimizing ks − sm k2 + pen(m) is equivalent to minimize −ksm k2 + pen(m), and one can easily find an unbiased estimator of −ksm k2 . 2 In our case, the bias term equals (T (sm ) − T (s)) , and this expression cannot be simplified as previously nor estimated. Therefore, in order to use model selection by penalized criterion methods, we introduce a new criterion. We assume that M is a subset of N. This implies in particular that M is ordered. The criterion which is introduced in Definition 1 aims at finding m ∈ M which minimizes sup |T (sm ) − T (sj )| + σm . j≥m,j∈M
Definition 1 Let (Sm , m ∈ M) be a finite or countable collection of linear subspaces of H. For all m ∈ M, let (φλ , λ ∈ Λm ) be an orthonormal basis of Sm and let X Y (φλ )φλ . sˆm = λ∈Λm
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
We define, for all m ∈ M, pen(m) =
998
√ 2xm σm ,
where (xm , m ∈ M) is a sequence of nonnegative real numbers. We set, for all j, m ∈ M, " 2 # X X σ2 2 σj,m = E T (φλ )L(φλ ) − T (φλ )L(φλ ) n λ∈Λm
λ∈Λj
and H(j, m) =
p 2xj,m σj,m
where (xj,m , (j, m) ∈ M2 ) is a sequence of nonnegative real numbers. We define for all m ∈ M
and
d Crit(m) =
sup j≥m,j∈M
m ˆ = inf
[|T (ˆ sm ) − T (ˆ sj )| − H(j, m)] + pen(m),
1 d d m ∈ M, Crit(m) ≤ inf Crit(j) + j∈M n
(2.3)
.
We estimate the linear functional T (s) by T (ˆ sm ˆ ).
In the following Theorem, we give an upper bound for the risk with respect to the Lp loss of the estimator T (ˆ sm ˆ ). Theorem 1 Let H be some separable Hilbert space endowed with the scalar product h., .i. One observes the Gaussian process {Y (t), t ∈ H}, where Y (t) is given by (1.1). Let T be some linear functional defined on S ⊂ H. Let M ⊂ N and let (Sm , m ∈ M) be some finite or countable collection of linear subspaces of H. Let T (ˆ sm ˆ ) be defined in Definition 1. Let for all m ∈ M, Crit(m) =
sup j≥m,j∈M
|T (sj ) − T (sm )| + pen(m).
Let m∗ be defined by 1 m∗ = inf m ∈ M / Crit(m) ≤ inf Crit(l) + . l∈M n Then, for all p ≥ 1, there exists some positive constant C(p) depending on p only such that p
p
p
p ∗ E (|T (ˆ sm ˆ ) − T (s)| ) ≤ C(p) ((Crit(m )) + |T (sm∗ ) − T (s)| + σm∗ ) ! X X 1 p ∗ −xm p −xj,m∗ p + C(p) sup (H (m , j)) + σm + e e σj,m∗ + p . (2.4) n j≤m∗ ∗ m∈M
j≥m
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
999
Comments: d sm ) with the • In the definition of Crit(m) given in (2.3), we compare T (ˆ estimators T (ˆ sj ) for j > m. This is a common point of our procedure with the initial method from Lepski [19, 20, 21, 23]. An important difference between our procedure and that of Lepski, however, is that we dissociate in (2.3) the terms H(j, m) and pen(m). This allows us to obtain non asymptotic results based on Gaussian concentration inequalities. Let us explain the main ideas underlying the definition of our estimator and how we obtain non asymptotic oracle inequalities for general linear functionals. As mentioned above, our goal is to minimize the criterion Crit(m) =
sup j≥m,j∈M
|T (sj ) − T (sm )| + pen(m).
The first term in this expression is a bias term, and the second one is closely related to the standard deviation of T (ˆ sm ). The unknown criterion Crit(m) d is estimated by Crit(m) defined by (2.3). This criterion involves the term H(j, sm ) − T (ˆ sj )| multiplied by p m) which is the standard deviation of |T (ˆ 2xj,m . For a suitable choice of xj,m , we prove that, with high probability, sup
j≥m,j∈M
[|T (ˆ sm ) − T (ˆ sj )| − H(j, m)] ≤
sup
j≥m,j∈M
|T (sj ) − T (sm )| .
This implies that, with high probability, d ∀m ∈ M, Crit(m) ≤ Crit(m).
d Since m ˆ is a minimizer of Crit(m), we obtain that, with high probability, d m) Crit( ˆ ≤ inf Crit(m), m∈M
which leads to an oracle inequality. We show that, up to remainder terms of smaller order, the risk of the estimator T (ˆ sm ˆ ) with respect to the Lp loss behaves as well as the risk of the “best” estimator of the collection. Our procedure is also easily implementable as we will see in the simulation study. • In order to prove Theorem 1, we do not have to assume that the family (Sm , m ∈ M) is nested. We just need to order this family. If for example H = L2 ([0, 1]), we can mix spaces generated by several kinds of orthonormal bases, for example, a wavelet basis, the Fourier basis, a spline basis. Let us explain how to extend our procedure in the case where we consider L different bases. We set M = {(l, m), l ∈ L, m ∈ Ml } , where L = {1, 2, . . . , L} and Ml ⊂ N. For all m ∈ Ml , and l ∈ {1, 2, . . . , L}, we set d l (m) = Crit
sup j≥m,j∈Ml
[|T (ˆ sm ) − T (ˆ sj )| − H(j, m)] + pen(m).
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1000
We define ˆ d l (m) ≤ (l, m) ˆ = inf (l, m) ∈ M, Crit
d k (j) + 1 inf Crit n (k,j)∈M
,
where M is ordered by the lexicographical order. d • It would be simpler to define m ˆ as a minimizer of Crit(m), but such a minimizer may not exist, or not be unique. This explains why we add the term 1/n in the definition of m ˆ given in Definition 1. We shall derive in the next section applications of Theorem 1 to adaptive results in the minimax sense for pointwise estimation and for the estimation of the mean of the signal on some interval. 3. Minimax results 3.1. Pointwise adaptive estimation Assume we observe (Y (u), u ∈ [0, 1]) which obeys the Gaussian white noise model: Z u σ (3.5) Y (u) = s(x)dx + √ W (u), u ∈ [0, 1], n 0 where s ∈ H = L2 ([0, 1]) and W is a standard Brownian motion. Let r ≥ 0, we assume that s(r) exists and that s(r) ∈ C([0, 1]), the set of continuous functions on [0, 1]. We consider the problem of estimating T (s) = s(r) (x0 ) for some fixed x0 ∈ [0, 1]. We introduce the following notation: let {Sj , j ≥ 0} be a multiresolution analysis with father wavelet ϕ and mother wavelet ψ (see for example [15]). Define ϕj,k (x) ψj,k (x)
= 2j/2 ϕ(2j x − k) , x ∈ [0, 1] , j ≥ 0 and k ∈ Z ; = 2j/2 ψ(2j x − k) , x ∈ [0, 1] , j ≥ 0 and k ∈ Z .
For all m ≥ 0, Sm denotes the linear space spanned by the functions (ϕm,k , k ∈ Z). We recall that sm denotes the orthogonal projection of s onto Sm : X sm = hs, ϕm,k iϕm,k , k∈Z
and that sˆm denotes the estimator of s based on the model Sm : XZ 1 sˆm = ϕm,k (u)dY (u)ϕm,k . k∈Z
0
We will consider compactly supported wavelets ϕ for which the above sum is finite.
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1001
For all α > 0 and 1 ≤ q ≤ +∞, the notation ([0, 1]) is used for the classical Besov space endowed with the norm k · kα,∞,q (see for example [15, Definition 9.2]). We denote by Bα,∞,q (L) the set of functions s in Bα,∞,q ([0, 1]) such that kskα,∞,q ≤ L. Assume that the following conditions are satisfied by ϕ and ψ : (i) ∃ M ≥ 0 such that supp(ϕ) and supp(ψ) are included in [−M, M ]. (ii) ∃ K ≥ 0 such that kϕk R ∞ ∨ kψk∞ ≤ K. (iii) ∃ N ≥ 0 such that xn ψ(x)dx = 0 for n = 0, . . . , N . (iv) We assume that ϕ(r) exists and is bounded on supp(ϕ) by Kr . Corollary 1 Let dn denote the integer part of ln(n)/ln(2) and let M = {1, . . . , dn }. For all m ∈ M, let Sm be the linear space spanned by the functions (ϕm,k , k ∈ Z). For p ≥ 1, we define p ∀m ∈ M, xm = ln 2m(1+2r) , 2 p ∀(j, m) ∈ M2 if j > m, xj,m = ln 2j(1+2r) − 2m(1+2r) , xm,m = 0. 2 Let m ˆ be defined as in Definition 1. Let r < α. There exist some constants C depending on α, p, q and r and C(σ) depending on σ such that, for any integer n satisfying nL2 /(σ 2 ln(n)) ≥ 21+2α and n2α σ 2 ln(n) ≥ L2 , sup s∈Bα,∞,q (L)
p (r) (r) E ˆ sm (x ) − s (x ) 0 0 ˆ
≤ CL
p(1+2r) 1+2α
+ C(σ)
σ 2 ln n n
ln n . np/2
p(α−r) 2α+1
Comments on the optimality of the result stated in Corollary 1 are given in Subsection 3.3, where the multidimensional case is considered. 3.2. Estimation of the mean of the signal on an interval As above, we observe (Y (u), u ∈ [0, 1]) defined by (3.5). We use the same notation as in Section 3.1. We now consider the problem of estimating the linear functional Z 1 s(x)dx T (s) = Hn IHn where IHn is an interval included in [0, 1] with length Hn that may depend on n. Corollary 2 Let (Y (u), u ∈ [0, 1]) defined by (3.5). R Let IHn be some interval included in [0, 1] with length Hn . Let T (s) = IH s(x)dx/Hn . Let mn = n sup {m ∈ N, 2m ≤ 1/Hn }. Let M = {0, 1, . . . mn + 1}. For all m ∈ M\ {mn + 1}, let Sm be the linear space spanned by the functions (ϕm,k , k ∈ Z) and let Smn +1 be the linear space spanned by the indicator function of the interval IHn .
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1002
For p ≥ 1, we define
p pm ∀m ∈ M, m ≤ mn , xmn +1 = ln(1/Hn ) 2 2 p(j ∨ m) = ∀(j, m) ∈ M2 , j 6= m and xm,m = 0 2
xm = xj,m
Let m ˆ be defined as in Definition 1. Let n be any integer satisfying nL2 /σ 2 ln(n) ≥ 1. Then, for all α > 0, q ≥ 1, p ≥ 1, there exist some constants C depending on α, p and q and C(σ) depending on σ such that the following inequalities hold: If Hn ≤ σ 2 ln(n)/nL2
s∈Bα,∞,q (L)
p
E (|T (ˆ sm ˆ ) − T (s)| ) ≤ CL 1+2α
If Hn ≥ σ 2 ln(n)/nL2 s∈Bα,∞,q (L)
, p
sup
sup
1/(1+2α)
1/(1+2α)
σ 2 ln n n
pα 2α+1
+
C(σ) . np/2
, p
E (|T (ˆ sm ˆ ) − T (s)| ) ≤ C
C(σ) σp p/2 (ln(1/Hn )) + p/2 . p/2 (nHn ) n
Comments: • The rates of convergence that we obtain depend on the relation between Hn , n, σ and the regularity of the signal (via the parameters α and L). If 1/(1+2α) Hn ≥ σ 2 ln(n)/nL2 , the best estimator is the “naif” estimator R √ dY (u)/H , which is unbiased and achieves the rate 1/ nHn . Note n IHn that in our result, we loose a logarithmic term (ln(1/Hn )) for the adaptation to the unknown regularity of the√signal. When Hn is independent of n, we recover the parametric rate 1/ n for the estimation of T (s). 1/(1+2α) If Hn ≤ σ 2 ln(n)/nL2 , we obtain the same rates for the estimation of T (s) as for the estimation of the signal s at one point. In this case, the “naif” estimator has a too large variance, and one takes advantage of considering estimators which are biased, but with smaller variance. • Our procedure is adaptive with respect to the unknown link between Hn and the regularity of the signal and allows us to obtain the optimal rates (up to logarithmic terms due to adaptation) in both cases as explained below. • It was possible to establish the upper bounds given in Corollary 2 since the result stated in Theorem 1 is non asymptotic. We have indeed considered here a linear functional that depends on n. Lower bounds: Using the results of Donoho and Liu [13], one can show that, up to logarithmic olderian balls. terms, the upper bounds given in Corollary 2 are optimal over H¨ The lower bounds are given in the following lemma.
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1003
Lemma 1 Let (Y (u), u ∈ [0, 1]) defined by (3.5). Let R 0 ≤ Hn ≤ 1/2 and IHn = [0, Hn ], we consider the linear functional T (s) = IH s(x)dx/Hn . Set, for all n α ∈]0, 1] and L > 0, Hα (L) = {f : [0, 1] → R, ∀x, y ∈ [0, 1], |f (x) − f (y)| ≤ L|x − y|α } . If Hn ≥ (1 + 2α)/(2nL2 ) inf Tn
1/(1+2α)
,
C(α, σ) E (Tn − T (s))2 ≥ nHn s∈Hα (L) sup
(3.6)
where C(α, σ) is a constant depending on α and σ and where the infimum is taken over all possible estimators. 1/(1+2α) If Hn ≤ (1 + 2α)/(nL2 ) /2, and nL2 ≥ 1 + 2α, inf
sup
Tn s∈Hα (L)
E (Tn − T (s))2 ≥ C(α, σ)L2/(1+2α) n−2α/(1+2α) .
(3.7)
3.3. Multidimensional pointwise adaptive estimation One observes the Gaussian white noise model Z 1 Z(x) = 1I[0,x1 ]×···×[0,xd ] (u)s(u)du + √ W (x) n [0,1]d for all x = (x1 , . . . , xd ) ∈ [0, 1]d , where W is the standard Wiener Process on [0, 1]d. We consider H = L2 ([0, 1]d ) endowed with its usual scalar product. We assume that s ∈ C([0, 1]d ), the set of continuous functions on [0, 1]d. Let x0 ∈ [0, 1]d, we estimate T (s) = s(x0 ). Our aim is to obtain adaptive results in the minimax sense over isotropic H¨ older spaces defined as follows. For all α ∈]0, 1] and L > 0, let Hα (L) = f : [0, 1]d → R, ∀x, y ∈ [0, 1]d , |s(x) − s(y)| ≤ Lkx − ykα ∞ ,
where kx − yk∞ = sup1≤i≤n |xi − yi |. In order to estimate T (s), we use the Haar basis of L2 ([0, 1]d ). For all m ∈ M, let Sm be the space of piecewise constant functions on the +1 +1 sets [ 2km1 , k21m [× · · · × [ 2kmd , k2dm [ for all (k1 , . . . , kd ) ∈ {0, 1, . . . , 2m − 1}d . Corollary 3 Let dn denote the integer part of ln(n)/ (d ln(2)) and let M = {1, . . . , dn }. For p ≥ 1, we define pd ln(2m ) 2 pd ln(2j ). ∀ (j, m) ∈ M2 , xj,m = 2 ∀m ∈ M, xm =
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1004
Let m ˆ be defined in Theorem 1. For all α ∈]0, 1] and L > 0, the following 2α inequality holds if n/(σ 2 ln(n)) ≥ 2d+2α L−2 and L2 ≤ n d σ 2 ln(n) : p
sup s∈Hα (L)
E (|ˆ sm ˆ (x0 ) − s(x0 )| ) ≤ CL
pd d+2α
σ 2 ln n n
pα 2α+d
+ C(σ)
ln n , np/2
where C is a constant depending on p, α and d and C(σ) is a constant depending on σ. Comments. It follows from the results given in Lepski [19] and Brown and Low [7] that the rates obtained in Corollary 1 and in Corollary 3 are optimal. These authors showed that the logarithmic loss which appears in the rate of convergence compared with the minimax rates is unavoidable for adaptive estimators. The dependence of our upper bound for the risk with respect to the radius L of the Besov or H¨ olderian balls is the sharp one obtained by Klemel¨a and Tsybakov [17]. 4. Simulation study Throughout this section, we consider the finite dimensional Gaussian regression model. The regression functions that we consider are defined on [0, 1] by: s1 (x) = (x4 − x) sin(6x), s2 (x) = exp(−30|x − 0.75|) + exp(−30|x − 0.25|),
s3 (x) = x cos(2πx) 1I0 Crit(m) + 2x ≤ P Crit(m)
X
2
e−xj,m e−x/σj,m .
j≥m,j∈M
5.1.1. Proof of Lemma 2 We recall that, for all m ∈ M, sˆm =
X
Y (φλ )φλ .
X
Y (φλ )T (φλ ).
λ∈Λm
Since T is a linear functional, T (ˆ sm ) =
λ∈Λm
Moreover,
σ2 ). n Using the properties of the isonormal process L, Y (φλ ) ∼ N (hs, φλ i,
2 T (ˆ sj ) − T (ˆ sm ) ∼ N (T (sj ) − T (sm ), σj,m ).
Let X ∼ N (µ, v 2 ). For all x > 0, √ 2 P |X − µ| > 2x ≤ e−x/v which implies that
√ 2 (5.9) P |X| > |µ| + 2x ≤ e−x/v . √ √ √ Since for all a, b > 0, a + b ≤ a + b, we obtain that for all (j, m) ∈ M2 such that j ≥ m, for all x > 0, √ p P |T (ˆ sj ) − T (ˆ sm )| ≥ |T (sj ) − T (sm )| + 2xj,m σj,m + 2x q 2 ≤ P |T (ˆ sj ) − T (ˆ sm )| ≥ |T (sj ) − T (sm )| + σj,m 2xj,m + 2x/σj,m 2
≤ e−xj,m e−x/σj,m .
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1010
Finally, √ d P Crit(m) > Crit(m) + 2x √ ≤ P ∃j ≥ m, j ∈ M, |T (ˆ sj ) − T (ˆ sm )| − H(j, m) ≥ |T (sj ) − T (sm )| + 2x X √ ≤ P |T (ˆ sj ) − T (ˆ sm )| − H(j, m) ≥ |T (sj ) − T (sm )| + 2x . j≥m,j∈M
This concludes the proof of Lemma 2. • We first consider the case where m ˆ ≤ m∗ . Since for all m ∈ M, pen(m) ≥ 0 d and by definition of Crit(m), we have d m) Crit( ˆ
ˆ ≥ |T (ˆ sm sm∗ )| − H(m∗ , m) ˆ ) − T (ˆ ˆ ≥ |T (ˆ sm sm∗ ) − T (s)| − H(m∗ , m). ˆ ) − T (s)| − |T (ˆ
Since
d m) d ∗) + 1 Crit( ˆ ≤ Crit(m n
we obtain
1 ∗ d ∗ |T (ˆ sm ˆ + |T (ˆ sm∗ ) − T (s)| + . ˆ ) − T (s)| ≤ Crit(m ) + H(m , m) n
On the event {m ˆ ≤ m∗ },
H(m∗ , m) ˆ ≤ sup H(m∗ , j). j≤m∗
Hence, using Lemma 2, we obtain that for all x > 0, the probability of √ 1 ∗ ∗ |T (ˆ sm 2x + sup H(m , j) + |T (ˆ sm∗ ) − T (s)| + ˆ ) − T (s)| > Crit(m ) + n j≤m∗ ∩ {m ˆ ≤ m∗ } is bounded by X
2
e−xj,m∗ e−x/σj,m∗ .
j≥m∗
• We now consider the case where m ˆ > m∗ . We recall that 2 σm = var(T (ˆ sm )) =
σ2 X 2 T (φλ ). n λ∈Λm
Using inequality (5.9), we obtain that for all m ∈ M, √ √ P |T (ˆ sm ) − T (s)| ≥ |T (sm ) − T (s)| + 2x + σm 2xm 2
≤ e−xm e−x/σm .
(5.10)
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1011
This implies that √ 2x + pen(m) ˆ P |T (ˆ sm ˆ ) − T (s)| ≥ |T (sm ˆ ) − T (s)| + X 2 ≤ e−xm e−x/σm . m∈M
We notice that for all m ∈ M, d Crit(m) ≥ pen(m)
since
sup [|T (ˆ sm ) − T (ˆ sj )| − H(j, m)] ≥ [|T (ˆ sm ) − T (ˆ sm )| − H(m, m)]
j≥m
and the right hand side of the inequality is equal to 0. Using the inequalities d m) d ∗) + 1 , pen(m) ˆ ≤ Crit( ˆ ≤ Crit(m n
we obtain
√ 1 ∗ d P |T (ˆ sm 2x + Crit(m ) + ˆ ) − T (s)| ≥ |T (sm ˆ ) − T (s)| + n X 2 −xm −x/σm ≤ e . e m∈M
We use the following inequality that holds if m ˆ > m∗ : |T (sm ˆ ) − T (s)| ≤ sup |T (sj ) − T (s)| j≥m∗
d ∗ ) by Crit(m∗ ). Hence and we apply Lemma 2 with m = m∗ to control Crit(m the probability of √ 1 ∗ |T (ˆ sm ˆ ) − T (s)| ≥ sup |T (sj ) − T (s)| + 2 2x + Crit(m ) + n j≥m∗ ∩ {m ˆ > m∗ } is bounded by X X 2 2 e−xm e−x/σm + e−xj,m∗ e−x/σj,m∗ . (5.11) m∈M
j≥m∗ ,j∈M
Define Cm∗ = Crit(m∗ ) + sup H(m∗ , j) + sup |T (sj ) − T (s)| + j≤m∗
j≥m∗
and X = |T (ˆ sm sm∗ ) − T (s)| . ˆ ) − T (s)| , Y = |T (ˆ
1 n
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1012
It follows from (5.10) and (5.11) that for all x > 0, X − 2x X √ − x2 σ +2 e−xj,m∗ e j,m∗ .(5.12) e−xm e σm P X − Y > Cm∗ + 2 2x ≤ j≥m∗
m∈M
We have E(X p ) = E(X p IX≥Y +Cm∗ ) + E(X p IX (2 2x)p p2p−1 ( 2)p ( x)p−2 dx. 0
Hence, using (5.12), we get
E (X − Y − Cm∗ )p+ Z X X −xm p −xj,m∗ p C(p) σm + e e σj,m∗ m∈M
∞
√ e−x ( x)p−2 dx.
0
j≥m∗
We conclude that for all p ≥ 1, there exists some constant C(p) > 0 such that p
∗ p E [|T (ˆ sm sm∗ ) − T (s)|p )} ˆ ) − T (s)| ] ≤ C(p) {Crit(m ) + E (|T (ˆ p + C(p) sup H p (m∗ , j) + sup |T (sj ) − T (s)| j≤m∗
+ C(p)
X
j≥m∗
p + e−xm σm
m∈M
X
p e−xj,m∗ σj,m ∗ +
j≥m∗
Moreover, possibly enlarging C(p),
1 . np
p
p E [|T (ˆ sm∗ ) − T (s)| ] ≤ C(p) [|T (sm∗ ) − T (s)|p + σm ∗] .
Since we obtain
|T (sj ) − T (s)| ≤ |T (sj ) − T (sm∗ )| + |T (sm∗ ) − T (s)|,
p p sup |T (sj ) − T (s)| ≤ C(p) |T (sm∗ ) − T (s)| + sup |T (sj ) − T (sm∗ )| p
j≥m∗
j≥m∗
p
∗ p
≤ C(p) (|T (sm∗ ) − T (s)| + Crit(m ) ) .
This concludes the proof of Theorem 1.
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1013
5.2. Proof of Corollary 1 The proof follows from bounding the terms on the right hand side of (2.4). In the following, C denotes a positive constant which may vary from line to line. We mention the dependency of these constants with respect to the parameters involved in the problem. L Sj . Set Let Wj be the orthogonal complement of Sj in Sj+1 : Sj+1 = Wj Dj (s) the projection of s onto Wj . It follows from [24, Theorem 3] p. 31 that there exists a constant C(r) such that k(Dj (s))(r) k∞ ≤ C(r)2jr kDj (s)k∞ . We recall that Dj (s) =
X
βj,k ψj,k
k∈Z
where ψj,k = 2j/2 ψ(2j x − k) and βj,k = hsψj,k i. Hence, since we have assumed that ψ has a compact support, kDj (s)k∞ ≤ C(r)2j/2 sup |βj,k | k∈Z
and k(Dj (s))(r) k∞ ≤ C2j/2+jr sup |βj,k | k∈Z
Since s ∈ Bα,∞,q ([0, 1]) with kskα,∞,q ≤ L, X 2qj(α+1/2) sup |βj,k |q ≤ Lq .
(5.13)
k∈Z
j≥0
(r)
(r)
We define B(m) = supm≤j≤dn |sj (x0 ) − sm (x0 )|. B(m)
≤ ≤
(r)
sup ksj − s(r) m k∞
j≥m
sup j≥m
j X
l=m+1
≤ C(r)
X
j>m
k(Dl (s))(r) k∞
2j(α+1/2) sup |βj,k |2−j(α−r) . k∈Z
Using Cauchy-Schwarz inequality, B(m) ≤ C(r)
X
j>m
!1/q′ q !1/q X ′ 2qj(α+1/2) sup |βj,k | 2−jq (α−r) k∈Z
where 1/q + 1/q ′ = 1. It follows from (5.13) that B(m) ≤ C(α, r, q)L2−m(α−r) .
j>m
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1014
Note that 2 σm =
2 σ 2 X (r) ϕm,k (x0 ) , n k∈Z
2 ∀j > m, σj,m = Var(T (ˆ sj ) − T (ˆ sm )) =
j−1 2 σ 2 X X (r) ψl,k (x0 ) . n l=m k∈Z
It follows from conditions (i) and (iv) that 2 σm ≤ C(r)
and that 2 σj,m ≤ C(r)
Thus, for all m in M, p
σ 2 m(1+2r) 2 n
σ 2 j(1+2r) 2 − 2m(1+2r) . n p −pm(α−r)
Crit (m) ≤ C(p, α, r, q) L 2 Let
"
1 ln m0 (n) = ln(2)
+σ
nL2 σ 2 ln(n)
p
2m(1+2r) m n
!# 1 1+2α
p/2 !
,
.
(5.14)
where [x] denotes the integer part of x. One can easily show that if n/(σ 2 ln(n)) ≥ 21+2α L−2 and n2α σ 2 ln(n) ≥ L2 , then m0 (n) ∈ M . Hence,
Crit(m∗ ) ≤ Crit(m0 (n)) +
1 , n
which implies that
Crit (m ) ≤ C(p, α, r, q) L p
∗
p(1+2r) 1+2α
σ 2 ln n n
p(α−r) 1+2α
1 + p . n
We next bound the other terms on the right hand side of Theorem 1. • For all 1 ≤ m ≤ dn ,
p (r) (r) (r) (r) (r) p (x )| (x ) − s (x )| + |s (x ) − s |s(r) (x ) − s (x )| ≤ |s 0 0 0 0 0 0 m m dn dn ≤ C(α, r, q) Lp 2−pdn (α−r) +
sup j≥m,j∈M
(r) p . (x )| |s(r) (x )) − s 0 0 m j
This implies that (r) |sm∗ (x0 ) − s(r) (x0 )|p ≤ C(α, r, q) Lp 2−pdn (α−r) + Critp (m∗ ) .
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1015
• For all 1 ≤ m ≤ dn , σm ≤ pen(m) ≤ Crit(m). • For all 1 ≤ m ≤ dn , sup H(m, j) = sup j≤m
j≤m
√ √ xm,j σm,j ≤ xm σm ≤ Crit(m) .
This implies that (r)
p p ∗ Critp (m∗ ) + |sm∗ (x0 ) − s(r) (x0 )|p + σm ∗ + sup H (j, m )
j≤m∗
≤ C(p, α, r, q) L
p(1+2r) 1+2α
σ 2 ln n n
p(α−r) 1+2α
Since 2−dn < 2/n and n2α σ 2 ln n ≥ L2 , L
p(1+2r) 1+2α
σ 2 ln n n
p(α−r) 1+2α
1 + p + Lp 2−pdn (α−r) . n
≥ Lp 2−pdn (α−r) 2p(α−r) .
On the other hand we have : X
p + e−xm σm
m∈M
X
j≥m∗
p e−xj,m∗ σj,m ∗ ≤ C(r, p)
σp ln(n) dn ≤ C(r, p) σ p p/2 , p/2 n n
which yields the desired bound. 5.3. Proof of Corollary 2 As in Corollary 1, the proof follows by bounding the terms on the right hand 2 side of (2.4). Let us first control σm for all m ∈ M. For m ≤ mn , Z Z σ2 X 2 σ2 X m 2 m σm = T (φm,k ) = 2 ϕ(2 x − k) ϕ(2m x − k). n nHn2 IHn IHn k∈Z
k∈Z
Since ϕ has a compact support, X Z ϕ(2m x − k) ≤ Ckϕk∞ Hn , IHn k∈Z
and
Z m sup ϕ(2 x − k) ≤ Ckϕk∞ (2−m ∧ Hn ). k∈Z IHn
Hence, for all m ≤ mn ,
2 σm ≤C
σ2 m 2 . n
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1016
Moreover, 2 σm = n +1
σ2 2 p T ( Hn 1IIHn ) ≤ σ 2 /(nHn ) ≤ σ 2 2mn +1 /n. n
2 2 It is easy to see that σj,m ≤ 2(σm + σj2 ) and that σm,m = 0. Hence 2 ∀j 6= m ∈ M, σj,m ≤C
This implies that
X
m∈M
∀m ∈ M,
σ 2 m∨j 2 . n
p ≤ C(p) e−xm σm
X
j≥m
σp np/2
p ≤ C(p) e−xj,m σj,m
σp . np/2
Since T (smn +1 ) = T (s), and since for all f, g ∈ L2 ([0, 1]), |T (f ) − T (g)| ≤ kf − gk∞ , we obtain, by the same computations as in the proof of Corollary 1, that for all m ≤ mn sup j>m,j∈M
|T (sm ) − T (sj )| ≤
sup m 2Hn
where ρn = ((1 + 2α)/(2nHn1+2α ))1/2 . 1/(1+2α) The condition Hn ≥ (1 + 2α)/(2nL2 ) ensures√that ρn ≤ L, hence f1 ∈ H√α (L). One can easily verify that kf1 − f0 k2 = 1/ n and that T (f1 ) = C(α)/ nHn . In order to prove (3.7), we consider the functions f0 = 0 and f1 defined on [0, 1] by: f1 (x)
=
f1 (x)
=
L(γn − x)α for x ∈ [0, γn ], 0 for x > γn
2 where γn = ((1+2α)/(nL2 ))1/(1+2α) √ . Since we have assumed that nL ≥ 1+2α, γn ≤ 1. We have, kf1 − f0 k2 = 1/ n and " 1+α # Hn Lγn1+α 1− 1− . T (f1 ) = Hn (1 + α) γn
T (f1 ) = Lγnα Cnα where Cn ∈ [1 − Hn /γn , 1]. The last equality is obtained by Taylor-Lagrange’s −1/(1+2α) formula. For Hn ≤ 12 (1 + 2α)1/(1+2α) nL2 , Hn /γn ≤ 1/2, which leads to T (f1 ) ≥ C(α)Lγnα , hence (3.7) holds. 5.5. Proof of Corollary 3 d For all m ∈ M, we set Λm = 0, 1, . . . , 2m−1 , and for all λ = (k1 , . . . , kd ) ∈ +1 +1 Λm , let Iλ = [ 2km1 , k21m [× · · · × [ 2kmd , k2dm [ and φλ = 2md/2 1IIλ .
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1018
Then (φλ , λ ∈ Λm ) is an orthonormal basis of Sm . It is easy to verify that for all m ∈ M, σ 2 md 2 2 σm = n and that for all (j, m) ∈ M2 , 2 σj,m ≤2
σ 2 md (2 + 2jd ). n
It follows from the definition of xm and xj,m that X
m∈M
X
j≥m∗
p ≤C e−xm σm
σp dn , np/2 σp dn . np/2
p e−xj,m∗ σj,m ∗ ≤
Moreover, σm∗ + sup
j≤m∗
√
xm∗ ,j σm∗ ,j ≤ Cpen(m∗ )
and p
|sm∗ (x0 ) − s(x0 )|
≤ C(p)
p
sup j≥m∗ ,j∈M
|sj (x0 ) − sm∗ (x0 )| p
+ C(p) |sdn (x0 ) − s(x0 )| . Hence, it follows from Theorem 1 that, possibly enlarging C(ǫ), σ p dn p p p ∗ . E (|ˆ sm + |sdn (x0 ) − s(x0 )| ˆ (x0 ) − s(x0 )| ) ≤ C(p) Crit (m ) + np/2 For all m ∈ M, sm =
X
λ∈Λm
1 2md
Z
s(x)dx
Iλ
1IIλ .
This implies that, sm (x0 ) − s(x0 ) =
1 2md
Z
Iλ (x0 )
(s(x) − s(x0 ))dx,
where Iλ (x0 ) is the set Iλ that contains x0 . Hence, if s ∈ Hα (L), |sm (x0 ) − s(x0 )| ≤ L2−mα . Let
"
1 ln m1 (n) = ln(2)
nL2 ln(n)
!# 1 d+2α
.
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
Since m1 (n) ∈ M, as soon as n/ ln(n) ≥ 2d+2α L−2 and L2 ≤ n p
∗
p
Crit (m ) ≤ Crit (m1 (n)) ≤ C(p, α, d, σ)L
pd d+2α
2α d
ln(n) n
1019
ln(n),
pα d+2α
.
Hence, for n large enough, p
E (|ˆ sm ˆ (x0 ) − s(x0 )| ) ≤ C(p, α, d, σ)L
pd d+2α
ln(n) n
pα d+2α
.
This concludes the proof of Corollary 3. Acknowledgment We would like to thank the Associate Editor, the referees and A. Tsybakov for their helpful comments. References [1] Baraud, Y. (2000). Model selection for regression on a fixed design, Probab. Theory Related Fields, 117, 467–493. MR1777129 [2] Baraud, Y. (2002). Model selection for regression on a random design, ESAIM Probab. Statist., 6, 127–146 MR1918295 [3] Barron, A.R., Birg´ e, L., Massart, P. (1999). Risk bounds for model selection via penalization, Probab. Theory Related Fields, 113, 301–415. MR1679028 [4] Birg´ e, L. (2001). An alternative point of view on Lepski’s method. State of the art in probability and statistics (Leiden, 1999), 113–133, IMS Lecture Notes Monogr. Ser., 36, Inst. Math. Statist., Beachwood, OH. MR1836557 [5] Birg´ e, L., Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3, no. 3, 203–268. MR1848946 [6] Birg´ e, L., Massart, P. (1997). From model selection to adaptive estimation. Festschrift for Lucien Le Cam, 55–87, Springer, New York. MR1462939 [7] Brown, L., Low, M. (1996). A constrained risk inequality with applications to nonparametric functional estimation. Ann. Statist. 24, no. 6, 2524–2535. MR1425965 [8] Cai, T., Low, M. (2003). A note on nonparametric estimation of linear functionals. Ann. Statist. 31, no. 4, 1140–1153. MR2001645 [9] Cai, T., Low, M. (2004). Minimax estimation of linear functionals over nonconvex parameter spaces. Ann. Statist. 32, no. 2, 552–576. MR2060169 [10] Cai, T., Low, M. (2005a). Adaptive estimation of linear functionals under different performance measures. Bernoulli 11, no. 2, 341–358. MR2132730 [11] Cai, T., Low, M.(2005b). On adaptive estimation of linear functionals. Ann. Statist. 33, no. 5, 2311–2343. MR2211088 [12] Donoho, D. (1994). Statistical estimation and optimal recovery. Ann. Statist. 22 (1994), no. 1, 238–270. MR1272082
B. Laurent, C. Lude˜ na, C. Prieur /Adaptive estimation of linear functionals
1020
[13] Donoho, D., Liu, R.(1991). Geometrizing rates of convergence. II, III. Ann. Statist. 19, no. 2, 633–667, 668–701. MR1105839 [14] Golubev, Y., Levit, B. (2004). An Oracle Approach to Adaptive Estimation of Linear Functionals in a Gaussian Model, Mathematical Methods of Statistics, 13, no. 4, 392–408. MR2126747 ¨rdle, W., Kerkyacharian, G., Picard, D.,Tsybakov, A. (1998). [15] Ha Wavelets, Approximation, and Statistical Applications, Lecture Notes in Statistics, 129, Springer. MR1618204 [16] Ibragimov, I.A., Khas’minskii, R.Z. (1984). On Nonparametric Estimation of the Value of a Linear Functional in White Gaussian Noise, Teor. Veroyatn. Primen. 29, no. 1, 18–32. MR0739497 ¨, J., Tsybakov, A. (2001). Sharp adaptive estimation of linear [17] Klemela functionals, Ann. Statist. 29, no. 6, 1567–1600. MR1891739 [18] Laurent, B., Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection, Ann. Statist. 28, No 5, 1302–1338. MR1805785 [19] Lepski, O. (1990). On a problem of adaptive estimation in gaussian white noise. Theory Probab. Appl. 35 454–466. MR1091202 [20] Lepski, O. (1991). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36, no. 4, 682–697. MR1147167 [21] Lepski, O. (1992). Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation. Theory Probab. Appl. 37, no. 3, 433– 448. MR1214353 [22] Lepski, O., Mammen, E., Spokoiny, V. (1997). Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 , no. 3, 929–947. MR1447734 [23] Lepski, O., Spokoiny, V. (1997). Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist. 25, no. 6, 2512–2546. MR1604408 [24] Meyer, Y. (1990). Ondelettes, 1, Hermann Paris. MR1085487