adaptive estimators for a general framework are given by Cavalier and Tsybakov [3] and by Cavalier et al. [2]. 2. ⦠. Severely ill-posed problems with log-rates: ...
c 2004 Society for Industrial and Applied Mathematics
THEORY PROBAB. APPL. Vol. 48, No. 3, pp. 426–446
BLOCK THRESHOLDING AND SHARP ADAPTIVE ESTIMATION IN SEVERELY ILL-POSED INVERSE PROBLEMS∗ L. CAVALIER† , Y. GOLUBEV† , O. LEPSKI† , AND A. TSYBAKOV‡ Abstract. We consider the problem of solving linear operator equations from noisy data under the assumptions that the singular values of the operator decrease exponentially fast and that the underlying solution is also exponentially smooth in the Fourier domain. We suggest an estimator of the solution based on a running version of block thresholding in the space of Fourier coefficients. This estimator is shown to be sharp adaptive to the unknown smoothness of the solution. Key words. linear operator equation, white Gaussian noise, adaptive estimation, running block thresholding DOI. 10.1137/S0040585X97980555
1. Introduction. The problem of solving linear operator equations from noisy observations has been extensively studied in the literature. Among the first to develop a statistical approach to this problem were Sudakov and Khalfin [17] and Bakushinskii [1]. For a survey of recent results we refer to Math´e and Pereverzev [14], Goldenshluger and Pereverzev [7], and Cavalier and Tsybakov [3]. A usual statistical framework in this context is as follows. Let K : H → H be a known linear operator on a Hilbert space H with inner product (·, ·) and norm · . The problem is to estimate an unknown function f ∈ H from indirect observations (1)
Y (g) = (Kf, g) + εξ(g),
g ∈ H,
where 0 < ε < 1 and ξ(g) is a zero-mean Gaussian random process indexed by H on a probability space (Ω, A, P) such that E{ξ(g)ξ(v)} = (g, v) for any g, v ∈ H, where E is the expectation with respect to (w.r.t.) P. Relation (1) defines a Gaussian white noise model. Instead of dealing with all the observations {Y (g), g ∈ H}, it is usually sufficient ∞ to consider a sequence of values {Y (gk )}∞ k=1 for some orthonormal basis {gk }k=1 . The corresponding random errors ξ(gk ) = ξk are independent identically distributed (i.i.d.) standard Gaussian random variables. We assume that the basis {gk } is such that (Kf, gk ) = bk θk , where bk = 0 are real numbers and θk = (f, ϕk ) are the Fourier coefficients of f w.r.t. some orthonormal basis {ϕk } (not necessarily ϕk = gk ). A typical example of its occurrence is when the operator K admits a singular value decomposition (2)
K ∗ gk = bk ϕk ,
Kϕk = bk gk ,
where K ∗ is the adjoint of K, bk are singular values, {gk } is an orthonormal basis in Range(K), and {ϕk } is the corresponding orthonormal basis in H. ∗ Received
by the editors July 23, 2002. http://www.siam.org/journals/tvp/48-3/98055.html † CMI, Universit´ e Aix-Marseille 1, 39 rue F. Joliot-Curie, F-13453 Marseille Cedex, France. ‡ Laboratoire de Probabilit´ es et Mod` eles Al´ eatoires, Universit´e Paris 6, 4 pl. Jussieu, BP 188, F-75252 Paris Cedex 05, France. 426
ILL-POSED INVERSE PROBLEM
427
Under these assumptions, one gets a discrete sequence of observations derived from (1): (3)
yk = bk θk + εξk ,
k = 1, 2, . . . ,
where yk = Y (gk ) and ξi are i.i.d. standard Gaussian random variables. The problem of estimating f reduces to estimation of the sequence {θk }∞ k=1 from observations (3). The model (3) also describes other problems such as the estimation of a signal from direct observations with correlated data (see [11]). Let θˆ = (θˆ1 , θˆ2 , . . . ) be an estimator of θ = (θ1 , θ2 , . . . ) based on the data (3). Then f is estimated by fˆc = k θˆk ϕk . The mean integrated squared error of the estimator fˆ is (4)
Ef fˆ − f 2 = Eθ
∞
def
ˆ θ), (θˆk − θk )2 = Rε (θ,
k=1
where Eθ denotes the expectation w.r.t. the distribution of the data in the model (3). In this paper we consider the problem of estimation of θ in the model (3) using the mean-squared risk (4). One can characterize linear inverse problems by the difficulty of the operator, i.e., with our notation, by the behavior of bk ’s. If bk → 0, as k → ∞, the problem is ill-posed. An inverse problem will be called softly ill-posed if the sequence bk tends to 0 at a polynomial rate in k, and it will be called severely ill-posed if 1 1 log =c k→∞ k bk lim
for some 0 < c < ∞. Thus, the problem is severely ill-posed if, in the main term, bk tends to 0 exponentially in k. An important element of the model is the prior information about θ. Successful estimation of a sequence θ is possible only if its elements θk tend to zero sufficiently fast, as k → ∞, which means that f is sufficiently smooth. A standard assumption on the smoothness of f is to suppose that θ belongs to an ellipsoid ∞ Θ = θ: a2k θk2 L , k=1
where a = {ak } is a positive sequence that tends to infinity, and L > 0. Special cases of Θ are the Sobolev balls and the classes of analytic functions, corresponding to ak ’s increasing as a polynomial in k and as an exponential in k, respectively. Thus, there appears a natural classification of different cases in the study of linear inverse problems. Regarding the difficulty of the operator described in terms of bk ’s and the smoothness assumptions described in terms of ak ’s, one obtains the following three typical cases. 1◦ . Softly ill-posed problems: bk ’s are polynomial and ak ’s are general (usually polynomial or exponential). These problems have been studied by many authors, and they are essentially similar to estimation of derivatives of smooth functions. Sharp adaptive estimators for a general framework are given by Cavalier and Tsybakov [3] and by Cavalier et al. [2]. 2◦ . Severely ill-posed problems with log-rates: bk ’s are exponential and ak ’s are polynomial. This case is highly degenerate in the sense that the variance of the optimal
428
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
estimators is asymptotically negligible as compared to their bias. The optimal rates of convergence are very slow (logarithmic) and sharp adaptation can be attained on a simple projection estimator [4], [8]. 3◦ . 2 exp-severely ill-posed problems: bk ’s are exponential and ak ’s are exponential too (the abbreviation “2 exp” stands for “two exponentials”). These problems will be studied here. They are characterized by some unusual phenomena. Golubev and Khasminskii [9] proved that 2 exp problems admit fast optimal rates converging to 0 as a power law, despite the “severe” form of the operator. They also showed that sharp minimax estimators for these problems are nonlinear, unlike all other known cases where sharp minimaxity is explored. Also the adaptation issue turns out to be nonstandard here. As shown by Tsybakov [19], there is a logarithmic deterioration in the best rate of convergence for adaptive estimation under the L2 -risk. In other words, here one has to pay a price for L2 -adaptation, while this is not the case for the inverse problems described in 1◦ and 2◦ : there the L2 -adaptation is possible without any loss, and even the exact constants are preserved. Since the ellipsoid Θ with exponential ak ’s corresponds to analytic functions, the 2 exp framework can be viewed as an analogue of convolution through a supersmooth filter (described by exponential bk ’s), with an analytical function f to reconstruct. There is an important reason why the 2 exp setup is of interest. In the study of inverse problems, a standard assumption is to connect the smoothness of the underlying function to the smoothness of the operator. Roughly, if a function is observed through a very smooth filter, then the function itself has to be very smooth. A formalization of this idea can be found, for example, in the well-known Hilbert scale approach to inverse problems (see [7], [13], [14], [15]). Nonadaptive minimax estimation for some inverse problems different from (3) but characterized by a similar “two exponentials” behavior has been analyzed by Ermakov [6], Pensky and Vidakovic [16], and Efromovich and Koltchinskii [5]. In this paper we study estimation for the 2 exp framework when the ellipsoid Θ is not known, but we know only that ak ’s are exponential. We propose an adaptive estimator that attains optimal rates (up to an inevitable logarithmic factor deterioration) simultaneously on all the ellipsoids with exponential ak ’s. Moreover, we show that the estimator is sharp adaptive, i.e., it cannot be improved to within a constant. This generalizes the result of Tsybakov [19] about the optimal rate of adaptation for 2 exp problems. The construction of our adaptive estimator is based on a block thresholding (cf. [10] or [3] for the inverse problems setting). The difference between our construction and those papers is that, in order to get sharp optimality in our case, we need a “running” block estimator rather than an estimator with fixed blocks. Let us give some examples of severely ill-posed inverse problems related to partial differential equations. Example 1. Consider the Dirichlet problem for the Laplace equation on a circle of radius 1: (5)
∆u = 0,
u(1, ϕ) = f (ϕ),
ϕ ∈ [0, 2π],
0 r 1,
where ∆ is the Laplace operator, u(r, ϕ) is a function in polar coordinates r 0, ϕ ∈ [0, 2π], and f is a 2π-periodic function in L2 [0, 2π]. It is well known that the solution of (5) is ∞ θ0 1 k uf (r, ϕ) = √ + √ (6) r θk cos (kϕ) + θ−k sin (kϕ) , π 2π k=1
429
ILL-POSED INVERSE PROBLEM
where θk are the Fourier coefficients of f . Assume that f is not known but that one can observe the solution uf (r, ϕ) on the circle of radius r0 < 1 in a white Gaussian noise dY (ϕ) = uf (r0 , ϕ) dϕ + εdW (ϕ),
(7)
ϕ ∈ [0, 2π],
where W is a standard Wiener process on [0, 2π] and 0 < ε < 1. The problem is to estimate the boundary condition f based on the observation of a trajectory {Y (ϕ), ϕ ∈ [0, 2π]}. Substituting (6) in (7), multiplying (7) by the trigonometric basis functions, and integrating over [0, 2π] we get the infinite sequence of observations −|k| yk = bk θk +εξk , k ∈ Z, where bk = r0 and ξk are i.i.d. N (0, 1) random variables. By renumbering the indices from k ∈ Z to k ∈ N we get a particular case of the model (3). This problem is severely ill-posed since bk → 0 exponentially fast as k → ∞. Example 2. Consider the following Cauchy problem for the Laplace equation: ∂ u(x, y) (8) = g(x), ∆u = 0, u(x, 0) = 0, ∂y y=0 where u(x, y) is defined for x ∈ R, y 0, and the initial condition g is a 1-periodic function on R. Suppose that we do not know g but have at our disposal the noisy observations {Y (x), x ∈ [0, 1]}, where Y is the random process defined by (9)
dY (x) = g(x) dx + ε dW (x),
x ∈ [0, 1].
Here W is the standard Wiener process on [0, 1]. The problem is to estimate the def
solution f (x) = ug (x, y0 ) of (8) at a given y0 > 0, based on these observations. Since g is 1-periodic, f is also 1-periodic. Denoting by θk the Fourier coefficients of f , one can find that, given (9), the following sequence of observations is available: yk = bk θk + εξk , where ξk are i.i.d. N (0, 1) random variables and bk ∼ k exp(−βy0 k) as k → ∞ for some β > 0 (see [8] for more details). 2. Setting of the problem. From now on we assume that the observations have the form (3), where ξk are i.i.d. N (0, 1) random variables and the values bk are defined by b−2 k = rk exp(ρk)
(10)
with ρ > 0 and a positive sequence rk varying slower than an exponential as k → ∞. Such a definition of bk covers the examples considered above, whereas considering the squared values of bk ’s reflects the fact that the results will be insensitive to the signs. We assume that rk is subexponential in the sense of the following definition. Definition 1. A sequence {rk }∞ k=1 is called subexponential if rk > 0 for all k and there exist constants C∗ < ∞ and µ ∈ (0, 1] such that rk+1 C∗ (11) k = 1, 2, . . . . rk − 1 k µ , The class of subexponential sequences is rather large and includes polynomial, logarithmic, and other sequences. It is easy to see that a subexponential sequence rk satisfies (12)
a exp(−ck 1−µ ) rk a exp(c k 1−µ ) ak −c
rk a k c
if
0 < µ < 1,
if µ = 1,
430
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
with some positive finite constants a, a , c, and c . We will assume that θ belongs to an ellipsoid ∞ 2 Θ(α, L) = θ : (13) qk exp(αk) θk L , k=1
where qk is a subexponential sequence and α > 0, L > 0 are finite constants. In order to shed some light on the estimation of θ in this setup, consider a simple projection estimator θ = (θ1 , θ2 , . . . ) with bandwidth W ∈ N, i.e., k W, b−1 k yk , θk = 0, k > W. The maximal risk of this estimator over Θ(α, L) is bounded from above as follows: θ) = Rε (θ,
sup θ∈Θ(α,L)
sup
θ∈Θ(α,L) k>W
L
(14)
k>W
θk2 + ε2
W
b−2 k
k=1
exp(−αk) qk−1 + ε2
W
exp(ρk) rk .
k=1
The minimum of the right-hand side of (14) with respect to W is attained for some W depending on ε such that W → ∞ as ε → 0 (in fact, otherwise the right-hand side of (14) does not tend to 0 as ε → 0). Using Lemma 2 (see below) to compute the last two sums in (14), we find that as W → ∞, the right-hand side of (14) is approximated by J(W ) =
−1 L exp(−αW ) qW exp(ρW ) rW . + ε2 1 − e−ρ 1 − e−α
The minimizer of J(W ) gives an approximately optimal bandwidth. Since for subexponential sequences rk , qk and large enough W we have rW −1 ≈ rW ≈ rW +1 , qW −1 ≈ qW ≈ qW +1 , the necessary conditions of a local minimum at W , namely J(W ) < J(W + 1) and J(W ) < J(W − 1), can be written in the form L exp(−γW ) < ε2 eρ rW qW ,
L exp(−γ(W − 1)) > ε2 eρ rW −1 qW −1 ,
where γ = ρ + α. It can be shown that all the local minimizers of J(W ) provide essentially the same value of J so that one can take, for example, the smallest local minimizer
W (α, L) = min k ∈ N : L exp(−γk) < ε2 eρ rk qk . (15) In other words, the minimum of the right-hand side of (14) is approximately attained at W (α, L). This yields the following upper bound for the minimax risk: inf
sup
θˆ θ∈Θ(α,L)
ˆ θ) Crε (α, L), Rε (θ,
with a constant C < ∞ and the rate of convergence (16) sε (α, L) = ε2 rW (α,L) exp ρW (α, L) .
431
ILL-POSED INVERSE PROBLEM
Using the argument of [9] (see also Lemma 6 below) it is not difficult to show that the rate of convergence sε (α, L) cannot be improved in a minimax sense. Unfortunately, the minimax approach has a disadvantage: the minimax optimal (or nearly optimal) bandwidth depends on parameters α and L of the functional class Θ(α, L). Tsybakov [19] shows that one can overcome this difficulty by constructing an adaptive estimator of θ independent of α and L and can attain the rate which is only logarithmically worse than sε (α, L) on any class Θ(α, L). In the next section we propose another adaptive estimator which attains the same rate as that of [19] but has an optimal asymptotic constant. 3. Adaptive estimator and its optimality. Define the following estimator θ∗ = (θ1∗ , θ2∗ , . . . ) based on block thresholding with running blocks 2 2 θk∗ = b−1 (17) k = 1, 2, . . . , k yk I yk 2ε ρ k , where ρ > ρ and
y2k =
(18)
ys2
s∈N : |k−s|N
with an integer N 1. Here and later I{·} denotes the indicator function. It will be clear from the proofs that θk∗ = 0, with a probability close to 1, whenever k does not belong to a “small” neighborhood of the integer W ∗ defined by
(19) W ∗ = W ∗ (α, L) = min k ∈ N : L exp(−γk) < 2ε2 ρkrk qk . We will call W ∗ the adaptive bandwidth. Note that W ∗ (α, L) is smaller than the optimal bandwidth W (α, L) given in (15) for all ε small enough. For instance, if rk = qk ≡ 1, we have as ε → 0, W (α, L) =
(20)
L 1 log 2 + O(1), γ ε
whereas (21)
W ∗ (α, L) =
1 1 L L log 2 − log log 2 + O(1). γ ε γ ε
For general rk , qk the closed form expression for W ∗ is not available, but using (12) one can see that W ∗ log(1/ε) as ε → 0. Nevertheless, possible terms of the order o(log(1/ε)) in the expression for W ∗ are not negligible since they can affect the rate of convergence (cf. (21)). Note also that the value W ∗ need not be known for the construction of our estimator. In this section we establish the exact asymptotics of the minimax adaptive risk. It turns out that this asymptotics is expressed in terms of the value A∗ of the following maximization problem: (22)
A∗ = A∗ (α, L) =
where (23)
Θ∞ (α, L) =
θ ∈ 2 (Z) :
∞
max
θ∈Θ∞ (α,L)
∞ k=−∞
exp(ρk) θk2 ,
k=−∞
θk2 1,
∞ k=−∞
exp(γk) θk2 E ∗ (α, L)
432
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
and (24)
E ∗ (α, L) =
L exp[−γW ∗ (α, L)] . 2ε2 ρ W ∗ (α, L) rW ∗ (α,L) qW ∗ (α,L)
Note that (22)–(23) is a problem of linear programming w.r.t. θk2 ’s and has a solution belonging to the boundary of Θ∞ (α, L). The values A∗ (α, L) and E ∗ (α, L) depend on ε, but the dependence is not strong: they oscillate between two fixed constants as ε varies. In fact, the definition of W ∗ implies that for any L and α there exist finite positive constants e∗1 , e∗2 such that e∗1 E ∗ (α, L) e∗2 for all ε. This implies the existence of finite positive constants a∗1 , a∗2 (depending on L and α) such that a∗1 A∗ (α, L) a∗2 for all ε. In particular, since γ > ρ, one can take a∗2 = e∗2 . Define (25)
ψε (α, L) = 2A∗ (α, L) ε2 ρ W ∗ (α, L) exp[ρW ∗ (α, L)] rW ∗ (α,L) .
The next theorem gives a bound for the maximal risk of the estimator θ∗ over Θ(α, L). Theorem 1. Assume that bk satisfies (10) and that rk and qk are subexponential. Let θ∗ be the estimator defined by (17)–(18) with ρ > ρ and N ∈ N. Then for any α > 0 and L > 0, we have
ρ N Rε (θ∗ , θ) lim sup sup (26) + C exp − min(α, ρ) , ρ 2 ε→0 θ∈Θ(α,L) ψε (α, L) where the constant C < ∞ does not depend on N and ρ . Note that if the size 2N + 1 of the block is large and the parameter ρ is close to ρ, then the right-hand side of (26) approaches 1. Alternatively, one can take N = Nε → ∞ and ρ = ρε → ρ as ε → 0, satisfying appropriate restrictions, which leads to the next result. For x 0 write x = min{n ∈ N : n > x}. Theorem 2. Assume that bk satisfies (10) and that rk and qk are subexponential. Let θ∗ be the estimator defined by (17)–(18) with N = log[log(1/ε) ∨ 1] and ρ = ρ + N −1 . Then for any α > 0 and L > 0 we have (27)
lim sup ε→0
Rε (θ∗ , θ) 1. θ∈Θ(α,L) ψε (α, L) sup
Remark 1. The estimator θ∗ is defined as an infinite sequence. It can be proved that, under our assumptions, the number of nonzero components of this sequence is finite almost surely. However, formally the estimator (17) is not feasible since one has to check the inequality y2k 2ε2 ρ k for all k = 1, 2, . . . . This problem does not arise in practice where one always has a finite number Nmax of coefficients to be dealt with (coming from intrinsic limitations of the data) and the estimator θ∗ is modified by setting θk∗ = 0 for all k > Nmax . Inspection of the proofs shows that to keep Theorems 1 and 2 valid for such a modified estimator it suffices to have Nmax log2 (1/ε). Remark 2. The choice of N and ρ suggested in Theorem 2 is not the only possible one: there exists a variety of similar values (N, ρ) that allow us to attain the result of the theorem. These values are described by technical conditions that we do not include in the theorem but that can easily be deduced from the proof. The following theorem explains in what sense the estimator θ∗ is optimal.
ILL-POSED INVERSE PROBLEM
433
Theorem 3. Assume that bk satisfies (10) and that rk and qk are subexponential. Let an estimator θˆ be such that, for some α0 > 0, L0 > 0, (28)
lim sup ε→0
ˆ θ) Rε (θ, < 1. θ∈Θ(α0 ,L0 ) ψε (α0 , L0 ) sup
Then there exists α > α0 such that for all α > α and all L > 0 (29)
lim inf ε→0
ˆ θ) ˆ θ) Rε (θ, Rε (θ, = ∞. sup θ∈Θ(α0 ,L0 ) ψε (α0 , L0 ) θ∈Θ(α,L) ψε (α, L) sup
Theorems 2 and 3 imply in particular that ψε (α, L) is an adaptive rate of convergence for our problem (cf. Definition 3 in [18]). But they give more than just the rate. These theorems show that if an estimator θˆ gains over θ∗ at one point (α0 , L0 ) at least in terms of a constant factor (cf. (28)), there exists another point (α, L), where θˆ looses much more than it gains at (α0 , L0 ) (cf. (29)). One can interpret this property as sharp adaptive optimality of the normalizing factor ψε (·, ·) and, consequently, of the estimator θ∗ whose convergence is characterized by this normalizing factor. Note also that for every fixed α0 > 0, L0 > 0 there exists an estimator θˆ satisfying (28) and even more: lim sup ε→0
ˆ θ) Rε (θ, = 0. θ∈Θ(α0 ,L0 ) ψε (α0 , L0 ) sup
For example, one can take θˆ as a projection estimator with the optimal bandwidth corresponding to (α0 , L0 ) (see section 2). Thus, θˆ gains over θ∗ “at a point” (α0 , L0 ). But at any other point (α, L) with α > α0 this estimator has catastrophic behavior (cf. (29)). For a more illustrative interpretation, one can modify Theorem 3 by expressing the result in terms of the ratio of maximal risks. For any two estimators θˆ1 and θˆ2 define supθ∈Θ(α,L) Rε (θˆ2 , θ) Gα,L θˆ1 /θˆ2 = . supθ∈Θ(α,L) Rε (θˆ1 , θ) This value is interpreted as the gain of θˆ1 over θˆ2 at (α, L). The larger value is Gα,L (θˆ1 /θˆ2 ); the better value is θˆ1 as compared to θˆ2 . It is easy to see that Theorem 3 and (27) imply the following corollary. Corollary 1. Under assumptions of Theorem 3, let an estimator θˆ be such that, for some α0 > 0, L0 > 0, ∗ ˆ (30) > 1, lim inf Gα0 ,L0 θ/θ ε→0
∗
where θ is the estimator defined by (17)–(18) and satisfying the assumptions of Theorem 2. Then there exists α > α0 such that for all L > 0 and all ε small enough, ∗ ˆ Gα,L θ∗ /θˆ > ε Gα0 ,L0 θ/θ with ε → ∞, as ε → 0. Remark 3. Consider, for instance, the case where rk and qk do not depend on k. Then it follows from (15) and (16) that the nonadaptive rate of convergence
434
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
sε (α, L) is of the order ε2α/γ , while (21) and (25) imply that the adaptive rate satisfies ψε (α, L) ε2α/γ log(1/ε)α/γ . Thus, one has to pay an extra log-factor for adaptation. This effect is similar to the one established by Lepskii [12] for adaptation using the loss at a fixed point, and it is due to a nondegenerate asymptotic behavior of the normalized loss of the estimators as ε → 0. It is interesting to note that our problem provides an example where such an effect occurs for adaptation using the L2 -loss. 4. Proofs. In this section we will denote by C finite positive constants that may be different in different occasions. 4.1. Proof of Theorems 1 and 2. Lemma 1. Let wk be a subexponential sequence. Then for any integers T, t such that t T and any integer M < min(T, log T ), we have wt+k sup wt − 1 ηT , k∈Z : |k|M
(31)
where ηT depends only on T and ηT → 0 as T → ∞. Proof of the theorem is straightforward. Lemma 2. Let wk be a subexponential sequence. Then for any τ > 0 as T → ∞, T
(32)
exp(τ k) wk = (1 + o(1))
k=1 ∞
(33)
exp(−τ k) wk = (1 + o(1))
k=T
exp(τ T ) wT , 1 − e−τ exp(−τ T ) wT . 1 − e−τ
Proof. Write T
exp(τ k) wk = exp(τ T ) wT
T −1
exp(−τ k)
k=0
k=1
wT −k . wT
Set M = log T − 1. If T is large, we have M < T − 1, and hence we can write (34)
T −1
exp(−τ k)
k=0
M T −1 wT −k wT −k wT −k = exp(−τ k) + exp(−τ k) . wT wT wT k=0
k=M +1
Using Lemma 1, we get (1 − ηT )
M
exp(−τ k)
k=0
M
wT −k (1 + ηT ) exp(−τ k), wT M
exp(−τ k)
k=0
k=0
where ηT = o(1) as T → ∞. Thus lim
T →∞
M k=0
exp(−τ k)
wT −k 1 = . wT 1 − e−τ
435
ILL-POSED INVERSE PROBLEM
The last sum in (34) satisfies T −1
exp(−τ k)
k=M +1
wT −k wT
T −1 k=M +1 ∞
exp(−τ k)
1+
C Mµ
k
exp −(τ − CM −µ ) k .
k=M +1
This term tends to 0 as M → ∞. Thus we obtain (32). Equation (33) is proved similarly. Lemma 3. Let ξi be i.i.d. N (0, 1) random variables. Then, for any k ∈ N, N ∈ N, and x > 0,
xe2 N +3/2 x 2 . E ξk2 I ξk x exp − 2 3 + 2N
(35)
Proof. For any 0 < λ < 12 , Eξk2 I ξ2k x exp(−λx) Eξk2 exp λξ2k = exp(−λx) Eξk2 exp λξk2 exp(λξ12 ) i=k
= exp(−λx)(1 − 2λ)−3/2 (1 − 2λ)−N 3 + 2N log(1 − 2λ) . = exp −λx − 2
(36)
The minimum with respect to λ of the right-hand side of (36) is attained at λ = 1 2 (1 − (3 + 2N )/x). Substituting this λ into (36) we get (35). Proof of Theorem 1. Let M be a sufficiently large integer satisfying N M < min(W ∗ /2, log W ∗ /2). In this proof we denote by C the constants that do not depend on M , N , ε, and θ. We decompose the risk of the estimator θ∗ into three parts (37)
Rε (θ∗ , θ) S1 + S2 + S3 ,
sup θ∈Θ(α,L)
where
S1 =
sup θ∈Θ(α,L)
S3 =
sup
∗ −M W
Eθ (θk∗
k=1 ∞
− θk ) , 2
S2 =
sup
∗ +M W
Eθ (θk∗ − θk )2 ,
θ∈Θ(α,L) k=W ∗ −M
Eθ (θk∗ − θk )2 .
θ∈Θ(α,L) k=W ∗ +M
Consider first the term S1 . Using (32) for the subexponential sequences wk = rk and
436
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
wk = krk , and Lemma 1 for wk = rk , we have S1 =
∗ −M W
sup θ∈Θ(α,L)
2ε
2
2 exp(ρk) rk Eθ yk I y2k < 2 ε2 ρ k − εξk
k=1
∗ W −M
exp(ρk) rk + 2
sup θ∈Θ(α,L)
k=1 ∗
2ε2
W −M
exp(ρk) rk + 4ε2
∗ W −M
exp(ρk) rk Eθ yk2 I y2k < 2ε2 ρ k
k=1
exp(ρk) rk kρ
k=1
k=1
(38)
∗ −M W
∗
Cρε exp(ρW ) r 2
W∗
∗
W exp(−ρM ) C ψε (α, L) exp(−ρM ).
Next, we bound from above the term S2 . This is the main term in (37) and we will analyze it using a renormalization argument. We begin with some simple remarks. Denote W ∗ +M +N 2 exp(γk) qk θk L . ΘM = θ = (θ1 , θ2 , . . . ) : k=W ∗ −M −N
Clearly, Θ(α, L) ⊂ ΘM . Now we change the variables from θk to νk by setting, for k 1 − W ∗, θk+W ∗ = θk+W ∗ νk = ∗ ε 2ρW rk+W ∗ exp(ρ(k + W ∗ ))
A∗ r W ∗ ψε (α, L) exp(ρk) rk+W ∗
1/2 ,
and let νk∗ be derived from θk∗ by the same transformation. If θ ∈ ΘM , the sequence ν = {νk } belongs to the set ΞM =
M +N
ν ∈ Ξ:
exp(γk) rk+W ∗ qk+W ∗ νk2
k=−M −N
L exp(−γW ∗ ) 2ε2 ρW ∗
,
where Ξ is the set of all sequences of the form ν = (ν1−W ∗ , . . . , ν0 , ν1 , . . . ). Now, in view of Lemma 1, applied to the subexponential sequences wk = rk and wk = rk qk , there exists η = ηε depending only on W ∗ , such that η → 0, as ε → 0, and max rk+W ∗ (1 + η) rW ∗ ,
(39) (40)
|k|M
min
|k|M +N
rk+W ∗ qk+W ∗ (1 − η) rW ∗ qW ∗ .
Fix 0 < δ < 1, and assume that ε is small enough to have simultaneously η < δ and (1 − η)−1 < 1 + δ. Then relation (40) implies that min|k|M +N rk+W ∗ qk+W ∗ rW ∗ qW ∗ /(1 + δ), and therefore ΞM ⊂ ΞδM =
ν ∈ Ξ:
M +N
exp(γk) νk2 E ∗ (1 + δ) .
k=−M −N
Furthermore, (39) guarantees that max|k|M rk+W ∗ (1 + δ) rW ∗ . These remarks
437
ILL-POSED INVERSE PROBLEM
imply that, for ε small enough, ∗ W +M
S2 sup θ∈ΘM
Eθ (θk∗ − θk )2
k=W ∗ −M
ψε (α, L) sup A∗ ν∈Ξδ
M
(41)
M
exp(ρk)
k=−M
(1 + δ) ψε (α, L) sup A∗ ν∈Ξδ
M
M
rk+W ∗ Eθ (νk∗ − νk )2 rW ∗
exp(ρk) Eθ (νk∗ − νk )2 .
k=−M
Using the inequality (x + y)2 (1 + δ) x2 + (1 + δ −1 ) y 2 for any x, y ∈ R, we get ξk+W ∗ ∗ 2 Eθ (νk − νk ) = E νk + √ 2ρW ∗ k+N 2 2 k ξl+W ∗ ρ ×I 1+ ∗ νl + √ − νk ρ W 2ρW ∗ l=k−N k+N 2 k ξl+W ∗ ρ 2 1+ ∗ νl + √ (1 + δ) νk P < ρ W 2ρW ∗ l=k−N
+ (1 + δ −1 )(2ρW ∗ )−1 .
(42)
Next, using the inequality (x + y)2 (1 − δ) x2 + (1 − δ −1 ) y 2
for any x, y ∈ R,
one obtains k+N l=k−N
ξl+W ∗ νl + √ 2ρW ∗
2 (1 − δ)
k+N
νl2 −
l=k−N
k+N 1−δ 2 ξl+W ∗ 2ρW ∗ δ l=k−N
and, therefore, k+N 2 k ξl+W ∗ ρ 1+ ∗ νl + √ P < ρ W 2ρW ∗ l=k−N k+N k+N k 1−δ ρ 2 2 1+ ∗ νl − ξl+W ∗ < P (1 − δ) 2ρW ∗ δ ρ W l=k−N l=k−N k+N k+N 1 2 I (43) νl2 V + P ξl+W , ∗ > δ ∗ 4δρW l=k−N
l=k−N
where 1 V = 2δ + 1−δ By Markov’s inequality (44)
M 1+ ∗ W
k+N 1 2 P ξl+W ∗ > δ 4δρW ∗ l=k−N
ρ . ρ
2N + 1 . 4δ 2 ρW ∗
438
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
Define
M
A(N, M ) = sup
ν∈ΞδM k=−M
exp(ρk) νk2
I
k+N
νl2
V
.
l=k−N
Combining (42)–(44) one obtains sup ν∈ΞδM
M k=−M
exp(ρk) Eθ (νk∗ − νk )2
2N + 1 (1 + δ) + A(N, M ) + 2 sup 4δ ρW ∗ ν∈Ξδ
M
(45)
(1 + δ) A(N, M ) +
M
exp(ρk) νk2
+
k=−M
C exp(ρM ) W∗
C (N + exp(ρM )), W∗
where for the last inequality we used that, since γ > ρ, sup ν∈ΞδM
M
M
exp(ρk) νk2 sup
ν∈ΞδM
k=−M
exp(γk) νk2 E ∗ (1 + δ).
k=−M
From (41) and (45) we get that for any 0 < δ < 1 and for all ε small enough,
ψε (α, L) C(N + exp(ρM )) 2 S2 (46) (1 + δ) A(N, M ) + . A∗ W∗ Our next goal is to show that A(N, M ) is close to A∗ . We will proceed in steps. The first step is to remark that k+N M A(N, M ) sup exp(ρk) νk2 I νl2 V (1 + δ) ν∈ΞδM k=−M
V
(47)
sup ν∈Ξ∗
where
exp(ρk) νk2 I
k=−M
Ξ∗ =
l=k−N
M
ν ∈ Ξ:
k+N
νl2 1
,
l=k−N
M +N
exp(γk) νk2
E
∗
.
k=−M −N
In fact, to get (47), introduce the sequence ν ∈ Ξ such that νk = νk V (1 + δ), observe that V > 1, and use the embedding M +N M +N 2 ∗ 2 ∗ ⊆ ν ∈ Ξ: ν ∈ Ξ: V exp(γk)(νk ) E exp(γk)(νk ) E . k=−M −N
k=−M −N
Our next step is to find an upper bound for the expression in square brackets in (47). Assuming without loss of generality that N is even, we can write k+N M +N 2 2 sup (48) exp(ρk) νk I νl 1 A1 + A2 + A3 ν∈Ξ∗
k=−M −N
l=k−N
439
ILL-POSED INVERSE PROBLEM
with
−N/2−1
A1 = sup ν∈Ξ∗
exp(ρk) νk2
M +N
A2 = sup ν∈Ξ∗
I
k=−M −N
k=N/2+1
C exp
αN − 2
νl2
1
l=k−N
exp(ρk) νk2
k+N
k+N
I
1
e
ρk
C exp
k=−∞ M +N
sup ν∈Ξ∗
l=k−N
νl2
−N/2−1
ρN , − 2
eρk νk2
k=N/2+1
and
N/2
A3 = sup ν∈Ξ∗
exp(ρk) νk2 I
k=−N/2
N/2
sup ν∈Ξ∗
exp(ρk) νk2
I
k=−N/2
k+N
νl2 1
l=k−N
N/2
νl2
N/2
1
sup
ν∈Ξ
l=−N/2
exp(ρk) νk2 ,
k=−N/2
where
M +N
ν ∈ Ξ:
Ξ =
N/2
exp(γk) νk2
∗
E ,
k=−M −N
νk2
1 .
k=−N/2
Substitution of the inequalities for A1 , A2 , and A3 into (48) yields k+N M +N 2 2 sup exp(ρk) νk I νl 1 ν∈Ξ∗
k=−M −N
l=k−N
N/2
sup
(49)
ν∈Ξ
exp(ρk) νk2 + C exp
−
k=−N/2
N min(α, ρ) . 2
Next, introducing the set N , ν ∈ Θ∞ (α, L) : νk = 0 for |k| > 2
ΘN/2 ∞ (α, L)
=
we note that
N/2
sup
ν∈Ξ
N/2
exp(ρk) νk2 =
sup N/2 ν∈Θ∞ (α,L)
k=−N/2
exp(ρk) νk2
k=−N/2
N/2
(50)
sup ν∈Θ∞ (α,L)
exp(ρk) νk2 A∗ .
k=−N/2
In fact, the equality in (50) follows from the fact that taking νk = 0 for |k| > N/2 N/2 does not increase the sum k=−N/2 exp(ρk) νk2 , and thus the supremum of this sum over Ξ is attained on the sequences ν with νk = 0 for |k| > N/2.
440
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
From (47), (49)–(50) we get
N A(N, M ) V A∗ + C exp − min(α, ρ) . 2 This together with (46) and the fact that A∗ is bounded from below uniformly in ε entails
N C 2 (51) S2 ψε (α, L) V (1 + δ) + C exp − min(α, ρ) + ∗ N + exp(ρM ) . 2 W Finally, we bound from above the term S3 . Using (3) and (10) we find (52) S3 2
∞
sup
θk2 + 2ε2
θ∈Θ(α,L) k=W ∗ +M
∞
sup
exp(ρk) rk E ξk2 I y2k 2ε2 ρ k .
θ∈Θ(α,L) k=W ∗ +M
The first term in the right-hand side satisfies sup
∞
θk2 =
θ∈Θ(α,L) k=W ∗ +M
∞
sup
θ∈Θ(α,L) k=W ∗ +M
∞
sup
θ∈Θ(α,L) k=W ∗ +M
exp(−αk) qk−1 qk θk2 exp(αk) exp(−αk) qk−1
∞
qk exp(αk) θk2
k=1
−1 − αM ] ψε (α, L), C exp − α(W + M ) qW ∗ +M C exp
(53)
∗
where to get the last two inequalities we applied Lemma 2 and then Lemma 1 for the subexponential sequence wk = qk−1 . Consider the sequence θ = (θ1 , θ2 , . . . ) with θk = bk θk . For k W ∗ + M , sup
θ 2k =
θ∈Θ(α,L)
sup
k+N
θ∈Θ(α,L) l=k−N
sup
k+N
θl2 rl−1 exp(−ρl) ql exp(αl) θl2
θ∈Θ(α,L) l=k−N
L
k+N
k+N
rl−1 ql−1 exp(−γl)
l=k−N
−1 −1 rl−1 ql−1 exp(−γl) Crk−N qk−N exp[−γ(k − N )],
l=k−N
where we used Lemma 2 for the subexponential sequence wl = rl−1 ql−1 . Applying to the last expression of the previous display successively Lemma 1 and then the fact that L exp(−γW ∗ ) < 2ε2 ρW ∗ rW ∗ qW ∗ , we get sup θ∈Θ(α,L)
(54)
−1 −1 θ 2k CrW ∗ qW ∗ exp(−γ(k − N ))
Cε2 W ∗ exp(−γ(M − N )) Cε2 k exp(−γ(M − N )).
ILL-POSED INVERSE PROBLEM
441
Now we bound the last term in (52). By Lemma 3, for any k W ∗ + M we have
θ 2 2 2 2 Eξk I θ + εξk 2ε ρ k E ξk I ξk 2ρ k − ε k 2 2 N +3/2
θ 2ρ ke 1 exp − 2ρ k − ε 2 3 + 2N k 2 N +3/2 θ 2ρ ke (55) exp −ρ k + 2ρ k ε . 3 + 2N k In the rest of the proof we set M =2
(56)
log[log(ε−1 ) ∨ 1] .
For ε small enough this choice of M satisfies the assumptions on M imposed above, since W ∗ log(1/ε). Since M → ∞ as ε → 0, for any small constant c > 0 there exp[−γ(M − N )] c(ρ − ρ)2 . Then for ε < ε0 , exists ε0 such that for ε < ε0 we have √ in view of (54) we have −(ρ − ρ)k + 2ρ kθ /εk −(ρ − ρ)k/2 and, therefore, sup
∞
θ∈Θ(α,L) k=W ∗ +M
2 exp(ρk) rk Eξk2 I θ + εξk 2 ε2 ρ k
θ rk exp −(ρ − ρ) k + 2ρ k ε θ∈Θ(α,L) k=W ∗ +M k ∞ 2ρ ke2 k rk exp −(ρ − ρ) 3 + 2N 2 k=W ∗ +M N +3/2 2ρ (W ∗ + M )e2 W∗ CrW ∗ , (ρ − ρ)−1 exp −(ρ − ρ) 3 + 2N 2
sup
∞
2ρ ke2 3 + 2N N +3/2
N +3/2
where the last inequality follows from (33) of Lemma 2 with the subexponential sequence wk = k N +3/2 rk and from Lemma 1 with wk = rk . This inequality and relations (52), (53) yield S3 C[exp(−αM ) ψε (α, L) + (ρ − ρ)−1 ε2 rW ∗ ] −1 Cψε (α, L) exp(−αM ) + (ρ − ρ) W ∗ . Combining this result with (37), (38), and (51) we find Rε (θ∗ , θ) V (1 + δ)2 θ∈Θ(α,L) ψε (α, L)
N min(α, ρ) + C exp(−αM ) + exp(−ρM ) + exp − 2 −1 N + exp(ρM ) + (ρ − ρ) + . W∗ sup
It remains to take limits as ε → 0, and then as δ → 0, using the definition of M in (56) and the definition of V . This completes the proof of (26).
442
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
Proof of Theorem 2. We follow along the lines of the proof of Theorem 1 with M = 2N (cf. (56)). The argument preceding (56) is true for any fixed N and ρ > ρ, and it remains intact. Inspection of the proof of Theorem 1 after (56) shows that the choice of N and ρ defined in Theorem 2 is sufficient to get (27). 4.2. Proof of Theorem 3. For an integer M , consider the set ΘM ∞ (α, L) = θ ∈ Θ∞ (α, L) : θk = 0 for |k| > M and define AM (α, L) =
sup
M
θ∈ΘM ∞ (α,L) k=−M
exp(ρk) θk2 .
Lemma 4. For any α > 0, L > 0, lim AM (α, L) = A∗ (α, L).
(57)
M →∞
Proof. It is sufficient to note that AM (α, L) =
sup θ∈Θ∞ (α,L)
exp(ρk) θk2
|k|M
and that uniformly in θ ∈ Θ∞ (α, L) ∞
exp(ρk) θk2 −
k=−∞
+
exp(ρk) θk2 =
|k|M
exp(ρk) θk2
kM
Lemma 5. Under the assumptions of Theorem 3, for any 0 < α0 < α α and any L0 , L, L > 0, we have ˆ θ) ˆ θ) γ0 Rε (θ, Rε (θ, lim inf inf max 1− , (58) sup , sup , L ) ψ (α , L ) ψ (α γ ε→∞ θˆ 0 0 θ∈Θ(α0 ,L0 ) ε θ∈Θ(α,L) ε where inf θˆ denotes the infimum over all estimators and γ0 = α0 + ρ, γ = α + ρ. Proof. For brevity we will write W0 = W ∗ (α0 , L0 ), W = W ∗ (α , L ). Let M be a given integer satisfying 1 M < W0 , and let δ ∈ (0, 1). For Lδ = L0 (1 − δ) set M 2 Θ0 = θ : qW0 exp(α0 k)θk Lδ , θk = 0, |k − W0 | > M . |k−W0 |M
Since W0 → ∞, as ε → 0, Lemma 1 implies that qW0 /qW0 +k = 1 + o(1) uniformly in |k| M . Hence ΘM 0 ⊂ Θ(α0 , L0 ) for sufficiently small ε, and, therefore, ˆ θ) ˆ θ) Rε (θ, Rε (θ, def Rε = inf max sup , sup θˆ θ∈Θ(α0 ,L0 ) ψε (α0 , L0 ) θ∈Θ(α,L) ψε (α , L ) ε ˆ ˆ 0) R (θ, θ) Rε (θ, (59) , sup inf max , ψε (α0 , L0 ) ψε (α , L ) θˆ θ∈ΘM 0
443
ILL-POSED INVERSE PROBLEM
where 0 denotes the sequence θ with all the elements equal to 0. To handle the last expression, we use again a renormalization. Change the variables from θk to νk by setting, for |k| M , θk+W0 νk = , ε 2ρW0 rW0 exp[ρ(k + W0 )] and let νk be obtained from θˆk by the same transformation, thus defining a sequence ν ∈ Ξ. Clearly, θ ∈ ΘM 0 if and only if ν = {νk } belongs to the set ΞM 0
=
ν ∈ Ξ:
exp(γ0 k) νk2
Eδ , νk = 0, |k| > M ,
|k|M
where Eδ = E ∗ (α0 , L0 )(1 − δ). We will √ denote by Pν the Gaussian measure generated by the observations Yk = νk + ξk+W0 / 2ρW0 , |k| M , and by Eν the expectation with respect to this measure. Using this notation and noticing that, in view of Lemma 1, rW0 /rW0 +k = 1 + o(1) uniformly in |k| M , we obtain ˆ θ) 1 + o(1) Rε (θ, = ∗ ψε (α0 , L0 ) A (α0 , L0 )
exp(ρk) Eν (νk − νk )2 +
exp(ρk) Eν νk2
|k|>M
|k|M
1−δ ∗ exp(ρk) Eν (νk − νk )2 A (α0 , L0 ) |k|M
which implies, together with (59), that Rε (60)
(1 − δ) A∗ (α0 , L0 )
× sup inf (1 − δ) ν ν∈ΞM 0
exp(ρk) Eν (νk − νk ) + δλε 2
|k|M
exp(ρk) E0 νk2
,
|k|M
where (61)
λε =
2ε2 ρW0 rW0 exp(ρW0 ) A∗ (α0 , L0 ) W0 rW0 exp(ρW0 ) = ∗ . ψε (α , L ) A (α1 , L ) W rW exp(ρW )
Since νk , |k| M , are assumed to be known, the Bayes estimator, which minimizes the left-hand side of (60), can be found easily: νk = νk = νk
−1 δλε 2 2 exp ρW0 1+ (Yk − νk ) − ρW0 Yk 1−δ
|k|M
|k|M
δλε 1+ ξk+W0 νk − ρW0 νk2 exp − 2ρW0 1−δ |k|M
|k|M
−1 .
444
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
Next, by substituting this estimator in (60) one obtains (1 − δ) 2 sup (1 − δ) Rε ∗ exp(ρk) Eν (νk − νk ) A (α0 , L0 ) ν∈ΞM 0 |k|M (1 − δ)2 2 ∗ exp(ρk) νk sup A (α0 , L0 ) ν∈ΞM 0 |k|M
−1 2
δλε , 2ρW0 ξν − ρW0 ν2 ×E 1 − 1 + exp 1−δ
(62)
where ξ ∼ N (0, 1). In order to continue this inequality, note that (see (12) and (61)) log(λε ) W0 − W γ0 = lim =1− . ε→0 ρW0 ε→0 W0 γ lim
Then for all sufficiently small ε εδ we have
−1 2 δλε exp E 1− 1+ 2ρW0 ξν − ρW0 ν2 1−δ 2 " ! δ λε 2 > 1 2ρW ξν − ρW ν exp (1 − δ)2 P 0 0 (1 − δ)2 # 1 ρW0 δ 2 λε 2 ν − √ = (1 − δ) P ξ log 2 (1 − δ)2 2ρW0 ν δ 2 λε 3 2 (1 − δ) I ρW0 ν − log −2 −ρW0 log δ ν (1 − δ)2 $ 1 δ 2 λε log δ 3 2 = (1 − δ) I ν log −2 − ν ρW0 (1 − δ)2 ρW0 γ0 2 3 (1 − δ) I ν 1 − − δ . γ Combining this inequality with (62) one obtains Rε
(63)
(1 − δ)5 exp(ρk) νk2 , sup ∗ A (α0 , L0 ) ν∈ΞM 1 |k|M
M 2 where ΞM 1 = {ν ∈ Ξ0 : ν 1 − γ0 /γ − δ}. Using the change of variables νk = [1 − γ0 /γ − δ]1/2 νk we easily arrive at γ0 sup (64) exp(ρk) νk2 1 − − δ sup exp(ρk) νk2 , γ M (α,L) ν∈Θ ν∈Ξ1 ∞ |k|M
|k|M
provided that (1 − γ0 /γ − δ)−1 (1 − δ) 1. To finish the proof of the lemma it remains to substitute (64) in (63) and to take the limits of the resulting inequality first as ε → 0, then as M → ∞ (using Lemma 4), and finally as δ → 0.
445
ILL-POSED INVERSE PROBLEM
Lemma 6. Let sε (α, L) be defined by (15)–(16) and let rk and qk be subexponential. Then ˆ θ) Rε (θ, lim inf inf sup > 0, ε→∞ θˆ θ∈Θ(α,L) sε (α, L) where inf θˆ denotes the infimum over all estimators. Proof. Consider the following parametric family in Θ(α, L). Assume that all components θk are zero except θW (α,L) = ε rW (α,L) exp[ρW (α, L)] ν, where W (α, L) is defined by (15) and ν is an unknown parameter such that ν2
L exp[−γW (α, L)] = cε (α, L). ε2 rW (α,L) qW (α,L)
Using (15) and the facts that W (α, L) → ∞, as ε → 0, and the sequences rk and qk are subexponential, we get that, for ε small enough, cε (α, L) c0 with some constant c0 > 0. Therefore we have ˆ θ) ε2 rW (α,L) exp ρW (α, L) inf sup Eν νˆ(Z) − ν 2 inf sup Rε (θ, √ θˆ θ∈Θ(α,L)
ν ˆ |ν| c0
Cε rW (α,L) exp[ρW (α, L)] = Csε (α, L), 2
where Eν is the expectation with respect to distribution of a single Gaussian observation YW (α,L) Z= = ν + ξW (α,L) ε rW (α,L) exp[ρW (α, L)] and inf νˆ denotes the infimum over all Borel functions νˆ. Proof of Theorem 3. The assumption of the theorem guarantees that there exists ˆ θ)/ψε (α0 , L0 )] 1 − δ for all ε small enough. δ ∈ (0, 1) such that supθ∈Θ(α0 ,L0 ) [Rε (θ, Substituting this into (58) and choosing in (58) the value α = 4γ0 /δ − ρ > α0 , we get for α > α and for sufficiently small ε ˆ θ) δ Rε (θ, δ 1 − + o(1) 1 − . inf max 1 − δ, sup , L ) ˆ (α ψ 4 2 θ θ∈Θ(α,L) ε Thus, for ε small enough, ˆ θ) Rε (θ, δ ψε (α , L ) δ A∗ (α , L ) W rW exp(ρW ) (65) sup 1− = 1− , 2 ψε (α, L) 2 A∗ (α, L) W rW exp(ρW ) θ∈Θ(α,L) ψε (α, L) where W and W are defined by 1 + o(1) L log 2 , ρ+α ε
W
= W ∗ (α, L) =
W
= W ∗ (α , L ) =
(66)
1 + o(1) L log 2 , ρ+α ε
ε → 0.
On the other hand, it follows from Lemma 6 that the minimax risk for Θ(α0 , L0 ) satisfies ˆ θ) CrW (α0 ,L0 ) exp[ρW (α0 , L0 )] Rε (θ, inf sup ∗ W (α0 , L0 ) rW ∗ (α0 ,L0 ) exp[ρW ∗ (α0 , L0 )] θˆ θ∈Θ(α0 ,L0 ) ψε (α0 , L0 ) (67)
exp[o(1) W ].
446
L. CAVALIER, Y. GOLUBEV, O. LEPSKI, AND A. TSYBAKOV
In the last line we used that W (α0 , L0 ) = (1 + o(1)) W ∗ (α0 , L0 ) and W ∗ (α0 , L0 ) = O(W ), as ε → 0. Finally, using (65)–(67), we get ˆ θ) Rε (θ, θ∈Θ(α0 ,L0 ) ψε (α0 , L0 ) sup
ˆ θ) exp[o(1) W ] W rW exp(ρW ) Rε (θ, −→ ∞, W rW exp(ρW ) θ∈Θ(α,L) ψε (α, L) sup
as ε → 0. REFERENCES [1] A. B. Bakushinskii, On the construction of regularizing algorithms under random noise, Soviet Math. Dokl., 189 (1969), pp. 231–233. [2] L. Cavalier, G. K. Golubev, D. Picard, and A. B. Tsybakov, Oracle inequalities for inverse problems, Ann. Statist., 30 (2002), pp. 843–874. [3] L. Cavalier and A. B. Tsybakov, Sharp adaptation for inverse problems with random noise, Probab. Theory Related Fields, 123 (2002), pp. 323–354. [4] S. Efromovich, Robust and efficient recovery of a signal passed through a filter and then contaminated by non-Gaussian noise, IEEE Trans. Inform. Theory, 43 (1997), pp. 1184– 1191. [5] S. Efromovich and V. Koltchinskii, On inverse problems with unknown operators, IEEE Trans. Inform. Theory, 47 (2001), pp. 2876–2893. [6] M. S. Ermakov, Minimax estimation of the solution of an ill-posed convolution type problem, Probl. Inf. Transm., 25 (1989), pp. 28–39. [7] A. Goldenshluger and S. V. Pereverzev, Adaptive estimation of linear functionals in Hilbert scales from indirect white noise observations, Probab. Theory Related Fields, 118 (2000), pp. 169–186. [8] G. K. Golubev and R. Z. Khasminskii, Statistical approach to some inverse boundary problems for partial differential equations, Probl. Inf. Transm., 35 (1999), pp. 51–66. [9] G. K. Golubev and R. Z. Khasminskii, A statistical approach to the Cauchy problem for the Laplace equation, in State of the Art in Probability and Statistics. Festschrift for Willem R. van Zwet, IMS Lecture Notes Monogr. Ser. 36, M. de Gunst, C. Klaassen, and A. van der Vaart, eds., Institute of Mathematical Statistics, Beachwood, OH, 2000, pp. 419–433. [10] P. Hall, G. Kerkyacharian, and D. Picard, Block threshold rules for curve estimation using kernel and wavelet methods, Ann. Statist., 26 (1998), pp. 922–942. [11] I. M. Johnstone, Wavelet shrinkage for correlated data and inverse problems: Adaptivity results, Statist. Sinica, 9 (1999), pp. 51–83. [12] O. V. Lepskii, On a problem of adaptive estimation in Gaussian white noise, Theory Probab. Appl., 35 (1990), pp. 454–466. [13] B. A. Mair and F. H. Ruymgaart, Statistical inverse estimation in Hilbert scales, SIAM J. Appl. Math., 56 (1996), pp. 1424–1444. ´ and S. V. Pereverzev, Optimal Discretization and Degrees of Ill-Posedness for [14] P. Mathe Inverse Estimation in Hilbert Scales in the Presence of Random Noise, Preprint 469, Weierstrass Institute for Applied Analysis and Stochastics, Berlin, 1999. Available online at http://www.wias-berlin.de/publications/preprints/469/ [15] F. Natterer, Error bounds for Tikhonov regularization in Hilbert scales, Appl. Anal., 18 (1984), pp. 29–37. [16] M. Pensky and B. Vidakovic, Adaptive wavelet estimator for nonparametric density deconvolution, Ann. Statist., 27 (1999), pp. 2033–2053. [17] V. N. Sudakov and L. A. Khalfin, Statistical approach to ill-posed problems in mathematical physics, Soviet Math. Dokl., 157 (1964), pp. 1094–1096. [18] Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes, Ann. Statist., 26 (1998), pp. 2420–2469. [19] A. B. Tsybakov, On the best rate of adaptive estimation in some inverse problems, C. R. Acad. Sci. Paris Ser. I Math., 330 (2000), pp. 835–840.