This problem has been addressed in different papers Neumann (1997);
Efromovich ... (2) is similar to that of Cavalier and Hengartner (2005) with noisy
singular values (3). In ... with noise not only on the eigen-values but on all the
components.
Wavelet deconvolution with noisy eigen-values∗ Laurent Cavalier † and Marc Raimondo‡ Universit´e Aix-Marseille 1 and University of Sydney Revised April 2006 To appear in IEEE Trans. Signal Process.
Abstract Over the last decade there has been a lot of interest in wavelet-vaguelette methods for the recovery of noisy signals or images in motion blur. Non-linear wavelet estimators are known to have good adaptive properties and to outperform linear approximations over a wide range of signals and images, see e.g. the recent WaveD method of Johnstone, Kerkyacharian, Picard and Raimondo (2004) [JKPR] or the ForWarD method of Neelamani, Choi and Baraniuk (2004) and also Fan and Koo (2002) in the density setting. In the de-blurring setting wavelet-vaguelette methods rely on the complete knowledge of a convolution operator’s eigen-values. This is an unlikely situation in practice, however. A more realistic scenario, such as would arise when passing the Fourier basis as an input signal through a Linear-Time-Invariant system, is to imagine that one also observes a set of noisy eigen-values. In this paper we define a version of the WaveD estimator which is near-optimal when used with noisy eigen-values. A key feature of our method includes a data-driven method for choosing the fine resolution level in WaveD estimation. Asymptotic theory is illustrated with a wide range of finite sample examples.
1
Introduction
We observe the stochastic process Y (t) = g ∗ f (t) + εξ(t),
t ∈ T = [0, 1],
(1)
where g is a convolution kernel, f is an unknown function, ξ is a white noise and 0 < ε < 1 is the noise level. Both f and g are supposed to be periodic on T and g ∗ f (t) denotes the circular convolution. The model (1) illustrates the action of a Linear-Time-Invariant (LTI) system H on an input signal f when the data are corrupted with additional noise. This is an important model for the problem of recovery of noisy signals (or images) in motion ∗ Key Words and Phrases. Adaptive estimation, deconvolution, wavelet, vaguelette, SVD, WaveD. Short Title: Wavelet deconvolution with noisy eigen-values. AMS 2000 Subject Classification: primary 62G05; secondary 62G08 † Universit´e Aix-Marseille 1, CMI, 39 rue Joliot-Curie 13453 Marseille cedex 13, France
[email protected] ‡ School of Mathematics and Statistics, the University of Sydney, NSW 2006, Australia.
[email protected]
1
blur, see Bertero and Boccacci (1998); Neelamani et al. (2004). Using the Singular Value Decomposition (SVD) of the convolution operator it is customary to write the model (1) in the Fourier domain yℓ = gℓ fℓ + εξℓ , ℓ ∈ Z, (2) R where eℓ (t) = e2πiℓt and fℓ = hf, eℓ i, gℓ = hg, eℓ i with hf, gi = T f g¯ and ξℓ = hξ, eℓ i are i.i.d. standard (complex-valued) normal random variables. Our aim in this paper is to recover the input function f from noisy observations (1). This is the so-called deconvolution problem. Most of the existing deconvolution methods (see Section 1.2) require the full knowledge of the convolution kernel g (also known as the impulse response function). In practice however, the kernel function g is generally not known in advance and one has to test the LTI-system to estimate g before estimating f . A realistic scenario is to imagine that it is possible to choose input signals f to be sent through the LTI-system to gather some information about the impulse response function g. For example, if we pass the Fourier basis f = (eℓ )ℓ in (1), equation (2) becomes: xℓ = gℓ + εzℓ ,
ℓ ∈ Z,
(3)
where zℓ are i.i.d. standard (complex-valued) Gaussian r.v.’s independent of ξℓ , and noise level 0 < ε < 1. In this paper we propose a wavelet estimator of f based on the two sets of noisy Fourier coefficients (2) and (3). In the finite sample implementation of the model (1) at points ti = i/n, i = 1, ..., n it is customary to define the noise level as √ (Aε ) ε = σ/ n, where σ is the noise standard deviation and n is the sample size. We consider ordinary smooth convolution where the Fourier coefficients of g decay in a polynomial fashion: |gℓ | ∼ C|ℓ|−ν .
(Cν )
Over a wide range of function class our proposal can recover the unknown function f in a near-optimal fashion. For smooth functions (dense phase) we derive an accuracy of order: log n α n
,
where α =
2s , 2(s + ν) + 1
(4)
performance being measured in Mean Integrated Square Error MISE. Here n denotes the usual sample size and s plays the role of a smoothness index for our target function f (taken in a large class which includes spatially inhomogeneous functions). Our result confirms that a phase transition occurs for more irregular functions (sparse phase) as in the case of fully know eigen-values [JKPR]. This is further detailed in Section 3. An illustration of the models (1) and (3) are given in Figures 3 and 4, respectively, using the four test functions depicted on Figure 1. Figure 2 depicts the noise free version of the data in Figure 3. Most Figures and tables presented in this paper can be reproduced using the NEWaveD1.0 software package available at http://www.maths.usyd.edu.au/u/marcr/.
1.1
A practical viewpoint
In our mind the framework of noisy eigen-values in deconvolution is rather practical. Indeed, if one considers a periodic convolution model, then the SVD basis is known to be the Fourier basis. There exist many applications of deconvolution, such as LIDAR (Light Detection
(a) Lidar
(b) Bumps 10
1.4 9 1.2
8
1
7 6
0.8
5 0.6 4 0.4 3 0.2
2
0 −0.2
1 0 0
0.2
0.4
0.6
0.8
1
0
0.2
(c) Cusp
0.4
0.6
0.8
1
0.8
1
(d) Doppler
2 1.8 1 1.6 1.4
0.5
1.2 1
0
0.8 −0.5
0.6 0.4
−1 0.2 0
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
Figure 1: Four (inhomogeneous) signals t → f (t), ti = i/n, i = 1, ..., n = 2048. (a) LIDAR; (b) Bumps; (c) Cusp; (d) Doppler.
(a) Blurred LIDAR
(b) Blurred Bumps
1.5
2 1.5
1
1 0.5
0
0.5
0
0.5
1
0
0
(c) Blurred Cusp
0.5
1
(d) Blurred Doppler
2
1 0.5
1.5
0 1
0.5
−0.5
0
0.5
1
−1
0
0.5
1
Figure 2: Signals of Figure 1 after blurring under (Cν ) with DIP=ν = 0.5 and C = 0.25.
And Ranging) remote sensing, Je Park et al. (1997) or the well-known blurred images of the Hubble satellite, Engl et al. (1996). In many cases the convolution kernel is supposed to be known whereas in fact this it has studied and tested with known functions or phantom images. In other words, the knowledge of the kernel is based on an estimation of the true kernel. Hence from the practical point of view noisy eigen-values are rather a standard set-up than a specific one. About the noise level in our simulations. Consider the problem of tomography which is not exactly in our framework but can be used as a benchmark with the optimal rate of order n−2s/(2s+3) , Johnstone and Silverman (1990) ; Cavalier (2001). Under our assumption (Cν ) this rate corresponds to ν = 1 (0.5 for inverting the Radon transform plus 0.5 for the curse of dimension). In tomography the number of observations may be of order of millions, Johnstone and Silverman (1990); this corresponds more or less to our assumptions (Cν ), (Aε ) with ν = 1, n = 2048 and σ = 0.03 as used in Section 4.
(a) Noisy−blurred LIDAR
(b) Noisy−blurred Bumps 1.5
1
1 0.5 0.5 0
0 0
0.5
1
(c) Noisy−blurred Cusp
0
0.5
1
(d) Noisy−blurred Doppler 0.5
1.6 1.4 1.2
0
1 0.8 −0.5
0.6 0
0.5
1
0
0.5
1
Figure 3: Blurred signals of Figure 2 plus noise with s.d σ = 0.05, in each case the BSNR defined at (20) is approximately 15 dB.
(a) Known eigen values
(b) Noisy eigen values
0
0
−1
−1
−2
−2
−3
−3
−4
−4
−5
−5
−6 −500
0
500
−6 −500
0
500
Figure 4: ℓ → log |xℓ | where xℓ are eigen-values (3) defined ν ) with ν = 0.5 and C = 0.25. √ under (C√ Left noise level ε = 0 (noise free), right noise level ε = σ/ n = 0.05/ 2048 = 1.11 × 10−3 .
1.2
Related works
Over the last decade many wavelet methods have been developed to recover f from indirect observations Donoho (1995); Abramovich and Silverman (1998); Pensky and Vidakovic (1999); Walter and Shen (1999); Johnstone (1999); Cavalier and Koo (2002); Fan and Koo (2002); Kalifa and Mallat (2003). See also Hall, Ruymgaart, van Gaans and van Rooij (2001); Neelamani, Choi and Baraniuk (2004) and the WaveD method of Johnstone, Kerkyacharian, Picard and Raimondo (2004) ([JKPR] in the sequel) on which our paper is based. The mathematical idea behind the 1D-WaveD algorithm is described in Donoho and Raimondo (2004) and extented to the 2D-setting in Donoho and Raimondo (2005). More generally, the study of stochastic inverse problems is a growing field of research see e.g. O’Sullivan (1986); Mair and Ruymgaart (1996); van Rooij and Ruymgaart (1996); Math´e and Pereverzev (2001); Cavalier et al. (2002); Cavalier and Tsybakov (2002), Johnstone and Raimondo (2004).
In most part of the statistical literature the operator is supposed to be known. This assumption is very restrictive as it appears that in many cases the operator (and its set of eigen-values) is completely unknown (or at least not exactly known). This is a very important point as the operator’s eigen-values have a very strong influence on the rate of convergence and tuning parameters of statistical estimators. Thus, one main statistical issue is then, what happens if we suppose that the operator is not exactly known? What happens if we have a noisy operator? This problem has been addressed in different papers Neumann (1997); Efromovich and Koltchinskii (2001); Cavalier and Hengartner (2005); Hoffmann and Reiss (2004); Marteau (2005). In the convolution setting (2) this is related to the notion of blind deconvolution, see e.g. Mathis and Douglas (2003); Pruessner and O’Leary (2003). In our paper, the model (2) is similar to that of Cavalier and Hengartner (2005) with noisy singular values (3). In contrast, our method is totally different as it is based on non-linear wavelet estimation whereas Cavalier and Hengartner (2005) used Unbiased Risk Estimation. Our asymptotic result (Section 3.3) agrees with the optimal theory by Efromovich and Koltchinskii (2001) and more recently by Hoffmann and Reiss (2004) for general noisy operators. These last two papers use the Galerkin projection method for a general operator with noise not only on the eigen-values but on all the components. On the other hand, our method is based on the noisy SVD approach (3) as in Cavalier and Hengartner (2005). This offers significant computational advantage since in the implementation of our method we use the fast algorithm of Donoho and Raimondo (2004) which requires only O(n(log n)2 )steps and extends naturally to images Donoho and Raimondo (2005). Section 4 shows that our estimator has very good numerical properties. While less general than the Galerkin projection methods cited previously our approach seems to be more suited for applications.
1.3
Paper organisation
We begin in Section 2 by preliminaries on WaveD estimation. The method and main results are presented in Section 3. Numerical performances are given in Section 4. Proofs are summarised in Section 5.
2 2.1
Preliminaries The WaveD-method
The WaveD estimator [JKPR] is based on hard thresholding of a wavelet expansion as follows: (notice that here and in the sequel κ will denote the multiple index (j, k)) fˆ =
X
κ∈Λn
βˆκ Ψκ I{|βˆκ | ≥ λj }
(5)
where λj is a threshold and Φ, Ψ denote the (periodised) Meyer scaling and wavelet function with the convention that Ψ−1 = Φ, see e.g. Meyer (1990), Mallat (1998). The WaveD paradigm stipulates that deconvolution and wavelet transform can be performed simultaneously by computing the wavelet coefficients in the Fourier domain: X yℓ ¯κ (6) Ψ βˆκ = ℓ gℓ ℓ∈Cj
S where Ψκℓ = hΨκ , eℓ i and Cj = {l : Ψκℓ 6= 0} ⊂ (2π/3) · [−2j+2 , −2j ] [2j , 2j+2 ]. Both the wavelet transform (6) and its inverse (5) can be computed in O(n(log n)2 )-steps using the algorithm of Donoho and Raimondo (2004). The tuning parameters of WaveD are: • The range of resolution levels (frequencies) where the approximation (5) is used: Λn = {(j, k), −1 ≤ j ≤ j1 , 0 ≤ k ≤ 2j }, here j1 is the finest resolution level of WaveD. Under assumption (Aε ) with σ = 1, [JKPR] have shown that the optimal choice of j1 for the case of known g satisfies 2j1 ≍ (n/ log n)1/(1+2ν) .
(7)
• The threshold value λj has four input parameters: λj = σ ˆ γ σj cn
(8)
– σ ˆ : an estimate of the noise standard deviation, σ. If yJ,k = hY, ΨJ,k i, denote the finest scale wavelet coefficients of the observed data, then σ ˆ = m.a.d.{yJ,k }/.6745, where m.a.d. is median absolute deviation. – γ: a constant which √ depends√on the tail of the noise distribution. For Gaussian noise, the range√ 2 ≤ γ ≤ 6 gives good result in practice. In software, the default value is 6. Theoretical properties of γ are given in [JKPR]. – σj : a level-dependent scaling factor which depends on the convolution kernel. For known convolution kernel [JKPR] have shown that 1/2 X σj := τj = |Cj |−1 |gℓ |−2
(9)
ℓ∈Cj
yields near-optimal asymptotic properties for the WaveD estimator. – cn : a sample size-dependent scaling factor reminiscent of the Universal threshold, cn =
2.2
log n 1/2 n
.
Besov spaces of periodic functions
s (T ) may be found in The standard definition of Besov spaces of periodic functions Bp,r [JKPR], section 2.4. Here we shall only recall that the Besov spaces are characterised by the behaviour of the wavelet coefficients (as soon as the wavelet is periodic and has enough smoothness and vanishing moments).
Definition 1. For f ∈ Lq (T ), q ≥ 1, X X X s f= βj,k Ψj,k ∈ Bp,r (T ) ⇐⇒ 2j(s+1/2−1/p)r [ |βj,k |p ]r/p < ∞. j,k
j≥0
(10)
0≤k≤2j
The index s indicates the degree of smoothness of the function. Due to the differential averaging effects of the summation parameters p and r, the Besov spaces capture a variety of smoothness features in a function including spatially inhomogeneous behaviour, see Donoho et al. (1995).
3 3.1
WaveD estimation with noisy eigen-values The method
In the case where g is not fully known but one observes noisy eigen-values (3) we apply the WaveD-paradigm (6) replacing gℓ ’s with xℓ ’s i.e. we set X yℓ ¯ κ, β˜κ = (11) Ψ xℓ ℓ ℓ∈Cj
as our estimated wavelet coefficients, and we define the WaveD threshold parameter as ˆj = σ λ ˆγσ ˆj cn
(12)
where σ ˆ , γ and cn are defined as in Section 2.1 and 1 X 2 σ ˆj := τˆj = |Cj |−1 |xℓ |−2 .
(13)
ℓ∈Cj
Our main result (Section 3.3) states that the WaveD estimator (5) fitted with coefficients (11) and threshold (12) achieves near-optimal rate result (4) provided that we stop the wavelet expansion early enough. To define the maximum resolution level let n o M = min ℓ, ℓ ≥ 0 : |xℓ | ≤ ℓ1/2 ε (log 1/ε2 ) , (14) denote the maximum Fourier frequency that we permit in the WaveD formula (6). Then we define the maximum wavelet resolution level as ˆj1 = ⌊log2 (M )⌋ − 1,
(15)
where ⌊x⌋ is the largest integer below x. This process is illustrated on Figure 5. Adpative level selection by Fourier thresholding 0
−1
−2
−3
−4
−5
−6 −500
−400
−300
−200
−100
0
100
200
300
400
500
Figure 5: Plain noisy curve: ℓ → log |xℓ | where xℓ are noisy eigen-values (3) of Figure 4 (right plot). Dashed smooth curve: ℓ → log(|ℓ|1/2 ε (log 1/ε2 )). Here the stopping time (14) is M = 92 which gives an estimated maximum resolution level ˆj1 = ⌊log2 (92)⌋ − 1 = 5.
3.2
Adaptive level selection
In wavelet deconvolution of noisy signals the choice of the fine resolution level is a key parameter which requires fine tuning. The asymptotic theory of [JKPR] shows that the more ill-posed a problem is (larger ν) the sooner the wavelet expansion must stop (see (7)). This contrasts with the case of direct estimation (ν = 0) where all resolution levels may be kept, Donoho et al. (1995). Hence in the WaveD tuning of [JKPR] the determination of j1 is an important practical issue since the definition (7) relies on the ’unknown’ parameter ν. From a practical viewpoint our definition of (15) is much more relevant since it is fully data-driven and depends only on observable eigen-values (with or without noise). Thus, even in the case of non-noisy eigen-values, this method for choosing the fine resolution level is more precise than the asymptotic condition (7).
3.3
Asymptotic properties
Theorem 1. Suppose that we observe the noisy Fourier coefficients (2), (3) with noise √ s (T ) with p ≥ 1, s ≥ 1/p. level ε = 1/ n under Assumption (Cν ). If f belongs to Bp,p √ Then, for γ ≥ 4 8π + 1 the WaveD estimator (5) fitted with coefficients (11), threshold (12) with σ ˆ = 1 and maximum level (15), is such that, under assumption (Aε ) with σ = 1, cn = (log(n)/n)1/2 and n → ∞ α 2 2α = O n−1 log(n) , (16) Ekfˆ − f k2 = O cn
1 1 − , p 2 2(s − 1p + 21 ) 1 1 1 α = α2 := ≤ s < (2ν + 1) − , , if p p 2 2(s + ν − 1p ) + 1 2s α = α1 := , if s > (2ν + 1) 2(s + ν) + 1
and
1 1 2 2α1 ˆ Ekf − f k2 = O cn log(n) , if s = (2ν + 1) − . p 2
(17) (18)
(19)
s (T ) where p ≥ 2 Remark 1: Linear versus non-linear. It is known that for Besov spaces Bp,r rate result (17) can be achieved by linear estimators for known eigen-values see e.g. Fan and Koo (2002) in the density setting. For noisy eigen-values and Sobolev-type constraints s (T )), Cavalier and Hengartner (2005) have shown that SVD-projection (linear) esti(B2,2 mators can achieve rate result (17). On the other hand, it is known that for Besov spaces s (T ) where p < 2 non-linear wavelet estimators outperform linear estimators, see e.g. Bp,r the discussion in Mallat (1998) p.395 and Fan and Koo (2002). Important classes of signals and images for which non-linear wavelet estimators outperform linear estimator include the 1 and B 1 (T ) with B 1 (T ) ⊂ BV (T ) ⊂ B 1 (T ) where BV (T ) denotes Besov spaces B1,1 1,∞ 1,1 1,∞ the class of bounded variation functions. We note that in this case p = 1 and that when 1 ≤ p < 2 there is an elbow in the convergence rate, this is further discussed below.
Remark 2: Near-optimality. The presence of an elbow in the convergence rate in the deconvolution setting was already notice in [JKPR] for the case of fully known eigenvalues and the two phases (17) and (18) are usually referred as the dense and sparse phase respectively. We note that in the case of noisy eigen-values the phase transition occurs along the same smoothness region: s = (2ν + 1)( p1 − 21 ) as for known eigen-values. The rate (exponent) in the dense phase (17) and in the sparse phase (18) are the same as in
WaveD estimation with fully known eigen-values [JKPR]. This shows that there is no (high) price to pay for using noisy eigen-values in WaveD estimation. This is confirmed in our simulation study of Section 4. These results are consistent with the existing theory of Efromovich and Koltchinskii (2001); Cavalier and Hengartner (2005); Hoffmann and Reiss (2004). The lower bound derived in Hoffmann and Reiss (2004) shows that our estimator is near-optimal. There is an extra-log penalty on the boundary phase (19) as in H¨ardle, Kerkyacharian, Picard and Tsybakov (1998) (Chap.10). s (T ) and various Lq -metrics. Using Proposition 3 p.568 Remark 3: General Besov spaces Bp,r in [JKPR] (Besov embeddings) we see that the rate results (17), (18) and (19) also hold s (T ) where we do not have p = r. Using arguments similar to for general Besov spaces Bp,r those of [JKPR] it is possible to extend Theorem 1 to Lq -loss functions, q > 1, deriving the same rates as in [JKPR] with an extra log term on the boundary region.
Remark 4: Different noise levels. Our results are presented with the same noise level ε in the data (2) and in the eigen-values (3). First, it is natural to think that if noisy eigen-values are obtained by passing the Fourier basis as impulse functions then the noise levels should be the same in either cases. Secondly, it is possible to extend the proof of Theorem 1 to the case where the noise levels are different. In this case the results agree with Efromovich and Koltchinskii (2001) or Hoffmann and Reiss (2004) where the rate of convergence corresponds to the highest of the two noise level. (a)
(b) 6
1.5
4
1
2 0.5 0 0 0
0.5
1
−2
0
(c)
0.5
1
(d)
2.5 1
2
0.5
1.5
0 1
−0.5
0.5 0
−1 0
0.5
1
0
0.5
1
Figure 6: Four WaveD estimates based on noisy the observations of Figure 3 and Figure 4: (a) LIDAR, (b) Bumps, (c) Cusp, (d) Doppler. Here the adaptive level selection (15) is ˆj1 = 5 as in Figure 5.
4 4.1
Numerical performances Implementation
The implementation of WaveD transforms is described in details in Donoho and Raimondo (2004) and Donoho and Raimondo (2005). The WaveD estimator as prescribed in Section 3.1 is available from the the Noisy Eigen-values WaveD (NEWaveD1.0) software package available at http://www.maths.usyd.edu.au/u/marcr/
4.2
Model and simulations
We study the effect of noisy eigen-values on finite sample WaveD estimation for different Blurred-Signal-to-Noise levels (20) within the range 10 − 25dB and different degrees of ill posedness (DIP = ν) within the range 0.1 − 1. We report the results for ν = 0.5 in Table 1 and for ν = 1 in Table 2. We use four signals borrowed from the wavelet literature and depicted Figure 1. For comparison purposes we give the results for WaveD estimation from known eigen-values as described in Section 2.1 as well as the results for WaveD estimation from noisy eigen-values as described in Section 3.1. In Table 1 and Table 2 we report the √ results when the same noise level ε = σ/ n is used in the signal and in the eigen-values. In the NEWaveD1.0 software it is possible to run simulations where the noise level in (3) is smaller than the noise level in (2). For such cases Monte-Carlo results (not reported in the present paper) show that RMISE lies within the range ’Known’ and ’Noisy’ eigen-values given in Table 1 and Table 2. Table 1: (DIP= 0.5) Monte-Carlo approximations to Known eigen-values Noisy eigen-values Relative increase in RMISE Known eigen-values Noisy eigen-values Relative increase in RMISE Known eigen-values Noisy eigen-values Relative increase in RMISE
σ 0.1 0.1 0.1 0.05 0.05 0.05 0.03 0.03 0.03
BSNR(dB) 10 10 10 15 15 15 20 20 20
Lidar 0.1483 0.1929 0.3 0.0916 0.1134 0.24 0.0649 0.0768 0.18
Known eigen-values Noisy eigen-values Relative increase in RMISE Known eigen-values Noisy eigen-values Relative increase in RMISE Known eigen-values Noisy eigen-values Relative increase in RMISE
BSNR(dB) 10 10 10 15 15 15 24 24 24
Lidar 0.1336 0.1627 0.21 0.1190 0.1314 0.10 0.0888 0.0921 0.04
Ekfˆ − f k22 .
Bumps 0.8710 0.9431 0.08 0.7578 0.7880 0.04 0.6067 0.6338 0.05
Table 2: (DIP= 1) Monte-Carlo approximations to σ 0.05 0.05 0.05 0.03 0.03 0.03 0.01 0.01 0.01
q
Cusp 0.045 0.1559 2.2 0.0326 0.0751 1.3 0.0266 0.0471 0.77
Doppler 0.2520 0.3320 0.3 0.1725 0.2040 0.20 0.1284 0.1446 0.13
q Ekfˆ − f k22 .
Bumps 0.9535 1.0620 0.11 0.9152 0.9663 0.06 0.8385 0.8502 0.01
Cusp 0.0326 0.0864 1.6 0.0359 0.0549 0.53 0.0155 0.0197 0.27
Doppler 0.2786 0.3597 0.30 0.2462 0.2870 0.17 0.1852 0.1923 0.04
The results presented in Table 1 and Table 2 are based on 1000 independent simulations √ of the models (2) and (3) under assumption (Aε ) with ε = σ/ n, n = 2048. In Table 1, we simulate data (2), (3) under assumption (Cν ) with ν = 0.5 and C = 0.25. For each signal of Table 1, we use three different noise levels taking σ = 0.1, 0.05 and 0.03. An illustration the models (1) and (3) in the medium noise setting (σ = 0.05) is depicted
Figure 3 and Figure 5, respectively. The corresponding WaveD estimates are depicted on Figure 6. For the simulations of Table 1, each signal of Figure 1 has been scaled so that the Blurred-Signal-to-Noise-Ratios (in dB) are approximately 10, 15 and 20 where BSN RdB = 10 log10
||f ∗ g||2 . σ2
(20)
In Table 2, we simulated data (2), (3) under assumption (Cν ) with ν = 1 and C = 0.25. For each signal of Table 2, we use three different noise levels taking σ = 0.05, 0.03 and 0.01 so that the Blurred-Signal-to-Noise-Ratios (in dB) are approximately 10, 15 and 24.
4.3
Results and discussion
The general pattern in Table 1 and Table 2 confirms that the relative increase in RMISE is well under control. A closer look at Table 1 and Table 2 shows that the presence of noise in eigen-values slightly increases the RMISE. However, even in high noise (low BSNR) the relative increase in RMISE is less than 30% except for the Cusp where we observe a stronger effect. For BSNR = 20dB or larger we observe a much smaller effect. This shows that in WaveD estimation one can replace the eigen-values by their noisy version without loosing too much on the precision of estimation. Comparing Table 1 and Table 2 for a given noise level we see that a larger DIP worsen the RMISE as to be expected from the theory. On the other hand we note that, for a fixed noise level, increasing ν does not affect the Relative increase in RMISE. These results shows that WaveD retains its good adaptive properties a over wide range of BSNR and DIP when used with noisy eigen-values. Acknowledgement. Both authors are grateful for the comments of an Associate Editor and the referees which have improved the original version significantly. This paper was written when Laurent Cavalier was visiting the University of Sydney funded by the ACIOpsyc grant and by the University of Sydney.
5
Proofs
Denote
o n Mc = min ℓ, ℓ ≥ 0 : |gℓ | ≤ ℓ1/2 ε(log 1/ε2 )4/3 , o n Md = min ℓ, ℓ ≥ 0 : |gℓ | ≤ ℓ1/2 ε(log 1/ε2 )2/3 ,
(21)
M1 = ⌊2j1 ⌋,
(23)
(22)
and where j1 is defined in (7). Define also jd and jc such that Md = ⌊2jd ⌋ and Mc = ⌊2jc ⌋.
(24)
Ω = cMd exp −(log(1/ε))1+τ .
(25)
Let Lemma 2 proves that with a very large probability, Mc ≤ M ≤ Md . There is a constant c in the definition of Ω in (25). Thus, we will use Ω as a O(·), and just suppose that the constant c is finally the sum of all the constants that appear in the proofs. For this reason Ω + Ω = Ω. Moreover, Ω is a very small term when ε → 0.
Lemma 1. Consider the Taylor expansions 1 xℓ 1 x2ℓ
= =
1 εzℓ − 2 , gℓ ξℓ1 1 εzℓ −2 3 , 2 gℓ ξℓ2
(26) (27)
with |ξℓ1 |, |ξℓ2 | within the interval (|gℓ | − |εzℓ |, |gℓ | + |εzℓ |). On the event, |gℓ | Md , B = ∩ℓ=1 ε|zℓ | ≤ 2 we have
1
4 1 8 , and ≤ . 2 3 |ξℓ1 |gℓ | |ξℓ2 | |gℓ |3 Moreover P (B c ) ≤ Ω. Under B, we have |2
≤
|gℓ | 3|gℓ | ≤ |xℓ | ≤ , 2 2
for all
ℓ = 1, . . . , Md .
(28)
Proof. The two decompositions are obtained using Taylor expansion of f (t) = (g + t)−2 √ −1 and f (t) = (g + t) around t = 0. By (22) and (Cν ), |gℓ | > c ℓ ε(log 1/ε)2/3 , for all ℓ = 1, . . . , Md , where 0 < c < 1. Since zℓ is Gaussian, X Md Md X c√ |gℓ | c ℓ (log 1/ε)2/3 ≤ Ω. ≤ P |zℓ | > P (B ) ≤ P ε|zℓ | > 2 2 ℓ=1
ℓ=1
Lemma 2. Define the event We have
P (Mc )
M = {Mc ≤ M ≤ Md }.
(29)
≤ Ω, and Mc ≤ Md ≤ M1 , as ε → 0.
Proof. P (Mc ) = P ({M > Md } ∪ {M < Mc }). (30) o n p √ d ℓ ε(log 1/ε2 ) ≤ P |xMd | > Md ε (log 1/ε2 ) . P (M > Md ) ≤ P ∩M ℓ=1 |xℓ | >
Using (28) and (Cν ) under B as ε → 0, |xMd | ≤ 23 |gMd | ≤ C|Md |−ν . By (22) and (Cν ), − 2 2ν+1 Md ≍ ε(log 1/ε2 )2/3 , (31) which shows that, as ε → 0, |Md |−ν−1/2 = o(ε (log 1/ε2 )), hence that P (M > Md ) ≤ P (B c ) ≤ Ω.
P (M < Mc ) ≤
M c −1 X ℓ=1
P |xℓ | ≤
√
(32)
ℓ ε(log 1/ε ) . 2
Combining (28) and (21) under B, as ε → 0, |xℓ | ≥ |g2ℓ | ≥ cMc−ν , for any ℓ = 1, . . . , Mc − 1. By definition of Mc in (21) and (Cν ), we have − 2 2ν+1 Mc ≍ ε(log 1/ε2 )4/3 , (33) −ν−1/2
which shows that, as ε → 0, (ε (log 1/ε2 )) = o(Mc ), hence that P (M < Mc ) ≤ P (B c ) ≤ Ω. Combining this with (32) in (30), we obtain the first inequality in Lemma 2. The second result is a direct consequence of (31) and (33).
Lemma 3. Let B(ε) = βκ − Eβ˜κ where β˜κ is defined at (11) and let τj be defined at (9). Then, for any j ≤ jd we have (B(ε))2 = O(ε2 τj2 ) (34) V ar β˜κ = O(ε2 τj2 )
(35)
S P P Proof. It what follows ℓ means ℓ∈Cj where Cj = (2π/3) · [−2j+2 , −2j ] [2j , 2j+2 ]. The number of terms in the sum is |Cj | = 4π 2j . In the sequel c denotes a generic constant which may change from line to line. We denote Ex (Z) = E(Z|x) the conditional expectation of Z given xℓ ’s at (3), in the same fashion Ey (Z) = E(Z|y) denotes he conditional expectation of Z given yℓ ’s at (2). Since the error terms at (2) and (3) are independent rv’s with zero mean: Ex (yℓ ) = Eyℓ = fℓ gℓ , Ey (xℓ ) = Exℓ = gℓ . Eβ˜κ = E(Ex β˜κ ) = E(Ex
X y X f g ℓ ¯κ ℓ ℓ ¯κ Ψℓ = E Ψℓ , xℓ xℓ ℓ
ℓ
applying Lemma 1 in the last written term and using Plancherel’s identity, Eβ˜κ =
X ℓ
¯ κ − εE fℓ Ψ ℓ
X f g z X f g z ℓ ℓ ℓ ¯κ ℓ ℓ ℓ ¯κ = β − εE Ψ Ψ κ ℓ ℓ . 2 2 ξl1 ξl1 ℓ
ℓ
¯ κ | ≤ 2−j/2 , (34) follows from Lemma 1: Recalling that for the Meyer wavelet we have |Ψ ℓ (B(ε))2 = (βκ − Eβ˜κ )2 ≤ c ε2 2−j ≤ c ε2 2−j
X |fℓ |2 |gℓ |2
ℓ
X ℓ
|fℓ |2 |gℓ |2 E
|z |2 ℓ IB + Ω |ξℓ1 |4
+ Ω = O ε2 τj2 .
(36)
For the variance term, the result follows from Lemma 1 V ar(β˜κ ) ≤ V ar(β˜κ IB ) + Ω ≤ c ε2 2−j
X 1 2 2 + Ω = O ε τ j . |gℓ |2
(37)
ℓ
Lemma 4. Let cε = ε(log 1/ε2 )1/2 . Then, for any j ≤ jd and any γ > 0, γ2 γτj cε ˜ ≤ C(log 1/ε2 )−1/2 (ε2 ) 128π . P |βκ − βκ | > 2 Proof. Define Bκ := β˜κ − Ex β˜κ =
X (yℓ − fℓ gℓ ) xℓ
ℓ
¯κ = Ψ ℓ
and Vκ := Ex (β˜κ ) − EEx (β˜κ ) =
X ℓ
¯κ fℓ gℓ Ψ ℓ
X εξℓ ℓ
1 −E xℓ
xℓ
(38)
¯ κ, Ψ ℓ
1 xℓ
(39)
.
(40)
Using the same notations as in Lemma 3, we have n o n o n o |β˜κ −βκ | > λ ⇐⇒ |β˜κ −Ex β˜κ +Ex β˜κ −Eβ˜κ +Eβ˜κ −βκ | > λ ⇐⇒ |Bκ +Vκ +B(ε)| > λ .
By triangular inequalities: n o n o |β˜κ − βκ | > λ ⊆ |Bκ | > λ − |Vκ | − |B(ε)| .
(41)
For any event A we write Px (A) = Ex (I(A) ) hence P (A) = E(Px (A)). From this and (41), P |β˜κ − βκ | > λ ≤ E Px (|Bκ | > λ − |Vκ | − |B(ε)|) . By Lemma 1 P |β˜κ − βκ | > λ ≤ E Px (|Bκ | > λ − |Vκ | − |B(ε)|)IB + Ω. On the event B 2
−j
|Vκ | ≤ c 2
X ℓ
1 X |fℓ |2 1 2 2 −j |fℓ | |gℓ | ≤ c ε 2 + E = O(ε2 τj2 ). |xℓ |2 xℓ |gℓ |2 2
2
(42)
ℓ
Using (36) and (42) and taking λ = (γτj cε )/2 we get P |β˜κ − βκ | > λ ≤ E Px |Bκ | > cε τj (γ/2 + o(1) IB + Ω. From (39) we see that conditionally on x = (xℓ ): X X Bκ ∼ N (0, v 2 ), v 2 ≤ ε2 2−j |xℓ |−2 = ε2 4π|Cj |−1 |xℓ |−2 . ℓ
(43)
(44)
ℓ
From this and Lemma 1 we see that on the event B: v 2 ≤ 16 πε2 τj2 . Let Z ∼ N (0, 1), it follows from (43) that γ2 γ P |β˜κ − βκ | > λ ≤ 2P Z > (log(1/ε2 ))1/2 ( √ + o(1)) ≤ C(log(1/ε2 ))−1/2 (ε2 ) 128π . 8 π Proof of Theorem 1. Denote f˜1 =
X
β˜κ Ψκ I{|β˜κ | ≥ γτj cn },
(45)
X
β˜κ Ψκ I{|β˜κ | ≥ γ τˆj cn },
(46)
X
β˜κ Ψκ I{|β˜κ | ≥ γ τˆj cn },
(47)
κ:j≤j1
f˜2 =
κ:j≤j1
f˜3 =
κ:j≤ˆ j1
where τj , τˆj are given at (9), (13) respectively; and j1 , ˆj1 are given at (7), (15) respectively. In all three cases above, β˜κ is defined by (11) and cn = (log n/n)1/2 whereas γ is a thresholding constant which is specified below. To prove the theorem, we will √ establish that, for s any f which belongs to a Besov space Bπ,π (T ), as n → ∞, for γ ≥ 4 8π + 1, (48) Ekf˜3 − f k22 = O rn where rn depends on s as seen in (17),(18) and (19). By the triangular inequality, Ekf˜3 − f k22 ≤ Ekf˜3 − f˜2 k22 + Ekf˜2 − f˜1 k22 + Ekf˜1 − f˜k22 := r3 (n) + r2 (n) + r1 (n).
(49)
Now that Lemma 3 and Lemma 4 have been established the proof that r1 (n) ≤ rn follows from √ the application of the Maxiset Theorem (Kerkyacharian and Picard (2000)) for γ ≥ 4 8π +1 as in [JKRP]. The main part of the proof is to established that, as n → ∞, r2 (n) = O(c2α n ) where α depends on s as in (17),(18) and (19). Let A1 = { κ = (j, k) : |β˜κ | ≥ γτj cn } and A2 = {κ = (j, k) : |β˜κ | ≥ γ τˆj cn }. By definition (45) and (46): Ekf˜2 −f˜1 k22 ≤ E
X
κ:j≤j1
|β˜κ IA1 −β˜κ IA2 |2 = E
X
κ:j≤j1
|β˜κ |2 IA1 T Ac2 +E
X
κ:j≤j1
|β˜κ |2 IAc1 T A2 . (50)
Let D = { κ = (j, k) : j ≤ j1 : |βκ | < τj cn /2}, and denote D c its complementary. For obvious reasons, we shall refer to D as the variance zone and D c as the bias zone. By symmetry (and Lemma 1) we need only to bound the last written term in (50) X X X |β˜κ |2 IAc1 T A2 = E |β˜κ |2 IAc1 T A2 + E E |β˜κ |2 IAc1 T A2 := SD + SDc . (51) κ:j≤j1
κ∈D c
κ∈D
Variance zone. First, we deal with the sum over D. Using the definition of Ac1 and noting that under (Cν ): λj = γ τj cn ≍ γ 2jν cn , j
SD ≤
−1 X 2X
j≤j1 k=0
λ2j P (A2 ∩ D) ≤ 2j1 λ2j1 P (A2 ∩ D).
(52)
By definition of A2 , D and using (28), we obtain ˆ j , |βκ | < τj cn /2) ≤ P (|β˜κ | ≥ λj , |βκ | < τj cn /2) + Ω. P (A2 ∩ D) ≤ P (|β˜κ | ≥ λ 2
(53)
Using λj = γτj cn together with the triangular inequality and Lemma 4 yields γ 1 P (A2 ∩ D) ≤ P |β˜κ − βκ | ≥ − τj cn + Ω = O(ε2 ), (54) 2 2 √ provided that γ ≥ 4 8π + 1. Combining bounds (52), (54) and noting that under (Cν ): −2/(1+2ν) τj1 = c 2j1 ν = c cn proves that SD = O(2j1 λ2j1 ε2 ) = O(ε2 ) = o(c2α n ), for α in the dense, sparse or boundary case as defined in (17), (18) or (19). Bias Zone. Consider now the sum over D c in (51). i) Case p = r = 2 (Sobolev-like): dense case rate with α1 given at (17). For all κ in D c , we have that |βκ | ≥ τj cn /2. Since wavelet coefficients decay as the resolution level increases, one must have a small enough resolution level j for κ = (j, k) to be in D c . Introducing the resolution level ja such that 2−ja (s+1/2) = τja cn . Under (Cν ), τj ≍ 2jν ,
−
1
−
2
2ja ≍ cn s+ν+1/2 = cn 2s+2ν+1 .
We split the sum SDc defined in (51) in two parts: X X |β˜κ |2 IAc1 T A2 + E SD c ≤ E (j,k):j≤ja
(55)
(j,k):ja ≤j≤j1
|β˜κ |2 IAc1 T A2 := S1 + S2 .
(56)
(57)
Using the definition of Ac1 : j
S1 ≤ E
X
(j,k):j≤ja
|β˜κ |2 IAc1 ≤
−1 X 2X
j≤ja k=0
γ 2 τj2 c2n ≤
X
j≤ja
2j γ 2 τj2 c2n = O 2ja τj2a c2n
(58)
1 which by definition (56) of ja shows that S1 = O(c2α n ) as to be proven in the dense case where p = r = 2. Consider now the sum S2 in (57), using |β˜κ |2 ≤ 2|βκ |2 + 2|β˜κ − βκ |2 , X X (59) |β˜κ − βκ |2 IA2 , |βκ |2 + E S2 ≤ 2
κ:ja ≤j≤j1
κ:ja ≤j≤j1
using (28) under B we have τˆj ≥ τj /2, arguments similar to those used for r1 (n) show that X |β˜κ − βκ |2 IA2 ≤ O(cn2α1 ) + O(Ω). (60) E κ:ja ≤j≤j1
As for the first term on the RHS of (59), we use the definition (10) of Besov spaces with p = r = 2 as well as (56) : X
2
κ:ja ≤j≤j1
|βκ | =
X
−2js 2js
2
2
j:ja ≤j≤j1
j −1 2X
k=0
|βj,k |2 = O 2−2ja s = O cn2α1 .
(61)
From (58)-(61) we obtain SDc = O(c2α n ) where α is given in (17) as to be proven in the dense case where p = r = 2. ii) Case p = r < 2. Here we combine the result obtained in i) with the so-called Sobolev s ⊂ B s′ where s′ = s − 1/p + 1/2. embedding Bp,p 2,2 • the dense case: introducing the resolution level jb such that 2s
−
2jb = cn (2s+2ν+1)(s−1/p+1/2) ,
(62)
and noting that ja ≤ jb ≤ j1 , we split the sum SDc defined at (51) into three parts: X X |β˜κ |2 IAc1 T A2 + E SD c ≤ S1 + E |β˜κ |2 IAc1 T A2 := S1 + S3 + S4 , (63) (j,k):ja ≤j≤jb
(j,k):jb ≤j≤j1
1 where S1 is defined as in (57). By (58) and (56) we have S1 = O(c2α n ). Next we show that 2 2 2 2α ˜ ˜ 1 S3 = O(cn ). Using |βκ | ≤ 2|βκ | + 2|βκ − βκ | , X X (64) |β˜κ − βκ |2 IA2 . |βκ |2 + E S3 ≤ 2
κ:jb ≤j≤j1
κ:jb ≤j≤j1
1 As seen in (60) the second term in the RHS of (64) is of order O(c2α n ). As for the first term on the RHS of (64), we use (10) with s′ = s − 1/p + 1/2 and (62):
X
κ:jb ≤j≤j1
2
|βκ | =
X
j:jb ≤j≤j1
−2js′ 2js′
2
2
j −1 2X
k=0
′ 1 . |βj,k |2 = O 2−2jb s = O c2α n
(65)
To deal with the sum S4 in (63) we use arguments similar to those of H¨ardle, Kerkyacharian, Picard and Tsybakov (1998) (Chap.10). Using the definition of Ac1 and D c , S4 ≤
X
j −1 2X
X γ 2 τj2 c2n #k : |βj,k | > τj cn /2 ,
γ 2 τj2 c2n I{|βκ |>τj cn /2} =
ja ≤j≤jb k=0
(66)
ja ≤j≤jb
where {#k : |βj,k | > c} denotes the number of k′ s ∈ {0, ..., 2j − 1} such that |βj,k | > c. s , it follows that Using Markov’s inequality and the definition (10) of Besov space Bp,p j
2 X −p #k : |βj,k | > τj cn /2 ≤ ( τj cn /2) |βj,k |p ≤ (τj cn /2)−p 2−j(s+1/2−1/p)p .
(67)
k=0
Under (Cν ) we have τj = O(2jν ), from (66)-(67) it follows that S4 = O
X
2−p −j(s+1/2−1/p)p
(τj cn )
2
ja ≤j≤jb
X −jp 2 = O c2−p n
.
s−(2ν+1)( p1 − 21 )
ja ≤j≤jb
(68)
Hence in the dense case when s > (2ν + 1)(1/p − 1/2), by definition (56) of ja , (68) becomes 4s 1 1 2−p −ja p s−(2ν+1)( p − 2 ) 1 S4 = O cn 2 = O cn2s+2ν+1 = O c2α . (69) n • the sparse case: introducing the resolution level jb such that 2
−
2jb = cn 1+2(s+ν−1/p) ,
(70)
and noting that ja ≤ jb ≤ j1 , we write SDc ≤ S1 + S3 + S4 as in (63). Here too we have 2α2 1 S1 = O(c2α n ) which by (18) gives S1 = o(cn ). Arguments similar to those used in (64) together with (70) show that ′ 2 . (71) S3 = O 2−2jb s = O c2α n To deal with the sum S4 we recall that (68) holds in the dense and sparse scenarios:
S4 = O
X
ja ≤j≤jb
X −jp 2 (τj cn )2−p 2−j(s+1/2−1/p)p = O c2−p n
.
s−(2ν+1)( p1 − 21 )
ja ≤j≤jb
(72)
Hence in the sparse case when s < (2ν +1)(1/p−1/2), by definition (70) of jb , (72) becomes −jb p S4 = O c2−p n 2
s−(2ν+1)( p1 − 12 )
2(s−1/p+1/2) 2 = O cn1+2(s+ν−1/p) = O c2α . n
(73)
As for the boundary case when s = (2ν + 1)(1/p − 1/2), it follows from (68) and the definition (62) of jb that 2−p 2α1 S4 = O c2−p j = O c log(n) = O c log(n) , (74) b n n n as had to be proved in this case. Combining the results we derived for S1 , S3 and S4 with (63) proves that r2 (n) ≤ rn .
To complete the proof we need to show that r3 (n) ≤ rn . Using Lemma 2 on the event M, we have ja ≤ jb ≤ jc ≤ ˆj1 ≤ jd ≤ j1 , where jc , jd are defined in (24). jb is defined at (62) in the dense (or boundary) scenario and at (70) in the sparse scenario (note that if p = 2: ja = jb ). We have X |β˜κ |2 IA2 + Ω = O(S3 ). r3 (n) = Ekf˜3 − f˜2 k22 ≤ E κ: jb ≤j≤j1
where S3 is defined at (63). Using (64)-(65) in the dense or boundary case we obtain 2α2 1 r3 (n) = O(c2α n ). Using (71)-(65) in the sparse case we obtain r3 (n) = O(cn ).
References Abramovich, F. and Silverman, B. (1998), ‘Wavelet decomposition approaches to statistical inverse problems’, Biometrika 85(1), 115–129. Bertero, M. and Boccacci, P. (1998), Introduction to Inverse Problems in Imaging, Institute of Physics, Bristol and Philadelphia. Cavalier, L. (2001), ‘On the problem of local adaptive estimation in tomography.’, Bernoulli 7, 63–78. Cavalier, L., Golubev, G., Picard, D. and Tsybakov, A. (2002), ‘Oracle inequalities for inverse problems’, Annals of Statistics 30, 843–874. Cavalier, L. and Hengartner, N. (2005), ‘Adaptive estimation for inverse problems with noisy operators’, Inverse Problems 21, 1345–1361. Cavalier, L. and Koo, J.-Y. (2002), ‘Poisson intensity estimation for tomographic data using a wavelet shrinkage approach’, IEEE Trans. Inform. Theory 48, 2794–2802. Cavalier, L. and Tsybakov, A. (2002), ‘Sharp adaptation for inverse problems with random noise’, Probab. Theory Related Fields 123(3), 323–354. Donoho, D. (1995), ‘Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition’, Applied Computational and Harmonic Analysis 2, 101–126. Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995), ‘Wavelet shrinkage: Asymptopia?’, Journal of the Royal Statistical Society, Series B 57, 301–369. With Discussion. Donoho, D. L. and Raimondo, M. E. (2005), ‘A fast wavelet algorithm for image deblurring’, ANZIAM J. 46, C29–C46. http://anziamj.austms.org.au/V46/CTAC2004/Dono. Donoho, D. and Raimondo, M. (2004), ‘Translation invariant deconvolution in a periodic setting’, The International Journal of Wavelets, Multiresolution and Information Processing 14(1), 415–423. Efromovich, S. and Koltchinskii, V. (2001), ‘On inverse problems with unknown operators’, IEEE Trans. Inform. Theory 47(7), 2876–2894. Engl, H., Hanke, M. and Neubauer, A. (1996), Regularization of Inverse Problems, Kluwer Academic Publishers.
Fan, J. and Koo, J. (2002), ‘Wavelet deconvolution’, IEEE Transactions on Information Theory 48(3), 734–747. Hall, P., Ruymgaart, F., van Gaans, O. and van Rooij, A. (2001), Inverting noisy integral equations using wavelet expansions: A class of irregular convolutions, in ‘State of the Art in Probability and Statistics: Festschrift for Willem R. van Zwet’, Vol. 36 of Lecture Notes-Monograph Series, Institute of Mathematical Statistics, pp. 533–546. H¨ardle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998), Wavelets, Approximation and Statistical Applications, Vol. 129, Springer. Hoffmann, M. and Reiss, M. (2004), Nonlinear estimation for linear inverse problems with error in the operator. WIAS Preprint No. 990 http://www.wias-berlin.de/publications/preprints/990/. Je Park, Y., Whoe Dho, S. and Jin Kong, H. (1997), ‘Deconvolution of long-pulse lidar signals with matrix formulation’, Applied Optics 36, 5158–5161. Johnstone, I., Kerkyacharian, G., Picard, D. and Raimondo, M. (2004), ‘Wavelet deconvolution in a periodic setting’, Journal of the Royal Statistical Society, Series B 66(3), 547– 573. with discussion pp.627-652. Johnstone, I. M. (1999), ‘Wavelet shrinkage for correlated data and inverse problems: adaptivity results’, Statistica Sinica 9(1), 51–83. Johnstone, I. M. and Raimondo, M. (2004), ‘Periodic boxcar deconvolution and diophantine approximation’, Annals of Statistics 32(5), 1781–1804. Johnstone, I. M. and Silverman, B. (1990), ‘Speed of estimation in positron emission tomography and related inverse problems’, Annals of Statistics 18, 251–280. Kalifa, J. and Mallat, S. (2003), ‘Thresholding estimators for linear inverse problems and deconvolutions’, Annals of Statistics 31, 58–109. Kerkyacharian, G. and Picard, D. (2000), ‘Thresholding algorithms and well-concentrated bases’, Test 9(2), 283–344. Mair, B. and Ruymgaart, F. H. (1996), ‘Statistical estimation in hilbert scale’, SIAM J. Appl. Math. 56, 1424–1444. Mallat, S. (1998), A wavelet tour of signal processing (2nd Edition), Academic Press Inc., San Diego, CA. Marteau, C. (2005), ‘Regularisation in inverse problems with unknown operators’, Manuscript . Math´e, P. and Pereverzev, S. V. (2001), ‘Optimal discretization of inverse problems in hilbert scales. regularization and self-regularization of projection methods’, SIAM J. Numerical Analysis 38, 1999–2021. Mathis, H. and Douglas, S. (2003), ‘Bussgang blind deconvolution for impulsive signals.’, IEEE Trans. Signal Process. 51, 1905–1915. Meyer, Y. (1990), Ondelettes et Op´erateurs-I, Hermann.
Neelamani, R., Choi, H. and Baraniuk, R. (2004), ‘Forward: Fourier-wavelet regularized deconvolution for ill-conditioned systems’, IEEE Transactions on signal processing 52, 418– 433. Neumann, M. (1997), ‘On the effect of estimating the error density in nonparametric deconvolution’, J. Nonparametric Stat. 7, 307–330. O’Sullivan, F. (1986), ‘A statistical perspective on ill-posed inverse problems’, Statistical Science 1, 502–527. Pensky, M. and Vidakovic, B. (1999), ‘Adaptive wavelet estimator for nonparametric density deconvolution’, Annals of Statistics 27, 2033–2053. Pruessner, A. and O’Leary, D. (2003), ‘Blind deconvolution using a regularized structured total least norm algorithm’, SIAM J. Matrix Anal. Appl. 24, 1018–1037. van Rooij, A. and Ruymgaart, F. (1996), ‘Asymptotic minimax rates for abstract linear estimators’, Journal of Statistical Planing and Inference . Walter, G. and Shen, X. (1999), ‘Deconvolution using the meyer wavelet’, Journal of Integral Equations and Applications 11, 515–534.