Complexity–Regularized Image Denoising - Semantic Scholar

6 downloads 0 Views 327KB Size Report
Feb 19, 2001 - between complexity–regularized denoising and operational ratedistortion optimization. This connection suggests the use of efficient ...
Complexity–Regularized Image Denoising



Juan Liu and Pierre Moulin University of Illinois at Urbana-Champaign Beckman Institute, Coordinate Science Lab, and ECE Department 405 N. Mathews Ave., Urbana, IL 61801 Contact Author: J. Liu, tel: (217) 244-1089, fax: (217) 244-8371, Email: [email protected] February 19, 2001

Abstract We develop a new approach to image denoising based on complexity regularization. This technique presents a flexible alternative to the more conventional l 2 , l1 , and Besov regularization methods. Different complexity measures are considered, in particular those induced by state– of–the–art image coders. We focus on a Gaussian denoising problem and derive a connection between complexity–regularized denoising and operational rate–distortion optimization. This connection suggests the use of efficient algorithms for computing complexity-regularized estimates. Bounds on denoising performance are derived in terms of an index of resolvability that characterizes the compressibility of the true image. Comparisons with state-of-the-art denoising algorithms are given. ∗

Work supported by the National Science Foundation under award MIP-9732995 (CAREER). This work was

presented in part at ICIP’97.

1

Keywords — image restoration, image compression, regularization, minimum description length principle, rate–distortion optimization, wavelets.

EDICS Categories : 2–REST, 2–WAVP

2

1

Introduction

Due to the imperfection of image acquisition systems and transmission channels, images are often degraded by noise. The presence of noise is visually annoying and makes it more difficult to perform tasks such as segmentation, recognition, and scene interpretation. It then becomes desirable to construct a good estimate of the original image from the noisy observations. In this paper, we consider the typical problem of estimating an image degraded by additive white Gaussian noise (AWGN). The problem is formally described as: y = f∗ + 

(1)

where f ∗ is the unknown original image,  is white Gaussian noise with zero mean and known variance σ 2 , and y denotes the noisy observations. We restrict our attention to digital images with N pixels. In this case, f ∗ ,  and y are length–N vectors. The problem is to design a good estimator fˆ for the original image based on the observations y. A classical framework in which to solve such image estimation problems is penalized maximum– likelihood (ML) estimation [1]. The estimator is given by fˆ = arg min[− ln p(y|f ) + µΦ(f )], f

(2)

where Φ(f) is the penalty or regularization functional, which takes into account a priori knowledge about f ∗ to penalize “unlikely” estimates and stabilize the ML estimator. The regularization parameter µ controls the tradeoff between the log–likelihood term and the regularization penalty. The choice of Φ(f) is critical to estimation performance. The estimator fˆ in (2) may also be thought of as the maximum a posteriori (MAP) estimator corresponding to the prior p(f ) ∝ e −µΦ(f ) . A conventional choice for the penalty function Φ(f) is the quadratic penalty µ||Cf|| 2 (implicitly assuming a Gaussian prior on f ), where C is a differentiating operator [2, 3, 4]. This choice penalizes 3

large local variations in the image. For the Gaussian observational model (1), the estimate fˆ is linear in the data y. However, the estimate often suffers from oversmoothing artifacts [5, 6], because sharp transitions are too heavily penalized by the quadratic penalty. One way to fix this problem is to use a data–adaptive (spatially varying) regularization parameter µ [4, 7]. Other forms of penalty functions have also been studied, such as l 1 norms [8], Besov norms [9], total–variation norms [10, 11], and Huber functions [5, 6], which penalize large local variations less heavily in order to preserve edges.

1.1

Complexity Regularization

In this paper, we investigate a new approach that penalizes unlikely images in a more flexible way. We use a measure of image complexity as the regularization penalty Φ(f), following the observation that typical real–world images have low complexity in a data–compression sense. Instead of imposing a fixed, specific form of Φ(f), we are able to use complexity measures based on rather sophisticated, possibly implicit, probability models. The complexity–regularized estimation criterion is stated as   fˆ = arg min − ln p(y|f ) + µL(f ) ,

(3)

f ∈Γ

where Γ is a codebook, namely a discrete set of candidate images. Complexity is measured by a codelength L(f) associated with each f ∈ Γ. For mathematical convenience, the codelength is measured in nats (1 nat =

1 ln 2

bits). Codewords should satisfy Kraft’s inequality:

P

f ∈Γ

e−L(f ) ≤ 1.

Unlikely images are assigned long codewords and hence are strongly penalized by the criterion (3). Rissanen’s minimum description length (MDL) principle [12, 13] is a well–known instance of complexity regularization, where the regularization parameter µ takes the value 1. The MDL principle has found numerous applications in signal and image estimation problems [14, 15, 16].

4

Complexity regularization provides a broader range of tradeoffs between complexity of the estimate and fidelity to the data. Compared to more conventional techniques such as MAP estimation, complexity regularization provides increased flexibility in selecting the penalty: explicit knowledge of a statistical prior is not required. This is an appreciable advantage, because: (1) priors are often not available; (2) priors may be hard to describe analytically and explicitly; (3) mathematically tractable image priors, such as l2 , l1 , and Besov priors, may not be robust and adaptive enough to cope with the variability of typical imagery. If a smooth statistical prior p(f) for the image is explicitly available, and if quantization is fine enough, the ideal codelength (which minimizes coding redundancy) is L(f ) = − ln p(f) − ln ∆ V , where ∆V is the volume of a quantization cell. In this case, the complexity–regularized estimator (3) differs from the MAP estimator based on the prior p(f) only through quantization effects. These effects could be significant, see Sec. 4.

1.2

Complexity Measures

An essential practical issue is the choice of the codebook that defines the complexity penalty. For a given compression scheme, there is a certain class of images whose statistics are well matched to the image coder. These images can be compressed and denoised efficiently. In the image compression literature, much effort has been devoted to designing codebooks that perform well for a variety of images. This research has led to state–of–the–art, wavelet–based codebooks, such as those used in the morphological coder [17] and in the EQ coder [18]; also see the excellent tutorial paper by Ortega and Ramchandran [19]. These coders possess excellent robustness and adaptivity properties. As we shall see, they can be used effectively in complexity–regularized image denoising problems

5

as well. For certain idealized classes of images, complexity–regularized estimators naturally lead to more conventional estimators. For instance, if images are uniformly distributed over a Besov σ (R), then simple codebooks tailored to this class can be designed (see Sec. 2.1), leading to ball Bpq

hard–thresholded estimates [20].

1.3

Related Work

The idea of penalizing estimates with high complexity also appears in Natarajan’s Occam filtering technique, which uses lossy data compression to filter noisy data [21]. Using noisy data y as the input to a data compression algorithm, and setting the distortion level equal to the noise variance σ 2 , the algorithm takes the output of the compression algorithm as the estimate fˆ (see Fig. 1). The method is named after the principle of Occam’s Razor, which states that “the simplest explanation of the observed phenomenon is more likely to be correct.” The Occam filter technique has found a theoretical justification in the case of bounded additive noise [21]. However, Natarajan’s analysis is not applicable to unbounded noise models such as Gaussian distributions. Unlike (3), his method may also be difficult to extend to problems other than denoising. The basic theory of complexity regularization was developed by Barron [22] in the context of nonparametric regression. Some extensions of that theory are developed in this paper and in [23].

1.4

Applications

Complexity–regularization naturally integrates compression into the framework of estimation. The estimates are directly available in compressed form. This is particularly useful in applications where estimates must be compressed due to storage or transmission requirements. The complexity regularization parameter µ should be at least equal to a certain value µ(R) that yields estimates

6

encoded at a rate of R bits per pixel. This can be done using a broad class of compression algorithms [19]. Instead of treating estimation and compression as two separate steps, a complexity–regularized estimator provides a unified solution.

1.5

Organization of This Paper

In Sec. 2, we apply complexity regularization to image denoising problems with AWGN. We show that complexity–regularized estimation is equivalent to an operational rate–distortion optimization problem. In Sec. 3, we present a theoretical performance analysis for our denoising problem, which establishes a direct link between estimation and compression performance. Experimental results for denoising applications are presented in Sec. 4. Concluding remarks are given in Sec. 5.

2

Complexity-Regularized Denoising Algorithms

This section applies the complexity regularization prescription (3) to the Gaussian observational model (1). The variance σ 2 of the noise samples i , 0 ≤ i < N , is assumed to be known. (If σ 2 is unknown, it can be estimated reliably from the data using methods such as in [24].) Under the AWGN model, the estimator (3) takes the form "

# ||y − f||2 + µL(f) , fˆ = arg min f ∈Γ 2σ 2 where ||x||2 =

PN −1 i=0

(4)

x2i denotes the squared l 2 norm of a vector x. Two essential questions are

how to properly choose the codebook Γ and assign codelengths L(f ) to all signal candidates f, and how to solve the problem (4). This constrained optimization problem is high–dimensional, discrete, and in general, computationally intractable. To illustrate the concepts, we present two kinds of complexity measures for which a solution to the optimization problem (4) can be derived. 7

2.1

A Simple Complexity Measure

We first consider complexity regularization in an energy–compaction domain such as an orthonormal wavelet domain. Conceptually, complexity regularization in the wavelet domain is the same as in the image domain, except that f , , and y in (4) are all wavelet representations. Indeed, any orthonormal transform maps AWGN in the spatial domain into AWGN in the transform domain. Signal representations are normally sparse in the wavelet domain, with only a few (k 2(ln 2)σ 2 , whereas an analysis assuming bounded noise as in [22] would require ν > 5b2 /3, where b is the length of interval in which both the f i ’s and the yi ’s take their values. For 8–bpp (bits per pixel) greylevel images, b = 255. In our analysis, we reserve the notation fˆ for the complexity–regularized estimate, and use f for any element of the codebook Γ. The following quantities are used to measure estimation performance: • Empirical loss

ˆ , f ∗) = d(f

1 N ||y

− f||2 −

1 N ||y

− f ∗ ||2 measures the difference between

residual MSE of an estimate f and that of the ideal estimate f ∗ . • Loss

ˆ , f ∗ )] = d(f , f ∗ ) = E[d(f

1 N ||f

− f ∗ ||2 is the MSE in our problem. It is minimized

when the estimate is identical to the original image. • Risk

E[d(fˆ , f ∗ )], where the expectation is with respect to p(y|f ∗ ).

We give bounds on the MSE of the estimator in terms of an index of resolvability that characterizes the compressibility of the true image f ∗ . The index of resolvability of f ∗ is defined as Rν (f ∗ ) =

h i 1 ν min ||f − f ∗ ||2 + L(f ) . N f ∈Γ ln 2 12

(5)

This quantity indicates how well the image f ∗ can be approximated in the l 2 sense by a moderately complex element of the codebook Γ. The index of resolvability takes the same D + νR form as the complexity regularization criterion (4). It is the x–intercept of the tangent to the R(D) curve with slope − ν1 , see Fig. 3. Hence Rν (f ∗ ) quantifies the performance of a given coder operating on f ∗ for a particular tradeoff (ν) between R and D. In Fig. 4, we plot the index of resolvability of Lena as a function of ν, where the codebook Γ is given by the morphological coder [17]. The index of resolvability quantifies the compressibility of the true image f ∗ , and hence depends on the codebook. The following theorem establishes an upper bound on the risk of the complexity– regularized estimator in terms of the index of resolvability. Theorem 3.1 Consider the denoising problem (1). The loss d( fˆ , f ∗ ) of a complexity–regularized estimator fˆ operating with Lagrange multiplier ν > 2(ln 2)σ 2 satisfies " # ν 2 1 2 2( ) ln ν + 2(ln 2)σ Rν (f ∗ ) + ν ln 2 2 δ P r d(fˆ , f ∗ ) ≤ ≥ 1 − δ, ν − 2(ln 2)σ 2 ( ln 2 − 2σ )N

(6)

for any 0 < δ < 1. The estimation risk is upper–bounded by E[d(fˆ , f ∗ )] ≤

2( lnν 2 )2 ν + 2(ln 2)σ 2 ∗ R (f . ) + ν ν − 2(ln 2)σ 2 ( lnν 2 − 2σ 2 )N

(7)

Proof: See Appendix A. When N is large, the constant terms

2( lnν2 )2 ln 1δ ( lnν2 −2σ 2 )N

in (6) and

2( lnν2 )2 ν ( ln 2 −2σ 2 )N

in (7) are negligible. This

is normally the case in image processing applications, so we loosely refer to

ν+2(ln 2)σ 2 R (f ∗ ) ν−2(ln 2)σ 2 ν

as the

upper bound on the estimation risk. According to Theorem 3.1, we can view R ν (f ∗ ) as an indicator of the bottomline performance of a complexity–regularized estimator. While lower bounds on l 2 risk are common in signal and image processing (e.g., Cramer–Rao bounds), upper bounds are less common. 13

Fig. 5 shows the MSE for estimation of Lena and its upper bound

ν+2(ln 2)σ 2 ∗ ν−2(ln 2)σ 2 Rν (f ).

Theorem 3.1

motivates choosing a codebook that yields good compression performance for a variety of images f ∗ . The experimental results in Sec. 4 show that the relationship between estimation and compression performance is very close indeed. We also note that Theorem 3.1 is applicable to estimators that penalize complexity more than MDL does (ν = 2(ln 2)σ 2 for MDL). While this may seem to be a restrictive condition, experimental results in Sec. 4 show that the MSE viewed as a function of ν exhibits a broad minimum in a range of values including the MDL choice, and that there is a close relationship between R ν (f ∗ ) and the MSE in that range. So the bound given by Theorem 3.1, while tighter than Barron’s [22], still appears to be conservative.

4

Experimental Results

We tested our approach on various images, and report results on the 512 × 512 images Lena and Barbara. Both images were contaminated by AWGN with variances σ 2 = 25, 49, and 100. In our experiments, we considered two wavelet hard–thresholding schemes discussed in Sec. 2.1: the first √ used the universal threshold σ 2 ln N [24], and the second, an MDL estimator, used the threshold √ σ 3 ln N . As discussed in Sec. 2, both methods are based on a very simple wavelet model. We also studied as a benchmark the hard thresholding “oracle”, which chooses the threshold value minimizing the MSE. This threshold is obtained by “cheating”, i.e., using the original image for MSE minimization. The oracle yields the best performance that a simple hard thresholding operator may have. We also compared with Donoho and Johnstone’s SureShrink soft thresholding method [29], which adaptively adjusts the threshold value according to the smoothness of data. The wavelet thresholding estimators used a four–level wavelet decomposition based on Daubechies’ maximally– 14

flat eight–tap filters [30]. None of these estimators applied quantization to the thresholded wavelet coefficients. These estimates were compared with complexity–regularized estimators based on three compression schemes: a JPEG coder [31], the morphological wavelet coder in [17], and the EQ wavelet coder in [18]. The morphological coder exploits spatial clustering of significant wavelet coefficients, using mathematical morphology. Substantial compression gains are achieved by the splitting the wavelet subbands into spatially clustered significant and nonsignificant groups. The morphological coder uses the biorthogonal 9–7 spline filters [32] and a 4–level wavelet decomposition. The EQ coder models the wavelet coefficient field as an independent generalized–Gaussian random field within each subband, with zero mean and slowly spatially–varying variance. Its good coding performance comes from the adaptability of the model to local statistics. The EQ coder uses the biorthogonal 10–18 filter set [33]. The regularization parameter ν can be chosen to provide a suitable tradeoff between the bitrate and the distortion. Various methods can be used to select ν [4, 34]. Moreover, in applications where the estimate is required to satisfy a rate constraint, e.g., the bit rate should not exceeds R bit per pixel, the parameter ν should be at least equal to a certain value ν(R) so that the rate constraint is met exactly. In this section, to compare with other methods such as a thresholding oracle and SureShrink, we do not concern ourselves with the rate-constrained case, and only present the complexity–regularized estimates using the MDL choice, ν = 2(ln 2)σ 2 . The MSEs of denoised images are shown in Table 1 and Table 2. The two wavelet coders outperform the JPEG coder as well as the simple universal and MDL wavelet thresholding schemes. This is not surprising, considering the high compression efficiency of the morphological and EQ coders applied to the clean images. The index of resolvability for the clean images is also listed in

15

Tables 1 and 2 and appears to be a good indicator of denoising performance. Indeed, coders with higher Rν (f ∗ ) usually produce higher MSE, and the MSE is bounded by the index of resolvability. None of these results was guaranteed by theory (recall Theorem 3.1 does not apply to the MDL case.) Still, better coders led to better denoising performance in almost all of our experiments. As can be seen from Table 1 and 2, SureShrink, which is the state–of–the–art wavelet–based denoising scheme, yields lower MSE (by 3 — 32 %) than our complexity–regularized estimator. The hard–thresholding oracle is also slightly better in average. The higher performance of SureShrink and the oracle is partly due to the fact that unlike the complexity–regularized estimators, they are free of quantization artifacts. Visually, the SureShrink and hard–thresholding oracle estimates also look a little better. Hence the advantage of complexity regularization may primarily reside in applications requiring storage and/or transmission of the denoised images in compressed form. To compare estimators on these grounds, we compressed the SureShrink denoised image using the EQ coder. This would be a classical way to compress a noisy image, motivated by results from rate–distortion theory [35]. This two-stage approach finds theoretical justification in Wolf and Ziv’s paper [36]. In our experiment, the SureShrink estimate is compressed to the same bit rate as the complexity–regularized estimator using the EQ coder. The resulting MSEs for Lena using compressed SureShrink (Sure-EQ) are about 10% higher (see Table 1). For Barbara, the difference in performance is less prominent. In Fig. 6, we show denoising results for Lena corrupted by AWGN with variance σ 2 = 100. The denoised image using universal hard–thresholding is shown in Fig. 6b (the results using the larger √ MDL threshold σ 3 ln N were about 25% worse in MSE, and also significantly worse in visual quality). Universal hard–thresholding reduced the MSE from 100 to 48.35 and eliminated most of the noise, but unfortunately also oversmoothed the image and produced visually annoying artifacts

16

near the eyes and the mouth. Results using the JPEG coder are shown in Fig. 6d. Although some noise was removed, the image contains significant residual noise and still looks grainy. This modest estimation performance is not surprising, given the similarly modest compression performance of JPEG on the clean Lena. In Fig. 6e and f, we show the image denoised using the morphological wavelet coder [17] and the EQ coder [18]. The denoised images contain noticeably fewer visual artifacts. The Sure-EQ estimate is shown in Fig. 6c. For the same bit rate, the image is more blurred (in the hat and feather area) than the complexity-regularized estimates produced by the EQ or the morphological coder. Similar results were observed for Barbara in Fig. 7. To evaluate the influence of the choice of the slope parameter ν −1 in Fig. 2, we computed estimates corresponding to many values of ν and displayed the corresponding curve M SE versus ν in Fig. 8. This was done for Lena with noise standard deviation σ = 7 using the morphological coder. Observe the presence of a flat, broad minimum in the curve. The MDL choice is ν = 67.93, and the corresponding MSE is 23.52. The “ideal” ν = 47.5 corresponding the the minimum of this curve cannot be made available to an estimator without cheating (knowing the true f ∗ ). However, Fig. 8 shows that little is lost by not knowing the ideal ν, since the corresponding MSE is 22.41, only 5% lower than that of the MDL estimator. In general, Natarajan’s method produces results similar to ours. Consider for instance the use of our three image coders to denoise Lena when σ = 10. Natarajan’s method selects the distortion level to be equal to σ 2 . The resulting MSEs are 56.46, 38.64, and 34.30, using the JPEG, morphological, and the EQ coders, respectively. Our method (the last row in Table 1) yields MSEs slightly lower than in Natarajan’s method.

17

5

Concluding Remarks

This paper has applied the concept of complexity regularization to image denoising under an AWGN model. Complexity regularization is not restricted to this particular framework. Our performance analysis of the complexity–regularized estimator (4) is extended in [23] to more general inverse problems such as image restoration with possibly nonlinear blur, and noise with non–Gaussian (e.g., Poisson or multiplicative noise) and non–white statistics. The natural estimation loss in this case is relative–entropy loss, and it is shown that the estimation risk is still upper bounded by a linear function of a suitably defined index of resolvability. A theoretical discussion of the optimal codebook design problem is also presented in [23].

18

A

Proof of Theorem 3.1.

First we present a lemma.

Lemma A.1 Let {Ui , 0 ≤ i < N } be a set of independent Gaussian random variables with sample ¯ = average U

1 N

PN −1 i=0

Ui . For all τ ≥ 0 and  ≥ 0, we have: ¯) 1 N V ar( U τ ¯ − E U¯ ≥ ≤ e−τ . Pr U + N 2 2 

(8)

¯ − EU ¯ is Gaussian distributed with mean 0 and standard deviation σ ¯ = Proof: Note that U U p ¯ ). For all τ ≥ 0, and  ≥ 0, we have V ar(U

¯) τ N V ar( U ¯ − EU ¯ ≥ + Pr U =Q N 2 

by definition of Marcum’s Q function, Q(x) = x =

τ N σU¯

+

N σU¯ 2 ,

has minimum value

decreasing function, and Q(x) ≤ 21 e−

x2 2

√1 2π

R∞ x

τ N

+

¯) N V ar(U 2

σU¯

!

,

(9)

t2

e− 2 dt. The argument of Q(·) in (9),

√ 2τ . Furthermore, note that Q(x) is a monotonically

. Hence we obtain (8).



Problem setting Let Ui = |yi − fi |2 − |yi − fi∗ |2 = −2(yi − fi∗ )(fi − fi∗ ) + (fi − fi∗ )2 ,

0 ≤ i < N.

(10)

The Ui ’s are independent (but not identically distributed) Gaussian random variables. Each U i has mean (fi − fi∗ )2 and variance 4σ 2 (fi − fi∗ )2 . The sample average is equal to N −1 N −1 N −1 X 1 X 1 X ˆ f ∗ ). ¯= 1 U Ui = |yi − fi |2 − |yi − fi∗ |2 = d(f, N N N i=0

i=0

i=0

19

(11)

ˆ , f ∗ )] = d(f , f ∗ ), ¯ is Gaussian with mean E U ¯ = E[d(f The distribution of U ¯) = and variance V ar(U

1 N2

PN −1 i=0

V ar(Ui ) =

4σ 2 N2

PN −1 i=0

|fi − fi∗ |2 =

∗ 4σ 2 N d(f , f ).

First application of (8) For convenience, we define ν 0 = and  =

1 ν0 ,

ν ln 2 .

¯ with τ = L(f ) + ln 1 Now we apply the inequality (8) to − U δ

where f is any element of Γ. We obtain "

# 1 2 ) + ln L(f 2σ 1 δ ˆ , f ∗) ≤ P r d(f , f ∗ ) − d(f · ν 0 + 0 d(f , f ∗ ) ≥ 1 − e−L(f ) δ. N ν 2 Let α =

2σ 2 ν0 ,

and apply the union–of–events bound (over all f ∈ Γ). Hence "

P r ∀f ∈ Γ :

ν 0 L(f ) ν 0 ln 1δ ˆ , f ∗) + (1 − α)d(f , f ∗ ) ≤ d(f + N N

#

δ ≥1− . 2

(12)

Since fˆ ∈ Γ, we have ˆ ν 0 L(f) ν 0 ln 1δ ˆ fˆ , f ∗ ) + P r (1 − α)d(fˆ , f ∗ ) ≤ d( + N N "

ˆ ∗, f ) + Since fˆ minimizes d(f "

P r ∀f ∈ Γ :

ν 0 L(f ) N

#

δ ≥1− . 2

over all f ∈ Γ, the inequality above can be written as

ν 0 L(f ) ν 0 ln 1δ ˆ , f ∗) + + (1 − α)d(fˆ , f ∗ ) ≤ d(f N N

#

δ ≥1− . 2

(13)

For the result (13) to be non–trivial, we need α < 1, which implies the condition ν 0 > 2σ 2 , or equivalently, ν > 2(ln 2)σ 2 .

Second application of (8) ¯ , now with parameters τ = ln 1 , and  = We apply the inequality (8) again to U δ "

P r ∀f ∈ Γ :

1 ν0 .

It follows that

# 0 ln 1 2 ν 2σ δ δ ˆ f ∗) ≤ + 0 d(f , f ∗ ) ≥ 1 − , −d(f , f ∗ ) + d(f, N ν 2

20

whence "

P r ∀f ∈ Γ :

1 ˆ , f ∗ ) ≤ (1 + α)d(f , f ∗ ) + ν 0 ln δ d(f N

#

δ ≥1− . 2

(14)

Combination of the previous results Combining the inequalities (13) and (14) and proceeding as in [23, Appendix A], we obtain the bounds (6) and (7). 

21

References [1] J. A. O’Sullivan, R. E. Blahut, and D. L. Snyder, “Information–theoretic image formation,” IEEE Trans. on Info. Theory, vol. 44, pp. 2094–2123, Oct. 1998. [2] M. G. Kang and A. K. Katsaggelos, “General choice of the regularization functional in regularized image restoration,” IEEE Trans. on Image Processing, vol. 4, pp. 594–602, Apr. 1995. [3] N. P. Galatsanos and A. K. Katsaggelos, “Methods for choosing the regularization parameter and estimating the noise variance in image restoration and their relation,” IEEE Trans. on Image Processing, vol. 1, pp. 322–336, July 1992. [4] G. Archer and D. M. Titterington, “On some Bayesian/regularization methods for image restoration,” IEEE Trans. on Image Process., vol. 4, pp. 989–995, July 1995. [5] D. Geman and G. Reynolds, “Constrained restoration and recovery of discontinuites,” IEEE Trans. on PAMI, vol. 14, pp. 367–383, Mar. 1992. [6] M. Nikolova, “Regularisation functions and estimators,” in Proc. IEEE Int. Conf. on Image Proc., (Lausanne, Switzerland), pp. I. 457–460, Sept. 1996. [7] M. R. Banham and A. K. Katsaggelos, “Digital image restoration,” IEEE Signal Processing Magazine, vol. 14, pp. 24–41, Mar. 1997. [8] J. Besag, “Towards Bayesian image analysis,” Journal of Applied Statistics, vol. 16, no. 3, pp. 395–407, 1989. [9] A. Chambolle, R. A. DeVore, N. Lee, and B. J. Lucier, “Nonlinear wavelet image processing: variational problems, compression and noise removal through wavelet shrinkage,” IEEE Trans. on Image Processing, vol. 7, pp. 319–335, Mar. 1998. [10] L. Rudin and S. Osher, “Total variation based image restoration with free local constraints,” in Proc. of ICIP’94, (Austin, Texas), pp. I. 31–35, Oct. 1994.

22

[11] J. Kalifa and S. Mallat, “Minimax restoration and image deconvolution in mirror wavelet bases,” IEEE Trans. on Image Proc. Submitted, also available at http://www.cmap.polytechnique.fr/˜kalifa. [12] J. Rissanen, Stochastic Complexity in Statistical Inquiry. Singapore: World Scientific, 1989. [13] A. R. Barron, J. Rissanen, and B. Yu, “The minimum description length principle in coding and modeling,” IEEE Trans. on Information Theory, vol. 44, pp. 2743–2760, Oct. 1998. [14] N. Saito, “Simultaneous noise suppression and signal compression using a library of orthonormal bases and the MDL criterion,” in Wavelets in Geophysics (E. Foufoula-Georgiou and P. Kumar, eds.), pp. 299– 324, New York: Academic Press, 1994. [15] H. Krim and I. C. Schick, “Minimax description length for signal denoising and optimized representation,” IEEE Trans. on Info. Theory, vol. 45, pp. 898–908, Apr. 1999. [16] P. Moulin and J. Liu, “Analysis of multiresolution image denoising schemes using generalized–Gaussian and complexity priors,” IEEE Trans. on Info. Theory, vol. 45, pp. 909–919, Apr. 1999. [17] S. D. Servetto, K. Ramchandran, and M. T. Orchard, “Image coding based on a morphological representation of wavelet data,” IEEE Trans. on Image Processing, vol. 8, pp. 1161–1174, Sept. 1999. [18] S. LoPresto, K. Ramchandran, and M. T. Orchard, “Image coding based on mixture modeling of wavelet coefficients and a fast estimation–quantization framework,” in Data Compression Conference 97, (Snowbird, Utah), pp. 221–230, 1997. [19] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Processing Magazine, vol. 15, pp. 23–50, Nov. 1998. [20] P. Moulin, “Model selection criteria and the orthogonal series method for function estimation,” Presented at Int. Symp. on Information Theory, (Whistler, B.C.), 1995. [21] B. Natarajan, “Filtering random noise from deterministic signals via data compression,” IEEE Trans. on Signal Process., vol. 43, pp. 2595–2605, Nov. 1995.

23

[22] A. R. Barron, “Complexity regularization with application to artificial neural networks,” in Nonparametric Function Estimation and Related Topics (G. Roussas, ed.), (Kluwer, Dordrecht), pp. 561–576, NATO ASI Series, 1991. [23] P. Moulin and J. Liu, “Statistical imaging and complexity regularization,” IEEE Trans. on Info. Theory, vol. 46, pp. 1762–1777, Aug. 2000. [24] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard, “Wavelet shrinkage: Asymptopia? (with discussion),” in J. R. Statist. Soc. B, vol. 57, pp. 301–369, 1995. [25] K. Ramchandran, M. Vetterli, and C. Herley, “Wavelets, subband coding, and best bases,” Proc. IEEE, vol. 84, pp. 541–560, Apr. 1996. [26] C. Taswell, “Satisficing search algorithms for selecting near-best bases in adaptive tree-structured wavelet transforms,” IEEE Trans. on Signal Processing, vol. 44, pp. 2423– 2438, Oct. 1996. [27] A. R. Barron and T. M. Cover, “Minimum complexity density estimation,” IEEE Trans. on Information Theory, vol. 37, pp. 1034–1054, July 1991. [28] C. C. Craig, “On the Tchebychef inequality of Bernstein,” Ann. Math. Statist., vol. 4, pp. 94–102, 1933. [29] D. L. Donoho and I. M. Johnstone, “Adapting to unknown smoothness via wavelet shrinkage,” J. Amer. Statist. Assoc., vol. 90, no. 432, pp. 1200–1224, 1995. [30] I. Daubechies, Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, 1992. [31] “JPEG software codec.” Portable Research Video Group, Stanford University, 1997. Available via anonymous ftp from havefun.stanford.edu:pub/jpeg/JPEGv1.1.tar.Z. [32] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding using wavelet transform,” IEEE Trans. on Image Process., vol. 1, pp. 205–221, Apr. 1992. [33] A. Cohen, I. Daubechies, and J. C. Feauveau, “Biorthogonal bases of compactly supported wavelets,” Commun. on Pure and Appl. Math., vol. 45, pp. 485–560, 1992.

24

[34] A. Neumaier, ularization.”

“Solving ill–conditioned and singular linear systems: preprint,

Institut

f¨ ur

Mathematik,

Universit¨ at

Wien,

A tutorial on regalso

available

at

http://solon.cma.univie.ac.at/˜neum, 1991. [35] O. K. Al-Shaykh and R. M. Mersereau, “Lossy compression of noisy images,” IEEE Trans. on Image Process., vol. 7, pp. 1641–1652, Dec. 1998. [36] J. K. Wolf and J. Ziv, “Transmission of noisy information to a noisy receiver with minimum distortion,” IEEE Trans. on Info. Theory, vol. 16, pp. 406–411, July 1970.

25

2 set distortion || y- ^f || =

observed data y

Data Compression Algorithm

Figure 1: Occam filter

26

σ2

estimated data ^f

R(Rate in bpp)

2.2 2 1.8 1.6

o

Working point chosen by MDL slope = 1/(2 σ2 ln2)

1.4 1.2 1 0.8

Working point chosen o by Natarajan’s method

0.6 0.4 0.2 0

10

20

30

40

50

60

70

D (Distortion) noise variance

Figure 2: Operational R(D) curve for the noisy Lena image (σ 2 = 49) using the morphological wavelet coder [17]. The complexity–regularized estimator chooses the point with slope

1 2(ln 2)σ 2 µ

on

the R(D) curve. In particular, the MDL choice corresponds to µ = 1. Natarajan’s Occam filtering technique [21] chooses the point whose horizontal coordinate is an estimate of σ 2 . R (Rate in bpp) 0.9

0.8

0.7

0.6

0.5

0.4

0.3

Slope = -1/ ν

0.2

0.1

0

0

10

20

30

40

Rν (f*)

50

60

70

80

90

100

D (Distortion)

Figure 3: Operational R(D) curve for original Lena using morphological wavelet coder [17]. 27

R ν( f*) 150

100

50

0

0

50

100

150

200

250

300

350

400

450

500

ν

Figure 4: The index of resolvability R ν (f ∗ ) of Lena using the morphological wavelet coder [18].

600

500

400

300

200

upper bound 100

MSE 0

0

100

200

300

400

500

600

ν

Figure 5: The MSE of a complexity–regularized estimate and the upper bound (7) viewed as a function of the scaled regularization parameter ν. This plot was obtained for the Lena image with σ = 7, using a morphological wavelet coder [17].

28

(a)

(b)

(c)

(d)

(e)

(f) √

Figure 6: Top row: a) Noisy Lena, MSE= 99.89; b) wavelet thresholding with threshold σ 2 ln N [24], MSE= 48.35; c) SureShrink estimate compressed using the EQ coder, MSE= 38.91. Bottom row: complexity– regularized estimators using d) JPEG coder, MSE= 51.82; e) the morphological wavelet coder [17], MSE= 33.65; f) the EQ coder [18], MSE= 33.78.

29

(a)

(b)

(c)

(d)

(e)

(f) √

Figure 7: Top row: a) Noisy Barbara, MSE= 99.89; b) wavelet thresholding with threshold σ 2 ln N [24], MSE= 119.34; c) SureShrink estimate compressed using the EQ coder, MSE= 72.84. Bottom row: complexity–regularized estimators using d) JPEG coder, MSE= 115.70. e) the morphological wavelet coder [17], MSE= 82.65; f) the EQ coder [18], MSE= 77.60.

30

MSE

50

45

40

35

30

MDL choice 25

minimum 20 20

o

o 30

40

50

60

70

80

90

ν

Figure 8: Influence of choice of R(D) slope on MSE. This plot was obtained for the Lena image with σ = 7, using a morphological wavelet coder [17].

31

σ=5

σ=7

σ = 10

Universal

MDL

HT Oracle

SureShrink

Sure–EQ

JPEG

Morph.

EQ

23.62

28.71

16.06

12.76

16.99

22.19

16.28

16.23

(33.32)

(26.19)

(24.10)

34.13

23.52

22.96

(48.02)

(37.29)

(34.25)

51.82

33.65

33.78

(67.94)

(54.73)

(49.30)

33.81

48.35

40.66

57.43

23.73

35.56

20.26

32.79

25.78

38.91

Table 1: MSEs for denoised Lena images using the complexity–regularized estimator (2) with µ = 1, and different wavelet–based denoising schemes. The corresponding indices of resolvability R ν (f ∗ ) are indicated in parentheses. The acronym “HT Oracle” denotes the hard thresholding oracle. In the “Sure–EQ” column, SureShrink estimates are compressed to the same bit rate as the complexity–regularized estimator using the EQ coder.

σ=5

σ=7

σ = 10

Donoho

MDL

HT Oracle

SureShrink

Sure–EQ

JPEG

Morph

EQ

45.26

59.34

22.38

17.51

26.95

36.98

27.27

25.56

(65.12)

(51.33)

(45.55)

52.14

43.33

40.62

(100.14)

(78.39)

(69.47)

115.70

82.65

77.60

(156.50)

(121.38)

(107.79)

73.58

119.34

95.00

151.24

38.48

66.56

30.59

53.96

44.99

72.84

Table 2: Same estimators as in Table 1, applied to the Barbara image.

32

Suggest Documents