IEEE SIGNAL PROCESSING LETTERS, VOL. 11, NO. 6, JUNE 2004
549
Generalized Cross Validation for Multiwavelet Shrinkage Tai-Chiu Hsung and Daniel Pak-Kong Lun
Abstract—Traditional multiwavelet shrinkage denoising techniques require a priori knowledge of noise variance that may not be obtained in some practical situations. By using generalized cross validation (GCV), we propose in this paper a new level-dependent risk estimator for multiwavelet shrinkage that does not require such a priori information. Simulation results verify that the resulted risk estimator gives better indication on threshold selection comparing with the traditional GCV method. Improved denoising performance is then achieved particularly for higher multiplicity multiwavelet shrinkage.
TABLE I SQUARE ERROR WITH OPTIMAL THRESHOLDS WHEN APPLYING TO THE DENOISING OF “HYPCHIRPS” WITH SIGNAL LENGTH 2 AND RNR VALUE R
Index Terms—Multiwavelet, parameter estimation, smoothing methods, wavelet transforms, white noise.
I. INTRODUCTION
C
ONSIDER estimating an unknown deterministic discrete signal from noisy observation (1)
where is independent and identically disnoise. The goal of denoising is to mintributed (iid) imize the mean square error (MSE) (2) subject to the condition that the estimated signal is at least as smooth as . It is found that multiwavelet denoising using multivariate shrinkage [1] gives consistently better results than the traditional wavelet shrinkage denoising. The improvement is contributed by the multiwavelet transform, which gives better signal representation so that noise and signal can be separated much easier. Besides, the multivariate shrinkage operator effectively exploits the statistical information of the transformed coefficient vectors of noise that improves the denoising performance. In this letter, we consider orthogonal multiwavelet transform with orthogonal prefilters. Denote the vector filter bank levels discrete multiwavelet transform of multiplicity for as and the prefilter as , we can write the discrete Manuscript received August 18, 2003; revised October 27, 2003. This work was supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China, and a grant from the Hong Kong Polytechnic University under Project A418. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Paulo Goncalves. The authors are with the Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China (e-mail:
[email protected];
[email protected]). Digital Object Identifier 10.1109/LSP.2004.827924
multiwavelet transform for scalar signals as . Hence the discrete multiwavelet transform of the noisy signal be, where and . The comes, matrix is designed such that the levels output transform by 1 vector, i.e., coefficient vectors are arranged into an where is th transform coefficient vector of at scale , each vector contains elements. It is shown in [2] that, by adaptively applying different multivariate shrinkage operation to coefficient vectors at different resolution levels, improved denoising performance can be achieved. More specifically, the level-dependent multivariate shrinkage with can be expressed as
where is the level transform coefficient vectors of the obis the shrunk coefficient vectors. is the servation, and number of transformed coefficient vectors at level . The multivariate shrinkage operator [1] on each coefficient vector is defined as
(3) where (4) and denotes the shrinkage parameters . is the threshold for the shrinkage operation performed at a particular is the covariance matrix of noise at resolution level , and
1070-9908/04$20.00 © 2004 IEEE
550
IEEE SIGNAL PROCESSING LETTERS, VOL. 11, NO. 6, JUNE 2004
TABLE II (DGHO5) PERFORMANCE OF THE ESTIMATED PARAMETER SET AND THE CORRESPONDING AVERAGE SQUARE ERROR FOR LSURE, RSURE, BGCV AND LGCV ON “HYPCHIRPS” WITH RNR = 7 AND 10 AND SIGNAL SAMPLE LENGTH n = 2
level , where If the noise is iid, can be simply obtained by , where is the identity matrix. The selection of the threshold values is critical to the performance of multiwavelet shrinkage. Recently, we proposed a risk estimator, namely LSURE [2], that accurately estimates . However, similar to other risk estimators for multiwavelet shrinkage [1], [3], it requires a priori knowledge of noise power that may not be obtained in some practical situations. By using Generalized Cross Validation [4], a risk estimator was proposed [5] for wavelet shrinkage that does not require a priori knowledge of noise power. However, we cannot borrow such estimator directly for multiwavelet shrinkage since the shrinkage function is modified. In this paper, we derive the GCV function for multiwavelet shrinkage. Simulation results verify that the resulted risk estimator gives better indication on threshold selection comparing with the traditional GCV method. Improved denoising performance is then achieved particularly for higher multiplicity multiwavelet shrinkage. II. GCV FOR HIGHER MULTIPLICITY MULTWAVELET SHRINKAGE
TABLE III (DGHO5) PERFORMANCE OF THE ESTIMATED PARAMETER SET AND THE CORRESPONDING AVERAGE SQUARE ERROR FOR LSURE, RSURE, BGCV AND LGCV ON “BLOCKS” WITH RNR = 7 AND 10 AND SIGNAL SAMPLE LENGTH n = 2
where is the total number of signal samples. is the . It so-called derivative influence matrix such that was proved that, by minimizing the GCV score, the threshold thus obtained approaches optimum asymptotically [5]. The same idea can be applied to the level-dependent multivariate shrinkage. Let us define the GCV function for multivariate shrinkage at level as (6)
The numerator in (6) is the mean square norm of the difference between the shrunk and the original transformed coefficient vectors. To evaluate the denominator of (6), we need to for compute the trace of Jacobian of the influence matrix each resolution level . As indicated in [5], it can be shown that . So let us consider the computation of at a particular level . To simplify the elaboration, we skip the index in the following formulations. First, we compute the partial derivatives of the shrinkage function (4) on a . Let , i.e., transformed coefficient vector , then we have
By using GCV, a risk estimator for wavelet shrinkage is given in [5] as follows:
(7) where
(5)
is an
th element: vector
vector with all elements zero except the . For , the shrink , then the partial derivative of the th element of
HSUNG AND LUN: GENERALIZED CROSS VALIDATION FOR MULTIWAVELET SHRINKAGE
TABLE IV (DGHO4) PERFORMANCE OF THE ESTIMATED PARAMETER SET AND THE CORRESPONDING AVERAGE SQUARE ERROR FOR LSURE, RSURE, BGCV AND LGCV ON “HYPCHIRPS” WITH RNR = 7 AND 10 AND SIGNAL SAMPLE LENGTH n = 2
w.r.t.
equal zero for ,
. On the other hand for
Hence,
551
TABLE V (DGHO4) PERFORMANCE OF THE ESTIMATED PARAMETER SET AND THE CORRESPONDING AVERAGE SQUARE ERROR FOR LSURE, RSURE, BGCV AND LGCV ON “BLOCKS” WITH RNR = 7 AND 10 AND SIGNAL SAMPLE LENGTH n = 2
fraction is only required for multiwavelet of higher multiplicity. , the GCV cost function (6) For bi-variate shrinkage is equivalent to that used in wavelet shrinkage. Or in other words, the traditional GCV function can be used for bi-variate . shrinkage but not for multivariate shrinkage with This statement will be verified in Section III. Similar to [5], the computation of the GCV cost function and the shrinkage operation do not require a priori information of noise power, which is the major improvement over the previous multivariate shrinkage approaches [1], [2]. III. SIMULATION
(8) . The Equation (8) only considers a particular element of trace of , or equivalently the Jacobian of , is the sum of all elements of that are above the threshold. That is,
(9) . It is seen in the formulation above that the where counts the number of coefficient vectors that quantity are not shrunk to zero and minus a term that is proportional to the , which must be obtained in average shrinking fraction performing the shrinkage. It should be noted that the shrinkage
As the first part of our simulation, we try to illustrate that multiwavelet shrinkage of higher multiplicities usually give better denoising performance. Let us denote the settings for the multiwavelet denoising using DGH wavelet of multiplicity 4 and 5 and the corresponding nondecimating orthogonal prefilters as DGHO4 and DGHO5, respectively. The setting for the traditional bi-variate shrinkage using DGH wavelet of multiplicity 2 and the corresponding prefilter [7] is denoted as GHMXIA. The test signal is “Hypchirps” contaminated with iid noise. The noise power is measured by the root signal-to-noise . Table I shows the denoising ratio [1] performance of the above multivariate shrinkage algorithms at different noise levels and signal sample lengths using optimal threshold. It is seen in the table that the higher is the multiplicity, the lower is the squared error between the denoised
552
IEEE SIGNAL PROCESSING LETTERS, VOL. 11, NO. 6, JUNE 2004
Fig. 1. (DGHO5) square error function versus bGCV and LGCV for “Hypchirp” with RNR = 7 and signal sample length n = 2 .
and the original signals. It is expectable since multiwavelet of higher multiplicity gives us a higher dimensional perception to a signal in the wavelet domain. It gives us more information to distinguish noise from signal. We then evaluate the performance of different risk estimators for multiwavelet shrinkage of multiplicity 4 and 5. We compare the performance of the LGCV estimator to i) the recently proposed LSURE [2] for multiwavelet shrinkage with true value of noise variance ; ii) the recently proposed LSURE [2] for multiwavelet shrinkage with noise variance estimated by using robust statistical method [8], denoted as RSURE; and iii) the GCV borrowed from the traditional wavelet shrinkage (bGCV) [5], with respect to the theoretical optimal conditions. The bGCV for each level is defined as (10) (11)
For RSURE, we first estimate the noise covariance matrix for the first level transform coefficients in each simulation, then obtain the noise variances by averaging the ratio to the covariance . The estimated varimatrix used in denoising ance is used for other resolution levels. It is well known that the observations contain both noise and signal. The selection of the
estimation parameter, which controls the rejection of “outliers,” cannot be known a prior. We adopt the suggestion in [8]. Simulation results were obtained by averaging the results from 100 trails of multiwavelet denoising on the test signals “Hypchirps” and “Blocks” contaminated with iid noise , 10. The signal sample length is . Tawith bles II–Table V show the simulation results for the test signals with DGHO4 and DGHO5 settings, respectively. The upper half of each table shows the square differences between the estimated thresholds and the optimal ones; and the lower half , where shows the corresponding averaged square error, , of the denoised wavelet coefficients. It is seen that in most cases LGCV is close to LSURE and better than bGCV. On the other hand, when comparing RSURE and LGCV with unknown noise power, the performance of RSURE is somewhat signal and noise power dependent. For simple test signal such as “Blocks,” the estimation of noise power in RSURE is very accurate. However, for more complex signal such as “HypChirps,” the performance of RSURE is much inferior to LGCV. Fig. 1 shows the MSE function versus the bGCV and LGCV for “Hypchirps” contaminated with iid and the signal sample length is . We noise with can see that it gives a large error in small threshold regions and the minimum is often far away from that of the optimal SE function. It verifies that we cannot borrow the traditional GCV-based risk estimator for multivariate shrinkage with . Similar to the GCV approach for wavelet shrinkage, the LGCV is also biased and does not give good estimation for very small thresholds. However, this does not introduce much problem since the threshold does not likely appear in that range. As compared to bGCV, much accurate thresholds can be found by using LGCV. IV. CONCLUSION We derived and verified with simulations a new leveldependent risk estimator LGCV for multiwavelet denoising using multivariate shrinkage. The LGCV does not require a priori information of noise power and closely resembles the square error function. It gives much accurate parameters for multiwavelet shrinkage as compared with the traditional GCV method. REFERENCES [1] T. R. Downie and B. W. Silverman, “The discrete multiple wavelet transform and thresholding methods,” IEEE Trans. Signal Processing, vol. 46, pp. 2558–2561, Sept. 1998. [2] T.-C. Hsung and D. P.-K. Lun, “Optimal thresholds for multiwavelet shrinkage,” Electron. Lett., vol. 39, no. 5, pp. 473–474, Mar. 2003. [3] E. Bala and A. Ertuzun, “Applications of multiwavelet techniques to image denoising,” in Proc. IEEE Int. Conf. Image Processing, vol. 3, New York, Sept. 22–25, 2002, pp. 581–584. [4] G. Wahba, Spline Models for Observational Data. Philadelphia, PA: SIAM, 1990. [5] M. Jansen, M. Malfait, and A. Bultheel, “Generalized cross validation for wavelet thresholding,” Signal Process., vol. 56, no. 1, pp. 33–44, Jan. 1997. [6] G. C. Donovan, J. S. Geronimo, and D. P. Hardin, “Orthogonal polynomials and the construction of piecewise polynomial smooth wavelets,” SIAM J. Math. Anal., vol. 30, no. 5, pp. 1029–1056, 1998. [7] X. G. Xia, “A new prefilter design for discrete multiwavelet transforms,” IEEE Trans. Signal Processing, vol. 46, pp. 1558–1570, June 1998. [8] N. A. Campbell, “Robust procedures in multivariate analysis I: Robust covariance estimation,” Appl. Statist., vol. 29, no. 3, pp. 231–237, 1980.