Chinese Journal of Electronics Vol.19, No.4, Oct. 2010
Image Denoising Using Bandelets and Hidden Markov Tree Models∗ ZHANG Wenge1,2 , WANG Suang2,3 , LIU Fang1,2 , GAO Xinbo2 and JIAO Licheng2,3 (1.School of Computer Science and Technology, Xidian University, Xi’an 710071, China) (2.Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China) (3.Institute of Intelligent Information Processing, Xidian University, Xi’an 710071, China) Abstract — In this paper, both the marginal and joint statistics of second generation Orthogonal bandelet transform (OBT) coefficients of natural images are firstly studied, and the highly non-Gaussian marginal statistics and strong interscale, interlocation and interdirection dependencies among OBT coefficients are found. Then a Hidden Markov tree (HMT) model in OBT domain which can effectively capture all dependencies across scales, locations and directions is developed. The main contribution of this paper is that it exploits the edge direction information of OBT coefficients, and proposes an image denoising algorithm (B-HMT) based on HMT model in OBT domain. We apply B-HMT to denoise natural images which contaminated by additive Gaussian white noise, and experimental results show that B-HMT outperforms the Wavelet HMT (W-HMT) and Contourlet HMT (C-HMT) in terms of visual effect and objective evaluation criteria. Key words — Image denoising, Bandelets, HMT model, Image modeling, Statistics models, Multiscale geometry analysis.
I. Introduction In recent years, statistic information provides useful prior for image processing, which comes from the statistic modeling to image. The aim of statistical modeling is to use a small number of parameters to characterize the image, so, a simple, accurate and tractable model is our pursuit. Recently, the combination of statistic model with wavelet and Multiscale geometry analysis (MGA) tools is popular in image processing. Crouse et al. use HMT to model wavelet coefficients and capture both the marginal and joint statistics and reach the state-of-the-art effect in image denoising, classification and detection in Ref.[1], but the lacking of directions (wavelet only has three directions: horizontal, vertical and diagonal) limites its application. To utilize more directions, Duncan and Minh in Ref.[2] use HMT to model contourlet coefficients and effectively capture all dependencies across scales, locations and directions. Although different directions and flexible number of direction can be ∗ Manuscript
used in contourlet transform at each scale, they must be set before transform, and it is difficult to choose the appropriate directional number for each scale. So, a transform which has plenty of selective directions and can adaptively choose best direction is our need. The best candidate is bandelets[3−7] . It is a new MGA tool which can adaptively capture the geometry of regular image along its edges. It has plenty of selective directions for each edge and finds potential in image denoising. Similar to wavelet and MGA tools such as ridgelets, curvelets, contourlets and directionlets, it has the characteristics of multiscale and time-frequency-localized. Different from them, it is adaptive and both its selective direction number and the nonlinear approximation error decay (O(M −α ), α ≥ 2) outperform them. Another adaptive tool wedgelets is beyond of scope as its nonlinear approximation error decay and the number of selective direction inferior to bandelets. Motivated by W-HMT and C-HMT, we develop a statistics model by using HMT in OBT domain. OBT can adaptively choose optimal direction, and HMT with Gaussian mixture model can capture the dependencies of scales and locations. Combining them and adding local direction information, we propose B-HMT algorithm and use it to denoise natural image which contaminated by additive Gaussian white noise, intuitively, the denoising results should be improved. The remainder of this paper is organized as follows. Section II defines three important relationships of OBT coefficients. Section III studies the marginal statistics and joint statistics of OBT coefficients. Section IV models OBT coefficient by applying HMT model to OBT domain. Section V gives the denoising process, Section VI analyses the experimental results. Conclusions are drawn in Section VII.
II. Bandelet Coefficient Relationships Definition For each OBT coefficient B, we define three important relationships to study its statistics. Parent-children relationship (PB): the coefficient in the same spatial location in the immediately coarser scale as its parent; Neighbor relationship (NB): eight adjacent coefficients in the same subband as its neighbors; Colleague relationship
Received May 2009; Accepted Apr. 2010. This work is supported in part by the National Natural Science Foundation of China (No.60702062, No.60703109, No.60970066, No.60972148 and No.60971128), the National High Technology Research and Development Program (863 Program) of China (No.2009AA12Z210, and No.2008AA01Z125), the National Research Foundation for the Doctoral Program of Higher Education of China (No.200807010003), the National Science and Technology Ministry of China (No.XADZ2008159), the Basic Science Research Fund in Xidian University of China (No.JY10000902001), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (No.B07048).
Image Denoising Using Bandelets and Hidden Markov Tree Models (CB): coefficients in the same directional subsquare which share the common direction as its colleagues. Wavelet in a subband has only one direction, contourlet in a subband has the pre-designated directions, whereas bandelet segments a subband to many subsquares, and each subsquare has many directions to choose. For example, a subsquare with size of 8×6 has 72 selective directions, and 16×16 has 288. Using directional information of each subsquare efficiently, BHMT should have a better application than W-HMT and C-HMT. The directional information produced in the process of quadtree segmentation is used to exhibit the colleague relationship clearly. Fig.1 shows the three relationships. Fig.1(a) shows that each child has one parent and each parent has four children. Fig.1(c) is part geometry of a subband of natural image “Barbara”, in which the black denotes the considering node, and each different gray subsquare denotes an edge (Inf denotes the smooth area, that is, there is no edge in that subsquare), and values in it denote its direction. Fig.1(d) is its corresponding bandelet coefficients. Assuming one coefficient in a subsquare of OBT is B, others in this subsquare are its colleagues, B and its colleagues share one common direction. From above we find that OBT have many direction subsquares and each subsqare has many directions to choose which outperform wavelet and contourlet.
647
show that large/small values of OBT coefficients tend to propagate through the scales of the quadtree, the children are always large if their parent is large. Clustering: the dependency between the current coefficient B and its neighbors NB exhibits that the OBT coefficients have the clustering, that is, if the neighbors of a OBT coefficient are large/small, it is always large/small, as Fig.3(b) shows.
Fig. 2. Histograms of three finest subbands HH, HL, LH and their 16×16 subsquares of the image “Barbara”, repectively. kurtosis: (a) 30.0327; (b) 27.034; (c) 68.0335; (d) 4.4308; (e) 4.2368; (f ) 25.376
Fig. 1. Bandelet coefficient relationships. (a) parent-children; (b) neighbors; (c) colleagues of geometry; (d) colleagues of OBT
III. Bandelet Statistics 1. Marginal statistics To investigate the marginal statistics of OBT coefficients of natural images, both the histograms of three finest subbands HH, HL, LH and their 16 × 16 subsquares of the image “Barbara” in Fig.2 are exhibited. The distributions of three finest subbands show a high non-Gaussian statistics: a sharp peak at zero and heavy tails at both sides. They also prove that OBT has sparsity property: the majority coefficients are close to zero while the minority coefficients contain the main energy. Their kurtosises are 30.0327, 27.034 and 68.0335, respectively, which are much higher than the kurtosis of 3 of Gaussian’s. We also find that distributions of their subsquares also show a non-Gaussian statistics, and their kurtosises are 4.4308, 4.2368 and 25.376, respectively. This is the difference between OBT and wavelet that the directional subsquares of OBT also show non-Gaussian statistics, and this make it possible to model the directional information of each subsquare by using a two-state, zero-mean Gaussian mixture model. This phenomenon exists in all subbands, all subsquares of other natural images. So, we conclude that the marginal distributions of subbands and subsquares of natural images in the OBT domain are non-Gaussian. 2. Joint statistics Since the marginal statistics only describe the individual behaviors of OBT coefficients and do not consider their dependencies, so, we study their joint statistics by using natural image “Barbara” to exploit their dependencies from three aspects: interscale, interlocation and interdirection. Fig.3 exhibits the conditional distribution of bandelet coefficients conditioned on their parents (PB), neighbors (NB), and colleagues (CB). We find that each conditional distribution reflects one important characteristic of OBT coefficients: Persistency: the dependencies of parent-children in Fig.3(a)
Fig. 3. Conditional distribution of a finest subband of “Barbara”, conditioned on (a) parent P (B|P B); (b) neighbor P (B|N B); (c) colleague P (B|CB) Similarity: the dependencies between the current coefficient B and its colleagues CB reveal that the OBT coefficients in a directional subsquare show high similarity. In a non-edge subsquare, coefficients are homogeneous and have the similar magnitude; in an edge subsquare, coefficients that construct the edge always have similar large values, and coefficients that do not construct the edge always have similar small values, as Fig.3(c) shows. From the joint statistics of OBT coefficients, we conclude that there exist strong interscale, interlocation, and interdirection dependencies among the OBT coefficients. How to model these dependencies is our task in the subsequent sections.
IV. Image Modeling Our objective is to develop a probability models to capture both the non-Gaussian statistics and complex dependencies of OBT coefficients while easy to process. Conditions mentioned above can be achieved by using HMT model which introduced by Crouse et al. in Ref.[1]. The HMT model is an HMM that uses a quadtree structure. HMM has been successfully used in the wavelet domain and contourlet domain[1,2] . An n-state HMM links each coefficient with a hidden state variable, so each coefficient can be characterized by an n-dimensional state probabilities vector p and an n-dimensional standard deviation vector σ p = (p1 , p2 , · · · , pn )T
(1)
σ = (σ1 , σ2 , · · · , σn )T
(2)
where 1, 2, · · · , n denote the states.
648
Chinese Journal of Electronics
We choose HMT to model the statistics of OBT coefficients lies in four aspects. First of all, by establishing links between the hidden state of each OBT coefficient and its four children, the HMT directly models the interscale dependencies of parent-children of OBT coefficients; second, by linking neighbors to a common ancestor, the HMT indirectly model the interlocation dependencies of neighboring OBT coefficients; third, by making a subsquare to share a σ, the HMT also model the interdirection dependencies of colleagues of OBT coefficients, which is different from W-HMT with a subband share a common σ, and this makes our model use more local directional information than W-HMT, and has a better pertinence and should produce a better result, intuitively; last but not the least is that, by using the EM algorithm[8] , the HMT model in OBT domain can be trained efficiently, and its two-states mixture models imply that two states corresponding to the edges and smooth areas of natural images, respectively. 1. Capturing non-Gaussian: Gaussian mixture models To capture the non-Gaussian statistics of OBT coefficients, the B-HMT models the marginal probability density function (pdf) of each OBT coefficient as a Gaussian mixture density by using a hidden state to designate the large/small of a coefficient. In practice, a two-state, zero-mean Gaussian mixture model is often used to characterize each OBT coefficient. Fig.4 is an example of two-state, zero-mean Gaussian mixture model for a random variable B. Where ps (1) and 1 − ps (1) are the probability mass function of the state variable S, the variances of the Gaussian pdf corresponding to each state.
Fig. 4. An example of two-state, zero-mean Gaussian mixture model for a random variable B The pdf of an OBT coefficient is ps (m)fB|S (b/S = m) fB (b) =
2010
difficult to designate the appropriate directional number for all subbands at the very start. In contrast, B-HMT segments coefficients of a subband to many subsquares and each subsquare has many directions to choose, and it can use the local directional information better than W-HMT and C-HMT, intuitively, and the application effect of B-HMT may outperform W-HMT and C-HMT.
V. Denoising Application 1. Obtain the noisy coefficients: OBT We apply the OBT to the noisy image and obtain the noisy coefficients. Before the transform, it is necessary for us to standardize the noisy image to a square one, and normalize the image, then set the transform parameters. We decompose the noisy image to the second layer, and set the maximum size of the decomposed subsquare √ to 16 × 16, the minimum ones to 4 × 4, the threshold T = σ 2 log N , where N is the size of image, and calculate the geometry which records the directional information of each subsquare. In OBT domain, the relationship among noisy coefficients y, clean coefficients x and noise e with zero-mean additive white Gaussian can be described as y =x+e (4) So the denoising problem is to estimate x given y. 2. Obtain model parameters: HMT modeling and EM training We use HMT model to match the noisy OBT coefficients and obtain a parameter set θy . We first use a two-state, zero-mean Gaussian mixture model to characterize each OBT coefficient, and use y to initialize the HMT model and use the variance in each subsquare which represents the directional information of the subsquare to update the initial variance of corresponding location. Then, we use the Markov tree to capture the dependencies of interscale, interlocation and interdirection, and use EM algorithm to train and obtain a model parameter set θy . By subtracting the noise variances from the variances in θy , we obtain a model of the clean OBT coefficients θx : (x)
(y)
(e)
(σ(i,j,k,n),m )2 = ((σ(i,j,k,n),m )2 − (σ(i,j,k,n) )2 )+ (3)
m=1,2
where m=1,2 ps (m) = 1, S is the hidden state variable which controls the amplitude of OBT coefficient and it is invisible. A coefficient is small when m = 1, otherwise it is large. We model each OBT coefficient in this way and can fit real OBT coefficients closely. 2. Capturing dependencies: HMT modeling To capture the dependencies of interscale, interlocation and interdirection among OBT coefficients, the B-HMT uses a probabilistic tree to model Markovian dependencies between the hidden states. To an OBT decomposition of J scales, K subband and L directional subsquare, a B-HMT model contains the following parameters. P1,k (k = 1, 2, 3): the root state probability vector of k subband at the coarsest scale. Aj,k (j = 2, · · · , J, k = 1, 2, 3): the state transition probability matrix of the subband k from scale j − 1 to scale j. σj,k,l (j = 2, · · · , J, k = 1, 2, 3, k = 1, · · · , L): the Gaussian standard deviation vector of the directional subsquare l at subband k of scale j. The major advantage of HMT in OBT domain is that it uses the direction dependency within the directional subsquare while the W-HMT model does not. In the W-HMT model, each subband can only share one direction of vertical, horizontal and diagonal, so it lacks of direction information. In C-HMT model, the directional number in a subband is designated before the transform, and it is
(5)
where (i, j, k, n) denote n-th bandelet coefficient in the scale i, subband j and direction subsquare k with state m; and (z)+ = z for z ≥ 0 and (z)+ = 0 for z < 0. Because of the “tying” of OBT coefficients in each subband and considering the directional information in each subsquare, all coefficients in the subsquare k have the same (y) (e) variance (σi,j,k )2 . The noise variance (σi,j,k )2 can be estimated by using the robust median estimator in the finest subband of OBT which are also used in wavelet transform in Ref.[9]: (e)
σ(i,j,k,n) =
M edian(|Yij |) , 0.6745
Yij ∈ HH1
(6)
3. Obtain the clean coefficients: Bayesian estimation With the HMT model θx , we can estimate E[xi,j,k,n |yi,j,k,n , θx ] for each OBT coefficient bi,j,k,n by using Bayessian estimation. Because of the state Si,j,k,n of each OBT coefficient is given, we can estimate E[xi,j,k,n |yi,j,k,n , θx ] by using the following formula which proposed in Ref.[1] E[xi,j,k,n |yi,j,k,n , θx , Si,j,k,n = m] (x)
=
(σ(i,j,k,n),m )2 (x)
(e)
(σ(i,j,k,n),m )2 + (σ(i,j,k,n) )2
yi,j,k,n
(7)
Then, we can use the state probability p(Si,j,k,n = m|yi,j,k,n , θx ) which obtained form HMT training to estimate each clean coefficient E[xi,j,k,n |yi,j,k,n , θx ] = p(Si,j,k,n = m|yi,j,k,n , θx ) m
Image Denoising Using Bandelets and Hidden Markov Tree Models (x)
×
(σ(i,j,k,n),m )2 (x) (σ(i,j,k,n),m )2
+
yi,j,k,n (e) (σi,j,k,n )2 (8)
4. Obtain the reconstructed image: inverse OBT We use the quadtree directional information and other parameters to perform an inverse OBT to the clean coefficient x, and can obtain the final reconstructed image.
VI. Experimental Results Analysis 1. Experimental results Two standard test images Barbara and Lena with size of 512 × 512 are used in our experiments, which contaminated by additive Gaussian white noise with different standard deviation 10, 20, 30, 40 and 50, respectively. We compare the denoising results of our method (B-HMT) with that of the following methods: (1) global hard-threshold in OBT domain (B-HT), (2) W-HMT, (3) C-HMT. √ B-HT and B-HMT√use T = σ 2 log N as the initial threshold, and B-HT uses T = σ 2 log N as the denoising threshold, where N is the size of image. OBT decomposes image to the second scale, and the size of the minimum subsquare is 4 × 4, and the maximum is 16 × 16. HMT uses two-state, zero-mean Gaussian mixture model to characterize each coefficient, the training precision is 10−4 . If the width of image is greater than 256, the coarsest 16 × 16 coefficients as the scale coefficients need not change, otherwise the coarsest 2×2 coefficients as the scale coefficients need not change. The denoising performance is evaluated in terms of visual effect and the Peak signal-to-noise ratio (PSNR) which is defined as follows. P SN R = 10 log10
2552 (|X − Y |2 )/N
649
Table 1. Comparison of performance for image denoising using different algorithms in terms of PSNR Images σ Noisy B-HT W-HMT C-HMT B-HMT 10 28.359 28.771 31.553 31.248 31.588 20 23.211 25.984 27.982 27.875 28.169 Barbara 30 20.658 25.476 26.097 26.000 26.561 40 19.091 23.478 24.221 23.723 24.503 50 18.203 23.024 23.497 23.349 23.736 10 28.227 30.9 33.667 33.12 33.533 20 23.656 28.607 30.138 29.797 30.319 Lena 30 21.472 26.479 28.174 28.005 28.46 40 20.132 25.705 27.105 26.872 27.652 50 19.216 24.662 26.304 25.758 26.54
(9)
where X is the original image, Y is the reconstructed image, and N is the size of image. In theory, the bigger the PSNR, the better the denoising effect. The experimental results are shown in Fig.5, Fig.6 and Table 1, with the best results highlighted in bold font.
Fig. 5. Denoising results of “Barbara” image. (a) “Barbara” image; (b) noisy image (noise standard deviation = 40, PSNR = 19.091 dB); (c) B-HT (PSNR = 23.478 dB); (d) W-HMT (PSNR = 24.221 dB); (e) C-HMT (PSNR =23.723 dB); (f ) our method: B-HMT (PSNR= 24.503 dB)
2. Results analysis The edge preservation of our method is better than that of WHMT and C-HMT because of we use the OBT which has an optimal approximation to edges. It segments each high subband to many subsquares, and each subsquare contains an edge at most and has many directions to select, which makes its edge approximation of B-HMT outperform W-HMT which only has three directions to select and C-HMT which decomposes image to fixed scales with fixed directions.
Fig. 6. Denoising results of “Lena” image. (a) “Lena” image; (b) noisy image (noise standard deviation = 40, PSNR = 20.132 dB); (c) B-HT (PSNR = 25.705 dB); (d) W-HMT (PSNR = 27.105 dB); (e) C-HMT (PSNR = 26.872 dB); (f ) our method: B-HMT (PSNR= 27.652 dB)
The denoising effects of B-HMT are also better than W-HMT and C-HMT. This is because that we model all the dependencies of interscale, interlocation and interdirection at the same time. Especially, modeling each directional subsquare can effectively use the local directional information and has a strong pertinence. The good denoising performance of our method is also exhibited in terms of PSNR. From Table 1, we find that, the PSNR of our method outperforms other methods in most cases. This is because that our method models all the dependencies of parent-children, neighbors and colleagues, so it can capture all dependencies of scales, locations and directions. The local directional information makes B-HMT obtain a better denoising result than W-HMT and C-HMT. In the lower noise level (σ = 10), PSNR of B-HMT for Lena is lower than that of W-HMT. This is probably because W-HMT is a classical statistical model and reaches the state-of-the-art denoising results, and our goal is to model a HMT in √OBT domain, so we only simply use the universal threshold T = σ 2 log N as the initial threshold and do not select other initial parameters appropriately in B-HMT, so PSNR of B-HMT is lower than W-HMT sometimes.
VII. Conclusion By using HMT to model OBT coefficients, this paper proposed an image denoising algorithm B-HMT which can make full use of the subsquare direction information of OBT and capture the all dependencies across scales, locations and directions. The denoising results of B-HMT for additive Gaussian white noise outperform that of the W-HMT and C-HMT in terms of visual effect and objective evaluation criteria. To find the best initial threshold for geometry calculation and optimize the initial parameters in order to obtain a better visual effect and a higher performance index are the further
650
Chinese Journal of Electronics
research. References [1] M. Crouse, R.D. Nowak, R.G. Baraniuk, “Wavelet-based signal processing using hidden Markov models”, IEEE Trans. Signal Process., Vol.46, No.4, pp.886–902, 1998. [2] Duncan D.Y. Po, Minh N. Do, “Directional multiscale modeling of images using the contourlet transform”, IEEE Trans. Image Processing, Vol.15, No.6, pp.1610–1620, 2006. [3] W.G. Zhang, F. Liu, L.C. Jiao, “SAR image despeckling via bilateral filtering”, The IET Electronics Letters, Vol.45, No.15, pp.781–783, 2009. [4] E. Pennec, S. Mallat, “Sparse geometric image representations with bandelets”, IEEE Transactions on Image Processing, Vol.14, No.4, pp.423–438, 2005. [5] E. Pennec, S. Mallat, “Bandelets image approximation and compression”, SIAM Journal of Multiscale Modeling and Simulation, Vol.4, No.3, pp.992–1039, 2005. [6] S. Mallat, G. Peyr´e, “A review of Bandelets methods for geometrical image representation”, Numerical Algorithms, Vol.44, No.3, pp.205–234, 2007. [7] Wenge Zhang, Fang Liu, Licheng Jiao et al., “SAR image despeckling based on edge detection and feature clustering in bandelets domain”, IEEE Geoscience and Remote Sensing Letters, Vol.7, No.1, pp.131–135, 2010. [8] A.P. Dempster, N.M. Laird, D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, J. R. Stat. Soc., Vol.39, pp.1–38, 1977. [9] D.L. Donoho, I.M. Johnstone, “Ideal spatial adaptation via wavelet shrinkage”, Biometrika, Vol.81, pp.425–455, 1994. ZHANG Wenge was born in Shaanxi Province, China in 1969. He received the B.S. degree from the Electronic Engineering Institute of PLA, Hefei, China in 1990 and the M.S. degree in computer network from the Chool of Computer Science and Technology of Xidian University, Xi’an, China, in 2006. Currently, he is a Ph.D. candidate at Xidian University. His research interests include the representation and application of multiscale geometry analysis, image denois-
2010
ing and SAR image processing. WANG Shuang was born in Shaanxi Province, China in 1978. She received the B.S. degree in electrical engineering in 2000, the M.S. degree in computer applications and technology and Ph.D. degree in circuit and system from Xidian University in 2003 and 2007, respectively. Currently, she is an associate professor at Xidian University. Her research interests are multiscale geometry analysis, image processing and high-resolution SAR image processing. (Email:
[email protected]) LIU Fang was born in Beijing, China in 1963. She received the B.S. degree in computer science and technology from the Xi’an Jiaotong University in 1984 and the M.S. degree in computer science and technology from the Xidian University in 1995. Currently, She is a professor at Xidian University. Her research interests include machine learning, optimization problems, image processing and high-resolution SAR image processing. GAO Xinbo was born in Shangdong Province, China in 1972. He received the B.S., M.S. and Ph.D. degrees in signal and information processing from Xidian University, China, in 1994, 1997 and 1999 respectively. Currently, he is a professor at Xidian University. His research interests include machine learning and multimedia information processing and analysis. JIAO Licheng was born in Shaanxi Province, China in 1959. He received the B.S. degree from Shanghai Jiaotong University, Shanghai, China in 1982 and the M.S. and Ph.D. degrees from Xi’an Jiaotong University, Xi’an, China in 1984 and 1990, respectively. His current research interests include signal and image processing, nonlinear circuit and systems theory, learning theory and algorithms, optimization problems, wavelet theory, and data mining.