Image Denoising by Exploring External and Internal Correlations

0 downloads 0 Views 6MB Size Report
deblurring [15] and denoising [16]. .... deep learning method has shown its power in image denoising. ... The single image based (internal) and learning based.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

1967

Image Denoising by Exploring External and Internal Correlations Huanjing Yue, Xiaoyan Sun, Senior Member, IEEE, Jingyu Yang, Member, IEEE, and Feng Wu, Fellow, IEEE

Abstract— Single image denoising suffers from limited data collection within a noisy image. In this paper, we propose a novel image denoising scheme, which explores both internal and external correlations with the help of web images. For each noisy patch, we build internal and external data cubes by finding similar patches from the noisy and web images, respectively. We then propose reducing noise by a two-stage strategy using different filtering approaches. In the first stage, since the noisy patch may lead to inaccurate patch selection, we propose a graph based optimization method to improve patch matching accuracy in external denoising. The internal denoising is frequency truncation on internal cubes. By combining the internal and external denoising patches, we obtain a preliminary denoising result. In the second stage, we propose reducing noise by filtering of external and internal cubes, respectively, on transform domain. In this stage, the preliminary denoising result not only enhances the patch matching accuracy but also provides reliable estimates of filtering parameters. The final denoising image is obtained by fusing the external and internal filtering results. Experimental results show that our method constantly outperforms state-of-the-art denoising schemes in both subjective and objective quality measurements, e.g., it achieves >2 dB gain compared with BM3D at a wide range of noise levels. Index Terms— Image denoising, external correlations, internal correlations, web images.

I. I NTRODUCTION

I

MAGE denoising is a well-known, ill-posed problem in image processing and computer vision. Theoretically, it is hard to precisely recover an image from noise since it is a highly under-constrained problem. During the past few decades, many intelligent methods have been proposed to improve single-image based denoising performance. From pixel level filtering methods, such as Gaussian filtering, bilateral filtering and total variation regularization, to patch level filtering methods, such as non-local means [1], Manuscript received July 27, 2014; revised December 10, 2014 and February 28, 2015; accepted February 28, 2015. Date of publication March 12, 2015; date of current version March 27, 2015. This work was supported in part by the Distinguished Young Scholars Program under Grant 61425026, in part by the Normal Project under Grant 61372084, in part by the Major Project under Grant 61390514 through the National Natural Science Foundation of China, and in part by the Reserved Peiyang Scholar Program through Tianjin University, Tianjin, China. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Javier Mateos. (Corresponding author: Xiaoyan Sun and Jingyu Yang.) H. Yue and J. Yang are with Tianjin University, Tianjin 300072, China (e-mail: [email protected]; [email protected]). X. Sun is with Microsoft Research Asia, Beijing 100080, China (e-mail: [email protected]). F. Wu is with the University of Science and Technology of China, Hefei 230026, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2015.2412373

block-matching 3D filtering (BM3D) [2], and low-rank regularization [3], single-image based denoising performance has greatly improved, with image details well recovered when the image is slightly noisy. However, with the increase of noise levels, the denoising performance is dropping seriously. The reason is that although patch based denoising methods try different methods to improve denoising performance, they share the same strategy: grouping similar patches together and then recovering their common structures. When the noise level is high, the patch matching accuracy will suffer from significant loss and this will result in smooth denoising result. Besides single-image based denoising methods, other promising denoising methods are learning based, such as fields of experts [4], maximizing expected patch log likelihood (EPLL) [5], and neural network training [6]. They restore the noisy image by integrating natural image priors into the under-constrained restoration problem. However the denoising performance is only comparable with state-of-theart single-image based denoising methods, such as BM3D. It is noteworthy that they utilize the same database for all kinds of noisy images, namely there’s no prior for the noisy image scene being used. If the noisy image doesn’t satisfy the assumed general priors, it will result in annoying artifacts. This raises a natural question: can we adaptively change the images in the dataset through the prior of a noisy image scene? In fact, there are many scenarios in which we can obtain correlated images as an external dataset. For example, landmark images, human faces, medical CT images, text images, and images captured by a multi-view camera system or multi-spectral cameras. In this paper, we demonstrate the feasibility of the proposed scheme using landmark and multi-view images. Before this paper, using correlated images has springed up in many computer vision and image processing problems, such as image colorization [7], image completion [8], [9], image compression [10], sketch to photo [11], [12], image super-resolution [13], [14], deblurring [15] and denoising [16]. However, it should be noted that [16] only explores the external correlations, without exploring internal correlations. Based on the above observations, we propose combined image denoising by exploring both internal and external correlations, which is an extension of our previous work [17]. Specifically, we propose a graph optimization method to improve patch matching accuracy and introduce a more effective filtering method compared with that of [17]. Compared with [16], our scheme could well take advantage of correlated images captured by different settings (focal length, view point, resolution). Furthermore, our scheme could well

1057-7149 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1968

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

handle noisy patches that have no matched patches in the external dataset. There are two key technical contributions in this paper. First, for each noisy patch, we design different external and internal filtering strategies to remove its noise. In the external denoising, we propose a graph based optimization method to improve patch matching accuracy between a noisy patch and clean patches in external correlated images. In the internal denoising, we perform 3D frequency domain filtering. These two denoising results are then combined in frequency domain to produce a preliminary denoising image. Second, we propose a two-stage based denoising strategy to fully take advantage of external and internal correlations. The denoising result at the first stage is used to improve image registration, patch matching and estimation of filtering parameters in the second stage. The remainder of this paper is organized as follows: Section II gives a brief overview of related work. The proposed framework is illustrated in Section III. Section IV discusses the preprocessing of denoising, namely correlated image retrieval and global registration. Section V and VI present the methods for exploring external and internal correlations in the first and second stages, respectively. Section VII presents experimental results and analysis. Section VIII discusses limitations and future work. Section IX concludes this paper. II. R ELATED W ORK In this section, we give an overview of the Gaussian noise removing methods in existing literature. Generally, it can be classified into single image based (exploring the correlations inside the noisy image itself, namely exploring internal correlations) and learning based (using priors learned from natural images to help denoise, namely exploring external correlations) methods. Recently, combining internal and external correlations for denoising has shown its superiority over exploring correlations separately [17]–[19]. In the following, we will briefly introduce the three main types of denoising methods. A. Single Image Based Denoising Single image based denoising methods aim to recover image details by exploring the correlations inside the noisy image itself. Earlier approaches focus on pixel-level denoising methods, which only explore the correlations between a noisy pixel and its neighboring pixels, such as Gaussian filtering, bilateral filtering, anisotropy diffusion, or total-variation regularization. Hereafter, exploring the correlations between non-local patches has boosted denoising performance, such as non-local means [1], BM3D [2], low-rank regularization [3], learned simultaneous sparse coding (LSSC) [20], and high order singular value decomposition [21]. These methods group similar patches in the noisy image and then recover their common structures by spacial domain averaging [1], transform domain filtering [2], [21], jointly sparsity constrain in the learned dictionary domain [20], or by using singular value decomposition [3], [21]. These methods work well for images containing repetitive patterns. However, these methods cannot well handle high noise levels since the patch structures are heavily polluted by

the noise. Therefore, the multi-scale strategy is introduced in denoising when the noise level is high [22]. Since the noise is uncorrelated across neighboring pixels, down-sampling will reduce the noise level and make image structures more visible. In addition to simple downsampling, there is also a multi-scale sparse representation based denoising algorithm [23]. Furthermore, M. Zontak et al. propose using patch recurrence across scales to recover the signal [24]. The above mentioned denoising methods tend to smooth texture regions. Therefore, a texture enhanced method by enforcing the gradient distribution of the denoised image to be close to the estimated gradient distribution is proposed in [25]. However it tends to produce much lower PSNR values compared with state-of-the-art methods since the estimated gradient distribution may deviate from the ground truth. Unlike these methods, we propose using external correlated images to help improve the visual quality of texture regions, while improving objective qualities. B. Learning Based Denoising Learning based denoising methods which benefit from certain priors learned from noise free images (or the noisy image itself) have recently become quite numerous. KSVD-based training has extended from the noisy image itself to 10000 natural images to learn the bases. Statistical models learned from a large scale of natural image patches have shown its feasibility in image denoising, such as high-order MRF models [4] and the finite Gaussian mixture model [5] (trained over 2 million natural image patches). In addition, the deep learning method has shown its power in image denoising. In [6], the relationship between a noisy patch and its noise free version has been studied by a plain multi-layer perceptron deduced from a large training database (150000 images). All these works have demonstrated the effectiveness of using natural image priors to help remove noise. However, since they use general priors for all kinds of noisy images, without considering the content of the noisy image, they soon reach their performance limitation (comparable to BM3D) and tend to introduce artifacts if the noisy image doesn’t satisfy the assumed priors. Different from them, we propose adaptively changing the external dataset for each noisy image by using content based image retrieval technologies. C. Internal and External Combined Denoising The single image based (internal) and learning based (external) denoising methods have complementary strengths. Therefore, they can be combined to improve denoising performance. I. Mosseri et al. propose classifying the patches into internal and external denoising according to patch’s signal-noise-ratio (SNR) [18]. Hereafter, H. C. Burger et al. propose a learning based approach, using a neural network to automatically combine denoising results from an internal and an external denoising method [19]. Our previous work [17] presents a simple and effective frequency domain fusion method to combine the advantages of internal and external denoising results and achieves the best denoising result compared with stand-alone methods.

YUE et al.: IMAGE DENOISING BY EXPLORING EXTERNAL AND INTERNAL CORRELATIONS

1969

Fig. 1. The framework of the proposed denoising scheme. There are two stages in our denoising scheme, separated by the blue line in the center. The r }. The denoising results obtained in the first and second stage are denoted as I˜1st and I˜2nd noisy image is I and its correlated images are {I1r , I2r , . . . , I K respectively.

III. F RAMEWORK OVERVIEW

parameters calculated from I˜1st . After combining the external and internal denoising results in the second stage, we obtain the final denoising result I˜2nd . The four modules in the two stages will be detailed in Sections IV, V and VI.

Given a noise free image I o (the superscript o denotes original), its noisy version I is produced as:

IV. C ORRELATED I MAGE R ETRIEVAL AND R EGISTRATION

In this paper, we adopt the frequency-domain combination strategy, similar to that in [17].

I = I o + N,

(1)

where N is the additive zero-mean independent identically distributed (i.i.d.) Gaussian noise with variance σ 2 . We aim to recover I˜, the estimated noise free version of I , with the help of web images. For one noisy input I , we retrieve its correlated images {I1r , I2r , . . . , I Kr } (the superscript r denotes reference images) from web images using content based image retrieval technology. Given I and {I1r , I2r , . . . , I Kr }, we propose a two-stage based denoising scheme, which is illustrated in Fig. 1. Each stage includes four components: registration, external denoising, internal denoising and combining of the two results. We adopt image registration to improve the geometric correlations between the noisy image and external images, which will improve the following patch matching accuracy. The external denoising in the first stage is graph-cut based patch matching. The internal denoising is hard thresholding on transform coefficients of 3D cubes built by similar patches retrieved from the noisy image itself. The combining procedure is to combine the power of the two denoising results. The first stage produces a preliminary denoising result I˜1st and its noise is assumed to be greatly attenuated. Therefore, it can help improve the denoising performance in the second stage in three ways. First, it can improve image registration since we can get more accurate matching points. Second, it can improve patch matching accuracy both in external and internal denoising. Third, we can estimate the Wiener filtering parameters according to I˜1st . The second external denoising is Wiener filtering on a noisy patch using adaptive transform bases calculated from matched external patches. The internal denoising is Wiener filtering on noisy 3D cubes with

In this section, we will briefly introduce our retrieval scheme and alignment method. A. Correlated Image Retrieval Previous learning based denoising methods ignore content priors in a noisy image, which limits improvement in denoising performance. Therefore, we adopt content-based image retrieval technology, specifically the scale invariant feature transform (SIFT) based method proposed in [26], to retrieve correlated images from a large-scale database, as our external dataset. Since a large scale SIFT feature may cover multiple small scale SIFT features, therefore L. Dai et al. propose bundling one large scale SIFT with many small scale SIFT features, namely using a visual group as one retrieval unit [26]. The visual group is much more robust than the quantized single SIFT feature because the relative positions of SIFT features are considered in matching. After matching all the visual groups extracted from the noisy image with those extracted from candidate images, we obtain a set of correlated images {I1r , I2r , . . . , I Kr }. For more details, please refer to [26]. Note that, to reduce the impact of noise in feature extraction, we discard some key-points with low contrast. Fig. 2 presents the top three retrieved images for three noisy inputs. The results demonstrate that the retrieval method does a good job of finding correlated images of the same scene with different imaging configurations for both architecture and natural images. B. Correlated Image Registration As shown in Fig. 2, the correlated images, though similar, are usually taken in different viewpoints, focal lengths

1970

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

Fig. 3. The patch matching accuracy (c) of using a retrieved image (at the top of (b)) and a registered image (at the bottom of (b)) as reference for denoising the input noisy image (a).

Fig. 2. Exemplified retrieved correlated images and registration results. For each image group, the small image on the left is the noisy input with noise level σ =50. The other six images on the right are the three correlated images (at the top) and the corresponding registration results (at the bottom).

and illuminations. Searching for matched patches from these images directly will not only impose considerable computational burdens, but also decrease the matching accuracy since the best matched patch may be at a different rotation and scale of the candidate patch. Although the patch matching algorithm proposed in [27] could search for patches across scales and rotations, it will result in an incorrect result because of the optimization process and the missing of true signal information with noisy query, as shown in Fig. 6. To solve this problem, we propose an approximate alignment through geometric registration to improve the correlation between the noisy query and retrieved images. First, we estimate the correspondences of feature points between the noisy image I and each of the correlated images. We adopt the matching criterion proposed in [28] and obtain a set of matching points, denoted as X = {x, x r }, where x is the feature point in the noisy query, and x r is its matched point in the retrieved image. Second, an eight-parameter homographic matrix H is estimated by RANSAC [29]. Four pairs of matched SIFT points in X are randomly selected to calculate H . The remaining pairs of SIFT points are used to verify H . If the Euclidean distance between x and the point of x r after H transformation is smaller than a given threshold, it is called as an inlier to H ; otherwise, it is an outlier. This process is repeated. The optimal transform matrix H corresponds to the maximum number of inliers. To accelerate the inlier selection, we utilize the accelerated hypothesis generation method

proposed in [30].1 For more information, please refer to [30]. Hereafter we use the inliers, denoted as Xin to calculate a global homography transformation Hk for the external image Ikr . Note that, if the inlier percent ((Xin )/(X ), where (X ) denotes the number of point pairs in X ) is less than 30%, we consider Ikr as uncorrelated image and remove it from our correlated image set.2 Finally we obtain the aligned external image set { I˜1r , I˜2r , . . . , I˜Kr } by aligning the pixels in Ikr with Hk (k = 1, 2, . . . , K ). Fig. 2 shows some examples of registered version for the retrieved images. We can see that, after registration, the correlated images are aligned with the input noisy image in both scale and viewpoint, which greatly improves the relevance of pixel-wise patches between the noisy input and external correlated images. We evaluate the effect of image registration in patch matching in Fig. 3. Fig. 3 shows the relationship between hit rate and error rate, defined as e=

Po − Q∗ 22 Po 22

,

(2)

where P denotes the noisy version of a clear patch Po sampled from I and Q∗ is the nearest sample for P in the retrieved or registered images. We also remove the DC components of sampled patches to reduce the effect of illumination changes. The hit rate is defined as the percentage of the test patches whose error rates are less than a given error e. In this experiment, we randomly choose 5000 patches with size 8 × 8 from the noisy input (σ = 50), as shown in Fig. 3 (a). For each patch, we search its nearest neighbor from the retrieved image (top image in Fig. 3 (b)) and the registered image (bottom image in Fig. 3(b)), respectively. Fig. 3 (c) presents the relationship between the hit rate and the corresponding error e given the two reference images, respectively. It shows that the patch matching accuracy is greatly improved by the registration. In the second stage registration, we can obtain more matching points since the noise has been greatly attenuated in the first stage. It enables us to calculate a more accurate registration result. We would like to point out that the single model based registration is not suitable for images containing 1 Their code is available here: http://cs.adelaide.edu.au/~tjchin/doku.php? id=code. 2 If no correlated images are available after registration, our scheme will degrade into internal based denoising, namely BM3D.

YUE et al.: IMAGE DENOISING BY EXPLORING EXTERNAL AND INTERNAL CORRELATIONS

1971

Fig. 4. A comparison of patch matching results using the baseline method and the proposed method. The left side: the visualization of translation vectors. Each color represents a translation vector where the orientation and magnitude are denoted by the hue and saturation of the color. The right side: from the top to the bottom row: the reconstructed images, the visualized translation vector in x and y directions, the translation vector in z direction. Column (a), (b): the reconstructed images obtained by the baseline method with noisy image and noise free image as input. The PSNR values are 27.72 dB and 30.89 dB respectively. Column (c), (d): the reconstructed images obtained by the proposed method with noise free image and noisy image as input. The PSNR values are 30.41 dB and 29.32 dB, respectively. To show the details clearly, only part of the reconstructed images are presented. The PSNR values are calculated over the whole image.

multiple planes. For these images, we adopt the method proposed in [31] to detect multi-homography regions, and align each region with its corresponding homography matrix. V. C OMBINED I MAGE D ENOISING : T HE F IRST S TAGE The noisy image I is split into overlapped patches {P} of size m × m at the step size of μ. For each noisy patch P, we aim to recover its details with the assistance of similar external and internal patches. Since P is heavily polluted by noise, searching for its similar patches in the external image set and the noisy image itself are both tough problems. Therefore, we propose a graph-cut based patch matching strategy to improve external patch matching accuracy, and using the first stage result to improve the internal patch matching in the second stage. A. Denoising by Exploring External Correlations For each noisy patch P in I , we try to search its most similar patch Q from correlated image set { I˜1r , I˜2r , . . . , I˜Kr } as the denoised version of P. We first describe a baseline method and then propose the graph-cut based patch matching method. 1) Baseline Method: An intuitive method is to use 2 distance as the matching criterion. Let ti = (x i , yi , z i ) be the translation vector (in col, row, and image number directions) of the i t h patch from the noisy image to the reference images. We want to find the optimal translation for each patch by minimizing the following function:  D(Pi , Q(ti )), (3) E(t) = i

where D(Pi , Q(ti )) is 2 distance of patch Pi and the candidate patch Q with translation ti . Note that the mean values of all the candidate patches and the noisy query have been removed to reduce illumination effect. Since the reference images are approximately geometrically aligned with I , we restrict the candidate translation x i ∈ [−16, 16] and yi ∈ [−16, 16]. This constraint is inspired by the phenomenon revealed in [32] that patches tend to recur frequently inside the same image, especially in local search windows. The aligned reference images can be treated as variants of the noisy input. Therefore, this constraint can not only reduce the computing complexity but also decrease the disturbance of lots of unrelated patches. Finally, the matched patches Q(ti ), after adding DC components of corresponding noisy queries, are blended together to produce an initial denoising result. However, this is problematic since the query patch is noisy. We test one noisy image when σ is 50. To observe the translation for each patch clearly, we set both patch size m and step size μ to 16. As shown in Fig. 4 (a), the reconstructed image (top) is noisy and its corresponding ti (the second and third row), visualized using the color-coding [33] shown in the left side of Fig. 4, is disarray. 2) Proposed Method: What will the patch translation be like if we use the noise free image as the query? Fig. 4 (b) presents the reconstructed image and its corresponding patch translation vectors. We can see that the ideal patch translation vectors are more regular than what is shown in Fig. 4 (a). This inspires us to integrate one constrain on the translation vectors. Similar strategies are also adopted in optical flow [34]

1972

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

Fig. 6. Comparison between denoising results using exhaustive search (a, 27.58 dB), fast patch match (b, 26.20 dB) and proposed patch matching method (c, 28.33 dB). To show the details clearly, only one cropped region of the denoising results is presented. Fig. 5. PSNR values vs ξ2 in the proposed graph-cut based patch matching algorithm.

and SIFT flow [35] estimation. Namely the translation vectors are obtained by minimizing the following energy function:   E(t) = D(Pi , Q(ti )) + β S(ti , t j ), (4) i

(i, j )∈N

where D(Pi , Q(ti )) is the same as that defined in Eq. 3, S(ti , t j ) is the smoothness term, β is a weighting parameter to adjust the effect of the smooth constrain and (i, j ) ∈ N denotes that the i t h and j t h patch are neighboring patches. In this paper, N is a four connected grid. The smooth term S penalizes the differences of neighboring translation vectors, defined as  min(ti − t j 1 , ξ1 ), if z i − z j = 0, (5) S(ti , t j ) = ξ2 , otherwise. We use a truncated L1 norm, which is popular in motion estimation [36], [37] because of its robustness, with threshold ξ1 for the difference of neighboring translations to avoid over-penalizing the difference. Since we have multiple images as references, if the candidate patches are chosen from different images, we penalize the image difference instead of the x, y difference with one constant factor ξ2 . In our experiments, ξ1 and ξ2 are set to 10 and 5, respectively. Eq. 4 can be efficiently solved by the classical graph-cut algorithm, specifically the alpha-expansion algorithm [36]. We apply the proposed method to the noise free image, the reconstructed image and its corresponding translation vectors are shown in Fig. 4 (c). The translation vectors are regular, and there’s only 0.4 dB loss compared with (b). It is impossible to obtain the translation vector as shown in Fig. 4 (b) using the noisy input, but approaching (c) is promising. As shown in Fig. 4 (d), the translation vectors are regular and similar to that shown in 4 (c). The PSNR values also demonstrate that the proposed method significantly outperforms the baseline method using the noisy image as input (1.6 dB gain). The parameter ξ1 and ξ2 are set empirically, and they are tuned by fix one and update the other iteratively. For example, we set ξ1 to 10 and β to 15 when σ is 50. Fig. 5 presents the average denoising results in terms of PSNR values with ξ2 varying from 1 to 20 for image “a”, “f” and “i” shown in Fig. 10. The best results are achieved when ξ2 is 5. Therefore, we set ξ2 to 5. In this step, we obtain an initial denoising patch Pˆ e for each noisy patch P using the optimized translation vector t.

The superscript e denotes it is obtained by exploring external correlations. We evaluate the proposed patch matching method with both registration and graph-cut by comparing it with the exhaustive patch matching method and the fast patch match method [27]3 in terms of denoising result. For exhaustive search, given one noisy patch, we retrieve its nearest neighbor by performing the exhaustive patch matching from the retrieved correlated image. The retrieved patches are averaged together to generate the denoising result. In this experiment, we only use one reference image. Fig. 6 shows one cropped region of the denoising results produced by the three methods. We can see that our result is much cleaner than that of exhaustive search and fast patch match. B. Denoising by Exploring Internal Correlations The internal correlations are explored in a way that is similar to BM3D [2]. For each noisy patch P, we search its k nearest neighbor (k-NN) patches from I with 2 distance as matching criteria, building a 3D cube P3D . Then we apply a 3D transform and filter the noise by hard-thresholding in the transform domain. It is described as follows:   Pˆ 3D = T3−1 (6) D H(T3D (P3D ), λ3D σ ) , where T3D represents the 3D transform consisting of a 2D wavelet transform and a 1D Hadamard transform along the third dimension, and H(·) represents the hard thresholding operator of the transform coefficients with a given threshold λ3D σ . After the inverse 3D transform T3−1 D , we obtain the denoised 3D intrinsic cube Pˆ 3D . We use Pˆ i to represent the internal denoising result of the noisy patch P. The superscript i denotes it is obtained by exploring internal correlations. C. Combining Internal and External Denoising Results The internal denoising method is good at recovering lowfrequency information, since similar patches are from the same noisy image and have similar structures and illumination. However its high-frequency information is filtered out by the hard-thresholding of transform coefficients. Meanwhile, the proposed external denoising method is good at recovering image details but may be largely distorted at the low frequency since the retrieved images have different illuminations from the noisy image. The intermediate results shown in Fig. 12 are 3 Their code is available here: http://gfx.cs.princeton.edu/pubs/Barnes_2010_ TGP/index.php.

YUE et al.: IMAGE DENOISING BY EXPLORING EXTERNAL AND INTERNAL CORRELATIONS

1973

a good illustration of the complementing of the two denoising methods. Therefore, we propose combining the external and internal denoising results, Pˆ e and Pˆ i as follows:   ˆ i )  ωi + T2D (Pˆ e )  ωe T ( P 2 D −1 1st , (7) P˜ = T2D ω i + ωe

where h is the rank of R, uk is the eigenvector of R, and λk is the corresponding eigenvalue. U is the stacked eigenvectors [u1 , u2 , . . . , uh ] and  is a diagonal matrix constructed by the eigenvalues  = diag[λ1 , λ2 , . . . , λh ]. Therefore Eq. 10 can be rewritten in the eigenvalue decomposition format, as follows:

where T2D and T2−1 D are forward and inverse 2D DCT transforms, respectively. The parameter  is the point-wise multiplication. Matrix ωi and ωe are the weighting matrices to combine the frequency coefficients of Pˆ i and Pˆ e . Since the external denoising result keeps more image details and the internal denoising method filters out the high frequency coefficients, we only combine the first F frequency coefficients of Pˆ e and Pˆ i . The remaining high-frequency coefficients are directly from Pˆ e . The number F is determined by noise deviation σ as follows m2 F =ζ√ , (8) σ +1

G = UU (UU + σ 2 I)−1 = UU (U( + σ 2 I)U )−1

where m 2 is the number of pixels in one patch, ζ is a regularization parameter. We have observed that large coefficients in low frequency domains are more important for recovering the structure of a patch. Therefore, the ωi is 2 weighting  matrix  2 i e proportional to the value of T2D (Pˆ ) / T2D (Pˆ ) . Similarly, ωe is proportional to the value of 1/ωi . After obtaining the denoising patch P˜ 1st for every noisy patch P in I , they are averaged together, resulting in the first stage denoising result I˜1st . VI. C OMBINED I MAGE D ENOISING : T HE S ECOND S TAGE The first stage denoising result I˜1st has greatly reduced the noise in I . Therefore it could help to improve the denoising performance in the second stage. A similar strategy is proposed in BM3D [2] and have achieved significant gain compared with the first stage denoising result. In this section, we will first introduce Wiener filtering in image denoising, and then apply it to our internal and external denoising scheme. A. Denoising by Wiener Filtering Given one noisy patch p (in vector format), it is obtained through the observation model: p = po + n,

(9)

po

where is a noise free patch (in vector format) and n is white Gaussian noise with variance σ 2 . They can be regarded as variables generated from stationary random processes, namely po and n are random vectors. We would like to recover the desired po from the observation p via a linear filter G, which minimizes E{po − Gp22 }. E(·) represents the expectation operator. The solution of this problem is a Wiener filter [38]: G = R(R + σ 2 I)−1 ,

(10)

where the correlation matrix R = E{po po } and I is the identify matrix. Since R is symmetric, it can be represented by eigenvalue decomposition as follows: R = UU =

h  k=1

λk uk uk,

(11)

= U( + σ 2 I)−1 U h  λk = uk uk. λk + σ 2

(12)

k=1

˜ = diag[ We define  λ

λ1 2 1 +σ



λ2 2 2 +σ

optimal estimation p˜ is obtained by

,..., λ

˜  p. p˜ = UU

λh 2 h +σ

], then the (13)

Namely, the Wiener filter first KL transforms the signal p using U to concentrate its energy, and then processes the ˜ coefficients using the MMSE-optimized weighting matrix , hereafter transforming the filtered coefficients into the original domain using U [38]. In practice, we have no knowledge of the correlation matrix R. A common solution is to use some kinds of orthogonal basis which can concentrate the signal energy in a relatively small subspace, e.g. DCT basis and wavelet basis, to replace the eigenvector basis U. The optimal diagonal ˜ can be replaced by hard or soft shrinkage matrix  thresholding. Eq. 6 is an example of the alternate filering method, with T3D ≈ U and the operator H generates ˜ Another solution is the a shrinkage matrix similar to . PCA based filtering methods [39]–[41], which train the orthogonal basis from the co-variance matrix built by nonlocal similar noisy patches. B. Denoising by Exploring External Correlations The noise in I˜1st is assumed to be greatly attenuated. Therefore, for each patch P˜ 1st in I˜1st , we can directly retrieve its k2 -NN patches {Q1 , Q2 , . . . , Qk2 } from the external image set { I˜1r , I˜2r , . . . , I˜Kr } using the baseline method described in Sec. V-A. These patches {q1 , q2 , . . . , qk2 } (in vector format) can be regarded as sample realizations of po . Note that DC components of {q1 , q2 , . . . , qk2 } and p are removed before patch matching to reduce illumination effects. Therefore, the matrix R is estimated as: R=

k2 1  qi qi k2 i=1

1  = q2D q2D , k2

(14)

where q2D = [q1 , q2 , . . . , qk2 ] is the matrix form of samples. Since patches {q1 , q2 , . . . , qk2 } have different distances to the ground truth patch po , we award the most similar patch with high weights. The more similar, the higher the weight it contributes to calculating matrix R. We use the distance between qi and the noisy patch p to measure the similarity.

1974

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

Fig. 7. Comparison of different filtering methods. The first row presents one noisy patch (a), its corresponding denoising result in the first stage (b) and the ground truth patch (c). The second row shows the patch matching results (d) in the second stage with (b) as query and the corresponding spatial filtering result (e, RMSE=28.43). The third row shows the eigenvectors (f) obtained from (d) and the corresponding Wiener filtering result (g, RMSE=21.62).

The similarity coefficient wi is defined using a common Gaussian Kernel Function as follows:   qi − p22 wi = exp − . (15) σ2 The k2 weight parameters are then normalized to 1, namely i=1 wi = 1. Therefore, the correlation matrix of the weighted patches is given by: RW =

k2 

wi qi qi

i=1  = q2D Wq2D ,

(16)

where matrix W = diag[w1 , w2 , . . . , wk2 ] is one diagonal matrix. Then the optimal orthogonal basis U and eigenvalue matrix  is obtained by eigenvalue decomposition of RW using Eq. 11. Finally, we obtain the denoised patch pˆ e2nd through Eq. 13, followed by adding the original DC component of p to pˆ e2nd . To demonstrate the effectiveness of the proposed filtering method, we compare the proposed Wiener filtering with the spatial domain filtering adopted in [17]. As shown in Fig. 7, given the noisy patch (a) and its first stage denoising result (b), using (b) as the query, we obtain its similar patches (d) by retrieving them in the external image set. A straightforward filtering method is median filtering along the third dimension on the 3D cube built by the patches shown in (a) and (d). The median filtering result is shown in (e). The root mean square error (RMSE) value between (e) and the ground truth patch (c) is 28.43. Another method is the proposed Wiener filtering in the transform domain. The transform bases (f) are obtained by eigenvalue decomposition on the patches shown in (d). Using Eq. 13, we obtain the filtering result (g). The RMSE value between (g) and the ground truth patch (c) is 21.62, which is much less than that of the median filtering result. Additionally, we compare the proposed filtering method with average filtering (the average of matched patches), nearest neighbor (NN) filtering, and median filtering (adopted in [17]) method statistically. We randomly sample 5000 patches from the test image “f” shown in Fig. 10. The MSE values between recovered patches and ground truth noise free patches using our proposed method, average, NN, and median filtering are presented by red, green, black, and blue curves

Fig. 8. Comparison of the proposed filtering method with average filtering, nearest neighbor (NN) filtering, and median filtering method in terms of patches’ MSE values.

in Fig. 8, respectively. We can see that the proposed filtering method constantly produces smaller MSE values compared with the other three filtering methods at all noise levels. C. Denoising by Exploring Internal Correlations The first stage denoising result I˜1st can also be used to improve the internal denoising result. First, it can improve patch matching accuracy, resulting in more consistent 3D cubes. Second, it can guide the filtering of the noisy patches in calculating orthogonal bases and filtering parameters. In this stage we build two 3D internal cubes. One is built i2nd . The by searching for similar patches in I˜1st , denoted as P˜ 3D other is built from the noisy image I using the same location i2nd patches as matched in I˜1st , denoted as P3D . According to our previous illustration of Wiener filtering, the optimal solution should first calculate the suitable orthogonal bases i2nd for the patches in P3D , and then filter the coefficients in the transform domain using Wiener filtering. The orthogonal bases can be obtained through eigenvalue decomposition of the i2nd matrix R built by the patches in P˜ 3D , as proposed in [42]. However the effectiveness of the calculated orthogonal bases i2nd highly depends on the patch similarity in P˜ 3D , but it is difficult to retrieve many similar patches in a local region of one image. Therefore, [42] adopts a fixed shape adaptive DCT transform for some cubes. In addition, considering the computing complexity of eigenvalue decomposition, we simply adopt fixed DCT bases, which is well enough for recovering low-frequency content. The Wiener shrinkage coefficient Wwie is determined by: Wwie =

wie ˜ 2nd 2 (P3D )| |T3D , i |T wie (P˜ 2nd )|2 + σ 2 i

3D

(17)

3D

wie represents a 2D DCT transform and a 1D where T3D −1 Hadamard transform along the third dimension. T3wie is the D inverse 3D transform. Hereafter, the internal denoising result is obtained via Wiener filtering as follows:  −1  i2nd wie i2nd = T3wie (P3D )) . (18) Pˆ 3D Wwie  (T3D D

The Wiener filtering process is the same with that of BM3D [2].

YUE et al.: IMAGE DENOISING BY EXPLORING EXTERNAL AND INTERNAL CORRELATIONS

Fig. 9. Exemplified combining coefficients and combining results. The internal denoising patch (a) and external denoising patch (b) are combined together in DCT domain using the corresponding weighting matrix (d) and (e). The combining result (c, MSE=277.7) is much closer to the original patch (f) than internal (a, MSE=322.9) and external (b, MSE=305.7) denoising patches.

Fig. 10.

Test images used in our experiments: image “a”, “b”, ..., and “l”.

Similar to the first stage, the denoising patch P˜ 2nd is obtained by combining internal and external denoising patches. Fig. 9 presents one example of the combining coefficients and the corresponding combining results. The MSE values between the original patch and the internal result, external result and combining result indicate the proposed combining method well combines the advantages of the two results. These patches are then aggregated together to generate the final denoising image I˜2nd . VII. E XPERIMENTAL R ESULTS AND A NALYSIS In our experiments, we build an image database containing 506,553 images to simulate web images. In the database, there are 6553 images from the public dataset Oxford building [43] and INRIA Holiday [44]. The others are landmark images (≥1024 × 768) crawled from Flickr.4 In the experiments, we test the denoising performance with σ ranging from 20 to 80 at step 10. The test images are shown in Fig. 10, including natural images with rich structures and textures as well. They are excluded from the database. The correlated images for all the test images are shared in [45]. It can be observed that the correlated images are similar to the input noisy image but with different viewpoints, scales and illuminations. A. Parameter Setting and Evaluation In our experiments, the patch size m ×m and step size μ are set to 16 × 16 and 6 respectively. The smooth cost weighting 4 http://www.flickr.com/.

1975

Fig. 11.

Denoising results vs ζ in Eq. 8.

parameter β in patch matching are set according to noise level, namely β = max(0.5σ − 10, 0). The correlated image number K is five. The number of similar patches k in the internal denoising is 16 and k2 in the external denoising is 8. The hard thresholding parameter λ3D is set to 2.7. The combining coefficient ζ in Eq. 8 in the first and second stage is set to 2 and 5 respectively. The parameters are tuned on image “a”, “f” and “i”, and applied to all the test images. The regularization parameter ζ in Eq. 8 is set experimentally. Fig. 11 illustrates the relationship between denoising result and the combination parameter ζ in the second stage. We test three noise levels (σ =20, 40, 60) using image “a”, “f”, and “i”. As shown in this figure, PSNR values improve consistently as ζ increases. When ζ is larger than 5, all the PSNR values rise little. Therefore, we combine the low-frequency coefficients with ζ = 5 in the second stage. Similarly, we determine ζ = 2 in the first stage. On the other hand, we prefer a higher regularization parameter ζ when the internal denoising can generate good candidates; otherwise, combining more coefficients will introduce artifacts, especially when the noise level is high. The patch size and number of similar patches are also set experimentally. We test three noise levels (σ =20, 40, 60) using image “a”, “f” and “i”. We observe that the denoising results improve with the increase of patch size as a small patch size tends to overfit the noise [19]. When the patch size is larger than 16, the improvement is tiny. Considering the computing complexity, we set the patch size to 16 × 16. Similarly, we observe that the final denoising results improve slowly as the number of similar patches (k2 ) increases, especially when k2 is larger than 8. Therefore, we set k2 to 8 in our experiments. For internal denoising, we adopt the same parameters as those used in BM3D. B. Analysis of the Proposed Scheme 1) Intermediate Results: To illustrate the step-by-step improvement of the denoising performance, we present the intermediate results of the proposed scheme in both subjective and objective measurements in this subsection. Fig. 12 presents the intermediate results of two regions, R1 (top) and R2 (bottom) in image “f” shown in Fig. 10.

1976

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

Fig. 13. Comparison of PSNR results on 12 test images at each step, with σ varying from 20 to 80. Fig. 12. The intermediate results when σ is 50. To clearly show the details, we only show two crops of corresponding results. (a), (b), and (c) show the internal denoising result (27.82 dB), external denoising result (30.45 dB), and combined result (30.64 dB) in the first stage, respectively; (d), (e), and (f) show the the internal denoising result (30.28 dB), external denoising result (31.12 dB), and combined result (31.34 dB) in the second stage, respectively. The PSNR values are computed over the whole image.

The top (bottom) three groups are the internal, external and combining denoising results in the first (second) stage. We can see that in the first stage, the internal denoising result (a) still contains much noise, e.g. the sky region in region R1, and the image details are smooth, e.g. the characters in region R2. The external denoising result (b) contains some blocking artifacts in region R1 due to inaccurate patch matching. The combined result (c) well amalgamates low-frequency and highfrequency details, resulting in a sharper denoising result with less blocking artifacts and noise. In contrast, the internal denoising result in the second stage recovers more detail and is much cleaner compared with (a) since the first stage denoising result helps to improve patch matching accuracy and guide the Wiener filtering process. The external denoising result (e) recovers more details and the blocking artifact is fading compared with (b). The combined denoising result (f) takes advantage of both the external and internal denoising results and achieves the best denoising result. Fig. 13 presents the average PSNR values with the noise level varying from 20 to 80 for the test images shown in Fig. 10. Clearly, the combination results outperform both internal and external results at all noise levels. In addition, results of the second stage are consistently higher than those of the first stage. For example, the second stage combination result outperforms the first stage combination result by 0.82 dB when σ is 20. On the other hand, the PSNR increase becomes less at high noisy levels. The first stage denoising result may be acceptable at higher noise level, e.g. σ = 80, when the time resources are limited. Another interesting phenomenon is that when the noise level is low (e.g. σ = 20), internal denoising results are better than external denoising results in terms of PSNR values. The reasons are two-fold. First, the external images are taken in different viewpoints and scales.

Fig. 14.

Denoising results vs number of correlated images.

After registration, the edge distortion in external patches can be larger than the noise distortion in internal patches. Second, for regions without highly correlated patches in the registered images, their internal denoising results are much better than the external denoising results. In a word, both the visual qualities and PSNR values demonstrate the step-by-step improvement of the proposed method. 2) Evaluation of Correlated Images: Correlated images play an important role in our scheme. In this subsection we show how the denoising performance changes with different numbers of correlated images. Fig. 14 shows the denoising performance in PSNR versus the number of correlated images for test image “b”, “f” and “k” when σ is 50. For each test image, we retrieve K (K =5) correlated images and generate denoising results using k correlated images where k = 1, 2, . . . , 5, respectively. We can observe that 1) the denoising performance enhances with the increase of the number of correlated images, and 2) the enhancement becomes smaller with the increase of the number of correlated images. Considering general cases, in our experiments, we usually set the correlated image number to 5. 3) Evaluation of Graph-Cut-Based Patch Matching: As shown in Fig. 4, the graph-cut based patch matching greatly

YUE et al.: IMAGE DENOISING BY EXPLORING EXTERNAL AND INTERNAL CORRELATIONS

1977

details along with more results, but also several technique improvements over [17]. Specifically, we propose a graphbased patch matching method, which greatly improves patch matching accuracy as demonstrated in Fig. 4. We also utilize Wiener filtering in external denoising part while [17] adopts simple median filtering. Fig. 8 shows that the proposed Wiener filtering significantly outperforms median filtering. Moreover, the proposed method well handles a large range of noise levels whereas [17] is suitable for high-level noise. Compared with our previous work [17], the proposed method in this paper achieves better denoising performance at all noise levels, as shown in Fig. 16. Fig. 15. Comparison between denoising results with and without graph-cut based patch matching.

Fig. 16. Comparison with [16]∗ and [17] on 12 test images, with σ varying from 20 to 80.

improves the external denoising result. In this subsection, we further demonstrate its contribution to the final denoising result. Fig. 15 shows the denoising results with and without graph-cut based patch matching for image “f”, respectively. It shows that our graph-cut based scheme constantly outperforms the one without graph-cut, especially when the noise level is high. 4) Evaluation of Filtering Methods: Ref. [16] also proposes using similar external images as reference. However, the internal correlations are ignored in [16]. Also, there is no dedicated patch matching algorithm proposed in [16] whereas the patch matching plays an important role in our scheme. As demonstrated in Fig. 6, our patch matching with alignment and graph-cut algorithms significantly outperforms the exhaustive patch matching method. Moreover, [16] derives the denoise filter based on group sparsity while our scheme is presented based on the theory of Wiener filtering. In this subsection, we evaluate the proposed hybrid filtering method in comparison with the filtering method in [16]. Fig. 16 presents the average denoising results in terms of PSNR values on 12 test images. We denote the comparison method as [16]∗ since the similar patches are generated using our scheme. We can see that our method constantly outperforms [16]∗ although [16]∗ uses the same correlated images as our method. It further demonstrates the effectiveness of the proposed hybrid filtering method. 5) Comparison With [17]: The proposed scheme is an extension of our previous work [17] but with not only more

C. Comparison With State-of-the-Art We evaluate the performance of proposed scheme by comparing it with five state-of-the-art methods, including single image based BM3D [2], low-rank regularization [3], learning based EPLL [5], and combination based method [18], [19]. We test two denoising cases for EPLL: one uses the general Gaussian Mixture Models trained from general natural images, denoted as EPLLg . The other uses specific Gaussian Mixture Models trained from correlated images, denoted as EPLLs . For a fair comparison, the same correlated images are used for EPLLs and our method. For [18], we generate their results by combining the internal denoising result BM3D and external denoising result EPLLg . The comparison results are generated by the authors’s code and the recommended parameters are used in our experiments. Specifically, the patch size in BM3D, EPLL and [18] is 8 ×8. The patch size in Low-rank [3] ranges from 7 × 7 to 9 × 9 according to the noise level. The patch size in [19] is 25 × 25. Table I presents the objective qualities of the seven denoising schemes in terms of PSNR and SSIM [46] values when σ is 50. The best results are highlighted in bold. We can see that our method clearly outperforms all other methods for all test images. Compared with the single image based and learning based methods, our scheme achieves more than 2 dB gain in PSNR and 0.1 in SSIM values. Although EPLLs uses the same correlated images as its training set, it achieves slight gain compared with EPLLg . Both [18] and [19] combine external and internal denoising results. They generate similar results, which are much better than stand-alone denoising methods. Compared with these two schemes, our scheme still outperforms more than 1.8 dB and 1.7 dB, respectively. It further demonstrates the effectiveness of the proposed two stage based combined filtering and advanced patch matching strategy. In Fig. 17, we compare the average denoising results (in PSNR (a) and SSIM (b) measurements) on 12 test images with noise level σ varying from 20 to 80. It can be seen clearly that our method constantly outperforms the other six schemes for all the noise levels in both PSNR and SSIM values. Specifically, compared with BM3D, our method achieves a gain of more than 2 dB at a wide range of noise levels. We also compare the above denoising methods in terms of visual quality. Fig. 18 presents the denoising results for image “a”, “e”, “f” and “l” when the noise level σ is 50. To show the

1978

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

TABLE I C OMPARISON OF PSNR AND SSIM R ESULTS W ITH S IX S CHEMES W HEN σ I S 50. T HE B EST R ESULTS A RE H IGHLIGHTED IN B OLD

Fig. 17. Comparison of denoising results on 12 test images with noise level σ varying from 20 to 80. (a) PSNR values vs noise level σ ; (b) SSIM values vs noise level σ . We can see that our method constantly outperforms the other six schemes.

details clearly, we only show two cropped regions indicated by red boxes for each result. We can see that our method generates the most natural and realistic details for all test images. Specifically, for image “f”, our method faithfully reconstructs the dense edges on the wall and the decorated window while all the other methods fail. The single image based BM3D and low-rank regularization tend to produce smooth results, e.g. the patches in image “f” and “l”, because the filtering on similar noisy patches not only alleviates the noise, but also smoothens the patch details, only recovering the common structures shared by the noisy patches. The learning based EPLLg tend to produce annoying artifacts, such as the right crop in image “e”, since the general priors may not be suitable. Even if we re-train the GMM using correlated images for each noisy input, EPLLs still cannot recover the texture details, since the training process will lose much useful information contained in the training set. The neural combination method [19] presents an elegant method to combine the internal denoising result (BM3D) and external denoising result [6].

However, it still cannot recover the lost details when the noise level is high, e.g. the patches in image “a” and “l”, because it is the common drawbacks of BM3D and [6]. The results of [18] are generated by combining the denoising result of BM3D and EPLLg . It suffers from the same limitation as [19]. In contrast, with the proposed advanced patch matching and filtering strategy, our method intelligently combines the advantages of external denoising and internal denoising results, recovering the most realistic and faithful details. In brief, our method produces the best denoising results in both subjective and objective quality measurements. D. Multi-View Image Denoising Our scheme also works well in multi-view image denoising. We test two multi-view images, “Barn” and “Cone” downloaded from Middlebury Computer Vision Page,5 as shown 5 http://vision.middlebury.edu/stereo/.

YUE et al.: IMAGE DENOISING BY EXPLORING EXTERNAL AND INTERNAL CORRELATIONS

1979

Fig. 18. Visual quality comparison of the denoising results for image “a”, “e”, “f” and “l” when the noise level σ is 50. For each column, the first row is the noisy image. Two crops of the regions indicated by red boxes are shown from the second row to the last row, obtained from the noisy image, BM3D result [2], Low-rank based result [3], EPLLg result [5], EPLLs result, [19], [18], our method and the original image respectively.

in the left side of Fig. 19. Each of the multi-view images contains 8 views. Assuming one or more multi-view images captured using a multi-view camera system are degraded

during capturing or wireless transmission, our scheme is able to recover a noisy view with the help of the noise-free views. We simulate this experiment by adding Gaussian noise

1980

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

Fig. 20. Denoising results with noisy correlated images. (a) input noisy image (σ = 50), (b) our result (31.34 dB) with noise free correlated images, (c) our result (30.45 dB) with noisy correlated images (σr = 10), (d) V-BM3D result (27.63 dB, for the first frame) with noisy correlated images (σr = 10) and (e) our result without correlated images (28.91 dB). To clearly show the details, only one crop of the highlighted region is shown.

Fig. 19. Left: multiview images: “Barn” and “Cone”. Right: comparison of denoising results for multi-view images with noise level σ varying from 20 to 80.

(similar to [16]) to one image and use the other seven images for denoising. Fig. 19 compares our denoising results with five denoising schemes in terms of PSNR for images “Barn” and “Cone”. To ensure a fair comparison, our method and [16] use the same correlated images. Even so, our method still outperforms [16] by more than 2 dB. In addition, for all the test noise levels, our method constantly outperforms BM3D by more than 7 dB. It further demonstrates that the proposed scheme could obtain terrific results when highly correlated images are available. E. Complexity Analysis In this section, we discuss the computing complexity of the proposed scheme. Our scheme is realized in Matlab, with some time-consuming parts, including SIFT extraction, image retrieval, graph-cut based patch matching, and the internal denoising, realized in C code. We use a Dell R910 server with four Intel CPUs (2.4 GHz) in the experiments. In our current realization, it takes about five seconds for retrieval and alignment of one correlated image on average. In the first stage of external denoising, the graph-cut based patch matching takes about 300 seconds for the patches in one noisy image to find their best matched patches from five correlated images. In the second stage of external denoising, the calculation of eigenvectors is time-consuming. It takes about 370 seconds in this process using our un-optimized Matlab code with 10 cores. The internal denoising part in the first and second stage are realized in optimized C code, which takes about 15 seconds. In brief, it takes about 750 seconds to denoise one image with a resolution of 1024 × 768. We also provide computing times of all the test methods in Table I. The computing times are generated using reference codes with default settings in our server. Specifically, the low-rank based denoising [3], EPLLg , and EPLLs are implemented in Matlab and the others involve C codes. Among all the test solutions, BM3D is the fastest one which needs only 11 seconds via an optimal C code implementation whereas the learning-based EPLL is the slowest. The computing time of our scheme is comparable with [18] but higher than [19] when external correlation is used. Please note that all the computing times are for references only. We do not implement the reference algorithms and also the shared

implementation may not be optimal in terms of computational complexity. On the other hand, our scheme can be used in Clouds when a huge number of images are available. In this case, we can greatly accelerate our scheme using parallel computing and optimal implementation. VIII. L IMITATIONS AND F UTURE W ORK In this section, we discuss the limitations and our future work. A. Limitations We would like to point out that the correlated images play an important role in our scheme. For general images, such as ‘Lena’, ‘Barbara’, we cannot get correlated images from open image databases. If there is no correlated images available, our scheme will turn into a two-stage based internal denoising scheme as same as BM3D. The other issue is that we suppose the current retrieved correlated images are noise-free. When dealing with noisy reference images (such as in video denoising), our scheme may not achieve the same improvement as using noise-free reference images. Fig. 20 compares the denoising results for image “f” when its retrieved correlated images are noise free, polluted by Gaussian noise with σr = 10, and the result without correlated images, namely BM3D result. We can see that when the retrieved images are noisy, our denoising performance degrades, since the proposed external denoising method is designed for clean reference patches. However, it is still much better than the result of single-image based BM3D. We also present the denoising result of V-BM3D [47] when the reference images are polluted by Gaussian noise with σr = 10, as shown in Fig. 20 (d). Since V-BM3D is designed for video denoising, it cannot perform well when the reference images are taken in different scales and rotations. We can see that its result is still very noisy.6 B. Future Work The proposed high level and low level combined patch matching method can be extended to single image based denoising in addition to the traditional patch matching. One promising direction is using visual-group based feature matching to detect similar regions inside one image and then utilizing geometrical transformations to improve their correlation before patch matching. 6 Note that the result of BM3D is much better than V-BM3D, since the parameters of V-BM3D is designed for video sequences.

YUE et al.: IMAGE DENOISING BY EXPLORING EXTERNAL AND INTERNAL CORRELATIONS

In this paper, we evaluate our scheme by landmark and multi-view images. In the future, we would like to extend our work to more types of images, such as multi-spectral images and facial images. Our current retrieval scheme is local-feature based. In the future, other advanced image/object retrieval technologies may be introduced to find highly-correlated images/regions. We can also introduce assist information for retrieval, such as GPS information, social tags, and labels, to make the retrieval system more reliable in generic cases. Besides, in our current test, no noisy images are retrieved. This may not be applicable to real world applications. We will work on this issue to make our scheme more feasible. IX. C ONCLUSION We have proposed a novel image denoising scheme by exploring both internal and external correlations. Given one noisy image, we first retrieve its correlated image set from web images as assisted information, instead of using general natural image priors. Then, in the first stage external denoising part, a graph-cut based patch matching strategy is utilized to improve patch matching accuracy. The internal denoising part is performed on similar noisy patches by filtering in the transform domain. After combining the internal and external denoising results in frequency domain, we obtain a basic denoising result, and its noise has been greatly attenuated. Therefore, it is utilized to improve the second stage denoising result in three ways: image registration, patch matching and providing an estimation of the Wiener filtering parameters. By combining the second stage external and internal denoising results, we obtain the final denoising result. Experimental results show that our scheme significantly outperforms five state-of-the-art schemes both objectively and subjectively at a wide range of noise levels. R EFERENCES [1] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proc. IEEE Comput. Soc. Conf. CVPR, Jun. 2005, pp. 60–65. [2] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007. [3] W. Dong, G. Shi, and X. Li, “Nonlocal image restoration with bilateral variance estimation: A low-rank approach,” IEEE Trans. Image Process., vol. 22, no. 2, pp. 700–711, Feb. 2013. [4] S. Roth and M. J. Black, “Fields of experts,” Int. J. Comput. Vis., vol. 82, no. 2, pp. 205–229, 2009. [5] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in Proc. IEEE ICCV, Nov. 2011, pp. 479–486. [6] H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising: Can plain neural networks compete with BM3D?” in Proc. IEEE Conf. CVPR, Jun. 2012, pp. 2392–2399. [7] X. Liu et al., “Intrinsic colorization,” ACM Trans. Graph., vol. 27, no. 5, 2008, Art. ID 152. [8] O. Whyte, J. Sivic, and A. Zisserman, “Get out of my picture! Internetbased inpainting,” in Proc. 20th Brit. Mach. Vis. Conf., 2009. [9] J. Hays and A. A. Efros, “Scene completion using millions of photographs,” ACM Trans. Graph., vol. 26, no. 3, 2007, Art. ID 4. [10] H. Yue, X. Sun, J. Yang, and F. Wu, “Cloud-based image coding for mobile devices—Toward thousands to one compression,” IEEE Trans. Multimedia, vol. 15, no. 4, pp. 845–857, Jun. 2013. [11] T. Chen, M.-M. Cheng, P. Tan, A. Shamir, and S.-M. Hu, “Sketch2Photo: Internet image montage,” in Proc. ACM SIGGRAPH Asia, 2009, Art. ID 124.

1981

[12] M. Eitz, R. Richter, K. Hildebrand, T. Boubekeur, and M. Alexa, “Photosketcher: Interactive sketch-based image synthesis,” IEEE J. Comput. Graph. Appl., vol. 31, no. 6, pp. 56–66, Nov./Dec. 2011. [13] L. Sun and J. Hays, “Super-resolution from internet-scale scene matching,” in Proc. IEEE Conf. ICCP, Apr. 2012, pp. 1–12. [14] H. Yue, X. Sun, J. Yang, and F. Wu, “Landmark image super-resolution by retrieving web images,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 4865–4878, Dec. 2013. [15] Y. HaCohen, E. Shechtman, and D. Lishchinski, “Deblurring by example using dense correspondence,” in Proc. IEEE ICCV, Dec. 2013, pp. 2384–2391. [16] E. Luo, S. H. Chan, and T. Q. Nguyen, “Image denoising by targeted external databases,” in Proc. IEEE ICASSP, May 2014, pp. 3019–3023. [17] H. Yue, X. Sun, J. Yang, and F. Wu, “CID: Combined image denoising in spatial and frequency domains using web images,” in Proc. IEEE Conf. CVPR, Jun. 2014, pp. 2933–2940. [18] I. Mosseri, M. Zontak, and M. Irani, “Combining the power of internal and external denoising,” in Proc. IEEE ICCP, Apr. 2013, pp. 1–9. [19] H. C. Burger, C. Schuler, and S. Harmeling, “Learning how to combine internal and external denoising methods,” in Pattern Recognition. Berlin, Germany: Springer-Verlag, 2013, pp. 121–130. [20] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Nonlocal sparse models for image restoration,” in Proc. IEEE 12th ICCV, Sep./Oct. 2009, pp. 2272–2279. [21] A. Rajwade, A. Rangarajan, and A. Banerjee, “Image denoising using the higher order singular value decomposition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 4, pp. 849–862, Apr. 2013. [22] H. C. Burger and S. Harmeling, “Improving denoising algorithms via a multi-scale meta-procedure,” in Proc. 33rd Int. Conf. Pattern Recognit., 2011, pp. 206–215. [23] T. Gan and W. Lu, “Image denoising using multi-stage sparse representations,” in Proc. 17th IEEE Int. Conf. Image Process. (ICIP), Sep. 2010, pp. 1165–1168. [24] M. Zontak, I. Mosseri, and M. Irani, “Separating signal from noise using patch recurrence across scales,” in Proc. IEEE Conf. CVPR, 2013, pp. 1195–1202. [25] W. Zuo, L. Zhang, C. Song, and D. Zhang, “Texture enhanced image denoising via gradient histogram preservation,” in Proc. IEEE Conf. CVPR, Jun. 2013, pp. 1203–1210. [26] L. Dai, X. Sun, F. Wu, and N. Yu, “Large scale image retrieval with visual groups,” in Proc. 20th IEEE Int. Conf. Image Process. (ICIP), Sep. 2013, pp. 2582–2586. [27] C. Barnes, E. Shechtman, D. B. Goldman, and A. Finkelstein, “The generalized PatchMatch correspondence algorithm,” in Proc. 11th ECCV, 2010, pp. 29–43. [28] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. [29] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981. [30] T. J. Chin, J. Yu, and D. Suter, “Accelerated hypothesis generation for multi-structure robust fitting,” in Proc. 11th ECCV, 2010, pp. 533–546. [31] A. Delong, A. Osokin, H. N. Isack, and Y. Boykov, “Fast approximate energy minimization with label costs,” Int. J. Comput. Vis., vol. 96, no. 1, pp. 1–27, 2012. [32] M. Zontak and M. Irani, “Internal statistics of a single natural image,” in Proc. IEEE Conf. CVPR, Jun. 2011, pp. 977–984. [33] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski, “A database and evaluation methodology for optical flow,” Int. J. Comput. Vis., vol. 92, no. 1, pp. 1–31, 2011. [34] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in Proc. 8th Eur. Conf. Comput. Vis. (ECCV), 2004, pp. 25–36. [35] C. Liu, J. Yuen, and A. Torralba, “SIFT flow: Dense correspondence across scenes and its applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978–994, May 2011. [36] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 11, pp. 1222–1239, Nov. 2001. [37] J. Kim, C. Liu, F. Sha, and K. Grauman, “Deformable spatial pyramid matching for fast dense correspondences,” in Proc. IEEE Conf. CVPR, Jun. 2013, pp. 2307–2314. [38] S. P. Ghael, A. M. Sayeed, and R. G. Baraniuk, “Improved wavelet denoising via empirical Wiener filtering,” Proc. SPIE, vol. V, pp. 389– 399, Oct. 1997.

1982

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 6, JUNE 2015

[39] L. Zhang, W. Dong, D. Zhang, and G. Shi, “Two-stage image denoising by principal component analysis with local pixel grouping,” Pattern Recognit., vol. 43, no. 4, pp. 1531–1549, 2010. [40] W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring and superresolution by adaptive sparse domain selection and adaptive regularization,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1838–1857, Jul. 2011. [41] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparse representation for image restoration,” IEEE Trans. Image Process., vol. 22, no. 4, pp. 1620–1630, Apr. 2013. [42] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “BM3D image denoising with shape-adaptive principal component analysis,” in Proc. Workshop Signal Process. Adapt. Sparse Struct. Represent., 2009. [43] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” in Proc. IEEE Conf. CVPR, Jun. 2007, pp. 1–8. [44] H. Jégou and M. Douze. (2008). INRIA Holiday Dataset. [Online]. Available: https://developers.google.com/maps/documentation/streetview [45] Correlated Images. [Online]. Available: https://onedrive.live.com/?cid =674C3A37469082BE&id=674C3A37469082BE%215215, accessed Dec. 10, 2014. [46] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. [47] K. Dabov, A. Foi, and K. Egiazarian, “Video denoising by sparse 3D transform-domain collaborative filtering,” in Proc. 15th Eur. Signal Process. Conf., 2007, pp. 145–149.

Huanjing Yue received the B.S. degree in electrical engineering from Tianjin University, Tianjin, China, in 2010, where she is currently pursuing the Ph.D. degree in electrical engineering. Her research interests include image compression and image processing.

Xiaoyan Sun (M’04–SM’10) received the B.S., M.S., and Ph.D. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 1997, 1999, and 2003, respectively. Since 2004, she has been with Microsoft Research Asia, Beijing, China, where she is currently a lead Researcher with the Internet Media Group. She has authored or co-authored over 60 journal and conference papers and ten proposals to standards. Her current research interests include image and video compression, image processing, computer vision, and cloud computing. He was a recipient of the best paper award of the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY in 2009.

Jingyu Yang (M’10) received the B.E. degree from the Beijing University of Posts and Telecommunications, Beijing, China, in 2003, and the Ph.D. (Hons.) degree from Tsinghua University, Beijing, in 2009. He visited Microsoft Research Asia (MSRA) in 2011, within the MSRA’s young scholar supporting program. He visited the Signal Processing Laboratory with the École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, in 2012, and from 2014 to 2015. He has been a Faculty Member with Tianjin University, China, since 2009, where he is currently an Associate Professor with the School of Electronic Information Engineering. His research interests include image/ video processing, 3D imaging, and computer vision. He was selected into the Program for New Century Excellent Talents in University from the Ministry of Education, China, in 2011, and selected into the Elite Peiyang Scholar Program and Reserved Peiyang Scholar Program of Tianjin University, in 2012 and 2014, respectively.

Feng Wu (M’99–SM’06–F’13) received the B.S. degree in electrical engineering from Xidian University, in 1992, and the M.S. and Ph.D. degrees in computer science from the Harbin Institute of Technology, in 1996 and 1999, respectively. He was a Principle Researcher and Research Manager with Microsoft Research Asia. He is currently a Professor with the University of Science and Technology of China. He has authored or co-authored over 200 high quality papers, including several dozen of the IEEE T RANSACTION papers, and top conference papers on MobiCom, Special Interest Group on Information Retrieval, Computer Vision and Pattern Recognition, and Association for Computing Machinery’s Multimedia. He has 77 granted U.S. patents. His 15 techniques have been adopted into international video coding standards. His research interests include image and video compression, media communication, and media analysis and synthesis. He was a recipient of the best paper award from the IEEE T RANSACTIONS ON C IRCUITS S YSTEMS FOR V IDEO T ECHNOLOGY in 2009, the Pacific-Rim Conference on Multimedia in 2008, and International Society for Optics and Photonics—Visual Communications and Image Processing in 2007. Prof. Wu serves as an Associate Editor of the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEM FOR V IDEO T ECHNOLOGY , the IEEE T RANSACTIONS ON M ULTIMEDIA, and several other International journals. He was a recipient of the 2012 Best Associate Editor Award from the IEEE Circuits and Systems Society. He also serves as the TPC Chair at the International Workshop on Multimedia Signal Processing in 2011, Visual Communications and Image Processing in 2010, and the Pacific-Rim Conference on Multimedia in 2009, and the Special Sessions Chair at the IEEE International Conference on Multimedia and Expo in 2010, and the IEEE International Symposium on Circuits and Systems in 2013.