Noise Estimation of Natural Images via Statistical ... - IEEE Xplore

2 downloads 0 Views 7MB Size Report
Jul 31, 2015 - Abstract—We develop a framework for estimating the noise level of a natural image using two important statistics: 1) high kurtosis and 2) scale ...
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 8, AUGUST 2015

1283

Noise Estimation of Natural Images via Statistical Analysis and Noise Injection Chongwu Tang, Student Member, IEEE, Xiaokang Yang, Senior Member, IEEE, and Guangtao Zhai, Member, IEEE

Abstract— We develop a framework for estimating the noise level of a natural image using two important statistics: 1) high kurtosis and 2) scale invariance in transform domain. By exploring the said priors of natural image statistics in 2-D discrete cosine transform (DCT) domain, we reveal the limitations of these statistics for images with highly directional edges or large smooth areas. Then, we derive a novel two-step estimation scheme for noise variance: 1) in preliminary estimation, an integration of wavelet and nondirectional DCT transform is used to alleviate the influence of image’s structures and 2) a noise-injection rectification is further devised to deal with the noise-free image contents. A simulation and comparative study demonstrates that this algorithm reliably infers noise variance and its robustness over wide ranges of visual content and noise levels, while outperforming some relevant methods. This paper can significantly improve the performance of existing denoising techniques that require the noise variance as a critical parameter. Index Terms— Dual transform, kurtosis, noise estimation, noise injection, scale invariance.

I. I NTRODUCTION

N

OISE reduction is a crucial issue for both theoretical and practical image processing tasks. Although image denoising has been studied for many years and very promising denoising results have been achieved, almost all existing denoisers depend on a priori knowledge of the noise, e.g., its variance. Pertinent researches indicate that the performance of state-of-the-art denoising algorithms can drop dramatically given inaccurate noise parameters. Therefore, an effective noise estimation is of central importance to image processing/analysis algorithms, such as image quality assessment, trace verification, super resolution, and so on. Natural image noise generally has the following properties: 1) randomness; 2) unpredictability; 3) correlation (between noise signal and image signal); and 4) additivity (power spectrum). Because of the insufficiency of a priori

Manuscript received June 28, 2014; revised October 25, 2014 and November 26, 2014; accepted December 5, 2014. Date of publication December 18, 2014; date of current version July 31, 2015. This work was supported in part by the National Natural Science Foundation of China under Grants 61025005, 60932006, 61001145, 61102098, and 61221001; in part by the Science and Technology Commission of Shanghai Municipality under Grant 12DZ2272600; and in part by the 111 Project under Grant B07022. This paper was recommended by Associate Editor P. Yin. The authors are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2014.2380196

information, people usually assume a parametric model to approximate these properties of noise in the contaminated image observation and use blind method to infer the noise level. Early attempts date back to Gonzalez and Woods [1], who proposed a noise estimation method based on noisy pixel sampling from smooth regions of the image with relative constant gray values. This pioneering method is simple and does not need an explicit noise model. However, the estimation can be extremely underconstrained without a priori statistical knowledge of noise so that approximate parameters of fixed noise models are focused, among which the additive white Gaussian noise (AWGN) model is by far the most widely used. Accordingly, some more sophisticated algorithms were proposed, which can be classified into three types: 1) spatial domain based; 2) transform domain based; and 3) statistics based. Spatial domain algorithms usually suppress the image signal first applying high-pass filtering and then calculate noise variance from the residual image. Rank et al. [2] used a cascade of two 1-D difference operators to filtrate the noisy image and then computed the histogram of local variances by dividing the remained image into some subregions, and the noise variance was estimated by averaging the weighted histogram. In [3]–[5], Laplacian filtering and Sobel edge extraction were used to get the edge mask. Block-based local variances were calculated and the maximum or mean of the variances was taken as the estimator. Amer and Dubois [6] further proposed a structure-oriented method to enhance the robustness of noise estimation for images with large textural areas. In [7], noise level was estimated from the gradients of smooth regions for each intensity interval. Moreover, histogram analysis plays an important role in spatial domain algorithms, include methods based on histogram partition [8] and gradient histogram fitting [9]. However, accuracies of these methods are quite limited since it is difficult to discriminate original image structure or texture from noise completely. Local spatial domain algorithms were proposed based on homogeneity assumption, in which the noise variance is calculated from the homogeneous blocks directly [10]. Lukin et al. [11] presegmented the image to obtain a segmentary map and then extracted homogeneous blocks from different image parts to discriminate structures. However, the unsupervised variational classification they adopted has a high computation complexity that is not suitable for real-time applications. Homogeneity is an ambiguous

1051-8215 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1284

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 8, AUGUST 2015

definition, which means the relatively homogeneous blocks may still contain image details. How to identify homogeneous blocks in natural image became a major challenge for this type of methods. Transform domain algorithms were proposed along with the development of multiresolution analysis and wavelet theory. Donoho and Johnstone [12] proposed a robust noise-level estimator, which is the scaled median of absolute wavelet coefficients at the highest resolution: σˆ (n) = median(|y(i )|)/0.6745, y(i ) ∈ HHi . This estimator is a popular choice for wavelet soft-threshold denoisers, but it tends to overestimate the noise variance when SNR in wavelet domain is high. Some modified algorithms were brought forward following a similar idea. Stefano et al. [13] defined nonlinear statistical noise estimation functions according to characteristics of image and noise components and proposed a set of training-based method in wavelet domain. Zlokolica et al. [14] designed a spatial–temporal noise estimation algorithm by analyzing the distributions of spatial/ temporal gradients, which were determined from the finest scales of spatial/temporal wavelet transform. Liu et al. [15] proposed a framework for automatic color noise estimation from single image using piecewise smooth image model. A novel continuous function describing the relationship between noise level and image brightness was proposed, and an upper bound of the noise level was estimated by fitting a lower asymptote to the standard deviations (STDs) of per-segment image variances. These algorithms have a good performance at the expense of higher computation complexity and large amount of memory consumption. Statistics-based algorithms strike a favorable balance between performance and complexity. Through statistical modeling, a priori knowledge of image can be well exploited. Matching moment [16] is one of the representative methods, in which the second and fourth moments of the noisy image are used to estimate the noise variance. It performs well in low-level noise conditions, but tends to underestimate in higher levels. Liu and Lin [17] proposed a novel estimation method based on singular value decomposition (SVD). They asserted that the tail of singular values (i.e., the later SVD subspaces) can be used as proper data basis for noise estimation and image details do not have a significant influence on that part of subspaces. This assumption is mostly reasonable, but the method still faces some difficulties. 1) There are tuning parameters need be adjusted for each specific implementation. 2) The estimation performance of lower noise level is not accurate enough. To deal with images with large textural and edge areas, Zoran and Weiss [18] proposed an accurate noise estimation algorithm through analyzing the high-order statistical invariance of natural image. This method was based on the assumption that the fourth moment, i.e., kurtosis of natural image’s discrete cosine transform (DCT) marginal filter responses (MFRs), is stable across different scales, and significantly outperformed existing algorithms. However, by examining a large set of natural images, we realize that scale invariance assumption of DCT MFR coefficients is only valid

in a weak statistical sense. Furthermore, kurtosis of the DCT MFR coefficients may change drastically across scales for images with prominent edges or large smooth areas [19]. All these facts restrain robustness and accuracy of this algorithm. Since the noise level can vary from one iteration to another in many recursive image restoration algorithms, it is always beneficial to have an effective and robust noise estimation algorithm. In this paper, we propose a dual-transformbased two-step noise estimation framework for AWGN model. First, we extract a high-frequency (HF) wavelet sub-band of the image and calculate its DCT MFRs excluding those associating with prominent edges. This dual-transform scheme makes the kurtosis of MFRs more scale invariant. Then, a constrained nonlinear optimization is implemented to yield a preliminary estimation. Second, we use a rectification algorithm proposed in [20] and [21] to alleviate the negative impact of noise-free image contents. A test noise is injected into the noisy image to produce a new noisy image of which the local statistics are calculated and used to generate an offset that is used to rectify the preliminary estimation. It is efficacious especially for the textural images. We testify the proposed algorithm over a wide range of images and noise levels, and show its superiority compared with the method of [16]–[18]. The rest of this paper is organized as follows. In Section II, we first illustrate kurtosis scale invariance assumption and its limitation in natural images, and then analyze the textural region related statistics of natural image’s transform domain MFR coefficients. The noise estimation algorithm is elaborated in Section III using dual transform and noise injection. Experiments and comparative studies are given in Section IV. Finally, Section V gives the conclusion. II. S CALE I NVARIANCE OF K URTOSIS AND F URTHER A NALYSIS A. Scale Invariance Assumption and Its Limitations An interesting research topic in natural image statistics is the analysis of distribution of image transform coefficients, and the most straightforward study is on moment statistics. Since the commonly used lower moments such as mean and variance do not provide enough information regarding the shape of the distribution, the normalized fourth central moments, named kurtosis in statistical terminology, is widely used in natural image statistics. For a random variable x, its kurtosis is defined as the fourth cumulant divided by the square of the second cumulant, which equals the normalized fourth central moments minus three κ4 (x) μ4 (x) −3 (1) K(x) = 2 = 4 σ (x) κ2 (x) where κk is the kth cumulant function and μk is the kth central moment. In this paper, we prefer the excess kurtosis in (1) instead of classic fourth standardized moment definition K(x) = μ4 (x)/σ 4 (x) to facilitate the following analysis. Kurtosis depicts the peakedness of the distribution, and natural image statistics suggests that kurtosis of MFR in clean image is constant through scales [22] or more generalized, the kurtosis values for higher frequencies are lower than for the lowfrequency ones [23]. Therefore, it is the noise that deteriorates

TANG et al.: NOISE ESTIMATION OF NATURAL IMAGES

Fig. 1. Noise effect on scale invariance of kurtosis. A clean natural image (left) is added with white Gaussian noise of zero mean and different STDs σ (n) = 0, 10, 30 and the kurtosis through different scales (components) are calculated using 8×8 DCT bases. We can observe from the scatter plot (right) that kurtosis drops drastically as the noise STD rises. Stronger noise makes the kurtosis converge to zero because the noise is prominent now and the excess kurtosis of standard Gaussian distribution equals to zero. Please refer to the electronic version for color figures.

1285

Fig. 3. Kurtosis-scale profiles of natural images shown in Fig. 2. It is evident that the more textural of the image will lead to the more constant distribution of the kurtosis of its DCT MFR coefficients through scales. Blue rings: FIELD and NATURE.

Fig. 4. Wavelet coefficient distributions of images shown in Fig. 2. Here we partition the coefficients into 1000 intervals according to their values and calculate the normalized histogram.

Fig. 2. Several natural images with different levels of edge, texture, and smoothness. (a) and (b) BIRD and FIELD are used in [18]. (c) SEA comes from Kodak database [24]. (d) NATURE belongs to van Hateren natural image database [25]. It is obvious that BIRD has a large proportion of relatively smooth area, vividly in contrast to FIELD that has strong texture. The same comparison can be made between SEA and NATURE.

the image makes the kurtosis violate scale invariance principle in practice. In general, kurtosis drops more drastically as the noise variance rises, which can be attributed to the fact that the noise tends to attenuate the structures of original image along all the directions and therefore making the statistics more uniform. Fig. 1 shows this assumption evidently. However, the assumption is found to be invalid for a large number of natural images. In Fig. 2, several natural images with different levels of edge, texture, and smoothness are illustrated and kurtosis through scales of these images are calculated using 8 ×8 DCT bases, with the scatter plots shown in Fig. 3. On the one hand, kurtosis distributions of BIRD and SEA obviously violate the scale invariance assumption; on the other hand, FIELD has a more constant kurtosis distribution than BIRD, so dose NATURE compared with SEA. The most important reason of this phenomenon is that FIELD and NATURE have a stronger texture than their competitors. It should be emphasized that the constant property is a relative definition (i.e., FIELD compares with BIRD and NATURE compares with SEA, respectively) and illustrates kurtosis-scale relationship qualitatively rather than quantitatively. Moreover, there is a beneficial priori of natural image that the non-Gaussian properties of natural images’ wavelet coefficients can be modeled by Laplacian distribution or generalized Gaussian distribution (GGD). GGD is a parametric family of symmetric distributions, which includes Gaussian and Laplacian distributions as special cases, and as limit cases, it includes δ function and uniform distribution.

In a GGD model GG(μ, σ 2 , α), shape parameter α determines the non-Gaussian properties directly. Mallat [26] found that HF wavelet coefficients always followed GGD with the α 0.7. While Do and Vetterli [27] used GGD to approximate the marginal distribution of wavelet coefficients, and found that α was in the range of 0.7–2.0. Joshi and Fischer [28] applied GGD to fit DCT ac coefficients of natural image. Maximum likelihood estimation was used to get that α was in the range of 1.0–2.0. However, the GGD assumption becomes weak in smoother images. Fig. 4 shows the first-level diagonal wavelet coefficient distributions of the test images. Apparently, FIELD and NATURE have better GGD-like distributions compared with their competitors. In Section II-B, we will explore the textural effect on natural image statistics, including kurtosis and wavelet coefficient distributions. B. Textural Effects on Nature Image Statistics Let us focus on the image pair of SEA and NATURE, in which NATURE tends to be more kurtosis-scale invariant and more GGD-like in wavelet domain, just as shown in Figs. 3 and 4. To investigate impact of the statistics on noise estimation, we test the algorithm in [18] on the two images over several noise levels with AWGN model, and the results are listed in Table I. Relative error is given by er = |σˆ (n) − σ (n)|/σ (n) × 100%, and mean square error (MSE) and mean absolute difference (MAD) are provided to give comprehensive comparisons, and the smaller metrics are emphasized in boldface. The data verify that the estimation model will perform better on the image that is more concordant to scale invariance and GGD assumptions. To further uncover the textural effect on natural image statistics, we extract textural areas from NATURE and SEA, respectively, to facilitate our analysis, as shown in Fig. 5. We compare the original images with their textural subimages

1286

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 8, AUGUST 2015

TABLE I N OISE E STIMATION R ESULT [18] OF SEA AND NATURE; 8 × 8 DCT F ILTER BASES A RE U SED . T HE A DDITIVE G AUSSIAN W HITE N OISE

TABLE II T RANSFORM I NVARIANCE OF N OISE ’ S S TATISTICS . T HE S IZE OF THE

N OISE M ATRIX I S 512 × 512

I S G ENERATED U SING MATLAB F UNCTION RANDN . P ar a. I S THE N OISE PARAMETER U SED TO G ENERATE THE N OISE M ATRIX , AND σ (n) R EPRESENTS THE R EAL N OISE STD C ALCULATED F ROM THE N OISE M ATRIX . σˆ (n), eest = |σˆ (n) − σ (n)|, AND er (%) D ENOTE THE

E STIMATED N OISE STD, E STIMATION E RROR , AND R ELATIVE E RROR , R ESPECTIVELY

Fig. 7. Kurtosis distributions of NATURE and SEA, compared with their CD sub-bands.

Fig. 5. Enlarged textural parts of SEA and NATURE. From left to right: textural parts 1 and 2 of SEA and textural parts 1 and 2 of NATURE.

Fig. 6. Kurtosis and wavelet coefficient distributions of NATURE and SEA with their respective textural parts.

on kurtosis distribution of MFRs and the wavelet coefficient distributions in Fig. 6. It is obvious that the stronger texture of the image will yield the more constant kurtosis distribution and the more GGD-like wavelet coefficient distribution. This texture-related behavior of natural image statistics is expected to improve the noise estimation accuracy. C. Scale Invariance of Natural Image in Wavelet Domain Investigation in Section II-B indicates that it is reasonable to assume kurtosis-scale invariance and GGD in wavelet domain of strong textural images. However, selecting textural parts of image is not a good choice because noise signal is also

discarded because of arbitrary sampling. Wavelet transform turns out to be an ideal tool for extracting HF image subbands. Given first-level wavelet decomposition of an image, the horizontal detail sub-band Horizontal coefficients and vertical detail sub-band Vertical coefficients are found to be sensitive to directional structures so that are inappropriate for our application, while diagonal detail sub-band Diagonal coefficients (CD) is expected to be a preferable HF substitute of original image. In this paper, db1 basis [29] is used for wavelet decomposing since its unitarity preserves the noise statistics in transform domain. We verify the statistical invariance of CD wavelet sub-band in Table II using randomly generated AWGN matrix n, where σ (n) is the STD of n and σ  (n) is the STD calculated from the CD wavelet sub-band of n. Difference between the original and the transformed STD can be negligible. Kurtosis distributions of the test images and their CD subbands are shown in Fig. 7. From the comparisons, we realize that the CD sub-bands are more kurtosis-scale invariance. Further inspection of Fig. 1 reveals that there are many outliers of the kurtosis scatter plot, which deteriorate the constant distribution, especially in lower noise STD conditions. These outliers correspond to the frequencies that represent the dominant edges along the horizontal or the vertical direction in images, so that the assumed kurtosis-scale invariance of DCT MFRs will be invalid for images with dominant edges in directions aligned with the DCT bases. Consider that natural images always contain nonuniform and anisotropic texture patterns that exclude directional contours such as in modern architectures, to achieve an even better scale invariance, we select the DCT filters located on the diagonal of the N × N DCT filter matrix, while abandon the ones that are sensitive to dominant directional edges, just as Fig. 8 shows. Revised kurtosis distributions of FIELD, BIRD, NATURE, and SEA are shown in Fig. 9, in which we derive that some abrupt points are removed and the distributions are more constant than before. In a quantitative manner, we verify the improvement of kurtosis invariance using nondirectional DCT

TANG et al.: NOISE ESTIMATION OF NATURAL IMAGES

1287

TABLE III E XPECTATIONS ( E ) AND SDTs (S) OF K URTOSIS D ISTRIBUTIONS , U SING E XHAUSTIVE ( E − DCT) AND N ONDIRECTIONAL (U− DCT) DCT BASES . S MALLER S FOR E ACH I MAGE I S E MPHASIZED IN B OLDFACE

TABLE IV Fig. 8. Exhaustive 8 × 8 DCT filters. Filters in the dotted line are the nondirectional ones we use.

E STIMATION R ESULTS AND THE A CCURACY FOR VAN H ATEREN AND K ODAK D ATABASES . σˆ (n) AND σˆ Z (n) R EPRESENT THE D UAL -T RANSFORM AND Z ORAN ’ S F RAMEWORKS . B EST R ESULTS A RE S ELECTED W ITH THE B OLDFACE

Fig. 9. Kurtosis distributions of exhaustive and nondirectional DCT filter responses of FIELD, BIRD, NATURE, and SEA. Note that to achieve the same amount of DCT filters between exhaustive and nondirectional conditions, we use 8 × 8 DCT bases for the former and 64 × 64 DCT bases for the latter. DC components are abandoned for both of them.

bases in Table III, in which the smaller STDs represent the more constant kurtosis distributions. Accordingly, in this paper, we first get the CD wavelet sub-band of the noisy image, and then derive its 2-D DCT MFRs. Since DCT is unitary, the dual-transform coefficients can be used directly for an accurate noise estimation algorithm to be introduced below. III. A LGORITHM OF THE T WO -S TEP N OISE E STIMATION Based on the analysis in Section II, we introduce our two-step noise-level estimation scheme in this section. A dualtransform-based preliminary estimation is implemented first, and a novel noise injection-based rectification procedure is carried out subsequently. A. Dual-Transform-Based Preliminary Estimation In this paper, we use AWGN model Y = X + N in the pixel domain, where X is the original image, N is the noise, and

Y is the noisy observation of X. If X and N are independent, we can extract the CD sub-band after wavelet transform y =x +n

(2)

where y is CD sub-band of Y and n is the noise in the transform domain. Note that x is a GGD random variable that x ∼ GG(μ, σ 2 (x), α) and n is an independent Gaussian random variable, i.e., n ∼ N (0, σ 2 (n)). Considering the independence between x and n, we have σ 2 (y) = σ 2 (x) + σ 2 (n) κ4 (y) = κ4 (x) + κ4 (n).

(3)

Using the relationship between the cumulant and the central moments κ4 (·) = μ4 (·) − 3σ 4 (·) and the expression of excess kurtosis in (1), we further have μ4 (y) = κ4 (y) + 3σ 4 (y) = κ4 (x) + κ4 (n) + 3σ 4 (y) = μ4 (x) − 3σ 4 (x) + μ4 (n) − 3σ 4 (n) + 3σ 4 (y) (4) = K(x)σ 4(x) + K(n)σ 4 (n) + 3σ 4 (y). Then, letting (4) into (1), we derive K(y) =

σ 4 (x) σ 4 (n) K(x) + K(n) σ 4 (y) σ 4 (y)

(5)

1288

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 8, AUGUST 2015

TABLE V E STIMATION R ESULTS AND THE A CCURACY OF C ONSIDERED M ETHODS FOR FIELD AND BIRD. N OISE L EVELS T HAT H AVE FAILURES A RE M ARKED W ITH F AND A RE N OT C ONSIDERED W HEN C ALCULATE E RRORS . T HE B EST R ESULT IN A ROW I S S ELECTED W ITH THE B OLDFACE

TABLE VI E STIMATION R ESULTS AND THE A CCURACY OF C ONSIDERED M ETHODS FOR NATURE AND SEA. N OISE L EVELS T HAT H AVE FAILURES A RE M ARKED W ITH F AND A RE N OT C ONSIDERED W HEN C ALCULATE E RRORS . T HE B EST R ESULT IN A ROW I S S ELECTED W ITH THE B OLDFACE

K(x) and K(n), which is determined by the ratios between the STDs of signals. Letting (3) into (5), we arrive at  K(y) =

Fig. 10. Test images from the Kodak [24] set. From left to right, top to bottom are kodim01–kodim24.

where K(x), according to the assumption, is approximately a constant throughout different scales. We can see that the kurtosis K(y) is essentially a weighted average of

σ 2 (y) − σ 2 (n) σ 2 (y)

2

 K(x) +

σ 2 (n) σ 2 (y)

2 K(n)

(6)

which enables us to quantify the relationships between the fourth moment (kurtosis) of original and the noisy transformed signals using the second moments (variances) of the noise n and the noisy observation y. We emphasize that the only premise of the analysis in this section is an independent additive noise model, as given in (2), and there is no underlying assumption about the distributions of x and n. Note that the higher moments can also be analyzed with the same method outlined in this section. However, higher moments are more sensitive to noises and outliers, and much more samples are needed in the estimation. Therefore, only kurtosis is used in this paper. Applying (6) to different frequency indexes for which the kurtosis can be assumed to be constant, we can estimate the noise variance by solving the following constrained nonlinear

TANG et al.: NOISE ESTIMATION OF NATURAL IMAGES

1289

TABLE VII E STIMATION R ESULTS AND THE A CCURACY OF C ONSIDERED M ETHODS FOR v AN

In Section III-B, we will introduce a novel noise injection method to address the misestimation problem.

H ATEREN AND K ODAK D ATABASES . T HE B EST R ESULT IN A

ROW I S S ELECTED W ITH THE B OLDFACE

B. Noise Injection-Based Rectification Natural images always contain abundant textures and structures that make their probability distributions very nonGaussian in transform domain. This property will also influence the accuracy of noise estimation because sometimes it is difficult to distinguish image texture from noise and prominent image structures may heavily annihilate the noise statistically, so that it is reasonable to assume that there is misestimation in the preliminary step since noise-free image contents may amplify or attenuate the noise effects. To tackle with the possible misestimation, high-pass filtering and edge extraction are commonly employed to alleviate the effect of intrinsic image structures. In this paper, we propose a noise injection-based rectification method to improve the estimation performance. Assume that there is a relationship between the extent of misestimation and original image structures, we can expect to acquire a rectification value for the preliminary estimation result because misestimation is already inherent in it. In our analysis, a test noise variance σ 2 (n − test) is used to generate a test noise matrix n test according to the size of original noisy image, and n test ∼ N (0, σ 2 (n − test)). Then, the test noisy image is obtained by

programming problem:   K(x), K(n), σˆ 2 (n)   2 2  σˆ (yi ) − σ 2 (n)  = arg min K(yi ) − σˆ 2 (yi ) K(x),K(n),σ 2 (n) i∈I   2  2  σ (n)  × K(x) − K(n)  , s.t. K(x), K(n) ≥ −2 (7)  σˆ 2 (yi ) 1

where yi represent different MFRs of the noisy CD subband and I denotes the set of selected frequency indexes.  i ) can be directly calculated from yi . Here, σˆ 2 (yi ) and K(y we obtain yi by convolving y with 64 × 64 diagonal DCT filters. The constraint for (7) comes from the fact that kurtosis defined by (1) is >−3. Note that K(n) = 0 for white Gaussian noise, and the optimization in (7) can be further simplified as   K(x), σˆ 2 (n) = arg min K(x),σ 2 (n)

i∈I    2 2   2 σˆ (yi )−σ (n)   ×K(yi ) − K(x) , s.t. K(x) ≥ −2. 2   σˆ (yi ) 1

(8) We perform 1 minimization in (7) and (8) to reduce the influence of outliers, while Zoran and Weiss [18] used the 2 norm in their algorithm. Table IV lists the estimation results of Kodak [24] and van Hateren and van der Schaaf [25] databases. We can observe that dual-transform method has already enhanced the estimation performance especially in higher noise levels, but for lower levels, the textures and structures of noise-free image make it difficult to discriminate noise from image contents, which results in overestimation.

ytest = y + n test = x + n + n test = x + n .

(9)

Note that we generate the test noise and inject it into original noisy image in pixel domain, and then analyze the test noisy image in wavelet domain. Theoretically, we should have σ 2 (n  ) = σ 2 (n) + σ 2 (n − test)

(10)

but in practice, the equality does not hold well due to the impact of noise-free image contents: σ 2 (n  ) = σ 2 (n) + σ 2 (n − test) + Q rec . Offset Q rec between both sides of (10) can be attributed to the misestimation quantities. Because of the blindness of noise estimation and for simulating the influence of image contents upon noise in the preliminary estimation, the most appropriate choice of σ 2 (n − test) is the preliminary estimation σˆ 2 (n). Then, we calculate the rectification value as Q rec = σˆ 2 (n  ) − λσ 2 (n − test) = σˆ 2 (n  ) − λσˆ 2 (n)

(11)

where σˆ 2 (n  ) is the estimated noise level of n  following the same procedure in Section III-A and λ is the coefficient that controls the rectification magnitude. Note that Q rec can be either positive or negative. However, the generated test noise matrix may contain negative entries that will counteract with original noise and compromise our whole noise estimation algorithm, so that we introduce the other two constraints: on the one hand, the aforementioned processes of noise generation, injection and Q rec determination should be implemented several times and the mean value Q rec is computed as

1290

Fig. 11.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 8, AUGUST 2015

Estimation errors of test images and databases. Blue lines: results of the proposed method. Please refer to the electronic version for color figures. TABLE VIII C OMPARISON OF E LAPSED T IMES (S ECOND )

a more reasonable rectification value; on the other hand, we limit the absolute value of Q rec to a maximum as ⎧ ⎪ ⎨−γ, Q rec ≤ −γ (12) Q rec = Q rec , −γ < Q rec ≤ γ ⎪ ⎩ Q rec > γ γ, where γ is a positive. The final estimation is given by 2 σˆ rec (n) = σˆ 2 (n) + Q rec .

(13)

To find the appropriate value of λ and γ, we test (11) and (12) on more than 4000 natural images according to our noise estimation experiment in Section IV-A, and then arrive that the value of λ typically varies in the range of 1.0–2.0 and γ in the range of 0.3–0.8. The experimental results are to be presented in Section IV. IV. E XPERIMENTAL R ESULTS A. Noise Estimation Experiments According to the central limit theorem and noise independence, many types of noise in the pixel domain will mix

into Gaussian noise in the transform domain. Hence, the assumed white Gaussian noise model is not very restrictive, and the proposed noise estimation algorithm is appropriate to most existing noise reduction algorithms designed for AWGN model. The proposed two-step estimation scheme is first tested on images shown in Fig. 2. We compare our algorithm with [18], the widely used matching moment based algorithm [16], and the latest SVD based method [17]. For the SVD method, we choose the recommended parameters that include 75% of the singular value tail and the test noise level 50; further details can be found in [17]. In our case, λ and γ are set to 1.5 and 0.5, respectively. In addition, we estimate five times for each noise level and calculate the mean value to consider the randomness of noise. To verify and compare the estimation methods thoroughly, we add white Gaussian noise to each image over the STD range of 1–50 and estimate the noise levels, respectively. The results are listed in Tables V and VI. Evaluate metrics of MSE, MAD, and mean relative error rate er are provided over all noise levels. The best results are emphasized in boldface. From Tables V and VI, we can observe that the proposed method achieves the best performance in most cases. Methods in [17] and [18] always cause overestimate due to the influence of noise-free image textures, while matching moment is imprecise under higher noise levels since the approximations of m 2 and m 4 are calculated from pixel values of the noisy image directly. By the comparison of the evaluate metrics, the proposed method not only constantly performs better

TANG et al.: NOISE ESTIMATION OF NATURAL IMAGES

1291

TABLE IX M INIMAL PSNRs FOR vAN H ATEREN AND K ODAK D ATABASES A FTER BM3D F ILTERING U SING D IFFERENT N OISE E STIMATORS . C OLUMN R N S HOWS THE PSNR B EFORE D ENOISING . C OLUMN R S HOWS THE PSNR W HEN THE T RUE N OISE STD I S PASSED TO THE D ENOISER . RT , R L, R Z , AND R O R EPRESENT PSNRs U SING E STIMATOR IN [16]–[18] AND T HIS PAPER . T HE PSNR VALUES N EAREST TO R A RE S ELECTED AS O PTIMUMS A MONG THE D ENOISING R ESULTS U SING THE F OUR E STIMATORS , AND A RE E MPHASIZED IN B OLDFACE

than all the other three throughout the noise levels but also surpasses [16] and [17] with remarkably large margins. To test robustness of the algorithms, we further experiment on the Kodak image suite [24] (the collage of this image set is shown in Fig. 10) and van Hateren natural image database [25]. Images in both of them are with different textural levels. Just as previously mentioned, the test images are also contaminated by AWGN with the 1–50 STD range, and the results are given in Table VII. We can observe that the proposed algorithm surpasses the other three as expected and maintains a good unbiasedness over the images in each database. Fig. 11 shows the estimation errors of test images and databases, respectively, through the noise levels, in which the superiority of the proposed method is evident. We further compare the computational complexity in Table VIII. We test images with different resolutions, and the average running times are listed using the four estimators. For smaller images, the SVD method in [17] is the fastest but with lower accuracy. In our framework, although we addressed a nonlinear programming problem twice, the computational complexity is lower especially for large test images because the downsampled sun band is used for estimation, and still with the best estimation performance. Experiments are performed using MATLAB on a 2.93-GHz Intel Core2 PC. B. Denoising Experiments As an important prerequisite for denoising methods, the estimated noise variance can then be fed into existing denoisers to suppress the noise. For comparison, we test the estimators of [16]–[18] and our proposed within denoising application using the block matching and 3-D filtering (BM3D) method [30], which represents the state of the art. In BM3D,

a Wiener filter is employed for collaborative filtering, which requires the accuracy estimation of noise STD σ . Peak signalto-noise ratio (PSNR) of the restored image refer to the original one is listed in Table IX. Limited by pages, only the minimal PSNR for each test noise level over selected images from van Hateren database and Kodak database is presented. However, the values represent the behavior of noise estimation in the worst case and, therefore, the applicability of the estimator. The PSNR is calculated by PSNR = 10·lg

1 M·N

2552 M−1 N−1 2 ˆ x=0 y=0 [ f (x, y)− f (x, y)]

(14)

where f (x, y) and fˆ(x, y) represent the original and restored images with the size of M × N. In Table IX, we calculate PSNRs when true noise STDs are passed to the denoiser, and select the PSNR values nearest to these TRUE PSNRs as optimums among the denoising results using the four estimators. From this table, we derive that for van Hateren database, PSNRs of the denoising results using the proposed method achieve the optimums at test noise levels, which are close to the results using true noise STD although not exactly equal to them. The performance of [18] is comparable with ours but a little bit inferior. As to the Kodak database, although the estimator in [18] is good at lower noise levels, our method carries out a competitive estimation for these lower levels and works even better for higher noise levels as well. In general, the denoising results show that a higher noise estimation accuracy leads to a higher denoising quality in most cases, so that the results using our estimator outperform the competitors in [16] and [17] thoroughly. Fig. 12 shows the BM3D denoising results for additive noises, in which noise levels of these images are preestimated using the four estimators. Partly enlarged images are presented for better visual effects. We can

1292

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 8, AUGUST 2015

Fig. 12. Denoising results on test images using BM3D. We partly enlarge the images and their restored results for better visual effects. Top row: test images in which the enlarged regions are marked by red rectangles. The second row to the bottom: enlarged regions, from left to right: original images, noisy images, denoising results using estimator in [16], denoising results using estimator in [17], denoising results using estimator in [18], and denoising results using our estimator.

observe that the restored results using the estimator in [16] still have some noise because of its underestimation, while because of the overestimation, the results using [17] and [18] are over-restored, especially in detailed areas. Compared with these estimators, our method achieves a good balance between noise restraining and texture retaining. C. Experiments on Real Noise Scenarios Due to limitations of capturing and transmitting technologies in the past, images were inevitably corrupted by noises introduced by optical devices or imaging media. Noises are inherent for these original images, so that it is meaningful to estimate the noises and restore the clean images.

We test some images of this kind and illustrate the results in Fig. 13. For the three original images in the leftmost column of this figure, noise levels are first evaluated using our method. Estimated noise STDs are 3.38, 1.54, and 1.68 from top to bottom. Then, these noise parameters are passed to BM3D denoiser and yield restored images. Details of the original, denoised, and absolute difference images are enlarged to make it easier for comparison. In denoised images, structures and textures are better preserved, while much of the noises are suppressed. Besides, kurtosis-scale plots in the rightmost column further verify the scale invariance assumption in which kurtosis throughout scales of denoised images is more constant than original images,  and distribute along the estimated kurtosis K(x)

TANG et al.: NOISE ESTIMATION OF NATURAL IMAGES

1293

Fig. 13. Denoising results of real noise scenarios using BM3D combined with our estimator. From left column to right: original uncorrupted images in which noises are inherent. Regions marked by red rectangles are enlarged for better visual effects; partly enlarged original images; partly enlarged denoising results; scaled absolute difference images; kurtosis distributions for the original and restored images.

[in left-hand side of (8)] represented by the dashed lines. (Estimated kurtosis is 8.18, 20.70, and 18.84 from top to bottom.) V. C ONCLUSION This paper presents a technique of using statistical priors from natural image statistics for noise estimation. Kurtosisscale invariance of natural image in transform domain is discussed, and the influence of image textures on it is analyzed. Based on these properties, we propose a dualtransform estimation model that cascades wavelet transform and nondirectional DCT, in which the noise variance can be effectively estimated via constrained nonlinear programming. A novel noise injection-based rectification procedure is then introduced to make the estimation more accuracy. This two-step estimation scheme is tested with additive Gaussian noise on prevalent image databases, and further validated in denoising application. The experimental results show that the algorithm outperforms state-of-the-art noise estimation methods, and are more propitious to image denoising. R EFERENCES [1] R. C. Gonzalez and R. E. Woods, “Image restoration and reconstruction,” in Digital Image Processing, 3rd ed., M. McDonald, Ed. Upper Saddle River, NJ, USA: Pearson Education, 2008, pp. 313–319. [2] K. Rank, M. Lendl, and R. Unbehauen, “Estimation of image noise variance,” IEE Proc.-Vis., Image, Signal Process., vol. 146, no. 2, pp. 80–84, Aug. 1999.

[3] J. Immerkær, “Fast noise variance estimation,” Comput. Vis. Image Understand., vol. 64, no. 2, pp. 300–302, 1996. [4] S.-C. Tai and S.-M. Yang, “A fast method for image noise estimation using Laplacian operator and adaptive edge detection,” in Proc. 3rd Int. Symp. Commun., Control, Signal Process. (ISCCSP), Mar. 2008, pp. 1077–1081. [5] B. R. Corner, R. M. Narayanan, and S. E. Reichenbach, “Noise estimation in remote sensing imagery using data masking,” Int. J. Remote Sens., vol. 24, no. 4, pp. 689–702, 2003. [6] A. Amer and E. Dubois, “Fast and reliable structure-oriented video noise estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 1, pp. 113–118, Jan. 2005. [7] W. Förstner, “Image preprocessing for feature extraction in digital intensity, color and range images,” in Geomatic Method for the Analysis of Data in the Earth Sciences (Lecture Notes on Earth Sciences). Berlin, Germany: Springer-Verlag, 1998. [8] J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. [9] H. Voorhees and T. Poggio, “Detecting textons and texture boundaries in natural images,” in Proc. 1st ICCV, 1987, pp. 250–258. [10] G. A. Mastin, “Adaptive filters for digital image noise smoothing: An evaluation,” Comput. Vis., Graph., Image Process., vol. 31, no. 1, pp. 103–121, 1985. [11] V. V. Lukin, S. K. Abramov, B. Vozel, M. Uss, and K. Chehdi, “Performance analys of segmentation-based method for blind evaluation of additive noise in images,” in Proc. Int. Kharkov Symp. Phys. Eng. Microw., Millimeter, Submillimeter Waves (MSMW), Jun. 2010, pp. 1–3. [12] D. L. Donoho and I. L. Johnstone, “Adapting to unknown smoothness via wavelet shrinkage,” J. Amer. Statist. Assoc., vol. 90, no. 432, pp. 1200–1224, 1995. [13] A. De Stefano, P. R. White, and W. B. Collis, “Training methods for image noise level estimation on wavelet components,” EURASIP J. Appl. Signal Process., vol. 2004, no. 16, pp. 2400–2407, Jan. 2004.

1294

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 8, AUGUST 2015

[14] V. Zlokolica, A. Pižurica, and W. Philips, “Noise estimation for video processing based on spatio-temporal gradients,” IEEE Signal Process. Lett., vol. 13, no. 6, pp. 337–340, Jun. 2006. [15] C. Liu, R. Szeliski, S. B. Kang, C. L. Zitnick, and W. T. Freeman, “Automatic estimation and removal of noise from a single image,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 299–314, Feb. 2008. [16] H. L. Van Trees, “Detection of signals—Estimation of signal parameters,” in Detection, Estimation, and Modulation Theory. New York, NY, USA: Wiley, 2001, pp. 246–286. [17] W. Liu and W. Lin, “Gaussian noise level estimation in SVD domain for images,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2012, pp. 830–835. [18] D. Zoran and Y. Weiss, “Scale invariance and noise in natural images,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Sep./Oct. 2009, pp. 2209–2216. [19] C. Tang, X. Yang, and G. Zhai, “Dual-transform based noise estimation,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2012, pp. 991–996. [20] C. Tang, X. Yang, and G. Zhai, “Robust noise estimation based on noise injection,” in Proc. 13th Adv. Multimedia Inf. Process.-Pacific-Rim Conf. Multimedia (PCM), vol. 7674. 2012, pp. 142–152. [21] C. Tang, X. Yang, and G. Zhai, “Robust noise estimation based on noise injection,” J. Signal Process. Syst., vol. 74, no. 1, pp. 69–78, 2014. [22] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT coefficient distributions for images,” IEEE Trans. Image Process., vol. 9, no. 10, pp. 1661–1666, Oct. 2000. [23] M. Bethge, “Factorial coding of natural images: How effective are linear models in removing higher-order dependencies?” J. Opt. Soc. Amer. A, vol. 23, no. 6, pp. 1253–1268, 2006. [24] Kodak Lossless True Color Image Suite. [Online]. Available: http://r0k.us/graphics/kodak [25] J. H. van Hateren and A. van der Schaaf, “Independent component filters of natural images compared with simple cells in primary visual cortex,” Proc., Biological Sci., vol. 265, no. 1394, pp. 359–366, Mar. 1998. [26] S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 7, pp. 674–693, Jul. 1989. [27] M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized Gaussian density and Kullback–Leibler distance,” IEEE Trans. Image Process., vol. 11, no. 2, pp. 146–158, Feb. 2002. [28] R. L. Joshi and T. R. Fischer, “Comparison of generalized Gaussian and Laplacian modeling in DCT image coding,” IEEE Signal Process. Lett., vol. 2, no. 5, pp. 81–82, May 1995. [29] I. Daubechies, “Orthonormal bases of wavelets and multiresolution analysis,” in Ten Lectures on Wavelets. Philadelphia, PA, USA: SIAM, 1992, p. 129. [30] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007.

Chongwu Tang (S’12) received the B.E. degree from Xi’an Petroleum University, Xi’an, China, in 2007 and the M.E. degree from Northwestern Polytechnical University, Xi’an, in 2010. He is currently working toward the Ph.D. degree with Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China. His research interests include image and video processing, image modeling, and noise analysis.

Xiaokang Yang (S’12–SM’04) received the B.S. degree from Xiamen University, Xiamen, China, in 1994; the M.S. degree from Chinese Academy of Sciences, Shanghai, China, in 1997; and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2000. He was a Research Fellow with the Center for Signal Processing, Nanyang Technological University, Singapore, from 2000 to 2002, and a Research Scientist with Institute for Infocomm Research, Singapore, from 2002 to 2004. He is currently a Full Professor and the Deputy Director of the Department of Electronic Engineering, Institute of Image Communication and Information Processing, Shanghai Jiao Tong University. He participates in international standards, such as MPEG-4, JVT, and MPEG-21. He has authored over 80 refereed papers and holds six patents. His research interests include video processing and communication, media analysis and retrieval, perceptual visual processing, and pattern recognition. Dr. Yang is a member of the Visual Signal Processing and Communications Technical Committee of the IEEE Circuits and Systems Society. He received the Microsoft Young Professorship Award in 2006, the Best Young Investigator Paper Award at the IS&T/SPIE International Conference on Video Communication and Image Processing in 2003, and several awards from the Agency for Science, Technology and Research and the Tan Kah Kee Foundation. He was the Special Session Chair of Perceptual Visual Processing of the IEEE International Conference on Multimedia and Expo in 2006. He was the Local Co-Chair of ChinaCom in 2007 and the Technical Program Co-Chair of the IEEE Workshop on Signal Processing Systems in 2007.

Guangtao Zhai (M’10) received the B.E. and M.E. degrees from Shandong University, Shandong, China, in 2001 and 2004, respectively, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2009. He was a Student Intern with Institute for Infocomm Research, Singapore, from 2006 to 2007; a Visiting Student with the School of Computer Engineering, Nanyang Technological University, Singapore, from 2007 to 2008; and a Visiting Scholar with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada, from 2008 to 2009, where he was a Post-Doctoral Fellow from 2010 to 2012. In 2011, he was a Visiting Researcher with the Department of Photonics Engineering, Technical University of Denmark, Copenhagen, Denmark. Since 2012, he has been a Research Professor with Institute of Image Communication and Information Processing, Shanghai Jiao Tong University. His research interests include image and video processing, perceptual signal processing, and pattern recognition. Dr. Zhai received the Humboldt Research Fellowship in 2011.