Learning Pixel-Distribution Prior with Wider Convolution for Image ...

8 downloads 0 Views 10MB Size Report
Jul 28, 2017 - [7] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for ... [24] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng. Patch group based ...
Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Ruogu Fang University of Florida [email protected]

Abstract In this work, we explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn pixel-distribution from noisy data. By increasing CNN’s width with large reception fields and more channels in each layer, CNNs can reveal the ability of learning pixel-distribution, which is a prior excising in many different types of noise. The key to our approach is a discovery that wider CNNs tends to learn the pixel-distribution features, which provides the probability of that inference-mapping primarily relies on the priors instead of deeper CNNs with more stacked non-linear layers. We evaluate our work: Wide inference Networks (WIN) on additive white Gaussian noise (AWGN) and demonstrate that by learning the pixel-distribution in images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. Code and models are available at https:// github.com/ cswin/ WIN.

1

Prior: pixel-distribution features

In low-level vision problems, pixel-level features are the most important features. We compare the histograms of different images in various noise levels to investigate the pixel-level features having a certain of consistency. As we can see from Fig. 1 and Fig. 2, the pixel-distribution in noisy images is more similar in higher noise level σ = 50 than lower noise level σ = 10. WIN inferences noise-free images based on the learned pixel-distribution features. When the noise level is the higher, the pixel-distribution features are more similar. Thus, WIN can learn more pixel-distribution features from noisy images having higher level noise. This is the reason that WIN performs even better in higher-level noise, which can be seen and verified in section 3.

4000

2500

Number of pixel

1000

500

1500

1000

500

0 0

Number of pixel

2000

1500

50

100

150

Pixel value

(a) Ground-truth-I

200

250

3500

2500

3000

2000 1500

50

100

150

200

Pixel value

(b) Noisy-I Noise=10

250

2000 1500 1000 500

0 0

2500

1000 500

0

Pixel value

3000

Number of pixel

2000

Number of pixel

arXiv:1707.09135v1 [cs.CV] 28 Jul 2017

Peng Liu University of Florida [email protected]

0

0

50

100

150

Pixel value

200

(c) Ground-truth-II

250

0

50

100

150 Pixel value

200

250

(d) Noisy-II Noise=10

Figure 1: Compare the pixel distributions of histograms of two different images added additive white Gaussian noise (AWGN) with same noise level σ = 10.

1800

2000

3000

1600

1500

Number of pixel

Number of pixel

Number of pixel

1200 1000

1000

500

800 600 400

0

0

Pixel value

50

100

150

Pixel value

200

(a) Ground-truth-I

250

0

2000 1500 1000

1000

500

500

200

0

Number of pixel

2500

1400

1500

0

Pixel value

50

100

150

Pixel value

200

250

0

(b) Noisy-I Noise=50

0 50

100

150

Pixel value

200

250

0

50

100

150

Pixel value

200

250

(d) Noisy-II Noise=50

(c) Ground-truth-II

Figure 2: Compare the pixel distributions of histograms of two different images added additive white Gaussian noise (AWGN) with same noise level σ = 50.

2

Wider Convolution Inference Strategy

In Fig. 3, we illustrate the architectures of WIN5, WIN5-R, and WIN5-RB. 2.1

Architectures

Three proposed models have the identical basic structure: L = 5 layers and K = 128 filters of size F = 7 × 7 in most convolution layers, except for the last one with K = 1 filter of size F = 7 × 7. The differences among them are whether batch normalization (BN) and an input-to-output skip connection are involved. WIN5-RB has two types of layers with two different colors. (a) Conv+BN+ReLU [19]: for layers 1 to L − 1, BN is added between Conv and ReLU [19]. (b) Conv+BN: for the last layer, K = 1 filters of size F = 7 × 7 is used to reconstruct the R(y) ≈ −n. In addition, a shortcut skip connecting the input (data layer) with the output (last layer) is added to merge the input data with R(y) as the final recovered image. 2.2

Having Knowledge Base with Batch-Normal

In this work, we employ Batch Normalization (BN) for extracting pixel-distribution statistic features and reserving training data means and variances in networks for denoising inference instead of using the regularizing effect of improving the generalization of a learned model.

INPUT

INPUT

INPUT

Conv+ReLU

Conv+ReLU

Conv+BN+ReLU

Conv+ReLU

Conv+ReLU

Conv+BN+ReLU

The regularizer-BN can keep the data distribution the same as input: Gaussian distribution. This distribution consistency between input and regularizer ensures more pixeldistribution statistic features can be extracted accurately. The integration of BN [9] into more filters will further preserve the prior information of the training set. Actually, a number of state-of-the-art studies [5, 11, 24] have adopted image priors (e.g. distribution statistic information) to Figure 3: Architectures (a) WIN5 (b) WIN5-R (c) WIN5-RB. achieve impressive performance. Conv+ReLU

Conv+ReLU

Conv+BN+ReLU

Conv+ReLU

Conv+ReLU

Conv+BN+ReLU

Conv

Conv

Conv+BN

OUTPUT

OUTPUT

OUTPUT

Can Batch Normalization work without a Skip Connection? In WINs, BN [9] cannot work without the inputto-output skip connection and is always over-fitting. In WIN5-RB’s training, BN keeps the distribution of input data consistent and the skip connection can not only introduce residual learning but also guide the network to extract the certain features in common: pixel-distribution. Without the input data as a comparison, BN could bring negative effects by keeping the each input distribution same, especially, when a task is to output pixel-level feature map. In DnCNN, two BN layers are removed from the first and last layers, by which a certain degree of the BN’s negative effects can be reduced. Meantime DnCNN also highlights network’s generalization ability largely relies on the depth of networks. In Fig.4, learned priors (means and variances) are preserved in WINs as knowledge base for denoising inference. When WIN has more channels to preserve more data means and variances, various combinations of these feature maps can corporate with residual learning to infer the noise-free images more accurately. 2

INPUT

Layer-1

Layer-2

Layer-3

Layer-4

Layer-5

OUTPUT

Figure 4: The process of denoising inference by sparse distribution statistics features. Learned priors (means and variances) are preserved in WINs as knowledge base for denoising inference. When WIN has more channels to preserve more data means and variances, various combinations of these feature maps can corporate with residual learning to infer the noise-free images more accurately.

3

Experimental Results

Table 1: The average results of PSNR (dB) / SSIM / Run Time (seconds) of different methods on the BSD200test [18] (200 images). Note: WIN5-RB-B (blind denoising) is trained on larger number of patches as data augmentation is adapted.This is the reason why WIN5-RB-B (trained on σ = [0 − 70]) can outperform WIN5-RB-S (trained on single σ = 10, 30, 50, 70 separately) in some cases. PSNR (dB) / SSIM σ

BM3D [3]

RED-Net [16]

DnCNN [26]

WIN5

WIN5-R

WIN5-RB-S

WIN5-RB-B

10 30 50 70

34.02/0.9182 28.57/0.7823 26.44/0.7028 25.23/0.6522

32.96/0.8963 29.05/0.8049 26.88/0.7230 26.66/0.7108

34.60/0.9283 29.13/0.8060 26.99/0.7289 25.65/0.6709

34.10/0.9205 28.93/0.7987 28.57/0.7979 27.98/0.7875

34.43/0.9243 30.94/0.8644 29.38/0.8251 28.16/0.7784

35.83/0.9494 33.62/0.9193 31.79/0.8831 30.34/0.8362

35.43/0.9461 33.27/0.9263 32.18/0.9136 31.07/0.8962

30 50 70

1.67 2.87 2.93

69.25 70.34 69.99

13.61 13.76 12.88

15.78 22.72 19.28

20.39 21.79 20.86

15.82 13.79 13.17

Run Time(s)

Quantitative Result

The quantitative result on test set BSD200 is shown on Table 1 including noise levels σ = 10, 30, 50, 70. Moreover, we compare WIN5-RB-B, DnCNN and BM3D behaviors at different noise levels of average PSNR on BSD200-test. As we can see from Fig.5, WIN5-RB-B (blind denoising) trained for σ = [0 − 70] outperforms BM3D [3] and DnCNN [26] on all noise levels and is significantly more stable even on higher noise levels. In addition, in Fig.5, as the noise level is increasing, the performance gain of WIN5-RB-B is getting larger, while the performance gain of DnCNN comparing to BM3D is not changing much as the noise level is changing. Compared with WINs, DnCNN is composed of even more layers embedded with BN. This observation indicates that the performance gain achieved by WIN5-RB does not mostly come from BN’s regularization effect but the pixel-distribution features learned and relevant priors such as means and variances reserved in WINs. Both Larger kernels and more channels can promote CNNs more likely to learn pixel-distribution features. 3.2

36

WIN5−RB−B DnCNN BM3D

34

PSNR (dB)

3.1

15.36 16.70 16.10

32 30 28 26 10

20

30 σ

40 50 nosie

60

70

Figure 5: Behavior at different noise levels of average PSNR on BSD200-test. WIN5-RB-B (blind denoising) is trained for σ = [0 − 70] and outperforms BM3D [3] and DnCNN [26] on all noise levels and is significantly more stable even on higher noise levels.

Visual results

For Visual results, We have various images from two different datasets, BSD200-test and Set12, with noise levels σ = 10, 30, 50, 70 applied separately. One image from BSD200-test with noise level=10

3

(b) Noise=10 / 28.13dB  /  0.7012

(c) BM3D / 33.42dB  /  0.9031

(d) RED-Net / 32.49dB /  0.8951

(e) DnCNN / 34.31dB  /  0.9186

(f) WIN5 / 33.82dB /  0.9110

(g) WIN5-R / 34.14dB /  0.9142

(h) WIN5-RB / 36.01dB /  0.9589

(a) Ground-truth

(i) WIN5-RB-B / 35.23dB /  0.9542

Figure 6: Visual results of one image from BSD200-test with noise level σ = 10 along with PSNR(dB) / SSIM. As we can see, our proposed methods can yield more natural and accurate details in the texture as well as visually pleasant results. One image from BSD200-test with noise level=30

4

(a) Ground-truth

(e) DnCNN / 30.70dB / 0.8661

(b) Noise=30 / 18.78dB / 0.2496

(f) WIN5 / 30.34dB / 0.8556

(c) BM3D / 30.11dB / 0.8481

(d) RED-Net / 30.43dB / 0.8597

(g) WIN5-R / 31.66dB / 0.8780

(h) WIN5-RB / 33.65dB / 0.9010

Figure 7: Visual results of one image from BSD200-test with noise level σ = 30 along with PSNR(dB) / SSIM. As we can see, our proposed methods can yield more natural and accurate details in the texture as well as visually pleasant results. One image from BSD200-test with noise level=50

5

(a) Ground-truth

(e) DnCNN / 23.70dB / 0.5872

(b) Noise=50 / 14.78dB / 0.2652

(f) WIN5 / 23.88dB / 0.5858

(c) BM3D / 23.10dB / 0.5163

(g) WIN5-R / 24.70dB / 0.6640

(d) RED-Net / 23.48dB / 0.5610

(h) WIN5-RB / 26.95dB / 0.8254

Figure 8: Visual results of one image from BSD200-test with noise level σ = 50 along with PSNR(dB) / SSIM. As we can see, our proposed methods can yield more natural and accurate details in the texture as well as visually pleasant results. One image from BSD200-test with noise level=70

6

(a) Ground-truth

(d) RED-Net / 29.93dB / 0.8534

(g) WIN5-R/ 32.17dB / 0.8912

(b) Noise=70 / 12.35dB / 0.1509

(e) DnCNN / 28.38dB / 0.8287

(h) WIN5-RB / 33.82dB / 0.8459

(c) BM3D / 27.91dB / 0.8172

(f) WIN5 / 31.09 dB / 0.8865

(i) WIN5-RB-B / 34.55dB / 0.9067

Figure 9: Visual results of one image from BSD200-test with noise level σ = 70 along with PSNR(dB) / SSIM. As we can see, our proposed methods can yield more natural and accurate details in the texture as well as visually pleasant results. One image from Set12 with noise level=10

7

(a) Ground-truth

(b) Noise=10 / 28.13dB / 0.7022

(c) BM3D / 34.18dB / 0.9199

(d) RED-Net / 32.95dB / 0.8932

(e) DnCNN / 34.67dB / 0.9262

(f) WIN5 / 34.12dB / 0.9188

(g) WIN5-R / 34.53dB / 0.9235

(h) WIN5-RB / 37.00dB / 0.9553

(i) WIN5-RB-B / 36.32 / 0.9535

Figure 10: Visual results of one image from Set12 with noise level σ = 10 along with PSNR(dB) / SSIM. As we can see, our proposed methods can yield more natural and accurate details in the texture as well as visually pleasant results. One image from Set12 with noise level=30

8

(a) Ground-truth

(b) Noise=30 / 18.71dB / 0.3263

(c) BM3D / 28.74dB / 0.8095

(d) RED-Net / 28.99dB / 0.8180

(e) DnCNN / 29.13dB / 0.8219

(f) WIN5 / 28.92dB / 0.8143

(g) WIN5-R/ 31.50dB / 0.8919

(h) WIN5-RB / 35.65dB / 0.9518

(i) WIN5-RB-B / 34.78dB / 0.9512

Figure 11: Visual results of one image from Set12 with noise level σ = 30 along with PSNR(dB) / SSIM. As we can see, our proposed methods can yield more natural and accurate details in the texture as well as visually pleasant results. One image from Set12 with noise level=50

9

(a) Ground-truth

(b) Noise=50 / 14.59dB / 0.1797

(c) BM3D / 26.23dB / 0.7164

(d) RED-Net / 26.77dB / 0.7379

(e) DnCNN / 26.83 / 0.7393

(f) WIN5 / 27.99dB / 0.7796

(g) WIN5-R / 30.00dB / 0.8573

(h) WIN5-RB / 33.06dB / 0.9089

(i) WIN5-RB-B / 32.96dB / 0.9285

Figure 12: Visual results of one image from Set12 with noise level σ = 50 along with PSNR(dB) / SSIM. As we can see, our proposed methods can yield more natural and accurate details in the texture as well as visually pleasant results.

References [1] D. F. Andrews and C. L. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society. Series B (Methodological), pages 99–102, 1974. [2] D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3642–3649. IEEE, 2012. [3] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. BM3D image denoising with shape-adaptive principal component analysis. In SPARS’09-Signal Processing with Adaptive Sparse Structured Representations, 2009. [4] C. Dong, Y. Deng, C. Change Loy, and X. Tang. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision, pages 576–584, 2015. [5] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006. [6] C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence, 35(8):1915–1929, 2013. [7] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. [8] J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982. [9] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. [10] V. Jain and S. Seung. Natural image denoising with convolutional networks. In Advances in Neural Information Processing Systems, pages 769–776, 2009. [11] N. Joshi, C. L. Zitnick, R. Szeliski, and D. J. Kriegman. Image deblurring and denoising using color priors. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1550–1557. IEEE, 2009. [12] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi. Residual interpolation for color image demosaicking. In 2013 IEEE International Conference on Image Processing, pages 2304–2308. IEEE, 2013.

10

[13] J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolution using very deep convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR Oral), June 2016. [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [15] S. Z. Li. Markov random field modeling in image analysis. Springer Science & Business Media, 2009. [16] X.-J. Mao, C. Shen, and Y.-B. Yang. Image restoration using convolutional auto-encoders with symmetric skip connections. arXiv preprint arXiv:1606.08921, 2016. [17] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, volume 2, pages 416–423. IEEE, 2001. [18] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, pages 416–423, July 2001. [19] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010. [20] Y. A. Rozanov. Markov random fields. In Markov Random Fields, pages 55–102. Springer, 1982. [21] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [22] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. [23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015. [24] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng. Patch group based nonlocal self-similarity prior learning for image denoising. In Proceedings of the IEEE International Conference on Computer Vision, pages 244–252, 2015. [25] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision, pages 818–833. Springer, 2014. [26] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. arXiv preprint arXiv:1608.03981, 2016.

11

Suggest Documents