Multi-Frame Example-Based Super-Resolution Using

0 downloads 0 Views 8MB Size Report
resolution approach to reconstruct a high-resolution image from several low-resolution video frames. The proposed algorithm consists of three steps: i) definition ...
Multi-Frame Example-Based Super-Resolution Using Locally Directional Self-Similarity Seokhwa Jeong, Student Member, IEEE, Inhye Yoon, Student Member, IEEE, and Joonki Paik, Senior Member, IEEE

Abstract — This paper presents a multi-frame superresolution approach to reconstruct a high-resolution image from several low-resolution video frames. The proposed algorithm consists of three steps: i) definition of a local search region for the optimal patch using motion vectors, ii) adaptive selection of the optimum patch based on lowresolution image degradation model, and iii) combination of the optimum patch and reconstructed image. As a result, the proposed algorithm can remove interpolation artifacts using directionally adaptive patch selection based on the lowresolution image degradation model. Moreover, superresolved images without distortion between consecutive frames can be generated. The proposed method provides a significantly improved super-resolution performance over existing methods in the sense of both subjective and objective measures including peak-to-peak signal-to-noise ratio (PSNR), structural similarity measure (SSIM), and naturalness image quality evaluator (NIQE). The proposed multi-frame superresolution algorithm is designed for real-time video processing hardware by reducing the search region for optimal patches, and suitable for consumer imaging devices including ultra-high-definition (UHD) digital televisions, surveillance systems, and medical imaging systems for image restoration and enhancement1. Index Terms — Super-resolution, patch-based restoration, image enhancement, UHD TV.

image

I. INTRODUCTION As the use of high-resolution display device increases, the demand for higher quality video contents is growing fast for ultra-high-definition televisions (UHD TVs) and highresolution mobile devices. Although various super-resolution (SR) methods were proposed in the literature, most of them 1 This work was supported in part by the Technology Innovation Program (Development of Super Resolution Image Scaler for 4K UHD) under Grant K10041900, by the ICT R&D program of MSIP/IITP. [14-824-09002, Development of global multi-target tracking and event prediction techniques based on real-time large-scale video analysis, and This research was supported by the MSIP(Ministry of Science, ICT&Future Planning), Korea, under the ITRC(Information Technology Research Center) support program (NIPA-2014-H0301-14-1044) supervised by the NIPA(National ICT Industry Promotion Agency). Seokhwa Jeong is with the Department of Image, Chung-Ang University, 156-756, Seoul, Korea (e-mail: [email protected]). Inhye Yoon is with the Department of Image, Chung-Ang University, 156756, Seoul, Korea (e-mail: [email protected]). Joonki Paik is with the Department of Image, Chung-Ang University, 156756, Seoul, Korea (e-mail: [email protected]).

are not suitable for consumer products because of the need of expensive hardware and indefinite processing times. There are various image upscaling methods to enhance the resolution using digital image processing algorithms. Image interpolation method is a simple method to enlarge the size of a low resolution image. Commonly used interpolation methods include nearest-neighbor, bi-linear, and bi-cubic interpolations [1]. Although interpolation-based can be easily implemented and provide fast processing time, they cannot reconstruct the original high-frequency component, and exhibit undesired artifacts such as blurring and jagging. In order to solve the problems in upscaling images, Freeman et al. proposed the example-based SR algorithm that uses patches from training images based on the learning that nearest neighbor search [2]. Glasner et al. applied the concept of self-similarity into the SR process by selecting similar patches in the single input image [3]. Freedman et al. extended the example-based SR method using self-examples and iterative image upscaling [4]. However, these examplebased methods cannot guarantee that the super-resolved result has the optimally reconstructed high-frequency component in the target high-resolution (HR) image. A multi-frame approach was proposed to improve the quality of single-frame SR algorithms by utilizing an increasing amount of patch information. Farsiu et al. proposed a fast and robust SR method using maximum a posteriori (MAP) estimation. Wei et al. proposed a patch detection method using a structural similarity with brightness value without using the patch registration process [7]. However, this method is difficult to implement in real time because of the computational expensive searching process. In order to solve the problems of existing SR methods, this paper presents a multi-frame example-based patch selection algorithm combined with self-similarity based on the lowresolution (LR) image degradation model. The proposed SR algorithm can significantly increase the visual quality of LR images in the sense of preserving high-frequency image details without various interpolation or SR artifacts. For that reason, the proposed method can be applied to high-definition image display device such as a UHD TV and high-end consumer mobile imaging devices. This paper is organized as follows. Section II describes the LR image acquisition model with theoretical background, and Section III presents the proposed SR algorithm that combines multi-frame patch selection and image restoration. Experimental results are given in Section IV, and Section V

approximated finite impulse response (FIR) filter.

concludes the paper. II. THEORETICAL BACKGROUND

III. MULTI-FRAME EXAMPLE-BASED SUPER-RESOLUTION

LR image is acquired by various degradation factors such as lens distortion, limited resolution of image sensor, lossy compression, and down-scaling in the digital image devices. This problem can be solved by restoration of HR image using a SR algorithm. In other words, SR is a process of image restoration by which one or more LR images are used to generate a HR image. Fig. 1 shows the LR image degradation model followed by the SR process.

In this section, the proposed multi-frame example-based SR algorithm is described in detail. The proposed SR algorithm is an extended version of Jeong’s work in [8], and consists of three steps: i) definition of a local search region for the optimal patch using motion vectors, ii) adaptive selection of the optimum patch based on low-resolution image degradation model, and iii) combination of the optimum patch and reconstructed image. Fig.2 shows the block diagram of the proposed SR method.

Fig. 2. Block diagram of the proposed SR algorithm.

Fig. 1. Image degradation model and HR image acquisition model using SR.

The mathematical expression of the LR image degradation model is given as g i  S  Hf i   i

,

(1)

where f i represents the MN  1 row-ordered vector of the M  N original HR image in the

i -th frame, gi the

( MN / r )  1 row-ordered vector of the ( M / r )  ( N / r ) LR 2

image in the i -th frame, H the MN  MN block-circulant matrix of the point-spread-function (PSF), i the additive noise

in

the

i -th frame, and S represents the

( MN / r )  MN matrix of the down sampling operator. 2

The matrix S is defined as s  s  s ,

(2)

where  represents the kronecker product, and S the onedimensional subsampling matrix. There are fast image restoration methods that can generate a high-resolution image fˆ in the frequency domain using the block-circulant matrix property of the linear spatially invariant PSF. However, a spatially varying or adaptive restoration should be implemented using iterative optimization or in an

The enlarged or up-scaled image g E is generated by simple interpolation from the t -th input image g t . The HR image fˆ t

is reconstructed by synthesizing optimally selected patches from patch pairs PLH that is generated in the local search region estimated by motion vector between adjacent frames.  represents the position of local search region. The reconstructed HR image fˆt is refined by back-projection to obtain the final result image fˆ .

A. Definition of Local Search Region for the Optimal Patch Freeman et al. used self-similarity for example-based SR to avoid indefinite amount of search for the best similar patch from the patch dictionary, and it has experimentally proved that a similar patch could be found by self-similarity [4]. However, the example-based SR method still requires the high computational load for searching the patch in the large databased. In this paper, a novel patch searching method is presented using local self-similarity in adjacent video frames without creating a patch dictionary. In order to reduce the computational complexity in multiframe SR, the proposed method resolves by using motion information to confine the patch searching region as small as possible. The local search region is also defined using pregenerated down-scaled images ( g st 1 , g st , and g st 1 as shown in Fig 2) for fast motion vector estimation.

Fig. 4. The estimated local region for patch searching.

Fig. 3. Motion estimation process for local region definition.

Although a block matching algorithm is used for motion estimation in this work, any other motion estimation methods can be used. The block motion is estimated by computing the sum of absolute difference (SAD) between two temporally adjacent frames based on spiral partial pixel-to-pixel differences (SpiralPDE) [9]. The SpiralPDE is a fast motion estimation method using a spiral outward trajectory starting from the center of the search window. The SAD computation is forced to stop if the partial error becomes greater than the previously determined minimum error. The partial SAD accumulates matching error of sub-block in the search window, and the accumulated matching error is compared with the minimum block SAD defined as Sk 

k

N 1



g t ( i , j )  g t 1 ( i  x , j  y )

i0 j0

,

B. Patch Searching Based on Image Degradation Model The proposed patch searching algorithm is based on LR image degradation model to remove interpolation artifacts in the up-scaling process. The LR degraded image is first generated using the degradation model given in (1). Next, patch pairs are generated in each frame using the input image and its degraded version. The input image provides the HR patches while its degraded version is considered as the LR counterpart. Pairs of HR and LR patches, PLH , in local region are classified according to the quantized orientations, such as 0 , 45 ,90 , and 135 , in order to reduce the searching time [10][11]. Fig. 5 shows the pair of patches produced using the image degradation model.

(3)

, k  0,1, ....., N  1

where N represents the size of a sub-block in the search window, (i, j ) the pixel coordinate of current block, and ( x, y ) the pixel coordinate of the reference block. g t 1 and g t represents the current and reference frames, respectively.

The estimated motion vector is compared with eight neighboring motion vectors as 8

 i 1

m  mi   m

,

(4)

where m represents the estimated motion vector of the current block, mi the eight neighboring motion vectors, and  m the pre-specified threshold for motion error. If the sum in (5) is larger than the threshold, the motion vector m is considered to be wrong, and is replaced by the median values of mi , for i  {1, ,8} . The finally estimated local regions are shown in Fig. 4.

Fig. 5. Generation of LR and HR patch pairs using the image degradation model.

Input image g t is assumed as a HR image. To generate the corresponding LR image, g t is intentionally blurred and then down-scaled based on the LR image degradation model. In order to add the interpolation artifacts as a degradation factor, the finally degraded image g Dt is generated by interpolating the down-scaled LR image. Since LR patches contains the information of interpolation artifacts, accuracy of patch searching increases in appropriately degraded images. All the patches are generated in local regions of g t and g Dt . The local region  in the patch pair generation process is saved for the successive reconstruction. The optimal patch is selected by minimizing the patch i along the mismatching error (PME) between g EP and g LP

same patch direction. A high PME value tends to decrease the probability of existence of a similar patch in local region. For this reason, if the PME value is bigger than a pre-specified threshold, the patch g EP of up-scaled image g E is used instead of the selected patch. Mathematical expression of the PME is given as EP 

|g

EP

( x , y )  g i LP ( x , y ) | ,

x, y

(5)

where ( x, y ) represents the patch location, g EP the patch of enlarged or up-scaled image g E , and g local region in the i -th frame.

i LP

process, dotted lines represent an open-loop process.

Fig. 7. Image back projection diagram.

Super-resolved image fˆ t of the proposed multi-frame

the LR patch of the

super-resolution method is down-sampled to the same size of the input image g t and then calculates the error value

C. Optimal Patch Combination and Multi-frame Image Reconstruction The proposed image restoration method estimates the HR image by searching the optimal patch and back-projection to refine the HR image. The optimal patch is selected with the number of input frames, and selected patches are combining using the weight values defined as

between down-sampled and input image g t . The error is up-

W 

 ( g  g i )2 1 exp   EP 2 LP 2  2 

,  

(6)

i the LR where g EP represents the patches of g E , and g LP patch of the local region in the i -th frame. Based on the definition of the weight value in (6), the lower the PME value is, the more portions is used. Fig. 6 shows the proposed patch combination process.

scaled using bi-cubic interpolation to the same size of the super-resolved image fˆ , and then low-pass filtered. Finally, t

the Gaussian filtered error is added to the resulting image fˆ to increase naturalness. Fig. 8 shows experimental results using 352  288 Foreman image using interpolation and the proposed SR methods. The proposed method uses three temporally adjacent frames. Fig. 8(a) shows the restored result using bi-cubic interpolation, Fig. 8(b) shows the restored result using the proposed algorithm, and Fig. 8(c) shows the final result refined by back-projection

(a)

(b)

(c)

Fig. 8. Comparison of three different SR methods including the proposed one.

IV. EXPERIMENTAL RESULTS

Fig. 6. The proposed patch combination method.

The proposed self-example-based SR method generates visually high-resolution result image. However, it generates over-amplified edge patterns by adding high frequency component of selected patch in the up-scaled image gE . To solve the problem, the back projection scheme is applied to refine the reconstructed HR image. Fig. 7 shows the block diagram of the proposed image back projection algorithm. As shown in the block diagram, solid lines represent the iterative

This section presents experimental results to compare the restoration performance of the proposed algorithm with existing methods. For the objective evaluation, peak-to-peak signal-to-noise ratio (PSNR), structural similarity measure (SSIM), and naturalness image quality evaluator (NIQE) [12] are used. For numerical evaluation, an original HR image is down-sampled by four times to generate the simulated LR image. The simulated LR image is used as the input of the SR algorithm, and the results of SR algorithms are compared with the original HR image to compute numerical similarities in the sense of objective measures such as PSNR, SSIM, and NIQE. For subjective evaluation of the SR performance, 352  288 Football image is enlarged by four times using three different SR methods as shown in Fig. 9. As shown in Figs. 9(a) and 9(b), the bi-cubic interpolation method could not successfully restore the high-frequency components whereas Freeman’s method produced a significant amount of artifacts near edges. Although Cho’s method could reconstruct the high-frequency

component with less amount of SR artifacts, it still exhibited unnaturally amplified edges as shown in Fig. 9(c). On the other hand, the proposed method successfully restored image details without unnatural edges overshooting or interpolation artifacts as shown in Fig. 9(d).

(c)

(a)

(b)

(d)

(c)

(d)

Fig. 9. Experimental results of four times upscaling using four different methods: (a) bi-cubic interpolation, (b) Freeman’s example-based SR method [2], (c) Cho’s method [11], and (d) the proposed method.

Fig. 10 shows the result of upscaling a 1920x1080 (2K) high-definition (HD) image to the 3840x2160 (4K) ultra-highdefinition (UHD) image. Fig. 10(a) shows three frames in the 2K video, and Fig. 10(b) shows a 4K resulting image frame of the proposed SR method. For clearer comparison, Fig. 10(c) shows the 4K result using bi-cubic interpolation, and Fig. 10(d) shows the same result the proposed SR method. As shown in Fig. 10, the proposed method produces much sharper and more natural result than the bi-cubic interpolation method.

Fig. 10. Experimental results of 2K-to-4K upscaling for the UHD TV: (a) 2K multiple input images (6,7,and 8 frames of little girl image), (b) 4K UHD resolution image restored by the proposed method, (c)results of bicubic interpolation, and (d) results of proposed SR method.

In order to provide an extended set of comparative analysis with existing SR methods, Table I summarizes test results using a single image in the sense of objective quality measures including PSNR, SSIM, and NIQE. TABLE I SUMMARY OF PSNR, SSIM, AND NIQE Input Image Size

Algorithm

PSNR

SSIM

NIQE

352  288

Bi-linear Freeman [2] Cho [11] Proposed

30.7385 26.3526 26.2648 31.4528

0.9166 0.7568 0.8078 0.9318

6.4124 6.0343 6.8299 5.3526

352  288

Bi-linear Freeman [2] Cho [11] Proposed

27.7171 25.8529 25.6105 28.1175

0.9289 0.9076 0.9228 0.9424

5.8670 4.9896 6.0053 5.1246

352  288

Bi-linear Freeman [2] Cho [11] Proposed

25.8543 23.6643 24.4426 26.0072

0.9146 0.8608 0.8824 0.9263

5.5574 5.0334 5.9230 5.0301

(a)

Table II summarizes the restored results using 4K video frames for the objective evaluation in the sense of PSNR, SSIM, and NIQE.

(b)

[9]

Algorithm

PSNR

SSIM

NIQE

4K

Bi-linear Freeman [2] Cho [11] Proposed

26.0472 26.5842 26.8006 28.4188

0.8570 0.8480 0.8963 0.9210

4.9307 4.9564 5.9947 4.9463

4K

Bi-linear Freeman [2] Cho [11] Proposed

33.1453 31.2563 31.1702 34.7841

0.9672 0.9465 0.9693 0.9822

5.1162 4.8056 5.1050 4.2860

B. Montrucchio and D. Quaglia, “New sorting-based lossless motion estimation algorithms and a partial distortion elimination performance analysis,” IEEE Trans. Circuits Syst. Video Technology, vol. 15, no. 2, pp. 210-220, February 2005. [10] S. Yang, Y. Kim, and J. Jeong, “Fine edge-preserving technique for display devices,” IEEE Trans. Consumer Electronics, vol. 54, no. 4, pp. 1761-1769, November 2008. [11] C. Cho, J. Jeon, and J. Paik, “Example-based super-resolution using selfpatches and approximated constrained least squares filter,” Proc. IEEE Int. Conf. Image Processing, pp. 2140-2144, October 2014. [12] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a completely blind image quality analyzer,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209-212, Nov. 2012.

4K

Bi-linear Freeman [2] Cho [11] Proposed

31.3533 28.3206 29.8145 33.6264

0.9507 0.9364 0.9526 0.9774

5.3093 3.8111 5.2495 4.9274

BIOGRAPHIES

TABLE II SUMMARY OF PSNR, SSIM, AND NIQE Input Image Size

Seokhwa Jeong (S’14) was born in Seoul, Korea in 1988. He received the B.S. degree in electronic engineering from Korea National University of Transportation, Korea, in 2013. Currently, he is pursuing M.S degree in image engineering at Chung-Ang University. His research interests include super-resolution, image enhancement and restoration for display processing, and video

V. CONCLUSION

The proposed multi-frame super-resolution algorithm provides significantly improved video quality by utilizing inter-frame correlation in searching the optimal patches. The major contribution of the proposed method is the complete removal of interpolation artifacts using the patch pairs based on the low-resolution image degradation model. In addition, the proposed method is particularly suitable for hardware implementation in consumer imaging devices since it searches the optimally similar patch in the local region of adjacent multiple frames. Experimental results show that the proposed algorithm provides better performance than existing state-of-the-art super-resolution algorithms in the sense of both objective and subjective measures. As a result, the proposed method can be used in various applications such as UHD TV up-scalers, high-resolution mobile imaging devices, and consumer visual surveillance systems.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8]

T. Lehmann, C. Gonner, and K. Spitzer, “Survey: interpolation methods in medical image processing,” IEEE Trans. Medical Imaging, vol. 18, no. 11, pp. 1049-1075, November 1999. W. Freeman, T. Jones, and E. Pasztor, “Example-based superresolution,” IEEE Computer Graphics and Applications, vol. 22, no. 2, pp. 56-65, March/April 2002. D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” Proc. IEEE Int. Conf. Computer Vision, pp. 349-356, September 2009. G. Freedman and R. Fattal, “Image and video upscaling from local selfexamples,” ACM Trans. Graphics, vol. 30, no. 2, pp. 12.1-12.11, April 2011. S. Farsiu, M. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe super resolution,” IEEE Trans. Image Processing, vol. 13, no. 10, pp. 1327-1344, October 2004. S. Farsiu, M. Elad, and P. Milanfar, “Multiframe demosaicing and superresolution of color images,” IEEE Trans. Image Processing, vol. 15, no. 1, pp. 141-159, January 2006. W. Bai, J. Liu, M. Li, and Z. Guo, “Multi-frame super-resolution using refined exploration of extensive self-examples,” Proc. Int. Conf. Multimedia Modeling, vol. 7732, pp. 403-413, January 2013. S. Jeong, I. Yoon, J. Jeon, and J. Paik, “Multi-frame Example-Based Super-Resolution Using Locally Directional Self-Similarity,” Proc. IEEE Int. Conf. Consumer Electronics, pp 664-665, January 2015.

stabilization. Inhye Yoon (S’10) was born in Suwon, Korea in 1988. She received the B.S. degree in electronic engineering from Kangam University, Korea, in 2010. She received the M.S. degree in image engineering from Chung-Ang University, Seoul, Korea, in 2012. Currently, she is pursuing Ph. D degree in image engineering at ChungAng University. Her research interests include image enhancement, color constancy, super-resolution, video stabilization, and forensic image processing. Joonki Paik (M’89-SM’12) was born in Seoul, Korea in 1960. He received the B.S. degree in control and instrumentation engineering from Seoul National University in 1984. He received the M.S. and the Ph.D. degrees in electrical engineering and computer science from Northwestern University in 1987 and 1990, respectively. From 1990 to 1993, he joined Samsung Electronics, where he designed the image stabilization chip sets for consumer's camcorders. Since 1993, he has joined the faculty at Chung-Ang University, Seoul, Korea, where he is currently a Professor in the Graduate school of Advanced Imaging Science, Multimedia and Film. From 1999 to 2002, he was a visiting Professor at the Department of Electrical and Computer Engineering at the University of Tennessee, Knoxville. Dr. Paik was a recipient of Chester-Sall Award from IEEE Consumer Electronics Society, Academic Award from the Institute of Electronic Engineers of Korea, and Best Research Professor Award from Chung-Ang University. He has served the Consumer Electronics Society of IEEE as a member of the Editorial Board. Since 2005, he has been the head of National Research Laboratory in the field of image processing and intelligent systems. In 2008, he has worked as a full-time technical consultant for the System LSI Division in Samsung Electronics, where he developed various computational photographic techniques including an extended depth of field (EDoF) system. From 2005 to 2007 he served as Dean of the Graduate School of Advanced Imaging Science, Multimedia, and Film. From 2005 to 2007 he has been Director of Seoul Future Contents Convergence (SFCC) Cluster established by Seoul Research and Business Development (R\&BD) Program. Dr. Paik is currently serving as a member of Presidential Advisory Board for Scientific/Technical policy of Korean Government and a technical consultant of Korean Supreme Prosecutor's Office for computational forensics.

Suggest Documents