1706
OPTICS LETTERS / Vol. 38, No. 10 / May 15, 2013
Defocus map estimation from a single image via spectrum contrast Chang Tang,1,* Chunping Hou,1 and Zhanjie Song2 1
School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China 2 School of Science and SKL of HESS, Tianjin University, Tianjin 300072, China *Corresponding author:
[email protected] Received April 4, 2013; revised April 17, 2013; accepted April 18, 2013; posted April 18, 2013 (Doc. ID 187792); published May 14, 2013
We present an effective method for defocus map estimation from a single natural image. It is inspired by the observation that defocusing can significantly affect the spectrum amplitude at the object edge locations in an image. By establishing the relationship between the amount of spatially varying defocus blur and spectrum contrast at edge locations, we first estimate the blur amount at these edge locations, then a full defocus map can be obtained by propagating the blur amount at edge locations over the entire image with a nonhomogeneous optimization procedure. The proposed method takes into consideration not only the affect of light refraction but also the blur texture of an image. Experimental results demonstrate that our proposed method is more reliable in defocus map estimation compared to various state-of-the-art methods. © 2013 Optical Society of America OCIS codes: (100.2960) Image analysis; (100.2000) Digital image processing; (330.4595) Optical effects on vision; (110.5200) Photography. http://dx.doi.org/10.1364/OL.38.001706
Due to the small lenses and sensors of traditional pointand-shoot cameras and optical imaging characteristics [1], many types of pictures are captured with sharp foreground and blurred background (i.e., focused foreground and defocused background). Defocus estimation is useful and important in many computer vision applications, including depth estimation [2], defocus magnification [3], image deblurring [4], refocusing and image quality assessment. Existing defocus estimation methods can be categorized into two classes: multiple images-based methods [5–7] and single image-based methods [8–10]. The former usually use a set of images captured by multiple camera focus settings, then defocus estimation is finished by a machine learning process. The latter mainly estimate accurate defocus at the edge locations, then the full defocus map is obtained using a propagation method. The multiple images-based methods have some limitations in practical applications due to their suffering from occlusion problem and requirement of a scene to be static. In contrast, defocus recovery from only a single image is more practical. Elder and Zucker [11] found the locations and the blur amount of edges according to the first- and second-order derivatives of the input image, they only got a sparse defocus map. Bae and Durand [3] extended this work to get a full defocus map from the sparse map by a defocus interpolation method. Namboodiri and Chaudhuri [12] modeled the defocus blur as a heat diffusion process and estimated defocus blur at edge locations using the inhomogeneous inversion heat diffusion. Zhuo and Sim [10] calculated defocus blur amount from the ratio between the gradients of input and reblurred images at edge locations, then obtained a full defocus map by propagating the blur amount at edge locations to the entire image. In this Letter, we propose a different and effective defocus estimation method from a single image captured by an uncalibrated conventional camera. Due to the fact that the amount of blur can be estimated reliably only in areas of an image that has significant frequency content [3], we focus on the spectrum contrast at edge locations 0146-9592/13/101706-03$15.00/0
to estimate defocus blur amount and then obtain the full defocus map by a blur propagation process. In particular, we take into account the chromatic aberration caused by wavelength-dependent variation of the index of refraction of lens to reduce the error of edge detection. In order to remove high-frequency noise and texture ambiguity, a Gaussian filter with small kernel is used before calculating spectrum contrast. Our defocus map is more accurate and reliable compared to the results of other various state-of-the-art methods. Specifically, we first analyze the defocus process, as shown in Fig. 1. We assume that focus and defocus obey the thin lens model. All the rays from a point of the object which is placed at the focus distance S f will converge to a single sharp point on the image sensor, rays from other distances, such as S 1 and S 2 , cannot focus well but form multiple sensor points and result in a blurred image. This blur pattern is also called the circle of confusion [1] (C 1 and C 2 in Fig. 1). f 0 is the focal length of camera lens. Generally, the defocus process can be modeled as a convolution of a sharp image with the point spread function (PSF). Here, we take into account the chromatic aberration, and the defocus model is given by dX; λ f X; λ pX; λ sλ;
(1)
where f X; λ is an idea focused image which gives the radiance at each location X x; y on the image sensor plane, λ is the wavelength of color rays which refract
Fig. 1.
Focus and defocus for thin lens model.
© 2013 Optical Society of America
May 15, 2013 / Vol. 38, No. 10 / OPTICS LETTERS
through the lens. sλ is a wavelength sensitivity function which causes the chromatic aberration. The PSF pX; λ is usually approximated by a Gaussian function gx; y; λ; σ and its blur Gaussian kernel σ is the defocus value which we need to estimate. dX; λ is the final defocus image captured by camera. Before calculating spectrum contrast, a Gaussian filter with small kernel is applied to the original image to remove high-frequency noise and suppress the blur caused by blur texture (e.g., soft shadows or blur patterns). The Gaussian filter is given by gx; y; σ 1
2 1 x y2 ; exp − 2σ 21 2πσ 21
(2)
we set the Gaussian kernel σ 1 to 0.5 in our experiment. Note that we only calculate the spectrum contrast at edge locations, we use Canny edge detector [13] to perform our edge detection on the input image. According to the refraction regularity, chromatic aberrations are caused by the refractive index of the lens being wavelength-dependent. This causes that color channels of an RGB color image will appear shifted. More detailed discussion of these artifacts can be found in the paper by Kang [14]. One can convert the image to gray scale for edge detection but it is not accurate. In order to reduce the error of edge detection, we perform edge detection on each color channel independently and only keep edges that are detected within five pixels of each other in channel R, G, and B. The chromatic aberration function sλ in Eq. (1) is light wavelength-dependent. After our edge detection optimization, we can drop the chromatic aberration function sλ. Then we substitute the PSF with a Gaussian function. Equation (1) can be abbreviated as dX f X gX; σ σ 1 :
(3)
In the Fourier domain, Eq. (3) can be expressed as
Through the nonlinear regression analysis of statistical mathematics, the relationship between σi and Ci in Eq. (6) can be approximated as q σi 1∕ expci − σ 21 ;
~ T Dm − m; ~ Em mT Lm λm − m
1X log‖Aj‖‖; N j∈B
(8)
~ and m represent the vector forms of sparse dewhere m ~ focus map mi and full defocus map mi, respectively. L is called the matting Laplacian matrix. D is a diagonal matrix and its element Dii is 1 if pixel i is at the edge location, and 0 otherwise. The constant λ is a parameter that balances the fidelity and smoothness of final defocus map. We set λ to 0.005 in our experiment. The i; j element of L is defined as X
δij −
−1 1 ε 1I i −μk T Σk U3 I j −μk ; jωk j jωk j
(4)
where fd; Dg, ff ; Fg, fg; Gg are Fourier pairs. We define our spectrum contrast as the absolute value of spectrum amplitude difference between one pixel and its adjacent pixels by allowing for the log spectrum representation, given as follows: Ci ‖ log‖Ai‖ −
(7)
where ci is the inverse Fourier transform of Ci. The σi values at the edge locations form a sparse defocus ~ map mi. However, quantization error, such as noise and weak edges can result in erroneous estimates of defocus. In order to suppress the influence of these outliers, we apply joint bilateral filtering (JBF) [15] to our sparse defocus map at edge locations. The JBF can rectify some inaccurate defocus values by using their adjacent defocus values along the edge. Figure 2 shows the sparse defocus map refinement using JBF. As we can see from Fig. 2, some errors in the sparse defocus map can be well corrected by the JBF. Since we get the sparse defocus map at edge locations, a full defocus map can be recovered by an edge-aware interpolation method [16]. Here, we choose the matting Laplacian optimization method [17] to perform the defocus map propagation. Similar to [17], we formulate our defocus propagation process as minimizing the following cost function:
kji;j∈ωk
Du; v Fu; vGu; v; σ σ 1 ;
1707
where δij is the Kronecker delta, I i and I j are the colors of the input image I at pixel i and j, respectively. μk and Σk are the mean and covariance matrix of the colors in
(5)
where Ai represents the spectrum amplitude of pixel i, B is the neighborhood of pixel i given by the size of N, we set N 3 × 3 in our experiment. By substituting Eq. (4) into Eq. (5), Eq. (5) can be rewritten as Ci ‖ log‖Fu; vGu; v; σi σ 1 ‖ 1X log‖Fu; vGu; v; σj σ 1 ‖‖; (6) − N j∈B
(a)
(b)
(c)
Fig. 2. Sparse defocus map refinement using JBF. (a) Original input defocus image, (b) sparse defocus map before JBF, and (c) sparse defocus map after JBF.
1708
OPTICS LETTERS / Vol. 38, No. 10 / May 15, 2013
(a)
(b)
(c)
Fig. 5. Comparison of our method with Zhuo and Sim’s method. (a) Input image, (b) Zhuo and Sim’s method, and (c) our method. (a)
(b)
(c)
Fig. 3. Comparison of our method with Bae and Durand’s method. (a) Input image, (b) Bae and Durand’s method, and (c) our method.
(a)
(b)
(c)
Fig. 4. Comparison of our method with the inverse diffusion method. (a) Input image, (b) inverse diffusion method, and (c) our method.
window ωk , U 3 is a 3 × 3 identity matrix, jωk j is the size of window ωk , ε is a regularization parameter, which approximates to 0. Readers can refer to [17] for the detailed derivation of the matrix. m can be obtained by solving the following sparse linear system: ~ L λDm λDm:
(9)
In this Letter, our experiments were conducted on some natural images [10]. Here, we randomly select three of the results listed in Figs. 3(c), 4(c), and 5(c) and compare our method with other three methods. As can be seen, our method can recover defocus maps corresponding to multiple layers, and the estimated defocus map maintains continuous change of homogeneous regions and discontinuity of edge areas. Figure 3 shows the comparison of our method with Bae and Durand’s method [3]. There are still some visible errors (the white noisy points) in Bae and Durand’s method. However, our method is more accurate and the final defocus map is smoother. A comparison of our method with the inverse diffusion method is shown in Fig. 4. As can be seen, the result of the inverse diffusion method is not fine enough (i.e., the slender flower part is not well separated from the background and contains some obvious error blocks). In contrast, our method can well solve this problem, the flower can be clearly distinguished. In Fig. 5, we compare our method with Zhuo and Sim’s method [10], and find that Zhuo and Sim’s method can obtain smooth defocus map but it cannot handle blur texture of the input image. As the flower center marked by a white rectangle in Fig. 5(a), Zhuo and Sim’s method treats it as defocus blur. This results
in error defocus estimation in that region. Our method can suppress this ambiguity to a great extent. In conclusion, we have presented a novel and effective method for defocus map estimation from a single natural image. This work was supported by the 863 Program of China (grant 2012AA03A301), the National Natural Science Foundation of China (grant 60932007), Ph.D. Programs Foundation of the Ministry of Education of China (grant 20110032110029), and Key Projects in the Tianjin Science & Technology Pillar Program (grant 11ZCKFGX02000). The authors thank Prof. Frédo for providing their code. References 1. E. Hecht, Optics 4th Series (Addison-Wesley, 2002). 2. P. Birch, A. Buchanan, R. Young, and C. Chatwin, Opt. Lett. 36, 2194 (2011). 3. S. Bae and F. Durand, Comput. Graph. Forum 26, 571 (2007). 4. A. N. Simonov and M. C. Rombach, Opt. Lett. 34, 2111 (2009). 5. P. Favaro and S. Soatto, IEEE Trans. Pattern Anal. Mach. Intell. 27, 406 (2005). 6. M. Burger, P. Favaro, S. Soatto, and S. J. Osher, IEEE Trans. Pattern Anal. Mach. Intell. 30, 518 (2008). 7. O. Cossairt, Z. Changyin, and S. Nayar, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2010), pp. 1110–1117. 8. Z. Wei and C. Wai-Kuen, in Proceedings of the IEEE 12th International Conference on Computer Vision Workshops (ICCV) (IEEE, 2009), pp. 1947–1954. 9. Y.-W. Tai and M. S. Brown, in Proceedings of the IEEE International Conference on Image Processing (ICIP) (IEEE, 2009), pp. 1797–1800. 10. S. Zhuo and T. Sim, Pattern Recogn. 44, 1852 (2011). 11. J. H. Elder and S. W. Zucker, IEEE Trans. Pattern Anal. Mach. Intell. 20, 699 (1998). 12. V. P. Namboodiri and S. Chaudhuri, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2008), pp. 1–6. 13. J. Canny, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-8, 679 (1986). 14. S. B. Kang, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2007), pp. 2025–2032. 15. E. Eisemann and F. Durand, ACM Trans. Graph. 23, 673 (2004). 16. D. Lischinski, A. Levin, and Y. Weiss, ACM Trans. Graph. 23, 689 (2004). 17. D. Lischinski, A. Levin, and Y. Weiss, IEEE Trans. Pattern Anal. Mach. Intell. 30, 228 (2008).