non-parametric natural image matting - University of Surrey

1 downloads 0 Views 546KB Size Report
{m.farooqui, a.hilton, j.guillemaut, h.kim}@surrey.ac.uk. ABSTRACT ..... [10] A. Criminisi, P. Pérez, and K. Toyama, “Object removal by exemplar-based inpainting ...
NON-PARAMETRIC NATURAL IMAGE MATTING Muhammad Sarim, Adrian Hilton, Jean-Yves Guillemaut, Hansung Kim Centre of Vision, Speech and Signal Processing University of Surrey, Guildford, GU2 7XH, Surrey, United Kingdom. {m.farooqui, a.hilton, j.guillemaut, h.kim}@surrey.ac.uk ABSTRACT Natural image matting is an extremely challenging image processing problem due to its ill-posed nature. It often requires skilled user interaction to aid definition of foreground and background regions. Current algorithms use these predefined regions to build local foreground and background colour models. In this paper we propose a novel approach which uses non-parametric statistics to model image appearance variations. This technique overcomes the limitations of previous parametric approaches which are purely colour-based and thereby unable to model natural image structure. The proposed technique consists of three successive stages: (i) background colour estimation, (ii) foreground colour estimation, (iii) alpha estimation. Colour estimation uses patch-based matching techniques to efficiently recover the optimum colour by comparison against patches from the known regions. Quantitative evaluation against ground truth demonstrates that the technique produces better results and successfully recovers fine details such as hair where many other algorithms fail. Index Terms: Alpha matte, composite, trimap, nonparametric statistics 1. INTRODUCTION Image matting is widely used in video editing to compose foreground object. An image I is considered to be a composite of foreground image F and background image B. The observed colour of the ith pixel in image I can be modeled by the compositing equation Ii = αi Fi + (1 − αi ) Bi ,

(1)

where Fi and Bi are the pure foreground and background colours while alpha (αi ∈ [0, 1]) provides their blending proportion to form the composite colour Ii . Alpha values range from 0 to 1, where α = 0 for background, α = 1 for foreground. Mixed pixels at the foreground boundary have intermediate alpha values. The compositing equation is underconstrained as Fi , Bi and αi are unknown. In a three channel colour space we have three equations to solve for seven unknowns. The compositing equation can be constrained in a

978-1-4244-5654-3/09/$26.00 ©2009 IEEE

3213

studio environment by using uniform background, typically green or blue [1]. The assumption that the background colour does not appear in the foreground leads to a trivial solution to the compositing equation 1. Natural images have an arbitrary background and no limitation over background colour appearing in the foreground. In such images, background and foreground can be constrained by user interaction, normally in the form of a trimap. A trimap is typically a hand drawn partition of an image into three regions namely, definite foreground, background and unknown. Trimap-based techniques use the local information in the known foreground and background regions to build foreground and background colour models to estimate alpha values for every unknown pixel. A common example is the use of Gaussian mixture models [2, 3, 4]. Previous techniques like [2, 3, 4, 5, 6, 7, 8] use a trimap to solve equation 1 for every pixel in the unknown region by exploiting the local information in the known foreground and background regions. In Corel Knockout [5], F and B are assumed to be locally smooth and α is estimated by taking the weighted average of local known foreground and background pixels. In [2, 3, 4, 6], local foreground and background pixels are used to build colour distributions. These distributions are then used to estimate the foreground, background colour and alpha for every unknown pixel. These techniques tend to suffer when the distributions overlap or when the unknown region is wide. In [7], the alpha matte is estimated by solving a Poisson equation with the matte gradient field by taking the gradient of the compositing equation. If F and B are not smooth in the unknown region, errors may occur in the alpha matte. In such cases local changes to the matte gradient field are required to obtain a satisfactory matte. Techniques like [8] use sparse samples of known foreground and background pixels for every unknown pixel. Only the higher confidence sample pairs, which minimize the matting energy function, are used to estimate α, giving robustness against outliers. A propagation-based approach like [9] fits a linear model to the foreground and background colours in a local window, thus defining a quadratic cost function in alpha. Alpha is then estimated by globally minimizing this cost function. In this paper, we present a novel approach for estimating an alpha matte using non-parametric statistics. Previously, non-parametric statistics have been used to locally rep-

ICIP 2009

resent image statistics for inpainting [10] and view interpolation [11]. They provide a mechanism to represent local image features, colours and textures which attempts to preserve the spatial information of natural images. Given a trimap, we estimate foreground and background colours for every pixel in the unknown region using a patch-based similarity criteria. Initially the background is estimated using an inpainting technique. Foreground colour is estimated for all the pixels in the unknown region which are dissimilar to the constructed background. Foreground colour for each pixel is estimated by finding the most similar patch in the known foreground. Finally an alpha matte is generated using the computed foreground and background colours.

2. NON-PARAMETRIC IMAGE MATTING

2.2. Foreground colour estimation

Our technique is split into three main steps: (1) building a background for the unknown region, (2) estimating a foreground colour for every pixel in the unknown region which is different from the constructed background by a predefined distance threshold and (3) generating an alpha matte.

2.1. Background colour estimation Image infilling similar to [10] is used to construct a background in the unknown region from a known background. Fig 1 shows the complete background estimation process. Initially an image is split into three regions, background Φ (black), unknown region Ψ(gray) and foreground Θ(white) as shown in Fig 1a. The contour of Ψ where the background and the unknown region meet is found and the background is evolved inwards. We consider a template ψp centred at pixel p on the contour of Ψ. The pixels in ψp can be split into two sets, pΦ of background pixels and pΨ of unknown pixels. Let us denote by φ the set of all possible overlapping patches contained in the background Φ with the same dimensions as ψp . The template ψp is compared to all the patches in φ. The pixels pΦ are used to find the most similar patch φq in the set φ by 1 φq = arg min dΦ (ψp , φi ) , npΦ φi ∈φ

Fig. 1: Background estimation: (a) template comparison, (b) original image, (c) trimap, (d) unfilled region and (e) filled in unknown region

(2)

where, the distance dΦ (ψp , φi ) between patches ψp and φi is the sum of squared difference (SSD) in the RGB colour space for the pixels pΦ in ψp and the corresponding pixels in φi . The SSD is normalized by the number of known neighboring pixels npΦ in the template ψp to ensure the costs are comparable. The pixels in φq corresponding to pixels pΨ are copied to fill in the unknown region. The process is iterated until the unknown region is completely filled in as shown in Fig 1e.

3214

Once the background is estimated, a predefined threshold is applied to mark all the pixels which are different from the background in the unknown region. A process similar to equation 2, with some modification, is applied to these marked pixels to estimate their foreground colour. A template ψp is centred on a pixel p in the unknown region. Let us represent all the marked pixels in the template by p . The data-set of patches θ in the foreground Θ is constructed in a similar fashion to φ, background patch data-set in section 2.1. The most similar patch θq is found by comparing the pixels p in the template ψp to the corresponding pixels of the patch θi in the data-set θ as 1 θq = arg min dΘ (ψp , θi ) , (3) n p θi ∈θ where, np is the number of marked pixels in the template ψp used for normalization and distance dΘ (ψp , θi ) has the same definition as in section 2.1. The partial comparison of the template ψp to the patch θi ensures to find a similar foreground structure in the known foreground region present in the template ψp . Noise in the foreground region tends to produce segmentation inaccuracies, so an additional optimization step is introduced. 2.2.1. Foreground colour optimization Normalized sum of square difference between ψp and θi is given by 1 δi = dΘ (ψp , θi ) . (4) np To optimize the foreground colour for pixel p, δ is sorted such that δk < δk+1 . Let us denote the triplet of most similar patches in the set θ by {θ1 , θ2 , θ3 }. The foreground colour f for pixel p is estimated by taking a weighted average of the centre pixels of the triplet. w1 θ1c + w2 θ2c + w3 θ3c f = w1 + w2 + w3

(5)

(a) Original

(b) Trimap

(c) Knockout

(d) Hillman

(e) Poisson

(f) Closed form

(g) Robust

(h) Non-para

(i) Composite

Fig. 2: Comparison of different techniques on natural images where, {w1 , w2 , w3 } are weights and {θ1c , θ2c , θ3c } are the centre pixel values of the patches in the triplet. Weights wi are defined as the inverse Euclidean distance in the spatial domain between the pixel p and the centre of patch θi . In this manner closer patches receive higher weights. The process is iterated until the foreground colour is estimated for all the marked pixels in the unknown region. 2.3. Alpha estimation All unmarked pixels in the unknown region which are very similar to the background are given the alpha value of zero. We know the estimated background and foreground colour b and f using equations 2 and 5 respectively. The alpha value of ith marked pixel in the unknown region is computed using equation 1 as ci − bi αi = (6) fi − bi where ci , bi and fi are the composite, estimated background and foreground colours respectively for the ith marked pixel in the unknown region. Once an alpha matte is computed, the foreground object can be seamlessly composited onto a new background. 3. EVALUATION We present a comparison of the proposed technique with other well known matting algorithms. We have used two natural images for qualitative comparison while three composite images with known α values for quantitative comparison. The images used were obtain from the data provided by [6, 8]. We have used five different techniques for comparison: (1) Knockout 2 [5], (2) Robust matting (EZmask) [8] (both are commercially provided as a photoshop plug-ins), (3) Hillman method [3], (4) Global Poisson matting [7] and (5) Closed form matting [9]. Although Poisson and Closed form techniques can be used with limited user interaction in the form of scribbles rather than a trimap, for the sake of fair comparison we have used a trimap.

3215

3.1. Qualitative evaluation Fig 2 shows two natural images and alpha mattes computed using the different techniques. For the first image all the techniques produced acceptable results. All the parametric techniques fail to produce a good alpha matte for the second image in the areas where the foreground and background colour distributions are overlapping or the unknown region is not smooth. The Robust approach provided a better result but has some artifacts. Our technique produced results which are visibly smooth and have no visible artifact in the new composites.

3.2. Quantitative evaluation Fig 3 shows three composite images, ground truth alpha matte and estimated alpha matte using the different techniques. The ground truths are obtained using the triangular approach explained in [1]. For the first two images, Knockout and Hillman produce good results because of distinct foreground and background colour but fail on the third image because of the complex background. Poisson produced an erroneous matte because it is optimized globally for a complex background. Robust approach produced matte with small errors. The Closed form technique produced good results for simple background while performed poorly with a complex background. All these mattes have visible artifacts compared to the ground truth. Our technique produced consistently better mattes for both simple and complex background with no visible artifacts. Fig 4 shows a bar chart representing the mean square error for the three composite images in Fig 3 against the ground truth. The errors are calculated only for the unknown region and alpha value ranges from 0 to 255. Although MSE is not always correlated to the visual matte quality, it still gives a reasonable error comparison. The run time of our Matlab implementation of the algorithm for the considered images is typically around three minutes and depends on the image size and the unknown region. The run time could be further reduced by optimizing the algorithm.

(a) Original

(b) Trimap

(c) Knockout

(d) Hillman

(e) Poisson

(f) Closed form

(g) Robust

(h) Non-para

(i) Ground truth

Fig. 3: Comparison of different techniques on composite images ings of IEEE CVPR ’01, vol. 2, December 2001, pp. 264–271. [3] P. Hillman, J. Hannah, and D. Renshaw, “Alpha channel estimation in high resolution images and image sequences,” in IEEE CVPR, 2001, pp. 1063–1068. [4] M. A. Ruzon and C. Tomasi, “Alpha estimation in natural images,” in CVPR, June 2000, pp. 18–25.

Fig. 4: MSE in alpha against the ground truth for the unknown region

4. CONCLUSION A novel patch based non-parametric natural matting approach is presented. We have utilized an inpainting technique along with the patch based foreground colour estimation. A detailed evaluation shows that our technique has a clear advantage over previous parametric techniques. The algorithm is robust to both complex background and long foreground strands. Future work will concentrate on developing a more robust matching criteria and incorporating smoothness constrains to further optimize the alpha matte. 5. REFERENCES [1] A. R. Smith and J. F. Blinn, “Blue screen matting,” in ACM SIGGRAPH ’96: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp. 259–268. [2] Y. Y. Chuang, B. Curless, D. H. Salesin, and R. Szeliski, “A bayesian approach to digital matting,” in Proceed-

3216

[5] A. Berman, A. Dadourian, and P. Vlahos, “Method of removing from an image the background surrounding a selected object.” U.S. Patent 6,134,346, 2000. [6] Y. Y. Chuang, A. Agarwala, B. Curless, D. Salesin, and R. Szeliski, “Video matting of complex scenes,” in Proceedings of ACM SIGGRAPH, 2002, pp. 243–248. [7] J. Sun, J. Jia, C.-K. Tang, and H.-Y. Shum, “Poisson matting,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 315–321, 2004. [8] J. Wang and M. F. Cohen, “Optimized color sampling for robust matting,” Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 0, pp. 1–8, 2007. [9] A. Levin, D. Lischinski, and Y. Weiss, “A closed form solution to natural image matting,” Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 1, pp. 61–68, 2006. [10] A. Criminisi, P. Pérez, and K. Toyama, “Object removal by exemplar-based inpainting,” Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 2, pp. 721–728, 2003. [11] A. Fitzgibbon, Y. Wexler, and A. Zisserman, “Image based redering using image based priors,” in International conference on computer vision ICCV, 2003, pp. 1176–1184.