Weakly Supervised Glasses Removal

Weakly Supervised Glasses Removal Zhicheng Wang, Yisu Zhou and Lijie Wen School of Software, Tsinghua University, Beijing 100084, P.R. China

ABSTRACT Glasses removal is an important task on face recognition, in this paper, we provide a weakly supervised method to remove eyeglasses from an input face image automatically. We choose sparse coding as face reconstruction method, and optical flow to find exact shape of glasses. We combine the two processes iteratively to remove glasses more accurately. The experimental results reveal that our method works much better than these algorithms alone, and it can remove various glasses to obtain natural looking glassless facial images. Keywords: Glasses removal, Sparse coding, Optical flow, Face reconstruction.

1. INTRODUCTION In the last decade, face recognition has been one of the most active research topics in computer vision. However, almost methods can't perform well with the presence of the glasses. To make a face recognition system more robust, it is of great importance to analyze glasses and reduce the effect on the recognition of glasses. There has been some recent work on automatic glasses removal from face image. Jiang [1] studied detecting glasses on facial images by a glasses classifier. Wu [2] invented an intricate glasses classifier based on SVM. Jing and Mariani [3] employed a deformable contour method to detect glasses under a Bayesian framework. Saito [4] did the glasses removal work by using principal component analysis (PCA). Further, Wu and Liu [5] capture the global correspondence of glasses and non-glasses patterns by modeling their joint distribution in eigen-space with PCA. However, most of them need strongly supervised signals such as the landmark of the glasses and the pairs of the same person with glasses and non-glasses, whose training data is hard to obtain. Since one can only train a model with so little data, it can’t provide a robust space to do face recovering. In this paper, we apply sparse coding trained by a large number of non-glasses face images to reconstruct face. For describing the shape of glasses, we apply a novel method using optical flow. Finally we combine two process iteratively to remove glasses. For our method only need a weakly supervised signal, it is more convenient to apply and implement.

2. PROPOSED METHOD Our framework consists of two modules mainly: glasses localization and face reconstruction. Specifically, we apply sparse coding technique to reconstruct face and capture an approximate glasses region by comparing the reconstruction and the input. Getting accurate localization is performed by a novel method using optical flow. These two steps will be processed iteratively to achieve an iterative glasses removal. Our method only need weakly supervised signals to train the models and the experimental results reveal that the system can remove various glasses to obtain natural looking glassless facial images.

2.1 Preliminary Knowledge We first introduce the basic methods used in our framework. 2.1.1 Sparse Coding Sparse coding is a method to factorize image by an over-complete set of basis and a sparse vector. The overcompleteness allows us to capture structures and patterns inherent in the input data [6]. With sparse penalty, the model is restricted to do representations within these structures and patterns, e.g. to prevent degeneracy introduced by overcompleteness.

Sixth International Conference on Graphic and Image Processing (ICGIP 2014), edited by Yulin Wang, Xudong Jiang, David Zhang, Proc. of SPIE Vol. 9443, 94431J © 2015 SPIE · CCC code: 0277-786X/15/$18 · doi: 10.1117/12.2178815 Proc. of SPIE Vol. 9443 94431J-1 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 03/18/2015 Terms of Use: http://spiedl.org/terms

Sparse coding can be described as a learning algorithm. We define the sparse coding cost function on a set of input vectors as

min  Dc (i )  x (i )    S (c ( j ) ) 2

D ,s (i )

2

i

subject to D

( j) 2

j

(1)

 1, j

2

where S(.) is sparsity cost function which penalizes that be far from zero. Although “sparse” means to normalize the “L0” norm. In reality it is very hard to compute. So we relax it to “L1” and get a pretty convex function to do optimization. For our problem, the noise (glasses) in image space is a pixel-wise masking, here L1 reconstruction penalty is more reasonable to describe the noise. By changing this we get the L1-reconstruction sparse coding as following.

min  Dc (i )  x (i )    c ( j ) D ,s (i )

1

i

subject to D

( j) 2 2

j

1

(2)

 1, j

L-BFGS-B algorithm is applied to solve this formula, which is included in scipy library pack [7]. 2.1.2 Optical Flow The optical flow methods try to calculate the motion between two image frames which are taken at times and at every voxel position. These methods are called differential since they are based on local Taylor series approximations of the image signal; that is, they use partial derivatives with respect to the spatial and temporal coordinates. To find out the optical flow between two image frames (glasses template and the image D), we suppose a voxel at location with intensity will have moved by , and it has been assumed that the gray value of a pixel is not changed by the displacement. (3) Assuming the movement to be small, the image constraint at following optical flow constraint:

with Taylor series can be developed to get the (4)

With descriptions of other model assumptions such as Gradient constancy assumption and Smoothness assumption in [8], it is directly to derive an energy functional that penalizes deviations. Let and , then the global deviations from these assumptions are measured by the energy ∫

|

|

|

|

(5)

with being a weight between both assumptions, the increasing concave function is applied to make the energy more robust. A smoothness term has to describe the model assumption of a piecewise smooth flow field [8], which can be expressed as ∫

|

|

|

|

(6)

Finally, the total energy is the weighted sum between the data term and the smoothness term (7) To minify E(u, v) must fulfill the Euler-Lagrange equations (8) (9) with reflecting boundary conditions. In these equations we can solve the w, which is exact the optical flow between two image frames.

Proc. of SPIE Vol. 9443 94431J-2 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 03/18/2015 Terms of Use: http://spiedl.org/terms

2.2 Preprocessing and Pose Normalization Suppose we are given a photo of a person’s face with glasses captured “in the wild”. We use the preprocessing pipeline that includes aligning face and warping to frontal pose. Once the eyes has been aligned, the approximate region of glasses can be ensured. We define an area containing glasses from the input face, for covering various glasses, the rectangular area can be a little large. Figure 1 shows this area.

Figure 1. Area of the glasses region

2.3 Non-glasses Face Reconstruction Since the large number of data is unlabeled, unsupervised learning is preferred. There are many candidate algorithms such as RPCA, PCA, Sparse coding [6], Nearest Neighbours and so on. We choose Sparse coding because it does recovery of face information very well (by adjusting sparse penalty) and unlike PCA methods, due to sparse penalty on representation vector, it will keep model from too expressive like PCA methods. It can learn bases larger than input dimensions but still does a good recovery of glasses-covered area with adjust of L1 penalty terms. We train sparse coding model on a set of images without glasses. When using sparse coding, we make the reconstruction part to be L1-penalized since the glasses pixels are totally different from facial ones. The using of sparse coding is straight-forward: we encode the image using follow loss function, and then use reconstruction term as output. We use sparse penalty

2.4 Glasses Localization The previous operations reconstruct a facial image R without glasses from input image I. But the results are not perfect. We note the exact glasses shape can help reconstruction for the glasses area won’t be rebuilt. Therefore, all one has to do is to apply a novel strategy getting better results. 2.4.1 Glasses Shape Estimate In order to estimate the region of glasses, we subtract image R from the image I to get a difference image D, which contains the general shape of the eyeglass. (10) Due to the image D would describe the poor part of the reconstruction and the bases of eyeglass are missed in image R, it is obvious that the image D can estimate the shape of glasses. 2.4.2 Warped by Optical Flow Since an estimated shape and region of glasses has been calculated, the next step is to find the exact glasses shape. In this section, we provide a novel solution using optical flow to solve the problem. We first make a glasses template in black and white. Then the template would warp and try to match the image D using optical flow. Since the image D contains incomplete glasses, our template can help predict the exact shape.

2.5 Iterative Glasses Removal Whether a pixel belongs to the glasses can be estimated by above methods, it would help us reconstruct the nonglasses face. When sparse coding is applied on face reconstruction, we exclude the bases of glasses area calculated before and use the new bases to reconstruct face again. The iterative process would continue until the estimated glasses shape converges. Figure2 shows the warp process.

(a)glasses template (b) image D (c) warp result Figure 2. Glasses template and warp process


2.6 Methods for Comparison In above framework, optical flow and sparse coding tells us what is glasses and face. We tried to use them alone to do performance comparison. Here we design two framework, one is optical flow and median filtering to directly remove glasses pixels, the other is using sparse coding to reconstruct face pixels, marking glasses pixels as reconstruction error. We compare these methods in experiments.

3. EXPERIMENTS AND RESULTS We first downloaded about 60000 non-glasses face images and used face++ api to align the eyes. The aligned images were converted to gray images and normalized automatically to 90x105 in size as shown in Figure 3. The algorithm we chose for flow estimation is Ce Liu’s [9] implementation of Brox [8] and Baker [10]. The following parameters are used α=0.05, ratio=0.95, minWidth=30, nOuterFPIterations=20, nInnerFPIterations=10, nCGIterations=20.

Figure 3. Sample no-glasses training images

The test data is collected in the wild from the Internet and Figure 4 shows the aligned results.

Figure 4. The aligned test data used for research

We trained sparse coding on no-glasses training images and get 1000 bases for reconstruction. Figure 5 shows the result bases.

Figure 5. Training data (Top) and Part of basis (Bottom) obtained from sparse coding (total 1000)

There are several experimental results to show the effectiveness of our glasses removal method. In the experimental works, we used different kinds of glasses shapes and positions. We first tested sparse coding and optical flow method alone, results shown in Figure 6,7. Then we combined these methods iteratively to get results in Figure 8.

Figure 6. Removal result using sparse coding only

Figure 7. Removal result using optical flow and median filter only


n

Figure 8. Recovery result of combined method

We compare our experimental results with Wu’s method [5], Figure 9 shows the samples that Wu [5] mentions they can’t deal well.

(a) our results

(b) Wu’s method results

Figure 9. Comparison with Wu’s method results

4. CONCLUSION AND DISCUSSION In this paper, we provide a weakly supervised method to remove glasses. From experiments we see that the iterative method performs much better than both methods alone. The experimental results of our approach are comparable to the Wu’s strongly supervised method [5]. For future work, we plan to solve the highlight problem on the glasses. Considering the template is alterable, our framework has a good scalability.

ACKNOWLEDGEMENTS The work is supported by NSFC Project (No. 61003099), and MOE-CMCC Project (MCM20123011).

REFERENCES [1] [2] [3] [4] [5] [6] [7]

X. Jiang, M. Binkert, B. Achermann, and H. Bunke. Towards Detection of Glasses in Facial Images. Proc. of ICPR, pages 1071–1973, 1998. C. Y. Wu, C. Liu, and J. Zhou. Eyeglass Existence Verification by Support Vector Machine. Proc. of 2nd Pacific-Rim Conference on Multimedia, 2001. Z. Jing and R. Mariani. Glasses Detection and Extraction by Deformable Contour. Proc. of ICPR, 2000. Y. Saito, Y. Kenmochi, and K. Kotani. Esitmation of Eyeglassless Facial Images Using Principal Component Analysis. Proc. of ICIP, pages 197–201, 1999. C. Y. Wu, C. Liu, and J. Zhou. Automatic Eyeglasses Removal from Face Images. Proc . of PAMI, 2004. B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37:3311--3325, 1997. Jones, Eric, Travis Oliphant, and Pearu Peterson. “SciPy: Open source scientific tools for Python.” (2001).


T. Brox, A. Bruhn, N. Papenberg, and J.Weickert. High accuracy optical flow estimation based on a theory for warping. In European Conference on Computer Vision (ECCV), pages 25–36, 2004. [9] C. Liu. Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. Doctoral Thesis. Massachusetts Institute of Technology. May 2009. [10] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski. A database and evaluation methodology for optical flow. In Proc. IEEE International Conference on Computer Vision (ICCV), 2007. [8]