Semi-Automatic User-Specific 3D Face Geometry

0 downloads 0 Views 19MB Size Report
is robust and quick, obtaining user-specific 3D faces of good quality ... 4.1 Introduction to face alignment . ... 6.1.1.1 Calculate initial position and re-scale reference shape . ... 3.2 SDM learns a sequence of generic descent maps {Rk} from the optimal op- ..... The 3D reconstruction of the human face (Morphological Adaption).
Semi-Automatic User-Specific 3D Face Geometry Modeling From Multiple Views Bowen Aguayo, Lincoln Emilio March 27, 2015

TECNUN Universidad de Navarra Paseo de Manuel Lardizabal, 13 20018, Donostia - San Sebastián +34 943 219877 http://www.tecnun.es

Title: Semi-Automatic User-Specific 3D Face Geometry Modeling From Multiple Views Abstract:

Student: Lincoln Emilio Bowen

In this project we present a semi-automatic method to obtain the 3D face geometry model of a spe-

Supervisor at TECNUN: Diego Borro

cific person. We use one or more images of the person and adjust semi-automatically a deformable generic 3D face model to them. The adjustment is based on the matching of fiducial landmarks found

Supervisor at Vicomtech-IK4:

on the face images to their corresponding control

Luis Unzueta

points in the graphical model. The fiducial landmarks are found on the image by a 2D face model alignment approach, which can be further refined manually, if required. Then, the graphical model is adjusted by an iterative approach in which local deformation and global fitting procedures are combined. Experimental results show that this method is robust and quick, obtaining user-specific 3D faces of good quality, compared to other alternatives such as face scanning, which can be used for tasks such as face tracking or the personalization of animated 3D avatars.

Contents

1 Agradecimientos

1

2 Introduction

2

3 Project scope

4

I

3.1 Project description and goals . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

3.1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

3.2 Planification of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

3.2.1 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

3.3 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

Analysis of the problem

9

4 State of the art 4.1 Introduction to face alignment . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

4.1.1 Generative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

4.1.2 Discriminative methods . . . . . . . . . . . . . . . . . . . . . . . . . .

10

4.2 3D face geometry modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

5 Background

II

13

5.1 Non-linear feature extraction function . . . . . . . . . . . . . . . . . . . . . . .

13

5.1.1 Distinctive image features from scale-invariant Key-points (SIFT) . . .

13

5.1.2 Histogram of oriented gradients (HOG) . . . . . . . . . . . . . . . . .

15

5.2 Multivariate linear regression (MLR) . . . . . . . . . . . . . . . . . . . . . . .

17

5.3 Interpolation using radial basis function (RBF) . . . . . . . . . . . . . . . . . .

18

Technical solution

19

6 Proposed solution 6.1 Supervised Descent Method (SDM) . . . . . . . . . . . . . . . . . . . . . . . II

10

20 20

CONTENTS

6.1.1 Deformable object model training method . . . . . . . . . . . . . . . . 6.1.1.1

Calculate initial position and re-scale reference shape . . . .

22

6.1.1.2

Build gaussian pyramid . . . . . . . . . . . . . . . . . . . . .

22

6.1.1.3

Generate perturbation shapes . . . . . . . . . . . . . . . . .

23

6.1.1.4

Generate ∆X . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

6.1.1.5

Get descriptor . . . . . . . . . . . . . . . . . . . . . . . . . .

24

6.1.1.6

Obtain the descent direction . . . . . . . . . . . . . . . . . .

24

6.1.1.7

Estimate new position . . . . . . . . . . . . . . . . . . . . . .

25

6.1.1.8

Fitting results . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

6.1.1.9

Error results . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

6.1.1.10 Summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

6.1.2 Deformable object model fitting method . . . . . . . . . . . . . . . . .

27

6.2 3D reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

6.2.1 Calibration and deformation based upon characteristic points . . . . .

28

6.2.1.1

Generic 3D face model . . . . . . . . . . . . . . . . . . . . .

29

6.2.1.2

Load images . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

6.2.1.3

Detection face . . . . . . . . . . . . . . . . . . . . . . . . . .

30

6.2.1.4

Load 2D face fitting model . . . . . . . . . . . . . . . . . . .

31

6.2.1.5

Calculate the 3D position from 2D coordinates . . . . . . . .

31

6.2.1.6

RBF interpolation . . . . . . . . . . . . . . . . . . . . . . . .

31

6.2.1.7

Recalculation of the normals . . . . . . . . . . . . . . . . . .

32

6.2.1.8

Camera calibration

. . . . . . . . . . . . . . . . . . . . . . .

32

6.2.1.9

Texture mapping . . . . . . . . . . . . . . . . . . . . . . . . .

33

7 Experiments 7.1 Equipment

III

21

34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

7.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

7.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

7.4 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

Discussion

41

8 Conclusions

42

A Appendix A

43

B Appendix B

47

C Appendix C

50

CONTENTS

III

List of Figures

3.1 In this case we have three images, each one with their respective landmark aligned, that will used for reconstructed the 3D model. . . . . . . . . . . . . .

5

3.2 SDM learns a sequence of generic descent maps {Rk } from the optimal optimization trajectories (indicates by the dotted lines). Each parameter update ∆x is the product of Rk and a sample specific component (y − h(xik )) . . . . .

6

5.1 For each octave of scale space, the initial image is repeatedly convolved with Gaussians to produce the set of scale space images, in that way it could be detect the potential interest points depending on it´s environment, it help to recognised the most reliable gradients . . . . . . . . . . . . . . . . . . . . . . a

After each octave, the Gaussian image is down-sampled by a factor of 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

b

14

Maxima and minima of the difference-of-Gaussian images are detected by comparing pixel.

c

14

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

This shown an example of Gaussian Pyramid´s application . . . . . .

14

5.2 For each image sample, the gradient magnitud, m(x, y), and orientation, θ(x, y), is precomputed using pixel differences. Images shown at the right illustrated an orientation histogram, that is formed from the gradient orientations, it has 36 bins covering the 360 degree range of orientations. Peaks in the orientation histogram correspond to dominan direction of local gradients.

. . . . . .

15

5.3 After each octave, the Gaussian image is down-sampled by a factor of 2. . .

16

6.1 three images have been made applying Gaussian filter and reducing scale .

22

6.2 In our case we decided to generate 100 perturb shape per image, because our database don’t have too many images . . . . . . . . . . . . . . . . . . . .

23

6.3 In our case we decided to generate 100 perturb shape per image, due to the lack of image’s quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

6.4 There are 57 landmark around the face, the localization of each one are detailed in a .cp file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV

30

LIST OF FIGURES

6.5 We decided to use three photos, in this way we have the profiles: right, frontal and left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

6.6 In our software we have at the right the images and in the left the 3D face model. We draw the rectangle over the images, indicating the position of the face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

6.7 Some of the points that do not displace correctly, the user could move itself .

32

6.8 A mesh of 5897 vertices, projected on every profile’s image . . . . . . . . . .

33

7.1 The line shows the error of fitting 600 images, trained with different numbers of perturbation.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

7.2 The error, at the front profile is: 0.009184652. At the left profile is: 0.008857246. At the right profile is: 0.010956161 . . . . . . . . . . . . . . . . . . . . . . . .

36

7.3 The error in this images are: at the front profile is: 0.0162299. At the left profile is: 0.018522. At the right profile is: 0.00886224 . . . . . . . . . . . . .

36

7.4 The error in this images are: at the front profile is: 0.00311373. At the left profile is: 0.0118532. At the right profile is: 0.007145 . . . . . . . . . . . . . .

37

7.5 The error in this images are: at the front profile is: 0.00918336. At the left profile is: 0.00565814. At the right profile is: 0.0071834 . . . . . . . . . . . .

37

7.6 The error in this images are: at the front profile is: 0.00817062. At the left profile is: 0.0152546. At the right profile is: 0.0211879 . . . . . . . . . . . . .

37

7.7 The error in this images are: at the front profile is: 0.0102739. At the left profile is: 0.00554801. At the right profile is: 0.0101993 . . . . . . . . . . . .

38

7.8 The error in this images are: at the front profile is: 0.00229697. At the left profile is: 0.00734694. At the right profile is: 0.00734694

. . . . . . . . . . .

38

7.9 The error in this images are: at the front profile is: 0.00787618. At the left profile is: 0.00524045 At the right profile is: 0.0109503 . . . . . . . . . . . . .

38

7.10 The error in this images are: at the front profile is: 0.00617725. At the left profile is: 0.00425791. At the right profile is: 0.0149479 . . . . . . . . . . . .

39

7.11 The error in this images are: at the front profile is: 0.0104655. At the left profile is: 0.0133484. At the right profile is: 0.0158946 . . . . . . . . . . . . .

39

7.12 The error in this images are: at the front profile is: 0.0109814. At the left profile is: 0.0154281. At the right profile is: 0.010894 . . . . . . . . . . . . . .

39

7.13 The error, at the front profile is: 0.015766969. At the left profile is: 0.010681506. At the right profile is: 0.005555397 . . . . . . . . . . . . . . . . . . . . . . . .

40

A.1 The error in this images are: at the front profile is: 0.0103844. At the left profile is: 0.00437356. At the right profile is: 0.00449371 LIST OF FIGURES

. . . . . . . . . . .

43 V

LIST OF FIGURES

A.2 The error in this images are: at the front profile is: 0.00634848. At the left profile is: 0.00482585. At the right profile is: 0.00542265

. . . . . . . . . . .

43

A.3 The error in this images are: at the front profile is: 0.00846724. At the left profile is: 0.0075797. At the right profile is: 0.00729054 . . . . . . . . . . . .

44

A.4 The error in this images are: at the front profile is: 0.00765903. At the left profile is: 0.00491683. At the right profile is: 0.00411343

. . . . . . . . . . .

44

A.5 The error in this images are: at the front profile is: 0.00663731. At the left profile is: 0.00526195. At the right profile is: 0.0063399 . . . . . . . . . . . .

44

A.6 The error in this images are: at the front profile is: 0.00744132. At the left profile is: 0.00536637. At the right profile is: 0.00556665

. . . . . . . . . . .

45

A.7 The error in this images are: at the front profile is: 0.00788626. At the left profile is: 0.00593755. At the right profile is: 0.005800229 . . . . . . . . . . .

45

A.8 The error in this images are: at the front profile is: 0.00914015. At the left profile is: 0.00373299. At the right profile is: 0.00691171

. . . . . . . . . . .

45

A.9 The error in this images are: at the front profile is: 0.0830682. At the left profile is: 0.00530921. At the right profile is: 0.00471849

. . . . . . . . . . .

46

A.10 The error in this images are: at the front profile is: 0.0106373. At the left profile is: 0.0054763. At the right profile is: 0.00549346 . . . . . . . . . . . .

46

B.1 Example results on subject 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

B.2 Example results on subject 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

B.3 Example results on subject 3 . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

B.4 Example results on subject 4 . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

B.5 Example results on subject 5 . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

C.1 Example 3D model with his corresponding mesh on subject 1 . . . . . . . . .

50

C.2 Example 3D model with his corresponding mesh on subject 2 . . . . . . . . .

50

C.3 Example 3D model with his corresponding mesh on subject 3 . . . . . . . . .

50

C.4 Example 3D model with his corresponding mesh on subject 4 . . . . . . . . .

51

C.5 Example 3D model with his corresponding mesh on subject 5 . . . . . . . . .

51

C.6 Example 3D model with his corresponding mesh on subject 6 . . . . . . . . .

51

C.7 Example 3D model with his corresponding mesh on subject 7 . . . . . . . . .

51

C.8 Example 3D model with his corresponding mesh on subject 8 . . . . . . . . .

52

C.9 Example 3D model with his corresponding mesh on subject 9 . . . . . . . . .

52

C.10 Example 3D model with his corresponding mesh on subject 10 . . . . . . . .

52

C.11 Example 3D model with his corresponding mesh on subject 8 but different expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

VI

52

LIST OF FIGURES

1

|

Agradecimientos

Aprovecho este apartado para agradecer primeramente a mi supervisor de proyecto, Luis Unzueta, por confiarme en la elaboración de este duro pero interesante trabajo, y también a Jon Goenetxea, por su apoyo técnico. A ambos les agradezco la dedicación del tiempo que me han dado en cada problema que se me presentó a lo largo de este proyecto. No quisiera dejar de agradecer a Juan Diego a si mismo, por el apoyo brindado por él. Un especial agradecimiento a mi supervisor de proyecto en TECNUN, Diego Borro. Un agradecimiento especial a André Gagalowicz, que a pesar de no conocerlo presencialmente, me ha ayudado mucho tanto con información como con material para poder trabajar. Agradezco a Andrés Navarro por brindarme su ayuda y espero que este trabajo le pueda servir a futuro. También debo un reconocimiento a todas las personas que me ayudaron con cinco minutos de su tiempo haciendo las pruebas, entre ellas están: Javier Barandiaran, Leire Varona, Andoni Cortés y Ander Arbelaiz. Debo reconocer que este trabajo a tenido su dificultad y no puedo terminar sin al menos mencionar los nombres de las personas que en algún momento me han brindado su ayuda: Andoni Mujika, Peter Leskovsky, Nerea Aranjuelo, Hugo Álvarez y Orti Senderos. Por último doy gracias a la empresa Vicomtech-IK4 por permitirme desarrollar está aplicación en sus instalaciones.

1

2

|

Introduction

With the increase in personal and web photos nowadays, a fully automatic, highly efficient and robust face alignment method is in demand, Darren Cosker said in his article [5] "The visualization, interpretation, recognition, and perception of faces are important elements of our culture. Facial expressions carry important messages involved in communication and therefore play an important role in media". 3D structure modeling, in which 3D information is acquired from 2D data, provides the means to analyze the reconstructed 3D object under any view angle, to realize its 3D geometrical structure, and to render a more realistic and livelier view than that obtained in the 2D image plane [20]. The 3D world that we imagine could be interpreted by the binding of infinite 2D planes of different views and different angles, since the face is our primary focus of attention in social intercourse, is important to develop a robust face model fitting. Therefore, face model fitting can be seen as a basic component in many Human-Computer Interaction applications since it enables facial feature detection, head pose estimation, face tracking, face recognition, and facial expression recognition. The latter need a training stage with several images to build the model, and therefore depend on the selection of images for a good fitting in unseen images. Nowadays the ambition of the human being is to give the machines the capacity of recognize, distinguish and transform an image from a matrix of pixels to information that have different meaningful. In spite of the fact that this science is still immature, it has made a lot of goals that have allowed to propose different methods, depending on the calculation’s speed, the accuracy and the environment on it could be prove. In this project we present a semi-automatic method to obtain the 3D face geometry model of a specific person. The adjustment is based on the matching of features point found on the face images to their corresponding control points in the graphical model. The features points are found on the image by a 2D face model alignment approach, which can be further 2

refined manually, if required. We show that this method is robust and quick, obtaining userspecific 3D faces of good quality, compared to other alternatives such as face scanning using marker-based solutions.

2. Introduction

3

3

|

Project scope

Our approach for fitting 3D generic face models consists in two steps:(1) the detection of facial features on the image and (2) the adjustment of the deformable 3D face model such that the projection of its vertices onto the 2D plane of the image matches the locations of the detected facial features [19].

This project have been based on the previous work of Xuehan Xiong and Fernando de la Torre [22] integrated with the project of Daria Kalinkina, André Gagalowicz and Richard Roussel [10].

Having a 3D model attached with a face shape S = {x1 , y1 , ..., xNf p , yNf p }T consist of Nf p facial landmarks, our intention is to estimate a shape S that is as close as possible ˆ in that way we can transfer those deformation to the 3D model. i.e. to the true shape S, minimizing

ˆ kS − Sk This method is based on deformation of a pre-defined generic polygonal face mesh to the specific face of a person, presented on several images taken from different views. This deformation is based upon matching points, and in order for get it, we use a method for face alignment.

The performance of the implementation should be similar to the software that can be downloaded from the link below1 . And in the part of the 3D Reconstruction we omit the silhouette contour adaptation and only settle with characteristic points selection.

1

4

http://www.humansensing.cs.cmu.edu/intraface/

3.1 Project description and goals

3.1

Project description and goals

From a set of 2D images, we determined the features points for each one of the images, in that way the generic model could adapt to the real face presented on the pictures by matching the projections of the generic characteristic points to those of the images, see figure 3.1. It is important to remark that the texture mapping was not our primary study objective, for that reason we have made a simple textured, only pasting the three images on the 3D model depending on the 2D coordinates from the projection of the 3D coordinates of vertices.

Figure 3.1: In this case we have three images, each one with their respective landmark aligned, that will used for reconstructed the 3D model. The method that will used for align the points is: Supervised Descent Method (SDM) [22], that consists on teach to the computer a descent direction, in our case a sequence of matrices {Rk }. In order to be able to get 3D information from images for the deformation of the generic model we need to calibrate the camera(s) used to obtain the images; in other words, we need to define camera parameters for the model according to each image [10]. Since in the very beginning the generic mesh doesn’t correspond at all to the real face. So we start the algorithm with the face alignment method. The first step of the algorithm consists on selecting the vertices on the generic model which will further serve as "characteristic or control Point". The main idea is to choose only those vertices that can be easily matched both on the images and the generic model. For example, they can be the corners of the eyes, mouth, etc.

3. Project scope

5

3.1 Project description and goals

After all the positions of the characteristic points have been defined, we perform a 3D stereo reconstruction of the characteristic points based upon the cameras calibration. In other words, for each point we build a set of 3D rays, starting from the optical center of the camera and going to the corresponding projection of this point in each image, calculate the middle point of the segment defining the minimum distance between each couple of rays and take the gravity center of all these middle points as a new 3D position of the characteristic point.

3.1.1

Objective

The two main tasks as we mention before are: • The features points alignment (SDM method). • The 3D reconstruction of the human face (Morphological Adaption). Which each one will have a general Objective: • General Objective for SDM: Having a dataset of facial point landmarks p our objective is to find Rk (Rk ∈ R2p×mp , is a sequence of descent direction matrix) and bk (bk ∈ R2p×1 ) that minimize the expected loss between the predicted and the optimal landmark displacement. Consider fig 3.2 where the goal is to minimize a Nonlinear Least Squares (NLS)function, f (x) = (h(x)− y)2 , where h(x) is a nonlinear function, x is the vector of parameters to optimize, and y is a known scalar.

Figure 3.2: SDM learns a sequence of generic descent maps {Rk } from the optimal optimization trajectories (indicates by the dotted lines). Each parameter update ∆x is the product of Rk and a sample specific component (y − h(xik ))

argmin Rk bk

6

XX di

i 2 k∆xki ∗ − Rk φk − bk k

(3.11)

dik

3. Project scope

3.2 Planification of the project

Where, k, it is the number of the iteration. i, it is the index dataset of image that we will use for training. d ∈ Rm×n , an image of m x n pixels. i i 2p×1 , it is the difference between the optimal landmark displace∆xki ∗ = x∗ − xk ∈ R

ment and the predicted landmark displacement. φik ∈ Rmp×1 , it is a vector that contain the descriptive features get it from the image descriptor (e.g SIFT) in the position of the landmark, m refer to the number of parameter that define the descriptor (for example SIFT contains 128)

– Secondary objective for SDM: To find Rk , we must need to converge this sequence to x∗ (this term refer to the correct position of the p landmarks), starting from x0 (which corresponds to an average shape of the initial configuration of the landmarks running the face detector). xk = xk−1 + Rk−1 φk−1 + bk−1

(3.12)

Where, xk ∈ R2p×1 , it is the vector containing the coordinates (x,y) of the pixels to detect/track in which it will be the landmark. xk−1 ∈ R2p×1 , it is the vector containing the coordinates (x,y) of the pixels to detect/track in which it will be the landmark.

• General Objective for Morphological Adaption: In order to be able to get 3D information from images for the deformation of the generic model we need to define camera parameters for the model according to each image. Calibration is performed by EPnP (Perspective-n–Point)[12]. The algorithm tries to minimize the difference between estimated projection of some 3D points of the object and their 2D locations in the image until the error falls below a threshold or converge.

3.2 3.2.1

Planification of the project Tasks

This projected is organized as follows: 3. Project scope

7

3.3 Tools

• At the first part are the duties for the SDM implementation, below are the steps of this section: 1. Make a database of images with their respective landmarks (frontal profile, right profile, left profile). 2. Get a mean Shape. 3. Normalize images. 4. For every image, get a number of perturbation based on the mean Shape (coming from a known distribution). 5. Implement a non-linear feature extraction function (e.g HOG). 6. Resolve the multiple linear regression. • At the second part it is about the assignment for the 3D reconstruction, below are the steps of this section: 1. Load the face detection. 2. Find out the position of every landmark for each image (Applying SDM). 3. Apply EPnP to get camera parameter. 4. Give the deformation values for a set of sparse vertices of the generic model. 5. Interpolated to all other vertices using Radial Basis Functions (RBFs).

3.3

Tools

Some of the programs/libraries that have been used were: • Microsoft Visual Studio 2012 2 • OpenCV 3 • Qt 4 • viulib_omf (Object Model Fitter) v13.10 5 • Videoman 6 • Cmake 7 • OpenGL 8 2

http://www.visualstudio.com/ http://docs.opencv.org/ 4 http://qt-project.org/downloads 5 http://www.viulib.org 6 http://videomanlib.sourceforge.net 7 http://www.cmake.org 8 https://www.opengl.org/ 3

8

3. Project scope

Part I

Analysis of the problem

9

4

|

4.1

State of the art

Introduction to face alignment

The problem of construction or/and alignment of a generic deformable models capable of capturing the variability of a non-rigid object is among the most popular and well-studied problem in the field of computer vision. Based on the ways that various deformable models are built and their respective alignment procedure, the existing methodologies can be broadly classified into Generative and Discriminative1 . The generative methods use an analysis-by-synthesis loop where the optimization strategy attempts to find the required parameters by maximizing the probability of the input image being constructed by the facial deformable model. On the other hand discriminative methods rely on the use of discriminative information or discriminative functions or both. As it will be explain below, both ways illustrate how it learn from the training part.

4.1.1

Generative methods

The generative or holistic methods at face alignment, studied the global behavior of the image. The advantage of this kind of technique are two: robust interpretation is achieved by constraining solutions to be face-like; and the ability to ’explain’ an image in terms of a set of model parameters [1][17] provides a natural interface to applications of face recognition . Most notable example of this category is The Active Appearance Models (AAMs)[3]

4.1.2

Discriminative methods

The discriminative or conditional methods at face alignment, are constructed using either a discriminative information (i.e. a set of facial landmark classifiers[22]) or discriminative functions [23]. One of the characteristic that has this kind of method, is that they used a cascade of regression functions to map the textual features to shape directly [22] or to shape parameters [8]. This provide a good way to learn from a dataset, taking account the 1

This categorization has been propose by Akshay Asthana, Stefanos Z. and Shiyang Cheng at”IncrementalF aceAlignment”

10

4.2 3D face geometry modeling

restrictions. The methodologies that have been mention before, achieve a real time performance and are successful for facial landmark localization under uncontrolled environments in-the-wild.

4.2

3D face geometry modeling

Human faces are remarkably similar in global properties, including size, aspect ratio, and location of main features, but can vary considerably in details across individuals, gender, race, or due to facial expression. Over the past decade, aided by advances in computational capacity, great progress has been made in the field of 3D facial reconstruction. Today, scientists in the field of computer vision offer techniques that are capable of reconstructing realistic human facial data from partial or complete photographic images, pencil sketches and skeletal remains. The most sophisticated of these techniques are able to replicate skin texture and color. Such techniques are extensively being used in forensic investigations, video-games, movie production, etc. More recently, facial reconstruction is being used to create special effects and characters for the movie industry (e.g., The characters of Pirates of the Caribbean: On Stranger Tides), or to replace ailing or diseased actors. Further, facial reconstruction systems have great potential for application of the developing of avatars, that could be found it in Mobile’s applications (e.g., Insta3D - animated 3D avatar) or in video-games (e.g. NBA 2K15 for PS4). Computerized face recognition is quickly becoming a popular security tool to verify and identify individuals at check points such as airports, and attempts have recently been made to use this technology to authorize transactions over the internet. However, currently available face recognition systems, which are largely based on 2D facial images, suffer from low reliability due to their sensitivity to lighting conditions and changes in head position and facial expressions. A common way to perform precise motion capture is to use marker-based solutions, but the number of markers needed for capturing complex facial expressions with high degree of precision in such systems can reach 150-200. As a consequence, the main limitations of this method is a time-consuming preparation phase. By the other hand, mark-less reconstruction methods oriented towards precise recovery of pose, a prior knowledge about the object is usually incorporated [14]. 4. State of the art

11

4.2 3D face geometry modeling

In order to extend the face landmark estimation from 2D to 3D, different alternatives have been proposed. In [21] a non-rigid structure-from-motion algorithm is proposed to construct the corresponding 3D shape modes of a 2D AAM during the training stage, in order to estimate the head pose angles from 2D fitting procedure. In [19], authors present a robust and lightweight method for the automatic fitting of deformable 3D face models on facial images. Their approach fitted a generic face model in two steps: (1) the detection of facial features based on local image gradient analysis and (2) the back-projection of a deformable 3D face model through the optimization of its deformation parameters. Thus, it can be estimate the position, orientation, shape and actions of faces, and initialize user-specific face tracking approaches, such as Online Appearance Models (OAM, which have shown to be more robust than generic user tracking approaches. The idea of 3D face model expression reconstruction has been boarded in[15][16][9], where authors presented a system that automatically generates a 3D face model from a single frontal image of a face or a single reference face shape, with the help of generic 3D model, allowing to synthesis various expressions. However, it has limited from the number of expressions that could be generated or the time frame, which is too high and make difficult if we want to make the tracking of the face. There has been a lot of proposals for the automatic modeling of a 3D human head, Peter Kán [11] created a novel system for automatic 3D head model creation from two images based on a parameterized head model with a hierarchical tree of facial features. His model was not too precise on looking at facial features, however the idea is very similar of what we are proposing.

12

4. State of the art

5

|

5.1

Background

Non-linear feature extraction function

One way to get features from images, besides their shapes, color or size, is computing their gradients. The both methods mention below used histogram of orientation for propose a set of values, that will be their descriptors from a particularly patch.

5.1.1

Distinctive image features from scale-invariant Key-points (SIFT)

SIFT is a method for extracting distinctive invariant features from images that can be used to perform reliable information from different views of an object or scene. We will summarize how this technique work below, if the reader is interested for more detail it could be found at [13]. Following are the major stages of computation used to generate the set of images features:

• Scale-space extrema detection: The first stage of computing searches over all scales and image locations. It is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation. Figure 5.1 • Key-point localization: At each candidate location, a detailed model is fit to determine location and scale. Key-points are selected based on measures of their stability. • Orientation assignment:One or more orientations are assigned to each key-point location based on local image gradient direction. All future operations are performed on image data has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.Figure 5.2

13

5.1 Non-linear feature extraction function

(b) Maxima and minima of (a) After each octave, the Gaussian im- difference-of-Gaussian images age is down-sampled by a factor of 2. detected by comparing pixel.

the are

(c) This shown an example of Gaussian Pyramid´s application

Figure 5.1: For each octave of scale space, the initial image is repeatedly convolved with Gaussians to produce the set of scale space images, in that way it could be detect the potential interest points depending on it´s environment, it help to recognised the most reliable gradients

14

5. Background

5.1 Non-linear feature extraction function

Figure 5.2: For each image sample, the gradient magnitud, m(x, y), and orientation, θ(x, y), is precomputed using pixel differences. Images shown at the right illustrated an orientation histogram, that is formed from the gradient orientations, it has 36 bins covering the 360 degree range of orientations. Peaks in the orientation histogram correspond to dominan direction of local gradients. • Key-point descriptor:The local image gradients are measured at the selected scale in the region around each key-point. These are transformed into representation that allows for significant levels of local shape distortion and change in illumination.

5.1.2

Histogram of oriented gradients (HOG)

HOG is a method for capture edge or gradient structure that is very characteristic of local shape. This approach is based on scanning a detection window over the image at multiple positions and scales, in each position runs an object/non-object classifier. Again we will summarize how this technique work, but if the reader is interested for more detail, it could be found at [7][6] The method is based on evaluating a dense grid of well-normalized local histograms of image gradient orientations over the image windows. The hypothesis is that local object appearance and shape can often be characterized rather well by the distribution of local intensity gradient or edge directions. Below is mention five stage that try to summarize HOG’s computation (Figure 5.3): • The first stage is to normalize equalize the image. It use gamma compression or computing the square root or the log of each channel, this helps the effects of local shadowing and illumination variations. • The second stage computes first order image gradients. These capture silhouette, contour and some texture information, while providing further resistance to illumination variations. The locally dominant color channel is used, which provides color invariance to a large extent. 5. Background

15

5.1 Non-linear feature extraction function

Figure 5.3: After each octave, the Gaussian image is down-sampled by a factor of 2.

• The third stage pools gradient orientation information locally in the same way as the SIFT (as we explain at the section 5.1.1). The image window is divided into small spatial regions, called "cells". For each cell we accumulate a local 1-D histogram of gradient or edge orientations over all the pixels in the cell. This combined cell-level 1D histogram forms the basic "orientation histogram" representation. Each orientation histogram divides the gradient angle range into a fixed number of predetermined bins. The gradient magnitudes of the pixels in the cell are used to vote into the orientation histogram.

• The fourth stage computes normalization, which takes local groups of cells and contrast normalizes their overall responses before passing to next stage. It also to improve the invariance of illumination, shadowing, and edge contrast. We refer to the normalized block descriptors as Histogram of Oriented Gradient (HOG) descriptors.

• The final step collects the HOG descriptors from all blocks of a dense overlapping grid of blocks covering the detection window into a combined feature vector for use in the window classifier. 16

5. Background

5.2 Multivariate linear regression (MLR)

5.2

Multivariate linear regression (MLR)

The linear regression model fits a linear function to set of data points, is useful for modeling the relationship between a numeric outcome or dependent variable (Y) and multiple explanatory or independent variables(X).The form of the function is: Y = β0 + β1 ∗ X1 + β2 ∗ X2 + . . . + βn ∗ Xn

(5.21)

Where Y is the target variable, and X1 , X2 , . . . , Xn are the predictor variables β1 , β2 ,. . . , βn are the coefficients that multiply the predictor variables. β0 is constant. Unlike the common linear regression that used linear predictor functions between two variables, the dependent variables to be predicted at MLR is not a single real-valued scalar but an m-length vector of correlated real numbers. As in the standard regression setup, there are n observations, where each observation i consists of k − 1 explanatory variable, grouped into a vector Xi of length k. This can be viewed as a set of m related regression problems for each observation i: yi,1 = xTi β1 + i,1 ... yi,1 = xTi β1 + i,1 where the set of error {i,1 , . . . , i,m } are all correlated. It can be viewed as a single regression problem where the outcome is a row vector yTi and the regression coefficient vectors are stacked next to each other as follows: YiT = xTi B + Ti

(5.22)

The coefficient matrix B is a k ×m matrix where the coefficient vectors β1 , . . . , βm for each regression problem are stacked horizontally:



   β1,m β1,1  .   .   .   .  B=  .  · · ·  .  βk,m βk,1 The classical, frequentist linear least squares solution is to simply estimate the matrix of regression coefficients Bˆ using the Moore-Penrose pseudo-inverse: ˆ = (X T X)−1 X T Y B 5. Background

(5.23) 17

5.3 Interpolation using radial basis function (RBF)

5.3

Interpolation using radial basis function (RBF)

Radial basis function methods are the means to approximate the multivariate functions. It can be settled as, given data in n dimensions that consist of data sites ξ ∈