Feature Selection for 2D and 3D Face Recognition

0 downloads 0 Views 6MB Size Report
Jan 17, 2015 - [30] G. Passalis, P. Perakis, T. Theoharis, and I. A. Kakadiaris. ..... [138] Omar Ocegueda, Tianhong Fang, Shishir K Shah, and Ioannis A Kaka-.
Feature Selection for 2D and 3D Face Recognition Mohammed Bennamoun, Yulan Guo, Ferdous Sohel January 17, 2015

Abstract

Face recognition is a popular research topic with a number of applications in several industrial sectors including security, surveillance, entertainment, virtual reality, and human-machine interaction. Both 2D images and 3D data can now be easily acquired and used for face recognition. For any 2D/3D face recognition system, feature extraction and selection play a signicant role. Currently, both holistic and local features have been intensively investigated in the literature. In this article, fundamental background knowledge of face recognition, including 2D/3D data acquisition, data preprocessing, feature extraction, classication, and performance evaluation, is presented. The state-of-the-art feature extraction algorithms, including 2D holistic feature, 2D local feature, 3D holistic feature, and 3D local feature extraction algorithms, are then described in detail. Finally, feature selection and fusion techniques are presented. The article covers the complete related aspects of feature selection for 2D and 3D face recognition. Keywords: Face recognition, feature selection, face identication, face verication, face biometrics, feature extraction.

1 Introduction Face recognition is an important mechanism that people use every day.

A

human is able to recognize people by their faces under dierent conditions, including variations in poses, expressions, and lighting conditions, even from early childhood [1]. During the past few decades, face recognition has attracted much interest from researchers in the areas of computer science, neuroscience and psychology. Although face recognition is an important part of the human perception system and is a routine task for humans, mimicking the face recognition ability of humans using a computer system has proven to be challenging.

A

great deal of eort has been expended in developing eective and ecient computer systems for face recognition. Compared to face recognition by humans, automatic machine recognition of faces has several advantages [2]. A computer is able to store a large number of face images in a gallery containing faces of known individuals, remembering more people and recognizing each of them more eciently than a human because it never gets tired.

Mohammed Bennamoun and Yulan Guo contributed equally to this work and are considered co-rst authors. 1

Figure 1: Five dierent representations of a human face. The mesh is rendered as a shaded image

Face recognition has a myriad of applications, including security (e.g., system logon, internet access, and le encryption), surveillance (e.g., border control, suspect tracking, and terrorist identication), entertainment (e.g., humancomputer interaction, 3D animation, and virtual reality) and medical treatment (e.g., facial surgery; and maxillofacial rehabilitation). Compared to other biometrics such as iris images, retinal scans and ngerprints, facial images are more socially acceptable because their acquisition is natural, nonintrusive and contact-free [3, 2]. Two dierent modalities are commonly used for face recognition: 2D images (including greyscale images and color images) and 3D data (including depth images, pointclouds and meshes). tices

A 2D image is represented by regular lat-

[u, v, I (u, v)] (or [u, v, R (u, v) , G (u, v) , B (u, v)]) with grey values or color

channels being stored within each lattice. A 2D image is a function of the scene geometry, the imaging geometry, the illumination conditions and the scene reectance [3]. Consequently, a 2D facial image is signicantly aected by variations in illumination and head pose [3].

In contrast, 3D data represent the

geometry of the scene and are thus less sensitive to pose and illumination variations [4, 5].

A pointcloud is a collection of unordered 3D point coordinates

that are dened with respect to a particular coordinate frame (e.g., the sensorcentered frame).

[u, v, d (u, v)];

A range image (also called a depth image) is a 2D image

where

u, v

are the pixel coordinates of a point in the scene; and

d

is the distance from the point to the sensor (or to another reference point) [6]. A mesh is a collection of vertices and edges that dene the shape of a face. Mesh faces usually consist of triangles, quadrilaterals or other simple convex polygons, with the triangular mesh representation being the most frequently used mesh

NV × 3 matrix NF × 3 matrix of the index numwhere NV and NF are the numbers of

type [7]. A triangular mesh is a data structure representing the of the 3D coordinates of its vertices and the bers of the vertices that form a triangle,

vertices and triangular faces of the range image, respectively [8]. Illustrations of a 2D greyscale image, color image, depth image, pointcloud and triangular mesh of a human face are shown in Fig. 1. In a face recognition system, 2D images and 3D data can either be used separately as an independent cue; or integrated into a multimodal 2D/3D framework [9]. 3D data-assisted 2D face recognition has also been investigated in previous studies [10]. Face recognition algorithms have been extensively investigated in previous studies [11, 12, 13, 14]. A number of commercial products for face recognition are currently available [15]. The typical pipeline of a face recognition system is shown in Fig. 2. The general task of a face recognition system is to identify or verify a person present

2

Figure 2: The typical pipeline of a face recognition system

in the input image using a set of faces (or their corresponding features) stored in a gallery [2]. For face identication, the system determines the identity of the input face by nding its most similar face in the gallery, which for face verication, the system accepts or rejects the claimed identity of the input face by checking if the similarity between the probe face and the gallery face is above a predened threshold. Face recognition is a typical pattern classication problem. First, 2D/3D still or movie images of a face are acquired with an imaging technique.

2D images are usually acquired with digital cameras,

while 3D data can be acquired with either active imaging techniques (such as time-of-ight, triangulation, and structured light) or passive imaging techniques (such as stereo vision) [16]. These acquired images are considered inputs to the face recognition system. Next, preprocessing is performed on these input images to smooth noise, remove spikes, ll holes, etc. The region of interest (ROI) in each image is then detected; this process is also known as face detection or face segmentation.

Several landmarks, such as the eyes or the nose, on the facial

image can also be identied if necessary.

Then, features are extracted from

each face region. During the oine training phase, the extracted features are stored in the gallery, and during the online recognition phase, these extracted features are fed into the face recognition system. Dierent classiers including K-nearest neighbors (KNN), neural networks (NN), support vector machines (SVM), AdaBoost, random forests (RF), and linear regression can be used for face recognition [17, 18, 19]. It is widely agreed upon that feature extraction plays a signicant role in many classication systems [20, 2]. On this basis, the focus of this article, is on feature selection (including 2D and 3D features) for face recognition. It is worth pointing out that sparse representation is now a popular tool for face recognition [21, 22, 23]. It treats the recognition problem as the classication of one object among multiple linear regression models.

In a sparse

representation based 3D face recognition system, the choice of features is however no longer critical. In contrast, a suciently large number of features and a correct calculation of the sparse representation are important. Therefore, simple features (such as down-sampled images, random projections) are sucient in this case. The remainder of this article is organized as follows. Section 2 describes the

3

background concepts and terminology of 2D and 3D face recognition. Sections 3, 4, 5 and 6 present a detailed overview of 2D holistic features, 2D local features, 3D holistic features and 3D local features, respectively. Finally, Section 7 introduces several feature selection and fusion algorithms.

2 Background of Face Recognition The pipeline of a typical face recognition system is shown in Fig.

2.

Each

module in the pipeline is described below in detail.

2.1 2D/3D Data Acquisition 2.1.1 2D Images Many digital cameras are available on the market to acquire 2D images of human faces.

Both still images and moving images (videos) can be captured with a

modern camera. The acquired images are usually digitized into arrays with a xed pixel resolution (e.g., 640×480).

2.1.2 3D Data 3D face recognition has several advantages, including its robustness with respect to variations in lighting, head pose, and sensor viewpoint. 3D facial data can be acquired with a 3D scanner using an active technique or a passive technique. Examples of active techniques include triangulation and structured light, while one type of passive technique is the stereo camera, as briey described below.

Triangulation

In this technique, the scanner shines a laser spot on the facial

surface, and a camera is then used to record an image of the spot. Once the center pixel of the spot is calculated, the location of the laser spot is nally determined by the triangle formed by the laser spot, the camera and the laser emitter. To scan a facial surface eciently, one approach is to scan the light spot over the whole facial surface using mirrors, while another approach is to use a plane rather than a beam of laser light [24, 16]. Although triangulation has a limited range of several meters, its accuracy is relatively high.

Several

systems using this technique have been developed, including the popular Konica Minolta Vivid 910 (as shown in Fig.

3(a)) and the Cyberware 3030.

The

triangulation technique can capture 3D facial scans with high resolution and quality. One of the major limitations of the triangulation technique is that it requires a relatively long time to acquire the 3D data, during which time the human subject must stay still [25]. Consequently, this technique is not suitable for 3D video recording.

Structured Light

In this technique, the scanner projects a pattern of light

onto a facial surface using an LCD projector or any other light source, and a camera measures the deformations of the pattern on the surface.

The dis-

tance of every point in the eld of view is then calculated based on the pattern deformation.

A structured light scanner can scan multiple points simultane-

ously. Dierent patterns (such as stripes, grids, elliptical patterns, and speckle

4

Figure 3: Three popular 3D scanners

patterns) have been studied in the past [24, 16]. Several systems using this technique have been developed, including the InspeckMega Capturor II 3D and the Microsoft Kinect (shown in Fig. 3(b)). The structured light technique is able to acquire sequences of 2D and 3D images in real time. However, the acquired 3D data come with a number of holes and artifacts caused by high refraction or low reection due to the underlying surfaces.

Stereo

The scanner uses two (or more) cameras (which are set slightly apart

from each other) to look at the facial surface, and the location of each point in the image is determined by matching corresponding points and comparing the information of the two images [26, 16].

Examples of systems using this

technique include the Geometrix system, the DI3D (Dimensional Imaging 3D) dynamic face capturing system, the 3DMD dynamic 3D stereo system, and the Bumblebee XB3 (shown in Fig. 3(c)). These scanners are able to simultaneously capture 2D and 3D data of a face. One of the major limitations of the stereo technique is that the accuracy of the reconstructed 3D facial scans is relatively low. This is mainly because it is very challenging to obtain reliable and accurate corresponding points in two facial images due to the relatively uniform appearance of a human face [24].

Besides, the computational eciency of a

stereo system is very low for high resolution images due to its time-consuming point correspondence process.

2.2 2D/3D Data Preprocessing The raw data captured by the sensors usually contain noise (for 2D/3D), spikes (for 3D) and holes (for 3D). They may also be acquired under dierent lighting conditions (for 2D), dierent head poses (for 2D/3D), and with varying resolutions (for 2D/3D). Therefore, a set of preprocessing operations is required before feature extraction and classication.

2.2.1 Noise/Spike Removal Noise can be generated by the optical components of the sensors (e.g., the lens, the CCD, and the mechanical parts), the external conditions (e.g., ambient light) and the facial properties (e.g., texture and makeup).

The removal of

surface/image noise is a challenging task because the noise is not easily distinguishable from the details of the underlying surface/texture. An optimal noise

5

removal algorithm is expected to smooth the undesired noise while preserving the details of the image. Popularly used noise removal techniques include 2D Wiener ltering [27], bilateral (smoothing) ltering [10], and bilateral mesh denoising [28].

Spikes (or impulsive noise) are commonly found in 3D data.

They are mainly caused by specular surfaces, such as the eyes, the tip of the nose, shiny teeth, and the facial oils [3]. Spikes can be removed using a simple thresholding technique or a median lter [10].

2.2.2 Hole Filling Holes in 3D facial data are mainly caused by spike removal, the specular reection of the underlying surface (such as from the sclera, pupil or eyelashes), the absorption of light in dark areas, self-occlusion and open mouths.

Small

holes can be lled using an interpolation technique such as linear interpolation, bi-linear interpolation or polynomial interpolation [10, 27, 29, 30]. Large holes can be lled using the symmetry of the face or principal component analysis (PCA) and morphable facial-model-based techniques [29, 3].

2.2.3 Face Detection Because raw facial data may contain both the human face and a large background area, face detection is used to segment the region of interest that precisely covers the facial area. Face detection can be performed using the cues of 2D images, 3D data, or a combination thereof. A number of 2D face detection algorithms are available in the literature, including face templates, the ViolaJones face detector [31], skin detection, and other learning-based algorithms [32]. 3D face detection is usually performed based on nose tip detection [9].

2.2.4 Data Normalization 2D facial image normalization usually includes illumination, scale and orientation normalization. A number of illumination normalization algorithms are available, including histogram equalization, gamma intensity correction, and homomorphic ltering-based algorithms [33]. Scale and orientation normalization can be achieved using facial landmarks, such as the eyes, nose, and mouth [19]. 3D facial data normalization usually includes pose correction and resampling.

Pose correction is required for holistic feature-based algorithms and

pose-sensitive local-feature-based algorithms. Several techniques have been developed for pose correction. PCA-based techniques correct the facial pose by aligning the 3D facial scan to its principal axes, which are calculated using PCA [9].

Sphere tting-based techniques achieve pose correction by tting the 3D

facial scan to a sphere [34]. Fiducial feature-based techniques use manually or automatically detected ducial points/lines (such as the nose tip, eyes and nose bridge) to achieve pose correction [29]. Reference face-based techniques use the ICP algorithm to register all 3D faces to a common reference face. Resampling is usually performed to obtain 3D data with a uniform (or desired) resolution and can be achieved by applying interpolation to the depth image. Note that the 3D pointcloud or mesh of a face can easily be converted into a depth image after pose correction.

6

2.3 2D/3D Feature Extraction The task of feature extraction is to encode the photometric/geometric information of a 2D/3D image using one or more feature vectors. A good feature should be discriminative, compact, and robust [20]. 2D and 3D features can be broadly classied into two categories: holistic features and local features [20]. Holistic features represent the face using the information of the whole facial image, whereas a local feature only encodes the information of part of the face, such as around the nose, the mouth, or the eyes.

Hand-crafted features are

widely used in most existing face recognition algorithms, and learning (especially deep learning)-based features are becoming increasingly popular. When feature extraction is used in conjunction with simple classiers (such as neural networks), the selection of an appropriate feature is considered critical to the success of the face recognition algorithm [23]. This has led to the development of a wide variety of feature extraction methods, as described in the following sections (Sections 3 to 6).

2.4 Classication The general term face recognition can refer to two dierent scenarios: identication and verication. In both scenarios, face images of known subjects are rst enrolled in a gallery. During the online recognition phase, images of these or other subjects are used as probes to match against the images in the gallery [15]. Face verication (authentication) is a binary classication problem in which a probe is compared using the gallery image with the claimed identity (i.e., oneto-one matching), and the claimed identity is accepted if the similarity score is above a given threshold.

Face verication can be used in many applications,

such as nancial transaction approval and portal control. The task is relatively simple because the user can be assumed to be cooperative. Face identication is a multiclass classication problem in which a probe is compared against all of the gallery images (i.e., one-to-many matching). The gallery image that is closest to the probe, with a similarity score higher than a given threshold, is used to determine the identity of the probe.

Note that the probe individual

may or may not be in the gallery. A typical application of face identication is to nd a suspect in a crowd. The task of face identication is more challenging than face verication. First, the larger gallery used for identication (compared to verication) results in a decrease in the recognition accuracy.

Second, no

collaboration with the probe subject can be assumed in the scenario of face identication. Although a face recognition system can be utilized in the context of either verication or identication, most phases in the pipeline (including image acquisition, preprocessing and feature extraction, as shown in Fig. 2) are exactly the same.

2.5 Criteria for Performance Evaluation A set of criteria has been proposed in the literature to evaluate the performance of face recognition algorithms.

7

2.5.1 Performance Criteria for Verication Commonly used criteria for face verication include the receiver operating characteristic (ROC) curve, the equal error rate (EER), and the verication rate (VR) at the false acceptance rate (FAR) of 0.1% ([email protected]%FAR). Two types of ROC curves have been used in the literature: one that plots the false rejection rate (FRR) versus the FAR, and one that shows VR versus FAR. The FAR is the percentage of probes that have been falsely accepted, the FRR is the percentage of probes that have been falsely rejected, and the VR is the percentage of probes that have been correctly accepted. VR, FRR and FAR all vary with the threshold for acceptance. The ROC curve is generated using a number of dierent thresholds. The EER is the error rate at the point on the ROC curve whereby FRR equals FAR. Either EER or [email protected]%FAR gives an important scalar characteristic of the verication performance using a single number.

2.5.2 Performance Criteria for Identication Commonly used criteria for face identication include the cumulative match characteristic (CMC) curve and the rank-1 recognition rate (R1RR). The CMC curve plots the percentage of correctly recognized probes versus the rank number that is considered as a correct match. The R1RR is the percentage of all probes for which the best match in the gallery belongs to the same subject. R1RR is the most frequently used measure with a single number for the evaluation of face identication performance.

2.6 Challenges in Face Recognition A number of issues can aect the performance of a face recognition algorithm, as described below. Therefore, appropriate facial features should be carefully selected to address these challenges.

2.6.1 Illumination Ambient lighting conditions vary signicantly between, and even within, days, especially in outdoor environments. Additionally, strong shadows can be cast by a direct lighting source due to the 3D structure of the human face.

Skin

reectance properties can also cause illumination variations. Consequently, the same face may take on signicantly dierent appearances due to illumination variation (as shown in Figs. 4(a-b)). It has been observed that changes caused by variations in illumination are often larger than the dierences between different individuals [13, 33], which makes 2D face recognition a highly challenging task. In contrast, 3D face recognition is more robust to illumination variations because only the shape/geometrical information (rather than texture information) is used in the system.

However, the 3D data acquired using stereo or

structured-light-based techniques can also be aected by illumination variations (as shown in Figs. 4(c-d)) [15].

8

Figure 4: 2D and 3D facial data under dierent illumination conditions. Figures (c-d) were originally shown in Bowyer et al. 2006 [15]

Figure 5: 2D and 3D facial data for dierent expressions. Figures (a) and (b) were originally shown in Naseem et al. 2010 [19]

2.6.2 Pose In many face recognition applications, the probe and gallery images have different poses. For example, a gallery image may contain a frontal face, whereas the probe might contain a rotated face (e.g., out-of-plane rotation).

2D face

recognition can be signicantly aected by pose variations due to projective deformation and self-occlusion in images [35]. 3D face recognition is more robust to pose changes because the pose can be corrected (see Section 2.2.4). However, if the pose variation is extremely large, the 3D face recognition accuracy can also be reduced due to missing data caused by self-occlusion [36].

2.6.3 Facial Expression Facial expression is one of the major challenges for both 2D and 3D face recognition algorithms. The geometric structure of the human face can dramatically deform in a complex way under dierent expressions, resulting in large deformations in both the 2D and 3D data (as shown in Fig. 5) [37, 38]. The recognition accuracy can therefore deteriorate due to the diculty in dierentiating expression deformations from interpersonal disparities [37].

2.6.4 Time Delay It is challenging to perform face recognition when the time delay between the gallery and the probe is not negligible because the human face changes in a nonlinear way over a long period of time [35].

Additionally, the facial data

may be acquired under signicantly dierent imaging conditions (e.g., dierent lighting and camera setups).

9

2.6.5 Occlusions The presence of occlusions (e.g., those caused by caps, sunglasses, and scarves) is one of the major problems faced by both 2D and 3D face recognition algorithms, especially for holistic-feature-based algorithms (see Sections 3 and 5) [35, 19].

3 2D Holistic Features These methods use the entirety of the 2D face as input to generate feature vectors.

The most widely used 2D holistic features include Eigenfaces, Fish-

erfaces, Laplacianfaces, independent component analysis (ICA), and discrete cosine transform (DCT).

3.1 Eigenfaces Kirby and Sirovich [39] introduced this pioneering work to eciently represent facial images using the common PCA technique (also known as the KarhunenLóeve transform or Hotelling transform). It has been demonstrated that a facial image can be spanned by its eigenvectors in a transformed space. Because these eigenvectors have the same dimensions as the original images, they are also referred to as Eigenpictures in [39] and as Eigenfaces in [40]. Therefore, a facial image can be approximately reconstructed using a small number of eigenvectors and their corresponding coecients (i.e., projections) in the transformed space.

These eigenvectors maximize the variance of the facial image and can

be obtained by applying eigenvalue decomposition on the scatter matrix of the facial images. Specically, PCA minimizes the mean squared error between the original image and the reconstructed image. If an image is reconstructed from all of the Eigenfaces, the mean squared error between the original image and its reconstructed counterpart is zero. The Eigenface-based 2D face recognition algorithm was initially proposed by Turk and Pentland [40], and it is briey described below. A set of characteristic facial images is rst collected from several individuals. This set should include several images for each individual to account for variations in expression and lighting. Assuming that

NI

training images have been

collected and that each two-dimensional image has already been transformed

X i,

into an one-dimensional vector

the covariance matrix

C

of these training

images is calculated using

C=

NI ∑ ( )( )T Xi − X Xi − X ,

(1)

i=1 where

X=

NI 1 ∑ X i. NI i=1

The eigenvalue decomposition is then applied to

CW = WD,

10

(2)

C: (3)

where of

C,

D

is a diagonal matrix, with its diagonal entries equal to the eigenvalues

W

and

is a matrix with columns equal to the eigenvectors of

C.

A

limited number of eigenvectors that correspond to the largest eigenvalues are selected as Eigenfaces.

These Eigenfaces are used to span a subspace with a

lower dimensionality than the original facial space. The three most signicant eigenvectors are usually discarded to improve feature robustness with respect to illumination variations. The gallery facial images are then projected onto the subspace along the Eigenfaces:

where in

WPTCA

ci = WT X i , X P CA

(4)

is the transposition of the combination of the selected Eigenfaces

W. When a probe image is given to the system, its projections on the Eigenfaces

are calculated using Eq.

4 to obtain the probe feature, which is compared

against the features in the gallery using a suitable distance metric to produce the face verication/identication results. Eigenface algorithms have been intensively investigated and are regarded as a benchmark in the area of face recognition [25]. They are computationally ecient, and single example per subject is sucient to generate of the Eigenfaces. However, because these algorithms do not consider the intraclass distribution of the training data, they are not suciently discriminative [3].

3.2 Fisherfaces Linear discriminant analysis (LDA) is a supervised learning algorithm that searches for the projection space in which points of dierent classes are far from each other, whereas points of the same class are close to each other [41]. The Fisherface algorithm was rst proposed in [42] using PCA and Fisher's LDA techniques. The procedure of the Fisherface algorithm is described in the following. Each training image

Xi

is rst projected onto a PCA subspace using Eq. 4.

The dimensionality of the PCA subspace is determined by is the number of all training images and

NC

NI − NC ,

where

NI

is the number of classes (subjects).

This operation is performed to ensure that the within-class covariance matrix (which is used in LDA) is non-singular. For the sake of simplicity, we will still use

Xi

to denote the images in the PCA subspace in the following steps.

The within-class and between-class covariance matrices (i.e.,

CW

and

CB )

of these projected training images are calculated as follows:

CW =

NC ∑

Ci .

(5)

( )( )T Ni X i − X X i − X ,

(6)

i=1

CB =

NC ∑ i=1

where

Ci

is the covariance matrix of the

the projected training images

{X i }, X i

11

i-th

class,

X

is the mean of all of

is the mean of the projected training

images of the the

i-th

i-th

Ni

is the number of projected training images in

Wopt

is selected as the matrix that maximizes the

class, and

class.

The optimal projection

ratio of the determinant of the between-class scatter matrix of the projected images to the determinant of the within-class scatter matrix of the projected images; that is,

( WLDA = arg max W

WLDA

WT CB W WT CW W

) .

(7)

is calculated as the set of generalized eigenvectors of

correspond to the largest generalized eigenvalues.

CB

and

CW

that

The generalized eigenvalue

decomposition is formulated as

C−1 W CB W = WD. Note that there are at most

NC − 1

(8)

non-zero eigenvalues.

Next, all gallery and probe facial images are projected onto the Fisherface subspace (with a maximum dimensionality of

NC − 1):

ci = WT WT X i . X P CA LDA

(9)

Finally, the probe and gallery facial images are compared in the Fisherface subspace. The Fisherface algorithms usually perform better than the Eigenface algorithms [42] because LDA extracts

discriminative features (which are more suitexpressive features (which are more

able for classication), whereas PCA selects

suitable for representation). One drawback of the Fisherface algorithms is that multiple images per person are required for the training of Fisherfaces, which is not always available in some applications [25].

3.3 Laplacianfaces Only the global structure is preserved when the PCA and LDA techniques are used.

To preserve the intrinsic geometry of the facial images and the local

structure, a locality preserving projection (LPP) technique was proposed in [41]. The Laplacianface algorithm uses LPP to learn a locality preserving subspace. Consequently, the eects of lighting variations, dierent facial expressions, and pose variations can be eliminated or reduced [41]. The procedure of the Laplacianface algorithm is described as follows. Each training image

Xi

is rst projected onto a PCA subspace using Eq. 4.

For the sake of simplicity, we will still use

Xi

to denote the images in the PCA

subspace in the following steps. A graph

G of NI

nodes is then constructed, with each node corresponding to

a facial image. An edge is connected between nodes among the of

X i.

k

nearest neighbors of

Xj,

or

Xj

i

and

is among the

j when either X i is k nearest neighbors

The constructed nearest neighbor graph is an approximation of the local

manifold structure. Next, the weight matrix

S of the graph G is calculated to model the face mani and j are connected,

ifold structure by preserving the local structure. If nodes then

12

(X i −X j )2

Sij = e− where

t

is a constant. If nodes

i

and

j

t

,

(10)

are not connected, then

Sij = 0.

The eigenvectors and eigenvalues for the generalized eigenvector problem are then calculated as follows:

XLX T W = λXDX T W, where

(11)

X = [X 1 , X 2 , . . . , X NI ], D is a diagonal matrix whose entries S, andL is the Laplacian matrix (i.e., L = D − S).

are the

column sums of

Finally, each facial image is projected onto the low-dimensional Laplacian-

face subspace as follows:

ci = WT WT X i , X LP P P CA WLP P T WT WLP P P CA where

(12)

is the set of generalized eigenvectors calculated from Eq.

11.

is the transformation matrix, and its column vectors are called

the Laplacianfaces. Because linear mapping preserves the manifold's estimated intrinsic geometry in a linear sense, the Laplacianface algorithm provides a better representation and achieves lower error rates in face recognition compared to Eigenface and Fisherface algorithms [41].

3.4 Independent Component Analysis Independent component analysis (ICA) is a generalization of PCA that has been widely used for subspace projection [43, 44]. Compared to PCA which considers only the second order moments, ICA also considers higher-order statistics. It can identify independent source components from their linear mixtures, thus providing an independent (rather than uncorrelated) image representation. Much of the important information of an image is encoded in the form of high-order relations between the image pixels.

These high-order image statistics oer a

signicant amount of information for face recognition [45]. Consequently, ICA gives a more discriminating representation than PCA [13]. One popular variant of the ICA algorithm is FastICA. The procedure of the FastICA algorithm [43] can be briey described as follows. Let

S

be the set of vectors of unknown source signals, and let

X

be the set

of vectors of observed mixtures (facial images). The mixing model can then be written as

X = AS, where

A

(13)

is an unknown mixing matrix.

The task of ICA is to estimate the independent source signals be achieved by calculating the separating matrix mixing matrix

A,

W

U,

which can

that corresponds to the

that is,

U = WX = WAS. To calculate the separating matrix

W,

the observed images

whitened, resulting in a set of whitened images

13

Z.

(14)

X

are rst

U i = WiT Z is calculated using { } ( { })2 4 2 . kurt (U i ) = E (U i ) − 3 E (U i )

Next, the kurtosis of

Wi

Finally, the separating vector

(15)

is obtained by maximizing the kurtosis.

Note that the linear projection of the whitened images produced by matrix

W

has the maximum non-Gaussianity of data distribution. Two dierent architectures have been introduced for ICA-based face recognition [43, 13].

The rst architecture considers the observed facial images X

as a linear combination of statistically independent basis images combined using an unknown matrix

A).

S

(which are

Consequently, a set of statistically

independent source images (i.e., independent image features) are generated for a given set of training images [46].

The second

architecture generates sta-

tistically independent coecients that represent the input images. Specically, the second architecture constructs image lters that produce statistically independent outputs [47]. PCA is used to reduce the dimensionality of the original images in both architectures.

It was reported that ICA algorithms achieve a

better recognition performance than Eigenface algorithms [13].

3.5 Discrete Cosine Transform In human vision, biological signals are transformed into signals conveying magnitude, phase, frequency, and orientation information to the higher visual center of the brain [48]. Because the discrete cosine transform (DCT) can represent all of the aforementioned attributes, it has been widely used for feature extraction from 2D images. There are four types of DCTs in the literature: DCT-I, DCT-II, DCT-III, and DCT-IV. DCT-II is the most popular because it is asymptotically equivalent to PCA for Markov-1 signals with a correlation coecient close to one [48]. The DCT-II algorithm has already been used for image compression in the standardization of the Joint Photographic Experts Group (JPEG). The procedure of the DCT algorithm [43] can briey be described as follows. Given a one-dimensional vector

X i that is transformed from the two-dimensional

image by concatenating all of its elements, the feature obtained by DCT is

ci = WX i , X where the DCT transformation matrix

W

(16)

is dened as

  1    √Np , W (k, n) =

where

Np

k=0 0 ≤ n ≤ Np − 1 , 1 ≤ k ≤ Np − 1 0 ≤ n ≤ Np − 1

) ( √ 2  (2n+1)πk  ,  Np cos 2Np

is the number of pixels in the image

Xi

and

k

and

n

(17)

are the row and

column indices, respectively. It is clear that DCT decomposes the image

X i into a weighted sum of basis X i can also be reconstructed

cosine sequences. Conversely, the original image from its DCT coecients.

Because illumination variations mainly lie in the

low-frequency band, the eects of lighting variations can be minimized using an appropriate subset of the DCT coecients [49].

14

4 2D Local Features These methods use the partial (local) region of a 2D face as an input to generate feature vectors. A number of 2D local features can be found in the literature [50, 51]. A few widely used features include Gabor jets, scale invariant feature transforms (SIFTs), speeded up robust features (SURF), and local binary patterns (LBP). Compared to holistic methods, local feature methods are more robust to illumination variations, viewpoint changes, facial expressions, and inaccuracies in face localization [2, 52].

4.1 Gabor Jets Wavelet analysis has been widely used to extract spatial frequency features for face detection [31] and face recognition [52]. A number of wavelet bases have been proposed in the literature, among which Gabor wavelets are the most popular because they provide the optimal resolution in both the spatial and frequency domains. Gabor wavelet-based face recognition was investigated in a dynamic link architecture (DLA) [53].

In the DLA algorithm, each face is

represented by a rectangular graph with deformable nodes.

Gabor wavelets

are used to extract a set of local features (Gabor jets) at each node of the graph.

Following DLA, an elastic bunch graph matching (EBGM) algorithm

was proposed for face recognition [54]. described as follows.

The EBGM algorithm can briey be

A face graph is rst manually built with nodes located

at 25 facial landmarks (including the corners of the eyes/mouth, the center of the eyes, and the noise tip, as shown in Fig. 6). A face bunch graph is then constructed by feature matching and manual correction. When a probe image is provided, its face graph is automatically generated and compared to all model graphs. The similarity between two faces is measured by the overall distance between the Gabor jets at the corresponding nodes of the two faces.

Figure 6: Landmarks for the elastic bunch graph matching (EBGM) algorithm. Figure originally shown in Albiol et al. 2008 [55]

The process to extract Gabor jets [52] is given below.

f β be the ratio to the central frequency. The 2D Gabor wavelet is then dened as Let

β

be the sharpness of the Gaussian in the

15

y -axis,

and let

η=

φ (x, y) =

( ( 2 )) f2 f 2 f2 2 exp − x + y exp (j2πf xr ) , πγη γ 2 r η2 r

(18)

where

Here,

f

xr = x cos θ + y sin θ,

(19)

yr = −x sin θ + y cos θ.

(20)

denotes the frequency of the modulating sinusoidal plane wave, and

denotes the orientation of the major axis of the elliptical Gaussian. σ , the 2D Gabor wavelet at location x Assuming that γ = η = √ 2π can be rewritten as

2

1 ∥k∥ exp φ (x) = 2π σ 2 where

k = 2πf exp (jθ)

(

2

− ∥k∥ ∥x∥ 2σ 2

2

= (x, y)

θ

T

) exp (jk · x) ,

(21)

is the central frequency component.

A set of Gabor wavelets

φΠ(fu ,θv ,γ,η) (x) can be extracted as features of a 2D

image, where

where

fmax

is the

fmax fu = (√ )u , u = 0, 1, . . . , U − 1, 2 vπ θv = , v = 0, 1, . . . , V − 1, 8 maximum frequency and fu and θv dene

(22)

(23) the frequency and

orientation of the Gabor wavelet. A total of

U ×V

Gabor wavelets are generated at each location. An example

showing the real parts of the Gabor wavelets with

U

equal to 5 and

V

equal to

8 is shown in Fig 7. Given an image

I , the Gabor wavelet response is calculated as the convolution

between the image and the Gabor wavelet.

Gu,v (x) = I ∗ φΠ(fu ,θv ,γ,η) (x) . Once a set of

U ×V

(24)

Gabor wavelets is used, a set of convolution responses can

be generated at the location

T.

x = (x, y)

All of these Gabor wavelet responses

are used to form a local feature known as a Gabor jet. The Gabor jet represents the information of a local patch at dierent scales and orientations.

In face

recognition, a face image can be represented by the Gabor jets generated at particular feature points or for the whole image [52, 53, 54, 56].

4.2 Scale Invariant Feature Transform The scale invariant feature transform (SIFT) algorithm uses a cascading ltering approach to eciently identify facial keypoints that are then selected in further detail using more time-consuming operations. Feature descriptors that represent local facial information around the keypoints are then extracted.

The SIFT

algorithm has been successfully used in a number of applications including object and face recognition [57, 9]. The SIFT algorithm [43] can briey be described as follows.

16

Figure 7: The real parts of the Gabor wavelets with 8, where

U

and

V

U

equal to 5 and

V

equal to

are the numbers of frequencies and orientations, respectively.

Figure originally shown in Shen and Bai 2006 [52]

(1) Scale-space Construction

The only possible scale-space kernel under

a set of reasonable assumptions is the Gaussian function [58]. Given an image

I(x, y),

the scale space

L(x, y)

of the image is dened as

L(x, y, σ) = G(x, y, σ) ∗ I(x, y), where * stands for the convolution operation, and

G(x, y, σ)

(25) is the Gaussian

kernel

G(x, y, σ) =

1 −(x2 +y2 )/2σ2 e . 2πσ 2

(26)

The dierence-of-Gaussian (DOG) images are calculated as the dierence between two nearby scales

D(x, y, σ) = (G(x, y, kσ) − G(x, y, σ)) ∗ I(x, y).

(27)

The process to generate the DOG images is shown in Fig. 8. The original image is incrementally convolved with Gaussian kernels to obtain a set of scalespace images. Neighboring Gaussian images are then subtracted to produce the DOG images. Once an octave is completed, the Gaussian image is down-sampled by a factor of 2, and the above process is repeated.

(2) Scale-space Extrema Detection

Scale-space extrema are detected by

searching over all scales and image locations in the DOG images. Each sample point in the DOG images is compared to its eight neighbors in the current image and to 18 neighbors in the neighboring scales (see Fig. 9). The sample point is selected as a potential keypoint if it is larger/smaller than all of its neighbors.

17

Figure 8: The generation of dierence-of-Gaussian (DOG) images. Figure originally shown in Lowe 2004 [57]

Figure 9: Extrema detection in the dierence-of-Gaussian (DOG) images. Figure originally shown in Lowe 2004 [57]

(3) Keypoint Localization

A Taylor expansion is applied to the scale-space

function, and the interpolated estimate of the location and scale of the extremum is determined.

A threshold is then applied to eliminate keypoints with low

contrast (which are sensitive to noise). Finally, the ratio of principal curvatures is used to delete those points that are poorly localized along an edge.

The

remaining points are selected as the nal set of stable keypoints.

(4) Orientation Assignment

For each keypoint, the gradient orientations

in its local neighborhood are weighted based on their corresponding gradient magnitudes and on a Gaussian-weighted window. These weighted gradient orientations are then aggregated into a histogram.

The gradient orientations at

the keypoint are determined by the dominant gradient directions that have a maximum value in the histogram.

Once one or more orientations have been

assigned to each keypoint, the coordinates and the gradient orientations in the local neighborhood of the keypoint are rotated relative to the assigned orientation. Consequently, the resulting keypoint descriptors are invariant to image rotations.

18

(5) Keypoint Descriptor

A SIFT feature is generated at each keypoint

using the gradients in its local neighborhood. First, the local neighborhood of

4×4

a keypoint is divided into

regions. In each region, a histogram with eight

orientation bins is generated, as shown in Fig. (i.e.,

4 × 4 × 8)

10.

There are a total of 128

elements in each SIFT feature vector.

To achieve robustness

against illumination variations, the feature vector is further normalized to a unit vector. The SIFT descriptor was also referred to as histograms of oriented gradients (HOG) in [59] and has been widely used in human detection and face recognition [9, 60, 55, 59].

Figure 10: The generation of the SIFT descriptor. Figure originally shown in Lowe 2004 [57]

4.3 Speeded Up Robust Features The speeded up robust feature (SURF) algorithm provides a scale- and rotationinvariant 2D image feature detector and descriptor [61, 62]. The SURF algorithm achieves a comparable performance to SIFT in terms of repeatability, distinctiveness, and robustness but exhibits a superior performance in regard to computational eciency. It uses integral images to speed up the process of image convolution, a Hessian matrix-based measure for feature detection, and a distribution-based method for feature description. recognition has been demonstrated in [63, 64].

Its eectiveness for face

The procedure of the SURF

algorithm [61, 62] can briey be described as follows.

4.3.1 Keypoint Detector An integral image

I Σ (x)

at a location

I Σ (x) =

x = (x, y)

y x ∑ ∑

T

is simply dened as

I(i, j).

(28)

i=1 j=1 Once the integral image is ready, only three additions are sucient to calculate the sum of the intensities over any rectangular area in the image.

H (x, σ) at pixel x with a scale σ is [ ] Lxx (x, σ) Lxy (x, σ) H (x, σ) = , Lxy (x, σ) Lyy (x, σ)

The Hessian matrix

(29)

Lyy (x, σ) are the convolutions of the Gaus∂2 ∂2 ∂2 sian second-order derivatives ∂x2 G (x, σ), ∂xy G (x, σ) and ∂y 2 G (x, σ) with where

Lxx (x, σ), Lxy (x, σ),

calculated as

and

19

the image

I

at pixel

x,

respectively. These second-order Gaussian derivatives

are further approximated with box lters, which can be implemented very e-

Dxx , Dxy , Dyy . Fig. 11 shows the Gaussian second-order partial derivatives Lyy (x, σ) and Lxy (x, σ) and their approximations Dyy and Dxy using box lters. ciently using integral images. These approximations are denoted by

and

The Gaussian second-order partial derivatives (Lyy

Figure 11:

Lxy (x, σ))

and their approximations (Dyy and

Dxy )

(x, σ)

and

using box lters. Figure

originally shown in Bay et al. 2006 [61] The determinant of the approximated Hessian matrix is calculated as

2

det (H (x, σ)) = Dxx Dyy − (0.9Dxy ) .

(30)

The lter responses are further normalized with respect to the mask size to guarantee a constant Frobenius norm for dierent lter sizes. In contrast to SIFT, which uses image pyramids to construct scale spaces, SURF applies lters with dierent sizes directly onto the original image so that the scale space is constructed by increasing the lter size rather than by iteratively reducing the image size. Because box lters and integral images are used, the computational time for each scale is exactly the same. The keypoints are nally determined by the maxima of the determinant of the Hessian matrix in each

3×3×3

neighborhood of the scale space along the scale and spatial

dimensions.

4.3.2 SURF Descriptor The rst step is to determine a repeatable orientation for the local neighborhood around a keypoint. For that purpose, the Haar wavelet responses in the

y

x

and

directions in the neighborhood of the keypoint are calculated. The wavelet

responses are then weighted using a Gaussian lter centered at the keypoint. The dominant orientation is nally estimated by calculating the sum of all π responses within a sliding orientation window that covers an angle of 3 . To extract an orientation-invariant feature descriptor, a square region centered at the keypoint is selected and oriented along the dominant orientation. The region is then regularly divided into

4 × 4 square sub-regions.

Let

dx

and

dy

denote the

Haar wavelet responses in the horizontal direction and the vertical direction, respectively. These wavelet responses are summed over each subregion, and the same operation is applied to the absolute values of these responses,

∥dy ∥,

resulting in a vector

(



dx ,



dy ,



∥dx ∥ ,



∥dy ∥).

∥dx ∥

and

The vectors from all

of the sub-regions are concatenated to form the SURF feature descriptor. The descriptor is further normalized into a unit vector to achieve invariance with respect to contrast. Consequently, the SURF descriptor is invariant to rotation, scale, brightness and contrast.

20

4.4 Local Binary Patterns The local binary pattern (LBP) operator is a widely used feature descriptor for texture classication and face recognition [65, 66, 67]. LBP is highly discriminative, computationally ecient, and invariant to monotonic grey-level changes. The LBP operator assigns a binary label (i.e., 0 or 1) to each pixel (used as the center pixel) of an image by comparing its neighboring pixels to the center pixel value. The descriptor is then generated as the histogram of the binary labels. Given a pixel

Ic

in the image, its neighboring pixels

selected. The LBP response at the pixel

LBP =

P −1 ∑

Ic

(I0 , I1 , . . . , IP −1 )

are

is then calculated as

s (Ip − Ic ) 2p ,

(31)

p=0 where

{ 1, x ≥ 0 s (x) = . 0, x < 0

(32)

For the basic LBP, the neighboring pixels are selected using a neighborhood around the center pixel

Ic .

3×3

square

Fig. 12 gives an illustration of the

basic LBP operator.

Figure 12: An illustration of the basic LBP

To calculate LBPs at dierent scales, the LBP operator is extended to encapsulate the information of neighborhoods with dierent sizes. The extended LBP selects the neighboring pixels as a set of sampling points distributed evenly along a circle with a center

Ic

and a radius of

r

(as shown in Fig.

13).

The

extended LBP is therefore able to encode the neighborhood with dierent radii and numbers of sampling points.

Because the coordinates of each sampling ( 2πp ) point (related to the center pixel Ic ) are determined by −R sin and by P ( 2πp ) R cos P , the sampling point may not lie exactly in the center of a pixel. In this case, the grey value of the sampling point is estimated by bi-linear interpolation. Note that, in the literature, LBP operator, while

LBP R,P

LBP

is usually used to denote the basic

is used to represent the extended LBP where

R

is the radius of the circle and P is the number of sampling points. P A total of 2 dierent output values can be generated by the LBP R,P operator, with some of the values corresponding to the same patterns following rotation. To remove the eect of rotation, a rotation-invariant LBP is proposed [66].

21

Figure 13:

The circular (8,1), (16,2), and (8,2) neighborhoods for

LBP R,P .

Figure originally shown in Ahonen 2006 [65]

LBP ri R,P = min {ROR (LBP R,P , p) |p = 0, 1, . . . , P − 1} , where ROR (x, p) performs a circular bit-wise right shift on the

x

by

p

The

P -bit

(33) number

times.

LBP R,P

operator keeps only the rotation-invariant patterns, resulting

in a signicant reduction in the number of output patterns. For example, the number of patterns for

LBP ri R,8

LBP R,8

is 256, whereas the number of patterns for LBP ri R,P is inferior to LBP R,P

is only 36. However, the performance of

[68]. To further improve the discriminative power and rotation invariance of LBP, another extension is proposed that uses only uniform patterns. A local binary pattern is determined to be uniform only if there are at most two 0/1 transitions in the pattern.

{∑ P −1 LBP riu2 R,P

=

p=0

s (Ip − Ic ) ,

P + 1,

U (LBP R,P ) ≤ 2 , otherwise if

(34)

where

U (LBP R,P ) = |s (IP −1 − Ic ) − s (I0 − Ic )| +

P −1 ∑

|s (Ip − Ic ) − s (Ip+1 − Ic )| .

p=1 (35) There are a total of label from 0 to neous label P riu2 by LBP R,P .

P +1

uniform binary patterns, each assigned a unique

P , with the nonuniform patterns all grouped under the miscella+ 1. Consequently, P + 2 distinct output values can be produced

The nal feature descriptor is generated as the histogram of the pattern lariu2 bels accumulated over a patch of an image. LBP R,P outperforms LBP R,P and LBP ri R,P in terms of rotation invariance, feature dimensionality and discriminative power [68]. For 2D face recognition, the facial image is usually divided into several local regions. LBP descriptors are rst extracted from each region and subsequently concatenated to form a nal global descriptor of the face that encodes both the appearance and the spatial structures of the facial regions. The descriptor represents the face on three levels of locality. First, the patterns on the pixel level are encoded by the LBP labels. Second, the information on a regional level is represented by the histogram of the pattern labels over a small region. Finally,

22

the global information of the face is described through the concatenation of the regional LBP histograms [69, 65].

5 3D Holistic Features These methods use information from the whole face or from large regions of the 3D face to generate feature vectors [36]. Examples of 3D holistic features include Eigenfaces (PCA), Fisherfaces (LDA), ICA, iterative closest point (ICP), extended Gaussian images, canonical forms, spherical harmonic features, and the tensor representation.

5.1 Iterative Closest Point The iterative closest point (ICP) algorithm has been widely used to perform pointcloud registration for 3D face recognition [9, 70]. The ICP algorithm treats the original 3D facial data as features and does not extract any specic highlevel features from the 3D facial data. The ICP algorithm starts with a search of the closest points between two pointclouds. These closest point pairs (i.e., correspondences) are used to calculate a rigid transformation (rotation translation

t)

R

and

between these two pointclouds. The estimated transformation is

then applied to one of the pointclouds to make the two pointclouds  closer to each other. The procedure is repeated/iterated until the mean square error (MSE) of the correspondences is below a threshold or until a set maximum

[ ] P = p1 , p2 , . . . , pNp

number of iterations is reached. Given two pointclouds

[ ] Q = q 1 , q 2 , . . . , q Nq , P and Q, respectively, the

and

pi and q j are the points in the pointclouds closest point to pi in the pointcloud Q can be obtained by where

) ( q i′ = arg min pi − q j .

(36)

q j ∈Q

The Euclidean distance is commonly used in Equation 36 to measure the dierence between two points. The simplest way to nd the closest point is to perform a brute force search. A faster alternative is to use an appropriate data structure or an indexing method e.g., The point pair

ci = (pi , q i′ )

ated distance being denoted as

k -d tree.

is considered a correspondence, with its associ-

di .

Consequently, a set of correspondences pointclouds

P

transformation

Q. (R, t).

and

C = {ci }

can be generated from the

The correspondence set is then used to generate a rigid Several lters can be applied to

C

to further improve

the accuracy and robustness of the subsequent transformation estimates. Standard lters include those using the distances between corresponding points, the surface normal variations between corresponding points, and boundary checks. The task of transformation estimation is to nd the rotation matrix translation vector

t

that minimize the MSE

e

R

and

between corresponding points in

C: e=

Nc 1 ∑ ∥q i′ − Rpi − t∥ . Nc k=1

23

(37)

The rotation matrix

R can be calculated using a quaternion-based algorithm

or a singular value decomposition (SVD) algorithm [71]. Once

R

is obtained, the translation vector

∑Nc

t

is calculated as

t = q − Rp, ∑Nc pi . p = N1c k=1

(38)

1 k=1 q i′ and Nc The pointcloud P is then transformed by replacing each point

where

q=

pi with Rpi −t. The aforementioned process is repeated again between the transformed pointP

cloud

e

and the original pointcloud

Q.

The process continues until the MSE

is below a pre-dened threshold or until the number of iterations reaches a

maximum set limit. The MSE

e can be used alone or in combination with other

measures as a matching metric between two 3D faces [9]. The ICP algorithm requires a coarse initial alignment between two 3D faces to ensure that it converges to the global minimum rather than to a local minimum. Pose correction or feature-based initial face alignment should therefore be performed prior to the ICP registration [9]. The ICP algorithm only performs well on rigid objects. Because human faces are non-rigid and can be deformed due to facial expressions, several 3D face recognition approaches have been proposed based on the ICP algorithm to only register the rigid or semi-rigid parts of faces [9, 72, 73]. The ICP algorithm is additionally known to be very time consuming.

5.2 Eigenfaces, Fisherfaces and ICA Eigenfaces, Fisherfaces and ICA are popular algorithms for 2D face recognition (see Sections 3.1, 3.2 and 3.4), and have also been successfully extended to 3D face recognition.

5.2.1 Eigenfaces The Eigenfaces (PCA) algorithm has been widely used to perform 3D face recognition from either depth images or pointclouds [74, 75, 37, 76, 77, 78].

It is

regarded as a baseline for the evaluation of 3D face recognition algorithms, for example, in the Face Recognition Grand Challenge (FRGC) v2.0 [79]. The process to generate 3D Eigenfaces is almost the same as 2D Eigenfaces (see Section 3.1 for more details) with the major dierence being that the one-dimensional vector

Xi

of intensity values in Section 3.1 is replaced by a one-dimensional

vector of depth values (for depth images) or a three-dimensional vector of point coordinates (for pointclouds). The face recognition pipeline of 3D Eigenfaces is the same as that of 2D Eigenfaces. A set of 3D Eigenfaces is rst generated by performing PCA on the 3D data of training faces. All gallery 3D faces are then projected onto the subspace spanned by the Eigenfaces. Once a probe 3D face is provided, it is projected onto the subspace and compared to all of the gallery faces to obtain the recognition results. The PCA algorithm can be used to address facial expressions by including expressive faces in the training dataset, as demonstrated in [80]. The PCA algorithm can also be used to model facial deformations caused by expressions [36].

For example, Al-Osaimi et al.

[37] produced patterns of expression de-

formations using eigenvectors from the training data. These learned patterns

24

are then used to morph out the expression deformations. Consequently, the facial deformations caused by facial expressions are separated from those caused by interpersonal disparities, making the face recognition system more robust against face expressions.

5.2.2 Fisherfaces 3D Fishersurfaces are generated by performing 3D LDA analyses on the 3D facial data. The process to generate 3D Fisherfaces is almost the same as the process for 2D Fishersurfaces (see Section 3.2 for more details). It was demonstrated that, when multiple training 3D faces are available for each subject, the 3D Fisherfaces algorithm produces signicantly better results than does the 3D Eigenfaces algorithm [81]. It was also shown that the 3D Fisherfaces algorithm is superior to the 2D Fisherfaces algorithm, and the 2D+3D Fisherfaces algorithm outperforms both of the algorithms [82]. Note that LDA can also be used in combination with PCA or other feature extraction methods, as described in [83, 27, 29].

5.2.3 ICA ICA has been used to recognize 3D faces from range images [84].

Compared

to PCA, ICA uses not only the second order statistical properties, but also the higher-order relations. It projects the 3D facial data onto a set of statistically independent basis vectors. For more details on the ICA technique, the reader is referred to Section 3.4. It was demonstrated that the performance of 3D ICA is better than that of 3D PCA (Eigenfaces) [84].

5.3 Extended Gaussian Image The extended Gaussian image (EGI) has been used for 3D symmetry detection and 3D face recognition [85, 86]. The rst step for extracting the EGI feature is to dene a Gaussian sphere generated by dividing a sphere into a number of

ϕ in a spherical coordinate Nθ and Nϕ bins, respectively. Each cell its associated angles θi and ϕj : ( ) 2π 1 θi = i+ , i = 0, 1, . . . , Nθ , (39) Nθ 2 ( ) π 1 ϕj = j+ , j = 0, 1, . . . , Nϕ . (40) Nϕ 2

cells using the azimuth angle

θ

and elevation angle

system, which are equally divided into

cij

is determined by

The surface normals of a 3D face are then mapped onto the Gaussian sphere. Each surface normal of the 3D face can be represented in a spherical coordinate system with the vector

r)

is 1 and

θn

and

ϕn

(r, θn , ϕn ),

where the length of the surface normal (i.e.,

are the azimuth and elevation angles. The corresponding

cell of the surface normal in the Gaussian sphere can be easily determined by checking its azimuth and elevation angles. The number of surface normals falling into each cell of the Gaussian sphere is counted, resulting in an

Nθ × Nϕ

vector. Each entry in the vector therefore represents the frequency of 3D points whose corresponding surface normals are mapped to the corresponding cell of

25

the Gaussian sphere.

The vector is then normalized using the number of 3D

points, resulting in the generation of the EGI feature.

5.4 Canonical Form Bronstein et al. [87, 88] treated facial surfaces as deformable objects in a Riemannian geometry and assume that facial expressions can be modeled as isometrics of the facial surface. Specically, the intrinsic geometrical properties of facial surfaces are invariant to expression variations. The task of selecting an expression-invariant face representation is therefore transformed into the task of nding an isometry-invariant facial representation. On this basis, they proposed an expression-invariant facial representation using bending-invariant canonical forms (as shown in Fig. 14). In the proposed 3D face recognition algorithm, isometric embedding plays a core role.

The isometric embedding is achieved

by rst calculating the geodesic distance between any two points on the facial surface followed by a multi-dimensional scaling (MDS). The process to generate the canonical form of a 3D facial surface is described below. Let

(S, g) and (Q, h) be two isometric Riemannian surfaces that represent two

facial surfaces with dierent expressions of the same subject. If the two surfaces

f,

are related by an isometry each point pair

(

(

( )) f (pi ) , f pj

pi , pj

)

then their geodesic distances are preserved. For

on the surface

In practice, the surfaces

S and their corresponding pair of points ( ) ( ( )) Q, we have dS pi , pj = dQ f (pi ) , f pj . Q can be sampled using dierent numbers and

in the surface

S

and

orders of sampling points. Therefore, it is impossible to directly use the distance matrix of the facial surface as an invariant representation [87]. Instead, isometric embedding is performed to represent the Riemannian surface in an embedding m space M . Isometric embedding is a mapping φ between two nite metric spaces such that

df ij = dij

for all

i, j = 1, 2, . . . , N :

( ) m e f2 , . . . , pf φ : ({p1 , p2 , . . . , pN } ⊂ S, D) → {f p1 , p N} ⊂ M , D , where

D = {dij }

and

e = D

{ } df ij

(41)

represent the mutual geodesic distances be-

tween the points in the original and embedding spaces, respectively. The canon-

(S, g) is generated by mapping the image {p1 , p2 , . . . , pN } under using the MDS technique. Note that the canonical form is dened up to an m isometry group in the space M so that the ambiguity is up to a translation,

ical form of

φ

rotation and reection transformation in a Euclidean space. To resolve this ambiguity, each canonical form is aligned. The 3D faces are ultimately matched based on the high-order moments of their canonical forms. Note that the embedding error is optimized globally and that the canonical form of a 3D face is aected by all of the points of the facial surface.

It is

therefore important to segment the same region of the facial surface for dierent 3D faces to achieve a robust performance.

5.5 Spherical Harmonic Features The spherical harmonic features (SHF) were proposed to encode 3D faces using the energies contained in spherical harmonics at dierent frequencies [34]. Each

26

Figure 14: The canonical forms (subgures f-j) of faces with strong facial expressions (subgures a-e). Figure originally shown in Bronstein et al. 2005 [87]

3D face is rst represented by a spherical depth map (SDM), and an SHF representation is then generated from the SDM. To generate the SDM of a 3D face, a sphere is rst tted to the pointclouds of the 3D face.

The 3D facial surface is then translated to the center of the

tted sphere to achieve translation invariance, and the scale of the Cartesian coordinates of the 3D face is normalized to a uniform length. Next, the coordinates of the input 3D face are transformed into a spherical coordinate system with coordinates

(r, θ, φ).

The pose of the 3D face is also normalized to a uni-

form orientation. The SDM of the 3D face is nally extracted by performing interpolation on a grid of

(θ, φ).

The spherical harmonics of the SDM are then used to represent the 3D face:

ˆ



ˆ

π

f (θ, φ) ylm (θ, φ) sin θdθdφ,

hlm = 0 where

f (θ, φ)

is the SDM of the 3D face and

ylm (θ, φ)

is dened as

√ m m  m>0 √2Kl cos (mφ) pl (cos θ) , m m yl (θ, φ) = 2Kl sin (−mφ) p−m (cos θ) , m τK

K > τK

bridge is expected to be a saddle region with

τK

and

τH

H < τH , each eye H < τH , and the nose and with H > τH , where

and with a mean curvature

cavity is expected to be a pit with

and with

K < τK

are thresholds close to zero [72, 92]. Landmarks can also be detected

using the shape index values of the facial points [70], the curvature properties of the facial prole curves [93], or the radial symmetry of the 3D facial surface [94]. The major disadvantage of the landmark-based features is their sparsity. Specically, part of the useful facial information is not represented by these features [36]. A representative example of the landmark-based features is the anthropometric facial distance feature [83]. Gupta et al. [83] used established anthropometric facial proportions of the human face and the textural and/or shape information of the facial points to detect 10 anthropometric ducial points (as shown in Fig. 15). The 3D Euclidean and geodesic distances between these anthropometric

29

facial ducial points are calculated and used as facial features. The geodesic distances along the facial surface are calculated using the Dijkstra's shortest path algorithm [95].

From each of the sets of the Euclidean and geodesic distance

features, a subset of the most discriminative distance features is selected using the stepwise linear discriminant analysis algorithm [96]. The selected Euclidean and geodesic distance features are then combined and the stepwise linear discriminant analysis algorithm is used again to identify the nal combined set of anthropometric facial distance features.

These features are further projected

onto a low-dimensional space using the linear discriminant analysis technique (see Section 3.2). Note that the frontal upright position of a face is required to achieve robust ducial point detection [83].

(a)

(b)

Figure 15: The 10 anthropometric facial landmarks for feature extraction on a color (subgure (a)) and range image (subgure (b)). Figure originally shown in Gupta et al. 2010 [83] Likewise, Gordon [97] used the left and right eye widths; the eye separation; the total width of the eyes; the nose height, width, and depth; and the head width as features for face recognition. Moreno et al. [98] segmented the facial surface into regions using the HK segmentation method and proposed 86 features (including the areas, distances, angles, and curvature average of each segmented region) for face recognition. Xu et al. [99] used Gaussian-Hermite moments to encode the areas around a set of landmarks (e.g., the mouth, nose, and left and right eyes).

6.2 Curve-based Features These algorithms use a set of curves as features to represent a 3D face. curves can be contours or proles.

The

Contours are closed and non-intersecting

curves with dierent lengths, whereas proles are open curves with a starting and end points [36]. Curve-based features are usually more discriminative than the landmark-based features because they are denser (less sparse) and because they encode more geometrical information on the underlying surface. Additionally, curve-based features are able to capture shape information from dierent subregions of the facial surface, which improves the robustness with respect to facial expression. However, these algorithms rely on the accurate localization of the proles, and part of the facial shape information is still unable to be represented by these features [36].

30

6.2.1 Contour-based Features Facial contours can be classied into iso-depth curves, iso-radius curves, and iso-geodesic curves.

Iso-depth Curves

Iso-depth curves are obtained by extracting the intersec-

tions between the 3D facial surface and a set of planes that move along a specied direction. Let

S

be a facial surface and

{Cλ } of F

that a set of level curves

F :S→R

be a continuous function so

can be extracted as

Cλ = {p ∈ S|F (p) = λ}.

F

is selected as the depth value in [100], i.e., F (p) = zp , where zp is the z compo3 nent of the point p ∈ R . Consequently, the extracted level curves are iso-depth

curves because the points on each curve have the same depth value. The 3D

S can be reconstructed from these iso-depth curves by S = ∪λ Cλ {Cλ |λ ∈ R+ }. In practice, the surface S can be approximated curves {Cλ } using a limited number of samples of λ. Fig. 16 (a)

facial surface

using the set by the level

shows an example of a 3D facial surface and its corresponding iso-depth curves. For a smooth surface

S,

F is smooth and easy to calculate, F are planar curves that are easy to

the depth function

and the resulting iso-depth curves

{Cλ }

of

compare. The shapes of these curves are invariant to rigid transformations of the 3D face in the plane that is perpendicular to the z -axis. To compare two ( ) S 1 and S 2 , the geodesic length d Cλ1 , Cλ2 between the corresponding 1 2 1 iso-depth curves Cλ and Cλ must rst be calculated. The distance between S 2 and S is dened as the Euclidean mean de or the geometric mean dg of the 3D faces

distances between the corresponding iso-depth curves of the two facial surfaces:

(

de S 1 , S

) 2

( =

∑ ( )2 d Cλ1 , Cλ2

)1/2 ,

(50)

,

(51)

λ∈Λ

( ) dg S 1 , S 2 =

(

∏ ( ) d Cλ1 , Cλ2

)1/|Λ|

λ∈Λ where the set

Λ

is obtained by uniform sampling of the facial depth values.

Iso-depth curves have also been investigated in [101] for 3D face recognition.

(a) Iso-depth curves

(b) Iso-geodesic curves

Figure 16: Contour-based features. Figure originally shown in Samir et al. 2006 [100] and in Samir et al 2009 [102]

31

Iso-radius Curves

Iso-radius curves are generated by extracting the inter-

sections between the 3D facial surface and a set of spheres or cylinders with dierent radii. In [103], an iso-radius curve was extracted from the intersection between the 3D facial surface and a cylinder with a center axis parallel to the

z -axis.

Dierent curves can be generated by changing the radius of the cylinder.

Iso-geodesic Curves

An iso-geodesic curve is dened by the points that have

equal geodesic distances to a reference point (e.g., the nose tip). Iso-geodesic Let S be a facial surface, and pr be a prominent reference point (e.g., the nose tip) on S . A function dist : S × S → R is dened as the geodesic distance function from the reference point pr to any point on the facial surface S . A set of level curves {Cλ } of F curves are more robust to facial expressions.

let

can then be extracted as

Cλ = {p ∈ S|dist (pr , p) = λ} ⊂ S, λ ∈ [0, ∞) . In [102], the denition of a curve so that

pr





was expanded to include a thin strip rather than

includes all the points with distances to the reference point

that are in the range of

[λ − δ, λ + δ].

The width of the strip is dened by

2δ ,

and it is usually small. Iso-geodesic curves are invariant to rigid transformations and can be used to reconstruct the 3D facial surface [102]. Riemannian analysis is then used to compare the iso-geodesic curves between two 3D facial surfaces. Fig.

16 (b) shows an example of a 3D facial surface and its corresponding

iso-geodesic curves. Iso-geodesic curves have also been investigated in [104, 105, 106, 101, 107, 108] for 3D face recognition.

6.2.2 Prole-based Features In [103], vertical and horizontal proles were extracted as the intersections of the 3D facial surface with a set of planes that are parallel to the

yz

and

xz

planes,

respectively. The range data along each prole are used as the feature descriptor of the prole, and the Euclidean distance is used to compare two proles. Good results are achieved by the vertical proles in the central region which contains the nose, the mouth and the inner corner of the eyes. The horizontal proles of the upper part of the face achieve a better recognition performance compared to those of the lower part. It has been found that the central vertical prole outperforms all the other proles [103, 109, 93]. In [109, 110], the central and lateral vertical proles were extracted by looking for the vertical symmetry axis of the Gaussian curvature values of the 3D facial surface [111]. The local curvature values along the proles were used to form the feature vectors.

In

[112, 113], the symmetry plane of the facial surface was determined by surface alignment.

Two horizontal proles were then extracted across the nose and

forehead areas, and the partial Hausdor distance was subsequently used to match the proles. In [114], angular radial signatures (ARSs) were proposed to represent a 3D face.

ARSs are dened as a set of curves originating from the nose tip with

dierent angles from the

x-axis.

To speed up the feature extraction process,

a binary mask is rst dened on the

xy

plane (see Fig.

mask contains 17 paths, each containing 20 points.

32

17 (a)).

The binary

Each path corresponds

to one direction of the ARSs and can be considered as the projection of the corresponding ARS on the

xy

plane.

The depth value at each point of these

paths is used to form the feature descriptor. Consequently, a 3D facial surface is represented by 17 ARSs, each of dimensionality 20, as shown in Fig.

17

(b). These ARS features are fed to an SVM for face recognition. The major advantage of this algorithm is its high computational eciency. In [115], radial curves are investigated for 3D face recognition.

(a)

(b)

Figure 17: An illustration of the angular radial signature (ARS). (a) The binary mask used for ARS extraction. (b) The 17 ARSs extracted from the semi-rigid region of a face. Figure originally shown in Lei et al. [114]

6.3 Patch-based Features These algorithms generate local features using the geometric information of several local patches of the 3D facial surface. A number of patch-based features have been proposed, with a few representative examples listed below. For additional 3D keypoint detection and local feature description algorithms, the reader should refer to the review papers [116, 20, 117].

6.3.1 Point Signatures Point signatures were initially proposed for 3D object recognition [118, 20] and then extended to the area of 3D face recognition [56, 119, 25]. Given a point a 3D curve

S

C

p,

can be obtained as the intersection between the 3D facial surface

and a sphere of radius

r

that is centered at the point

p.

A plane

P

is tted

to the curve C and the normal n1 of the tted plane is calculated. Next, a new ′ plane P is obtained by translating the tted plane P to the point p along the ′ direction n1 . A planar curve C is generated by projecting the 3D curve C onto ′ ′ the plane P and a signed distance is then calculated for each point on C by ′ measuring the distance between the point on C and its corresponding point on

C.

n2 is therefore selected as the unit vector from p to C ′ with the largest positive distance. Consequently, the orientation of the curve C is determined by the local reference frame formed by n1 , n2 and n1 × n2 . Finally, each point on C is characterized by d (θ), where d is the signed distance from the point to its corresponding point on C ′ and θ is the clockwise rotation angle about n1 from the reference direction n2 . The ◦ ◦ ◦ angle θ is sampled in the range of [0 , 360 ] with an interval of △θ (e.g., 10 ). The discrete set of values d (θi ) , i = 1, 2, . . . , Nθ is used to represent the local A reference direction

the projected point on

facial surface.

33

6.3.2 Fitted Local Surface Mian et al. [120] proposed 3D keypoint detection and tted local-surface-based feature description methods for 3D face recognition. Given a pointcloud of a

[xi , yi , zi ] , i = 1, 2, . . . , Np , the face is subsampled at uniform intervals in x and y dimensions and a local surface L is then cropped from the facial pointcloud at each sample point p using a sphere of radius r . The principal axes of the local surface L are calculated using the PCA technique, and the local surface is aligned with its principal axes. The dierence δ between the

face the

lengths of the local surface along the rst two principal axes is used to detect a set of keypoints. If the dierence

δ

is above a threshold, the sample point

p

is considered a keypoint; otherwise, it is rejected. For each keypoint, a local ′ feature is extracted from its neighborhood L . A surface is tted to the points ′ in L using the D'Erico algorithm, and the tted surface is then sampled on a uniform

20 × 20

lattice, resulting in a feature vector with a dimension of 400.

The feature vector is further compressed through a projection onto a subspace dened by the eigenvectors of their largest eigenvalues. The pose invariance of these feature vectors is guaranteed due to the use of local reference frames.

6.3.3 Signature of Histograms of Orientations Signature of histograms of orientations (SHOT) has been successfully used for 3D face recognition [121] and 3D object recognition [122, 123, 20].

Given a

p and its neighboring points pi within a radius r, a weighted covariance C is calculated as

keypoint matrix

C= ∑

∑ 1 T (r − di ) (pi − p) (pi − p) , (r − d ) i i:di ≤r

(52)

i:di ≤r

where

di = ∥pi − p∥.

Three eigenvectors

x+ , y +

and

eigenvalue decomposition of the covariance matrix

C

z+

are obtained from an

and are disambiguated to

generate a repeatable local reference frame. The disambiguated

x-axis is dened

by

{ x+ , |Sx+ | ≥ |Sx− | x= , x− , otherwise

(53)

Sx+ = {i : di ≤ r and (pi − p) x+ ≥ 0}, Sx− = {i : di ≤ r and (pi − p) x− > 0}, + and x is the opposite vector of x . The disambiguated z -axis is generated using the same procedure, whereas the y -axis is obtained using z × x. The

where



neighboring points are then aligned with the local reference frame to achieve invariance with respect to rigid transformations. Next, an isotropic spherical grid is used to divide the spherical support of the keypoint into 8 azimuth divisions, 2 elevation divisions and 2 radial divisions. In total, there are 32 volumetric regions, as shown in Fig. 18. In each region, a local histogram is generated by accumulating the point counts into bins according to the variations between the normals at the points within the region and the normal at the keypoint. The local histograms of all the regions are concatenated and normalized to form the nal feature descriptor.

34

Figure 18: Signature of histograms of orientations (SHOT). Figure originally shown in Tombari et al. 2010 [122]

6.3.4 3D Geometric Features Lei et al. [124] represented each facial region (e.g., the nose, the eye-forehead, and the mouth areas) using multiple spatial triangles. One vertex of the triangle was selected as the nose tip, while the other two vertices were randomly selected from the local facial region, as shown in Fig. 19 (a). Four types of low-level geometric features were then generated from these triangles, as shown in Fig. 19 (b). The rst feature is dened as the angle

A between the two lines that are

determined by the two randomly selected vertices and the nose tip. The second feature is dened as the radius

C

of the circumscribed circle to the triangle.

The third feature is dened as the distance

D

of the line segment between the

N z -axis.

two randomly selected vertices, and the last feature is calculated as the angle between the line dened by the two randomly selected vertices and the

Once the four types of features have been generated, each is rst normalized to the range of

(−1, +1)

and aggregated into a histogram with dimension

m.

The

four histograms are concatenated to form the nal feature descriptor. Several other geometric attributes (including mask angles, pairwise geodesic distances, triangle areas, Gaussian curvatures and mean curvatures) were extracted in [125] to represent a 3D face.

(a)

(b)

Figure 19: An illustration of the 3D geometric features. (a) One of the triangles used for feature extraction. (b) Four types of low-level geometric features. Figure originally shown in Lei et al. 2013 [124]

6.3.5 Mesh Scale Invariant Feature Transform The mesh scale invariant feature transform (MeshSIFT) algorithm consists of four major components: keypoint detection, orientation assignment, local fea-

35

ture description, and feature matching [126]. First, a scale space is constructed by smoothing the input mesh

H

The mean curvature

S

with a set of approximated Gaussian lters.

is calculated at each vertex

i

and at each scale

s

in the

scale space, and the dierence between any two subsequent scales is computed. The extrema (i.e., the minima and maxima) in the scale space are determined as the locations of keypoints, and the scale

s at which the extremum is detected

is selected as the scale of the keypoint. Fig. 20 (a) shows detected keypoints on a 3D facial surface. The normal vectors of the neighboring vertices of the keypoint are then projected onto the tangent plane at the keypoint, as shown in Fig.

20 (b).

These projected normal vectors are gathered into a weighted

histogram with 360 bins, in which the highest peak in the histogram and those peaks with values above 80% of the highest peak are selected as canonical orientations. A feature descriptor is then generated in the form of histograms over nine circular subregions, as shown in Fig. 20 (c). In each subregion, two 8-bin histograms

HSI

dex values, and

and





are computed. Here,

HSI

is the histogram of shape in-

is the histogram of the slant angles (i.e., the angles between

the projected normal vectors and the canonical orientation).

The histograms

of all subregions are concatenated to form the nal feature descriptor. Finally, feature matching is performed by calculating the angles between each query feature descriptor and all the probe feature descriptors.

(a)

(b)

(c)

Figure 20: An illustration of the MeshSIFT algorithm. (a) Keypoint detection. (b) Orientation assignment.

(c) Local feature description.

Figure originally

shown in Smeets et al. 2013 [126]

6.3.6 Mesh Histogram of Oriented Gradients The mesh histogram of oriented gradients (MeshHOG) algorithm has been successfully used for 3D face recognition and 3D shape matching [127, 121]. Given a keypoint

r.

p,

its support region

N (p)

is dened using a geodesic ball of radius

A local reference frame is then built using the unit vector

the tangent plane

{a, a × n}

Ti

at the point

residing in the plane

boring points in

N (p)

Ti .

p

n

orthogonal to

and using a pair of orthonormal vectors

To determine the unit vector

are projected onto the plane

Ti .

a,

the neigh-

By considering the

gradient magnitudes and the geodesic distances from the point

p,

the projected

points in each bin are counted to generate a polar histogram. The vector

a

is

chosen as the direction associated with the dominant bin in the histogram. The gradient vectors of all neighboring points in

N (p)

are projected onto

the three planes associated with the local reference frame, as shown in Fig. 21(a). For each plane, a two-level histogram is computed. First, the plane is

36

divided into

bs

polar slices, as shown in Fig.

21(b) so that each neighboring

point will fall into one of the slices during projection. The space in each spatial

bo( orientation slices, as shown in Fig. 21(b). The ) ∇f pj of the points pj ∈ N (p) in the corresponding

slice is further divided into projected gradient vectors

spatial slice are used to determine the orientation slices. The nal descriptor is generated by concatenating all the histograms from the three orthonormal 2 planes and is further normalized using the L norm to make the descriptor invariant to mesh sampling.

(a)

(b)

Figure 21: An illustration of the MeshHOG algorithm. (a) The three orthogonal planes.

(b) The spatial polar slices and orientation slices on a plane.

Figure

originally shown in Zaharescu et al. 2012 [127]

6.3.7 Rotational Projection Statistics The rotational projection statistics (RoPS) feature has been successfully used in many tasks (including 3D modeling and 3D object recognition) with prominent

p and a support radius r, S , which contains Nt triangles and Nv vertices, is cropped from the facial mesh. Assuming that the i-th triangle contains vertices pi1 , pi2 and pi3 , the scatter matrix of the ith triangle is calculated as performance [128, 129, 130, 131]. Given a keypoint

the local surface

) 1 ∑∑( T 1 ∑ (p − p) (p − p)T . pij − p (pik − p) + ij 12 j=1 12 j=1 ij 3

Ci =

3

3

(54)

k=1

The overall scatter matrix of the local surface

C=

Nt ∑

S

is calculated as

ωi1 ωi2 Ci ,

(55)

i=1 where

|(p − pi1 ) × (pi3 − pi1 )| ωi1 = ∑Nt i2 , i=1 |(pi2 − pi1 ) × (pi3 − pi1 )| )2 ( pi1 + pi2 + pi3 ωi2 = r − p − . 3

37

(56)

(57)

Three eigenvectors

{v 1 , v 2 , v 3 }

are obtained by performing eigenvalue de-

composition on the overall scatter matrix

C.

The sign of each eigenvector is

determined to eliminate the sign ambiguity. The unambiguous vectors

f3 v

f1 v

and

are dened as

 3 ∑ ( ) 1 f1 = v 1 · sign  v ωi1 ωi2  pij − p v 1  , 6 i=1 j=1 



Nt ∑



(58)



 3 ∑ ( ) 1 f3 = v 3 · sign  v ωi1 ωi2  pij − p v 3  , 6 i=1 j=1 Nt ∑

(59)

sign (·) denotes the signum function, which extracts the sign of a real f2 is dened as v f3 × v f1 . Consequently, a unique and unambiguous v f1 , v f2 and v f3 . local reference frame is constructed for the keypoint p using v Assuming that the points on the local surface S constitute a pointcloud { } Q = q 1 , q 2 , . . . , q Nv , they can be transformed with respect to the local referwhere

number.

ence frame to achieve rotation invariance, resulting in a transformed pointcloud } { Q′ = q ′1 , q ′2 , . . . , q ′Nv . First, the pointcloud Q′ is rotated about the x-axis by ′ ′ an angle θk , resulting in a rotated pointcloud Q (θk ). This pointcloud Q (θk )

xy , xz f′ i (θk ) , i = 1, 2, 3. For Q

yz

is then projected onto three coordinate planes (i.e., the

and

to obtain three projected pointclouds

each projected

planes)

f′ i (θk ), a 2D bounding rectangle is obtained and subsequently diQ vided into L×L bins. The number of points falling into each bin is then counted to yield an L × L distribution matrix D. The information in D is further enpointcloud

coded using a vector that contains ve statistics (i.e., four central moments and a Shannon entropy). The three vectors from the

f x (θk ). Q′ about

then concatenated to form a sub-feature are obtained by rotating the pointcloud

xy , xz

and

yz

planes are

Three sets of sub-features the

x-, y -

and

z-

axes by a

set of angles. The overall feature descriptor is generated by concatenating the sub-features of all of the rotations.

6.4 Other Features Other patch-based features include the geodesic polar representation [132], multiring geometric histograms [121, 133], and TriSI [134, 135]. The geodesic polar representation is robust to isometric deformations and is therefore appropriate for expression-invariant 3D face recognition [132].

The multi-ring geometric

histograms are robust to translations and rotations and do not require the computation of a reference frame [121, 133]. The TriSI feature is highly descriptive and is robust to noise, varying mesh resolutions, occlusion, and clutter [134, 135]. Moreover, 2D features can also be extracted from the 2D images which are derived from 3D facial meshes, e.g., Gabor lter coecients [136, 137, 89], wavelet coecients [30, 138], Haar-like features [89] and LBP features [89].

7 Feature Selection and Fusion Feature selection is used to nd the subset of the most discriminative features that leads to the best recognition performance [139, 140].

38

Feature fusion is

used to improve the face recognition performance by integrating information from dierent features. Feature selection and fusion are commonly used in the following situations: 1) Multisensor fusion: features calculated from dierent sensor modalities (e.g., 2D facial images and 3D facial surfaces [9]) are used for face recognition. 2) Integration of multiple data models: either 2D or 3D facial data are modeled using dierent approaches. The features extracted using dierent approaches can be pooled to improve the face recognition performance [89]. 3) Multiregion Fusion: features extracted from dierent regions of a face are selected and fused to boost the recognition performance [9, 124, 141].

7.1 Feature Selection When several features are available for the description of a facial image, feature selection can be applied to select an optimal subset of discriminative features [34]. A number of feature selection algorithms has been proposed in the literature, with several representative examples described below.

7.1.1 Genetic Algorithm In a genetic algorithm, a given feature subset is represented as a binary string (i.e., a chromosome) of length

Nf .

A zero or one in the

chromosome denotes the absence or presence of the

i-th

i-th

position of the

feature in the subset.

Each chromosome is then evaluated to determine how likely it is to survive and breed into the next generation. New chromosomes can then be generated from old chromosomes by crossover or mutation.

Crossover denotes the operation

in which parts of two dierent parent chromosomes are mixed to generate osprings, and mutation denotes the operation in which bits of a single parent are randomly perturbed to breed a child [142, 143, 139].

7.1.2 Relief-F Relief-F selects features by estimating the contribution of these features according to their ability to distinguish between the nearest instances [34]. First, a weight vector

W (f )

of each feature

T.

f

is initialized with 0, and the maximum

Ri is selected, its k nearest Hj , j = 1, 2, . . . , k from the same class as Ri are calculated, and its k nearest neighbors Mj (c) , j = 1, 2, . . . , k from each class for c ̸= class(Ri ) are computed. The weight W (f ) for each feature f is updated as iteration number is dened as

Next, an instance

neighbors

∑ (

∑k W (f ) = W (f ) −

j=1 d (f, Ri , Hj )

Tk K=

+

c

K

∑k

p (c) , 1 − p (class (Ri ))

) d (f, Ri , Mj (c)) Tk

,

(60)

(61)

is the distance between the two instances I1 and I2 with f . p (c) is the prior probability of class c, and 1−p (class (Ri )) is the sum of probabilities for all classes c ̸= class(Ri ). The above process is repeated T times and results in the nal weight vector W for all features. where

d (f, I1 , I2 )

j=1

respect to feature

39

7.1.3 Max Dependency In an unsupervised case where the classiers are not specied, achieving a minimal classication error usually requires the maximal statistical dependency of the target class

c

on the data distribution in the subspace of selected features

[144]. This scheme is therefore called the maximal dependency feature selection scheme. A widely used approach to realize maximal dependency is the maximal relevance feature selection algorithm. Relevance is commonly characterized in terms of mutual information. Given

Nf

features

task of feature selection is to nd a feature subset hibit a maximum dependency on the target class criterion selects features satisfying

{ } F = f1 , f2 , . . . , fNf , the S with m features that ex-

c.

max (D (S, c)),

The maximal dependency

where

D = I ({fi , i = 1, 2, . . . , m} ; c) .

(62)

m equals 1, the selected feature is the one that maximizes I (fj ; c), I (fj ; c) is the mutual information between the feature fj and the class c. When m is larger than 1, an incremental search approach can be used to add one feature at a time. Assuming that we already have the set Sm−1 with m − 1 features, the m-th feature is selected as the feature that achieves the largest increase in I (S; c), where ˆ ˆ p (Sm , c) I (Sm ; c) = p (Sm , c) log dSm dc. (63) p (Sm ) p (c) When

where

7.1.4 Minimal Redundancy Maximal Relevance

{ } f1 , f2 , . . . , fNf and the target class c, the feature selection task is to nd a subset S with m features from the Nf -dimensional observation space that optimally characterizes c [144]. The Minimal-Redundancy-

Given

Nf

features

F =

Maximal-Relevance (mRMR) algorithm selects a subset of features by considering both the ability of features to identify the classication label, and the

f1 and f2 , their I (f1 , f2 ) is dened in terms of their probabilistic density p (f1 ), p (f2 ) and p (f1 , f2 ) as follows: ˆ ˆ p (f1 , f2 ) I (f1 , f2 ) = p (f1 , f2 ) log df1 df2 . (64) p (f1 ) p (f2 )

redundancy amongst features [144, 34, 133]. Given two features mutual information functions

The Maximal Relevance criterion is to select features satisfying

max (D (S, c)),

where

D=

1 ∑ I (fi ; c) . |S| fi ∈S

To reduce redundancy, the Minimal Redundancy condition is used to select mutually exclusive features that satisfy

R=



1 |S|

min R (S),

2

where

I (fi , fj ) .

(65)

fi ,fj ∈S

The mRMR criterion is proposed to combine the above two constraints. The operator

Φ (D, R) is dened as D − R, and the features are selected to maximize 40

the value of

Φ (D, R).

In practice, incremental search methods are usually used

Φ. Assuming that we already have Sm−1 with m − 1 features, the feature selection task is performed by selecting the m-th feature from the set {F − Sm−1 } that maximizes Φ as follows: to select the near-optimal features dened by

Φ = I (fj , c) −

1 m−1



I (fj , fi ) .

(66)

fi ∈Sm−1

7.1.5 Stepwise Linear Discriminant Analysis The stepwise linear discriminant analysis (SLDA) algorithm [96] selects a subset of the most discriminative features by maximizing a statistical discrimination criterion [83]. A commonly used criterion is the Wilks' lambda criterion, which is dened as the ratio of the within-group sum of squares to the total sum of squares. The SLDA algorithm begins with an empty subset of features, and at each step, a feature is either added to or removed from the subset. A feature in the subset is removed if it does not signicantly decrease the discriminative power of the set of selected features according to the statistical discrimination criterion.

If there is no feature removal in the step, then feature addition is

considered in which a feature that is not in the subset but that can signicantly improve the discrimination power of the selected features is added to the subset. This above procedure for feature addition and removal is repeated until no feature can be added or removed.

Similar approaches include the sequential

forward selection [142] and sequential backward selection algorithms [139, 140]. Other feature selection algorithms include exhaustive search [140], Wrappers [145], best individual features [139], simulated annealing [142, 143], and random forests [34].

7.2 Feature Fusion Feature fusion is required when more than one feature is used for face recognition. Fusion can be performed at the sensor level, feature level, score level, rank level, or decision level.

7.2.1 Sensor-level Fusion The raw data acquired using multiple sensors are processed and integrated to generate new data that can then be used for feature extraction. For example, both 2D texture information and 3D geometric information can be fused to obtain a textured 3D image of the face that can subsequently be used for feature extraction and matching.

7.2.2 Feature-level Fusion Dierent approaches have fused features at the feature-level to generate a new feature to represent a subject. For example, geometric features generated from dierent regions of a face (e.g., the nose or forehead) can be concatenated to form the overall feature of the face [124]. The features of the 2D image and the 3D pointcloud of a face can also be concatenated to generate the fused feature.

41

7.2.3 Score-level Fusion For the score-level fusion, each individual feature obtain a match score

si

fi

is fed to the classier to

between two facial images. Several rules can be used

to fuse scores obtained by dierent features. The sum rule calculates the nal ∑Nf fusion score s based on the formula s = i=1 wi si , where Nf is the number of dierent features and wi is the weight assigned to the feature fi . These weights can be obtained during a training phase or they can be assigned equally. ∏Nf Similarly, the product rule calculate the nal fusion score s using s = i=1 wi si . The minimum rule uses the smallest score of all the individual scores as the Nf nal fusion score, that is, s = mini=1 (si ). The minimum rule method is highly dependent on data normalization because features with smaller overall matching scores dominate the nal score [146].

7.2.4 Rank-level Fusion At the rank-level fusion, the feature the features

fi

fi

of a probe face is matched against all

of the gallery faces, resulting in a set of similarity scores that are

rij be the rank assigned to subject j in i-th feature fi , i = 1, 2, . . . , Nf and j = 1, 2, . . . , No . The subject j can be calculated using dierent fusion rules.

then ranked in descending order. Let the gallery using the nal rank

rj

for

Consensus Voting

The consensus voting method returns a vote for the clos-

est match in the gallery for each feature. The subject with the highest number of votes is declared to be the best match [146]. Consensus voting can be further improved using condence scores so that if there are ties, the subject with the highest total condence is selected as the best match.

The Highest Rank Fusion

The fused rank of a subject is calculated as the

lowest rank of dierent features:

Nf

rj = min (rij ) . i=1

(67)

The probe image with the lowest fused rank is declared a match. To further break ties between subjects, the fusion rule is usually modied as

Nf

rj = min (rij ) + ϵj . i=1

(68)

where

∑Nf

i=1 rij

ϵj = with

K

being selected to ensure that

Borda Count Rank Fusion

K ϵj

,

(69)

is small.

The fused rank of a subject is calculated as the

sum of the ranks obtained by dierent features as follows:

rj =

Nf ∑ i=1

42

(rij ) .

(70)

The probe image with the lowest fused rank is declared a match. Compared to the highest rank fusion method, the Borda count rank fusion method takes into account the variability in ranks of all features. However, this method assumes that the features are statistically independent and that all of them perform well. The fused rank therefore achieves the average performance of all features. This makes the Borda count rank fusion method highly aected by weak features. To overcome this drawback, the Nanson function [147] can be used. One approach is to rst eliminate the weakest rank and to aubsequently calculate the regular Borda count on the remaining ranks. Another approach is to eliminate all ranks whose corresponding similarity scores are below a threshold.

7.2.5 Decision-level Fusion For a probe face, a class label (i.e., accept or reject in the verication scenario or the identity of a subject in the identication scenario) is produced by each feature.

The nal class label can be obtained using dierent rules, including

majority voting and behavior knowledge space [148].

8 Summary 2D and 3D Face recognition is a popular research topic in the area of computer vision and pattern recognition.

Within any face recognition system, feature

extraction and selection play a critically important role. In this article, some background related to face recognition algorithms is rst presented, followed by several representative examples of local and holistic 2D/3D features.

Fi-

nally, multiple feature selection and fusion techniques are briey discussed for completeness. Although numerous features are already available in the literature, future work in this area is still expected. First, as 3D scanners are becoming cheaper, reliable and user-friendly, 3D features will be used more frequently due to their robustness to e.g., lighting variations. Second, with the boosting of deep learning techniques, high-level features learned using deep neural networks are expected to become even more popular. Third, binary features for 3D data are expected as they are highly compact and computationally ecient.

Acknowledgment Mohammed Bennamoun and Yulan Guo contributed equally to this work and are considered co-rst authors.

References [1] M. Daoudi, A. Srivastava, and R. Veltkamp.

and Recognition.

3D Face Modeling, Analysis

John Wiley & Sons, 2013.

[2] R. Chellappa, C. L. Wilson, and S. Sirohey. Human and machine recognition of faces: A survey.

Proceedings of the IEEE, 83(5):705741, 1995.

43

[3] A.S. Mian and N. Pears. 3D face recognition. In

and Applications, pages 311366. Springer, 2012.

3D Imaging, Analysis

[4] Y. Guo, J. Wan, M. Lu, and W. Niu. A parts-based method for articulated

Optik - International Journal for Light and Electron Optics, 124(17):27272733, 2013. target recognition in laser radar data.

[5] Y. Guo, F. Sohel, M. Bennamoun, J. Wan, and M. Lu.

RoPS: A lo-

cal feature descriptor for 3D rigid objects based on rotational projection

In 1st International Conference on Communications, Signal Processing, and their Applications, pages 16, 2013. statistics.

[6] A. K. Jain and C. Dorai. matching.

3D object recognition:

Representation and

Statistics and Computing, 10(2):167182, 2000.

[7] S.J. Owen. A survey of unstructured mesh generation technology. In

International Meshing Roundtable, volume 3, 1998.

[8] A.S. Mian, M. Bennamoun, and R. Owens.

7th

Three-dimensional model-

IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1584 based object recognition and segmentation in cluttered scenes. 1601, 2006.

[9] A.S. Mian, M. Bennamoun, and R. Owens. An ecient multimodal 2D-

IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):19271943, 2007. 3D hybrid approach to automatic face recognition.

[10] N. Erdogmus and J. Dugelay. 3D assisted face recognition: Dealing with expression variations. 2014. [11] R. Ksantini, B. Boufama, D. Ziou, and B. Colin. gistic discriminant model:

A novel bayesian lo-

An application to face recognition.

Recognition, 43(4):14211430, 2010.

Pattern

[12] E. Boyer, A.M. Bronstein, M.M. Bronstein, et al. SHREC 2011: Robust feature detection and description benchmark. In

on Shape Retrieval, pages 7986, 2011.

Eurographics Workshop

[13] W. Zhao, R. Chellappa, P.J. Phillips, and A. Rosenfeld. Face recognition: A literature survey.

Acm Computing Surveys, 35(4):399458, 2003.

[14] D. Smeets, P. Claes, D. Vandermeulen, and J. G. Clement. Objective 3D face recognition: Evolution, approaches and challenges.

international, 201(1):125132, 2010.

[15] K. W. Bowyer, K. Chang, and P. Flynn.

Forensic science

A survey of approaches and

challenges in 3D and multi-modal 3D+2D face recognition.

vision and image understanding, 101(1):115, 2006.

Computer

[16] Y. Guo, J. Zhang, M. Lu, J. Wan, and Y. Ma. Benchmark datasets for 3D

The 9th IEEE Conference on Industrial Electronics and Applications, 2014.

computer vision. In

[17] G. Fanelli, M. Dantone, J. Gall, A. Fossati, and L. Van Gool. Random forests for real time 3d face analysis.

44

[18] R. Ksantini, B. S. Boufama, and I. S. Ahmad. A new KSVM+KFD model for improved classication and face recognition.

Journal of Multimedia,

6(1):3947, 2011. [19] I. Naseem, R. Togneri, and M. Bennamoun. recognition.

Linear regression for face

IEEE Transactions on Pattern Analysis and Machine Intel-

ligence, 32(11):21062112, 2010.

[20] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan. 3D object recogA survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11):2270 nition in cluttered scenes with local surface features: 2287, 2014. [21] John Wright, Allen Y Yang, Arvind Ganesh, Shankar S Sastry, and Yi Ma.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):210227, 2009.

Robust face recognition via sparse representation.

[22] Andrew Wagner, John Wright, Arvind Ganesh, Zihan Zhou, Hossein Mobahi, and Yi Ma. Toward a practical face recognition system: Robust

IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2):372386, 2012.

alignment and illumination by sparse representation.

[23] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan. Sparse representation for computer vision and pattern recognition.

of the IEEE, 98(6):10311044, 2010.

[24] J. Kittler, A. Hilton, M. Hamouz, and J. Illingworth.

Proceedings

3D assisted face

recognition: A survey of 3D imaging, modelling and recognition approachIn IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 114114, 2005. est.

[25] S. Gupta, M. K. Markey, and A. C. Bovik. Advances and challenges in 3D and 2D+ 3D human face recognition.

Pattern recognition in biology,

pages 63103, 2007. [26] C Hernandez Esteban and Francis Schmitt.

Multi-stereo 3d object re-

First International Symposium on 3D Data Processing Visualization and Transmission, pages 159166, 2002.

construction.

In

[27] H. Mohammadzade and D. Hatzinakos. Iterative closest normal point for

IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2):381397, 2013. 3D face recognition.

[28] S. Fleishman, I. Drori, and D. Cohen-Or.

Bilateral mesh denoising.

ACM Transactions on Graphics, volume 22, pages 950953, 2003.

[29] L. Spreeuwers. Fast and accurate 3D face recognition.

nal of computer vision, 93(3):389414, 2011.

In

International jour-

[30] G. Passalis, P. Perakis, T. Theoharis, and I. A. Kakadiaris.

Using fa-

cial symmetry to handle pose variations in real-world 3D face recognition.

IEEE Transactions on Pattern Analysis and Machine Intelligence,

33(10):19381951, 2011.

45

[31] P. Viola and M. J. Jones. Robust real-time face detection.

journal of computer vision, 57(2):137154, 2004.

International

[32] C. Zhang and Z. Zhang. A survey of recent advances in face detection. Technical report, Technique report, Microsoft Research, 2010. [33] X. Zou, J. Kittler, and K. Messer. Illumination invariant face recognition:

First IEEE International Conference on Biometrics: Theory, Applications, and Systems, pages 18, 2007. A survey. In

[34] P. Liu, Y. Wang, D. Huang, Z. Zhang, and L. Chen. Learning the spherical harmonic features for 3-D face recognition.

Processing, 22(3):914925, 2013.

IEEE Transactions on Image

[35] A. F. Abate, M. Nappi, D. Riccio, and G. Sabatino. recognition:

A survey.

Pattern Recognition Letters,

2D and 3D face 28(14):18851906,

2007. [36] D. Smeets, P. Claes, J. Hermans, D. Vandermeulen, and P. Suetens. A comparative study of 3-D face recognition under expression variations.

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 42(5):710727, 2012. [37] F. Al-Osaimi, M. Bennamoun, and A. Mian. An expression deformation approach to non-rigid 3d face recognition.

puter Vision, 81(3):302316, 2009.

[38] Alexander

M

Bronstein,

Michael

M

International Journal of Com-

Bronstein,

Expression-invariant 3d face recognition. In

and

Ron

Kimmel.

Audio-and Video-Based Bio-

metric Person Authentication, pages 6270. Springer, 2003.

[39] M. Kirby and L. Sirovich. Application of the Karhunen-Loeve procedure

IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1):103108, 1990. for the characterization of human faces.

[40] M. Turk and A. Pentland. Eigenfaces for recognition.

neuroscience, 3(1):7186, 1991.

[41] X. He, S. Yan, Y. Hu, P. Niyogi, and H. Zhang. ing Laplacianfaces.

Journal of cognitive

Face recognition us-

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 27(3):328340, 2005.

[42] P. N. Belhumeur, J. P. Hespanha, and D. Kriegman. Eigenfaces vs. sh-

IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711720, 1997.

erfaces: Recognition using class specic linear projection.

[43] J. Kim, J. Choi, J. Yi, and M. Turk. Eective representation using ica

IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12):1977 for face recognition robust to local distortion and partial occlusion. 1981, 2005.

[44] F. Sohel, M. Bennamoun, and M. Hahn.

Spatial shape error conceal-

Industrial Electronics and Applications (ICIEA), 2011 6th IEEE Conference on, pages 265270. IEEE, 2011. ment utilising image texture. In

46

[45] S. G. Kong, J. Heo, B. R. Abidi, J. Paik, and M. A. Abidi. Recent advances in visual and infrared face recognition  a review.

Image Understanding, 97(1):103135, 2005.

Computer Vision and

[46] A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution.

Neural computation, 7(6):1129

1159, 1995. [47] A. J. Bell and T. J. Sejnowski. The independent components of natural scenes are edge lters.

Vision research, 37(23):33273338, 1997.

[48] Z. M. Hafed and M. D. Levine. sine transform.

Face recognition using the discrete co-

International Journal of Computer Vision, 43(3):167188,

2001. [49] W. Chen, M. J. Er, and S. Wu.

Illumination compensation and nor-

malization for robust face recognition using discrete cosine transform in

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 36(2):458466, 2006.

logarithm domain.

[50] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. In

IEEE Conference on Computer Vision and Pattern Recognition,

volume 2, pages II257, 2003. [51] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors.

IEEE Transactions on Pattern Analysis and Machine Intelligence,

27(10):16151630, 2005. [52] L. Shen and L. Bai.

A review on gabor wavelets for face recognition.

Pattern analysis and applications, 9(2-3):273292, 2006.

[53] M. Lades, J. C. Vorbruggen, J. Buhmann, J. Lange, C. von der Malsburg, R. P. Wurtz, and W. Konen. Distortion invariant object recognition in the dynamic link architecture.

IEEE Transactions on Computers, 42(3):300

311, 1993. [54] L. Wiskott, J.-M. Fellous, N. Kuiger, and C. Von Der Malsburg. recognition by elastic bunch graph matching.

Face

IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, 19(7):775779, 1997.

[55] A. Albiol, D. Monzo, A. Martin, J. Sastre, and A. Albiol. Face recognition using HOGEBGM.

Pattern Recognition Letters, 29(10):15371543, 2008.

[56] Y. Wang, C.-S. Chua, and Y.-K. Ho. face recognition from 2D and 3D images.

Facial feature detection and

Pattern Recognition Letters,

23(10):11911202, 2002. [57] D.G. Lowe.

Distinctive image features from scale-invariant keypoints.

International Journal of Computer Vision, 60(2):91110, 2004.

[58] J.J. Koenderink.

The structure of images.

50(5):363370, 1984.

47

Biological cybernetics,

[59] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In

IEEE Conference on Computer Vision and Pattern Recognition,

volume 1, pages 886893, 2005. [60] O. Déniz, G. Bueno, J. Salido, and F. De la Torre. tion using histograms of oriented gradients.

Face recogni-

Pattern Recognition Letters,

32(12):15981603, 2011. [61] H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded up robust features. In

9th European Conference on Computer Vision,

pages 404417,

2006. [62] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. features (SURF).

Speeded-up robust

Computer vision and image understanding, 110(3):346

359, 2008. [63] P. Dreuw, P. Steingrube, H. Hanselmann, H. Ney, and G. Aachen. SURFFace: Face recognition under viewpoint consistency constraints. In

Machine Vision Conference, pages 111, 2009.

British

6th International Symposium on Multispectral Image Processing and Pattern Recognition, pages 749628749628, 2009.

[64] G. Du, F. Su, and A. Cai. Face recognition using SURF features. In

[65] T. Ahonen, A. Hadid, and M. Pietikäinen.

Face description with local

IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12):20372041, 2006. binary patterns: Application to face recognition.

[66] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classication with local binary patterns.

Transactions on Pattern Analysis and Machine Intelligence,

IEEE

24(7):971

987, 2002. [67] L. Liu and P. W. Fieguth. tures.

Texture classication from random fea-

IEEE Transactions on Pattern Analysis and Machine Intelligence,

34(3):574586, 2012. [68] L. Liu, Y. Long, P. Fieguth, S. Lao, and G. Zhao. Brint: Binary rotation invariant and noise tolerant texture classication.

Image Processing, 2013.

[69] T. Ahonen, A. Hadid, and M. Pietikäinen. binary patterns. In

IEEE Transactions on

Face recognition with local

European Conference on Computer Vision, pages 469

481. Springer, 2004. [70] X. Lu, A. K. Jain, and D. Colbry. Matching 2.5D face scans to 3D models.

IEEE Transactions on Pattern Analysis and Machine Intelligence,

28(1):3143, 2006. [71] P. J. Besl and N. D. McKay. A method for registration of 3-D shapes.

Transactions on Pattern Analysis and Machine Intelligence, 256, 1992.

48

IEEE

14(2):239

[72] K. I. Chang, K. W. Bowyer, and P. J. Flynn. Adaptive rigid multi-region selection for handling expression variation in 3D face recognition.

In

IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 157157, 2005. [73] K. I. Chang, W. Bowyer, and P. J. Flynn. Multiple nose region matching

IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):16951700, 2006.

for 3D face recognition under varying facial expression.

[74] N.

Mavridis,

M. Strintzis.

F.

Tsalakanidou,

D.

Pantazis,

S.

Malassiotis,

The HISCORE face recognition application:

and

Aordable

In International Conference on Augmented Virtual Environments and 3D Imaging, pages desktop face recognition based on a novel 3D camera. 157160, 2001.

[75] K. Chang, K. Bowyer, and P. Flynn. Face recognition using 2D and 3D facial data. In

ACM Workshop on Multimodal User Authentication, pages

2532, 2003. [76] C. Xu, Y. Wang, T. Tan, and L. Quan. A new attempt to face recognition using 3D eigenfaces. In

The 6th Asian Conference on Computer Vision,

volume 2, pages 884889, 2004. [77] F. Tsalakanidou, D. Tzovaras, and M. G. Strintzis. Use of depth and colour eigenfaces for face recognition.

Pattern Recognition Letters,

24(9):1427

1435, 2003. [78] F. Al-Osaimi, M. Bennamoun, and A. Mian. Integration of local and global geometrical cues for 3d face recognition.

Pattern Recognition, 41(3):1030

1040, 2008. [79] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Homan, J. Marques, J. Min, and W. Worek. Overview of the face recogni-

IEEE conference on computer vision and pattern recognition, volume 1, pages 947954, 2005. tion grand challenge. In

[80] T. Russ, C. Boehnen, and T. Peters. 3D face recognition using 3D align-

In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 13911398, 2006. ment for PCA.

[81] Thomas Heseltine, Nick Pears, and Jim Austin. Three-dimensional face recognition: A shersurface approach. In

Image Analysis and Recognition,

pages 684691. Springer, 2004. [82] C. BenAbdelkader and P. A. Grin. Comparing and combining depth and texture cues for face recognition.

Image and Vision Computing, 23(3):339

352, 2005. [83] S. Gupta, M. K. Markey, and A. C. Bovik. recognition.

Anthropometric 3D face

International Journal of Computer Vision,

90(3):331349,

2010. [84] C. Hesher, A. Srivastava, and G. Erlebacher. A novel technique for face

7th International Symposium on Signal Processing and Its Applications, volume 2, pages 201204, 2003.

recognition using range imaging. In

49

[85] H.-S. Wong, K. Cheung, and H. Ip.

3D head model classication by

evolutionary optimization of the Extended Gaussian Image representation.

Pattern Recognition, 37(12):23072322, 2004.

[86] C. Sun and J. Sherrah. 3D symmetry detection using the extended Gaus-

IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):164168, 1997. sian image.

[87] A. M. Bronstein, M. M. Bronstein, and R. Kimmel. Three-dimensional face recognition.

International Journal of Computer Vision,

64(1):530,

2005. [88] A. M.

Bronstein,

M. M.

Bronstein,

invariant representations of faces.

ing, 16(1):188197, 2007.

and

R.

Kimmel.

Expression-

IEEE Transactions on Image Process-

[89] Y. Wang, J. Liu, and X. Tang. Robust 3D face recognition by local shape

IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(10):18581870, 2010.

dierence boosting.

[90] Roser Sala Llonch, Erosyni Kokiopoulou, I To²i¢, and Pascal Frossard. 3D face recognition with sparse spherical representations.

nition, 43(3):824834, 2010.

Pattern Recog-

[91] P. J. Besl and R. C. Jain. Segmentation through variable-order surface tting.

IEEE Transactions on Pattern Analysis and Machine Intelligence,

10(2):167192, 1988. [92] A. Colombo, C. Cusano, and R. Schettini. 3D face detection using curvature analysis.

Pattern recognition, 39(3):444455, 2006.

[93] L. Zhang, A. Razdan, G. Farin, J. Femiani, M. Bae, and C. Lockwood. 3D face authentication and recognition based on bilateral symmetry analysis.

The Visual Computer, 22(1):4355, 2006.

[94] M. L. Koudelka, M. W. Koch, and T. D. Russ.

A prescreener for 3d

face recognition using radial symmetry and the Hausdor fraction.

In

IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 168168, 2005. [95] E. W. Dijkstra. A note on two problems in connexion with graphs.

merische mathematik, 1(1):269271, 1959.

[96] S. Sharma.

Applied multivariate techniques.

Nu-

John Wiley & Sons, Inc.,

1995. [97] G. Gordon. Face recognition based on depth and curvature features. In

IEEE Conference on Computer Vision and Pattern Recognition,

pages

808810, 1992. [98] A. Moreno, A. Sánchez, J. F. Vélez, and F. J. Díaz. using 3D surface-extracted descriptors. In

Processing Conference, volume 2003, 2003.

50

Face recognition

Irish Machine Vision and Image

[99] C. Xu, Y. Wang, T. Tan, and L. Quan. Automatic 3D face recognition combining global geometric features with local shape variation informa-

6th IEEE international conference on Automatic face and gesture recognition, pages 308313, 2004. tion. In

[100] C. Samir, A. Srivastava, and M. Daoudi. Three-dimensional face recogni-

IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11):18581863, 2006. tion using shapes of facial curves.

[101] S. Jahanbin, H. Choi, Y. Liu, and A. C. Bovik. Three dimensional face

2nd IEEE International Conference on Biometrics: Theory, Applications and Systems, recognition using iso-geodesic and iso-depth curves. In pages 16, 2008.

[102] C. Samir, A. Srivastava, M. Daoudi, and E. Klassen. An intrinsic frame-

International Journal of Computer

work for analysis of facial surfaces.

Vision, 82(1):8095, 2009.

[103] T. Nagamine, T. Uemura, and I. Masuda. human identication.

In

3d facial image analysis for

IAPR International Conference on Computer

Vision and Application, pages 324327, 1992.

[104] S. Berretti, A. Del Bimbo, and P. Pala. Description and retrieval of 3D face

the 8th ACM international workshop on Multimedia information retrieval, pages 1322, 2006. models using iso-geodesic stripes. In

[105] S. Berretti, A. Del Bimbo, and P. Pala. geodesic stripes.

3D face recognition using iso-

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 32(12):21622177, 2010.

[106] A. Srivastava, C. Samir, S. H. Joshi, and M. Daoudi. Elastic shape models for face analysis using curvilinear coordinates.

Imaging and Vision, 33(2):253265, 2009.

[107] S. Feng, H. Krim, and I. Kogan. integral invariants signature. In

Journal of Mathematical

3D face recognition using euclidean

IEEE/SP 14th Workshop on Statistical

Signal Processing, pages 156160, 2007.

[108] L. Li, C. Xu, W. Tang, and C. Zhong. 3D face recognition by constructing deformation invariant image.

Pattern Recognition Letters,

29(10):1596

1602, 2008. [109] C. Beumier and M. Acheroy. Automatic 3D face authentication.

and Vision Computing, 18(4):315321, 2000.

[110] C. Beumier and M. Acheroy. clues.

Image

Face verication from 3D and grey level

Pattern recognition letters, 22(12):13211329, 2001.

[111] J.-Y. Cartoux, J.-T. LaPresté, and M. Richetin.

Face authentication

or recognition by prole extraction from range images. In

Interpretation of 3D Scenes, pages 194199, 1989.

Workshop on

[112] Y. Wu, G. Pan, and Z. Wu. Face authentication based on multiple proles

Audio-and Video-Based Biometric Person Authentication, pages 515522. Springer, 2003.

extracted from range data. In

51

[113] G. Pan, Y. Wu, Z. Wu, and W. Liu. 3d face recognition by prole and surface matching. In

International Joint Conference on Neural Networks,

volume 3, pages 21692174, 2003. [114] Y. Lei, M. Bennamoun, M. Hayat, and Y. Guo. An ecient 3D face recognition approach using local geometrical signatures.

Pattern Recognition,

47(2):509524, 2014. [115] Hassen Drira, Boulbaba Ben Amor, Anuj Srivastava, Mohamed Daoudi, and Rim Slama. pose variations.

3D face recognition under expressions, occlusions and

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 2013.

[116] F. Tombari, S. Salti, and L. Di Stefano. Performance evaluation of 3D keypoint detectors.

International Journal of Computer Vision,

102(1):198

220, 2013. [117] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and J. Zhang. Performance evaluation of 3D local feature descriptors. In

Conference on Computer Vision, 2014.

The 12th Asian

[118] C. S. Chua and R. Jarvis. Point signatures: A new representation for 3D object recognition.

International Journal of Computer Vision,

25(1):63

85, 1997. [119] C. S. Chua, F. Han, and Y. K. Ho. 3D human face recognition using point

4th IEEE International Conference on Automatic Face and Gesture Recognition, pages 233238, 2000. signature. In

[120] A.S. Mian, M. Bennamoun, and R. Owens. Keypoint detection and local feature matching for textured 3D face recognition.

of Computer Vision, 79(1):112, 2008.

International Journal

[121] S. Berretti, N. Werghi, A. del Bimbo, and P. Pala.

Matching 3D face

scans using interest points and local histogram descriptors.

Graphics, 2013.

Computers &

[122] F. Tombari, S. Salti, and L. Di Stefano. Unique signatures of histograms for local surface description. In

European Conference on Computer Vision,

pages 356369. Springer, 2010. [123] S. Salti, F. Tombari, and L. D. Stefano.

SHOT: Unique signatures of

histograms for surface and texture description.

Image Understanding, In press, 2014.

Computer Vision and

[124] Y. Lei, M. Bennamoun, and A.A. El-Sallam. An ecient 3D face recognition approach based on the fusion of novel local low-level features.

Recognition, 46(1):2437, 2013. [125] X. Li and H. Zhang.

Pattern

Adapting geometric attributes for expression-

invariant 3D face recognition.

In

2132, 2007.

52

Shape Modeling International,

pages

[126] D. Smeets, J. Keustermans, D. Vandermeulen, and P. Suetens. meshSIFT: Local surface features for 3D face recognition under expression variations and partial data.

Computer Vision and Image Understanding, 117(2):158

169, 2013. [127] A. Zaharescu, E. Boyer, and R. Horaud. Keypoints and local descriptors of scalar functions on 2D manifolds.

Vision, 100:7898, 2012.

International Journal of Computer

[128] Y. Guo, F. Sohel, M. Bennamoun, M. Lu, and J. Wan. Rotational projection statistics for 3D local surface description and object recognition.

International Journal of Computer Vision, 105(1):6386, 2013.

[129] Y. Guo, F. Sohel, M. Bennamoun, J. Wan, and M. Lu. An accurate and robust range image registration algorithm for 3D object modeling.

Transactions on Multimedia, 16(5):13771390, 2014.

IEEE

[130] Y. Guo, M. Bennamoun, F. Sohel, J. Wan, and M. Lu. 3D free form object

IEEE 14th Workshop on the Applications of Computer Vision, pages 18, 2013. recognition using rotational projection statistics. In

[131] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan.

An integrated

framework for 3D modeling, object detection and pose estimation from

IEEE Transactions on Instrumentation and Measurement, in press, 2014. point-clouds.

[132] I. Mpiperis, S. Malassiotis, and M. G. Strintzis. 3-D face recognition with

IEEE Transactions on Information Forensics and Security, 2(3):537547, 2007. the geodesic polar representation.

[133] S. Berretti, N. Werghi, A. del Bimbo, and P. Pala. Selecting stable keypoints and local descriptors for person identication using 3D face scans.

The Visual Computer, pages 118, 2014.

[134] Y. Guo, F. Sohel, M. Bennamoun, M. Lu, and J. Wan. TriSI: A distinctive In 8th International Conference on Computer Graphics Theory and Applications, local surface descriptor for 3D modeling and object recognition. pages 8693, 2013. [135] Y. Guo, F. Sohel, M. Bennamoun, J. Wan, and M. Lu.

A novel local

surface feature for 3D object recognition under clutter and occlusion.

formation Sciences, 293(2):196213, 2015.

In-

[136] Y. Wang and C.-S. Chua. Face recognition from 2D and 3D images using 3D Gabor lters. [137] I. A. Kakadiaris,

Image and Vision Computing, 23(11):10181028, 2005. G. Passalis,

G. Toderici,

N. Karampatziakis, and T. Theoharis.

M. N. Murtuza,

Y. Lu,

Three-dimensional face recogni-

tion in the presence of facial expressions: An annotated deformable model

IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(4):640649, 2007.

approach.

53

[138] Omar Ocegueda, Tianhong Fang, Shishir K Shah, and Ioannis A Kakadiaris.

3d

marginals.

face

discriminant

analysis

using

gauss-markov

posterior

IEEE Transactions on Pattern Analysis and Machine Intel-

ligence, 35(3):728739, 2013. [139] A. K. Jain and D. Zongker.

Feature selection: Evaluation, application,

IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):153158, 1997.

and small sample performance.

[140] A. K Jain, R. P. W. Duin, and J. Mao. Statistical pattern recognition: A review.

IEEE Transactions on Pattern Analysis and Machine Intelligence,

22(1):437, 2000. [141] Chaua C Queirolo, Luciano Silva, Olga RP Bellon, and Mauricio Pamplona Segundo.

3D face recognition using simulated annealing and the

IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):206219, 2010. surface interpenetration measure.

[142] B. Taati and M. Greenspan. Local shape descriptor selection for object recognition in range data.

Computer Vision and Image Understanding,

115(5):681694, 2011. [143] L. Davis. Genetic algorithms and simulated annealing. 1987. [144] H. Peng,

F. Long,

and C. Ding.

Feature selection based on mu-

tual information criteria of max-dependency, max-relevance, and min-

IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):12261238, 2005. redundancy.

[145] R. Kohavi and G. H. John. Wrappers for feature subset selection.

intelligence, 97(1):273324, 1997.

Articial

[146] T. C. Faltemier, K. W. Bowyer, and P. J. Flynn. A region ensemble for

IEEE Transactions on Information Forensics and Security, 3(1):6273, 2008. 3-D face recognition.

[147] P. Fishburn. A note on a note on nanson's rule.

Public Choice, 64(1):101

102, 1990. [148] ’. Raudys and F. Roli.

The behavior knowledge space fusion method:

Analysis of generalization error and strategies for performance improvement. In

Multiple Classier Systems, pages 5564. Springer, 2003.

54

Suggest Documents