3D and 2D face recognition using integral projection

50

Int. J. Intelligent Systems Technologies and Applications, Vol. 14, No. 1, 2015

3D and 2D face recognition using integral projection curves based depth and intensity images Ammar Chouchane* and Mebarka Belahcene Faculty of Science and Technology, Department of Electrical Engineering, University of Mohamed Khider, Biskra, BP 145 RP, 07000 Biskra, Algeria Email: [email protected] Email: [email protected] *Corresponding author

Salah Bourennane Institut Fresnel, UMR CNRS 7249, Ecole Centrale Marseille, France Email: [email protected] Abstract: This paper presents an automatic face recognition system in the presence of illumination, expressions and pose variations based on depth and intensity information. At first, the registration of 3D faces is achieved using iterative closest point (ICP). Nose tip point must be located using Maximum Intensity Method. This point usually has the largest depth value; however there is a problem with some unnecessary data such as: shoulders, hair, neck and parts of clothes; to cope with this issue, we propose the integral projection curves (IPC)-based facial area segmentation to extract the facial area. After that, the combined method principal component analysis (PCA) with enhanced fisher model (EFM) is used to obtain the feature matrix vectors. Finally, the classification is performed using distance measurement and support vector machine (SVM). The experiments are implemented on two face databases CASIA3D and GavabDB; our results show that the proposed method achieves a high recognition performance. Keywords: IPC-based facial area segmentation; nose tip; 2D and 3D face recognition; PCA; principal component analysis; EFM; enhanced fisher model; SVM; support vector machine. Reference to this paper should be made as follows: Chouchane, A., Belahcene, M. and Bourennane, S. (2015) ‘3D and 2D face recognition using integral projection curves based depth and intensity images’, Int. J. Intelligent Systems Technologies and Applications, Vol. 14, No. 1, pp.50–69. Biographical notes: Ammar Chouchane received his Master’s degree in Electronic Telecommunication from the Department of Electrical Engineering, University of Mohammed Khider Biskra, Algeria, in 2011. Currently, he is a PhD student at the same university. His research interests are in image processing, feature extraction, face detection, tensor analysis; classification, computer vision, biometric techniques and face recognition systems.

Copyright © 2015 Inderscience Enterprises Ltd.

3D and 2D face recognition using integral projection curves based depth

51

Mebarka Belahcene received his PhD from the University of Mohammed Khider Biskra, Algeria, in 2013. Currently, he is a Professor at Electrical Engineering Department at the same university. His research interests are in signal processing, image processing, image compression, classification, biometric techniques and face recognition systems. Salah Bourennane received his PhD in Signal Processing from Institut National Polytechnique de Grenoble, France, in 1990. Currently, he is a Full Professor at The Ecole Centrale de Marseille, France. His research interests are in statistical signal processing, array processing, image processing and tensor signal processing and performances analysis.

1

Introduction

In recent years, automatic face recognition systems have become one of the most active research fields in both computer vision and pattern recognition disciplines that have attracted a lot of interest from the research community. Face recognition is a biometric approach that has used automatic methods and techniques to verify the identity of persons based on the human characteristics and traits. Several problems and challenges are posed in 2D face recognition such as illumination, expression and pose variations (Segundo et al., 2007). To address these problems and to increase the recognition accuracy, the researchers use 3D face images that represent the geometric information. The face recognition system is composed of three principal stages: face detection, feature extraction and recognition. Face detection is a very important step: it permits to locate exactly the facial area, which allows further to extract the face from the background. To obtain a whole face recognition system, it is very important to get a high detection accuracy in the first step. A growing interest is attracted by face recognition combining 3D shape and 2D intensity/colour information. The combination of 2D and 3D information provides an opportunity to improve face recognition performance (Xu et al., 2009). The point cloud data is an M × 3 matrix and their projection into x – y grid gives us the depth (2.5D) image, where each pixel in the x-y plane stores the z depth value. A 2.5D image takes the form of a greyscale image, where the rather black pixels correspond to the background while the white-most pixel accounts for the point that is nearest to the camera. In the facial scan of a 3D surface representation, there is some additional information of shoulders, hair, neck and some parts of clothes that are unusable and should be removed. In this paper, we study the effect of depth and intensity information into a face recognition system in the presence of expression, illumination and pose variations. Our system includes three principal procedures: detection and pre-processing, feature extraction and classification. Figure 1 shows the overview of the proposed face recognition system using the IPC-based facial area segmentation.

52

A. Chouchane et al.

Figure 1

Overview of our proposed method (see online version for colours)

Firstly, the registration of 3D faces is achieved using ICP algorithm. Then, the depth (3D) and intensity (2D) images are generated from the 3D surface. To remove the pieces of information, which are not usable and to segment the facial area from the background, we propose the horizontal and vertical IPC-based facial area segmentation. The nose tip is a key point to segment the facial area from the background. The nose tip point has a very important role in 3D face recognition. The nose tips generally have the largest z value in the depth images, so we locate the coordinates of the nose tip point based on maximum intensity method to crop out the facial area from the face image using an elliptic mask centred at this point. After that, our pre-processing includes, first, removing of noise using median filter and filling of holes by a linear interpolation for the depth images; second, we use a greyscale transformation followed by histogram equalisation for 2D intensity images to reduce the influence of illumination variations. Then, the depth and intensity images are fused to reach a high performance. Secondly, in feature extraction step, PCA and EFM are used to obtain the feature matrix vectors, which are sent to the classifier. Finally, we use two methods for classification and comparison of the features: distance measurement (correlation measuring) and the SVM. Then, the decision is made to reject or accept the person according to a specific protocol (Tables 1 and 2). The main contributions of this paper can be summarised as follows: •

The evaluation of a novel face detection method based on IPC-based facial area segmentation to build an efficient automatic face recognition system using 3D depth and 2D intensity images in which we can cope with challenges where expressions, illumination and pose variations in the training and testing images are very different.

•

A summary of the literature related to face detection methods and face recognition using depth and intensity images.

•

The adaptation of PCA to reduce the large dimensionality of the feature vectors and extract the discriminant features: PCA is followed by EFM algorithm to improve the discrimination ability of similar features.

•

The combination of two classification methods: distance measurement and SVM.

The rest of the paper is organised as follows: Section 2 is an overview of the previous works. Section 3 introduces face alignment and registration. In Section 4, we present the facial area segmentation and pre-processing stage that is very important in our system. Section 5 presents the feature extraction method. In Section 6, we describe the


53

classification process. Section 7 contains the experimental results and the comparisons with the existing methods. Finally, in Section 8, we give the conclusions and future works. Table 1

Protocols used in CASIA3D Protocol 1 (phase 1)

Protocol 2 (phase 2)

Dataset

Customer

Impostor

Customer

Impostor

Training

500 images (1, 4, 8, 9, 10)

0 images

600 images (1, 4, 8, 9, 10, 16)

0 images

Evaluation

500 images (2, 6, 7, 14, 15)

195 images (1 : 15)

600 images (2, 6, 7, 14, 15, 17)

208 images (1 : 15 + 18)

Test

500 images (3, 5, 11, 12, 13)

150 images (1 : 15)

600 images (3, 5, 11, 12, 13, 20)

160 images (1 : 15 + 21)

Table 2

Protocol used in GavabDB

Dataset

Customer

Impostor

Training

55 images (1)

0 image

165 images (2, 4, 6)

42 images (1 : 7)

Test

2

Related work

In this section, we review the important works on face recognition based on the 2D and 3D data. We mainly focus on the different face detection methods and the effect of the detection process on the performance of these systems. Segundo et al. (2007) developed a new method for face segmentation to detect the eye corners, nose tip, nose base and nose corners using only range images as input. Facial features are detected by combining an adapted method for 2D facial feature extraction with the surface curvature information. A high nose tip detection rate of 99.95% was obtained. Xu et al. (2009) used both depth and intensity Gabor images to construct a robust classifier for face recognition under expression and pose variations. A novel hierarchical selecting scheme embedded in linear discriminant analysis (LDA) and AdaBoost learning is proposed to select the most effective and most robust features to construct a strong classifier. GarcíaMateos and Vicente Chicote (2001) proposed a method to detect the facial area using horizontal and vertical IPC in the intensity images; in our work, we try to use this method but our input is composed of the 3D depth images. Mousavi et al. (2008) proposed a 3D face recognition system based on range (depth) data. The nose tip is the reference point of the facial region; this important point is determined by the thresholding of z coordinate values of the 3D depth image. The feature vectors are extracted using 2DPCA. After that, the similarity score between the feature vectors is calculated. Finally, to choose and select the closest match, the authors use the SVM classifier. GavabDB face database is used to test the proposed method.

54

A. Chouchane et al.

Xiaoguang and Jain (2006) proposed a feature extractor based on the Directional Maximum to estimate the nose tip location and the pose angle simultaneously. A nose profile model is used to select the best candidates for the nose tip. Assisted by a statistical feature location model, a multimodal scheme is presented to extract eye and mouth corners. The invalid points in X, Y and Z are filtered out by matrix M. The facial area is segmented by thresholding the horizontal and vertical IPC of the matrix M. Xu et al. (2006) proposed a novel method for locating the nose tip; the authors presented a feature extraction hierarchical scheme combining local features to detect the nose tip coordinates and estimate the nose ridge by the included angle curve (IAC). ‘Effective energy’ is introduced to describe the local distribution of neighbouring pixels of nose tips. Then, SVM is used to select the correct nose tips. A successful detection rate of 99.3% of nose tip is achieved. García-Mateos et al. (2002) used vertical and horizontal IPC to model the visual appearance of human faces. Model-based detection is done by fitting the model into an unknown pattern. If a good fitting is obtained, the object is detected. This procedure was used to extract the face and non-face candidate regions. Yong-An et al. (2010) utilised geometric features for 3D face recognition. Geometric distance among the points around nose tip is calculated to represent the facial geometry. A number of candidates are selected from the gallery according to the geometric features. ICP is performed to match the probe with the candidates to make the final decision. Yue et al. (2010) proposed a novel framework for 3D face recognition in the presence of expression variations. Correlation features analysis is used as an intrinsic feature by 3D LBP. Spectral regression is then used to select the effective features and to combine them with a classifier. Experiments were implemented on CASIA3D face database and a recognition rate of 94% was obtained. Taghizadegan et al. (2012) proposed a 3D face recognition method that is robust to changes in facial expressions. Background and some unnecessary regions are eliminated by thresholding the z coordinate values of the 3D depth data using Otsu method. Maximum intensity method is used to locate the nose tip point. After the conversion of images to a standard size, 2DPCA is used to obtain feature matrix vectors. Finally, Euclidean Distance is applied for the comparison of features and classification. Zhou et al. (2012) combined the 2D intensity information and the 3D depth information to construct an efficient face recognition system. In the pre-processing step, they use a novel pose normalisation method for 3D range data and then transfer them to be depth image. After that, 2DPCA, 2DFLD, LBP and their combination are used to extract features of both 2D intensity and 3D depth images and the similarity scores are calculated. Finally, fusion of scores based on weighted sum rule is used to get a further improved performance. Xiaoxing et al. (2009) presented a face recognition method based on sparse representation under expression variations using low-level geometric features. In this work, a feature pooling and ranking scheme is designed to collect various types of low-level geometric features and rank them according to their sensitivities to facial expressions. In our work, IPC-based facial area segmentation is used. We try to apply this method with 3D depth images and 2D intensity images. The input data is a 3D point cloud of human face; more details about how we used these methods are presented in Section 4.2. Our automatic face recognition system contains two phases, the training phase and the testing phase. During the training phase, the enrolment of facial images is used and


55

the training data is prepared for the classifier. The training phase will be performed only once. In the recognition phase, the new test data is classified with the training data, which is learned and prepared in the training phase. These processing steps in the two phases – training and test – are the same. Figure 1 illustrates the different steps of the proposed method through the training and the test phases.

3

Face alignment and registration

The ICP algorithm was firstly proposed by Besl and McKay (1992). In our work, ICP is used to register the 3D faces. We use a reference face model with neutral expression and frontal pose. All the faces are aligned to this model to refine the effect of pose and expression variation. ICP is an iterative algorithm that minimises the mean square error (MSE) between two sets of 3D point clouds (the reference face and the input face) using a rigid transformation to the input face. The drawback of the ICP is the need of a reference face model, and a large computation time for the alignment.

4

Facial area segmentation and pre-processing

4.1 Depth and intensity images With the development of 3D capturing equipment, it has become faster and easier to obtain 3D shape and 2D texture information to represent a real 3D face (Xu et al., 2009). 3D point cloud is the input of our system. Each point Pi in the point cloud bears the (xi, yi, zi) coordinate (i = 1, …, N). As mentioned earlier, the 3D face image is represented by the depth image that is also called 2.5D image with respect to x – y plane that contains at most one depth value (z-direction) for every point in the (x, y) plane. These images have several advantages over 2D intensity images and 3D meshes especially under illumination change (Zhou et al., 2012). In our work, the fusion of 2D and 3D images is used to improve the further performance of the face recognition system.

4.2 IPC-based facial area segmentation The first phase in the face recognition system is to detect the facial region from the background. Our proposed method is based on the IPC-based facial area segmentation. Some additional information have always existed in 3D facial data like shoulders, hair, neck and some parts of clothes; this unusable data must be removed to get a high detection performance. Integral projection has been used in problems such as face detection and facial feature location (GarcíaMateos and Vicente Chicote, 2001; GarcíaMateos et al., 2002). In addition, the main facial features (i.e., eyebrows, eyes, nose and mouth) can be accurately located by analysing its horizontal and vertical IPC (GarcíaMateos and Vicente Chicote, 2001). Another useful technique for hand vein feature extraction is proposed by Hua-Bin et al. (2012). In this method, based on greyscale vertical integral projection and wavelet decomposition, a novel algorithm for hand vein feature extraction and recognition is proposed. An integral projection is a one-dimensional pattern, obtained through the sum of a given set of pixels along a given direction (García-Mateos et al., 2002).

56

A. Chouchane et al.

Horizontal and vertical IPCs are most commonly used, although they can be applied on any direction, where the integral projection along an arbitrary angle α is denoted by αP(i); in this case, the input image must be rotated along α. First, the input image (grey level) must be transformed to be a binary image using Otsu (1975) method. Otsu algorithm assumes that the input image to be thresholded contains two classes of pixels or bi-modal histogram foreground and background, then calculates the optimum threshold separating those two classes so that their combined spread (intra-class variance) is minimal. When the threshold value is calculated, it is used for the binarisation of the input image. For example, given a segmented greyscale input image I(x, y), the horizontal and vertical IPCs are defined as follows (García-Mateos et al., 2002): HP(y ) = ΣI(•, y )

(1)

VP(x) = ΣI(x, •).

(2)

The vertical and horizontal IPCs of depth image are shown in Figure 2, where Figure 2(a) illustrates the binarised image of the input depth image. In Figure 2(b), x-axis represents the y coordinates and y-axis represents the integral of the depth value (z-direction). In Figure 2(c), x-axis represents the x coordinates and y-axis represents the integral of the depth value. The horizontal IPC gives exactly the location (Xmin, Xmax) of the facial area in along x-axis of the input image. Moreover, the vertical IPC gives the location (Ymin, Ymax) of the facial area in along y-axis of the input image (Ymin, Ymax). On the basis of the rectangular M defined by [Xmin … Xmax, Ymin … Ymax], the facial area is segmented. Figure 2(d) is the segmented facial area based on the rectangular M; it can be seen that the IPC-based facial area segmentation is an efficient method used to remove the additional information as well as to segment the facial area from its background.

4.3 Pre-processing At the beginning, the coordinates of the nose tip point are estimated. For each pixel, the mean of the grey level values in a 3 × 3 window is computed around it. The nose is the most distinct feature in 3D facial data (z value), so the largest value computed in the 3 × 3 window is considered as the nose region; the central pixel of this window is regarded as the nose tip point. Although, in most cases, this point represents the nose tip, but sometimes the depth value of chin, shoulders, hair, neck and some parts of clothes is larger than that of the nose, therefore, we find errors in the nose tip position. To overcome this problem, IPC-based facial area segmentation is used to eliminate large parts of this unusable data (see Figures 3(b) and 4(b)), which cause errors in the coordinate of nose tip point. The prepossessing of facial images is used to improve the image to be really expressive in the form of faces in both depth and intensity images. Generally, the data obtained from 3D laser scanners are noisy; the 3D depth images have spark noise and also holes in some parts of the image. In this paper, median filter is used to eliminate spike noise and we fill holes by linear interpolation of the neighbouring pixels as illustrated in Figures 3(c) and 4(c) for the depth and intensity images, respectively.


57

Figure 2

Detection of facial area: (a) binarisation of depth image; (b) horizontal integral projection; (c) vertical integral projection and (d) the segmented facial area (see online version for colours)

Figure 3

Detection and pre-processing of the depth image: (a) the input depth image; (b) segmented facial area using IPC; (c) removing noise and filling holes and (d) the elliptical mask

Figure 4

Detection and pre-processing of the intensity image: (a) the input intensity image; (b) segmented facial area using IPC; (c) removing noise and filling holes and (d) the elliptical mask with the histogram equalisation (see online version for colours)

58

A. Chouchane et al.

The intensity images are processed using a similar method as on the depth images but, in addition, histogram equalisation (see Figure 4(d)) is used to reduce the effect of the illumination variation after cropping the facial area. Most works on face recognition use an elliptical template and ignore the regions outside the elliptical mask to remove uncertain information (Xu et al., 2009). In this work, we use this elliptical mask to crop the facial region; the centre of this mask is the nose tip that has been localised previously. The elliptical mask of both depth and intensity images is illustrated in Figures 3(d) and 4(d), respectively. The fusion of different information or feature sets increases the accuracy of biometric systems (Ross and Govindarajan, 2005). In this work, intensity information (2D) and depth information (3D) are fused to get more representation of facial image. We use the fusion by combination based on simple logic operations (Sum, Product, Min, Max and Mean), and give the best results.

5

Feature extraction PCA and EFM

Any biometric system has a very important phase based on the reduction of space or dimensionality. PCA also known as the Karhunen-Loeve, proposed by Turk and Pentland (1991), is one of the most successful techniques that has been widely used. The main idea of this algorithm is to reduce a large dimensionality of the data to the smaller intrinsic one PCA projection (transformation) matrix (UPCA), which maximises the variance while minimising the MSE. All M facial images are collected in a unique matrix A = [A1 A2, …, AM]; each image is represented by a column vector Ai = [ Ai1 Ai2 ,…, AiN ], of size N. Let A be the average vector of all M vectors that represents the gravity centre of the matrix A. The main steps for computing the PCA projection matrix UPCA are summarised as follows:

•

Calculate the average vector: A=

•

1 M

M

∑ A. i

The data in the matrix A is adjusted according to the average A by subtraction from each image vector. Qi = ( Ai − A ) , i = 1, …, M .

•

(3)

1

(4)

Calculate the covariance matrix C: M

C = ∑ ( Ai − A ) ( Ai − A)T = X . X T , X = [Q1 , Q2 , Q3 , …., QM ].

(5)

1

•

Calculate the eigenvalues V and eigenvectors U of the covariance matrix C.

•

Sort the eigenvectors in the descending order of corresponding eigenvalues.

•

UPCA contains the first k eigenvectors corresponding to the k greater eigenvalues.

The principal aim of PCA algorithm is to represent the original data in a lower-size space. However, we have a negative side, i.e., PCA does not take into account the aspect of the


59

discrimination of classes (persons). In contrast, the LDA proposed by Belhumeur et al. (1997) performs a true separation of classes. This algorithm is considered as a technique that is looking for the directions that are efficient for the discrimination between classes. The main aim of LDA is to maximise the distance between the classes while minimising the intra-class variance. However, we have another problem: LDA induces on-adjustment to the training data. This prepares wrong with the new test data. To resolve this problem, we propose EFM algorithm (Liu and Wechsler, 2002). EFM improves the generalisation capacity of the LDA and preserves a suitable balance between the selection of the eigenvalues and the requirement that the eigenvalues of dispersion matrix intra-class are not too small. We can summarise the steps of EFM algorithm as follows: •

For all samples of all M classes, the dispersion matrices intra-class (Sw) and inter-class (Sb) are defined as follows: M

ni

S w = ∑∑( Ai j − mi )( Ai j − mi )T

(6)

i =1 j =1 M

Sb = ∑ni ( mi − m )( mi − m ) . T

(7)

i =1

In the above-mentioned equations, Ai j is the jth samples of the class i, mi is the average of the samples in the class i, m is the average of all samples and ni is the number of samples in the class i. •

Calculate the eigenvalues (Λ) and the eigenvectors vectors (V) of the SW matrix

•

calculate the new inter-class matrix: Kb = Λ–1/2VTSbVΛ–1/2 (8)

•

calculate the eigenvalues (Λb) and the eigenvectors vectors (Vb) of Kb

•

calculate the global transformation matrix UEFM: U EFM = V Λ −1/ 2Vb .

(9)

In this paper, PCA is applied and followed by EFM. The output of PCA is an input to EFM. Through this process, the minimum and the important features are extracted. After computing the EFM transformation matrix UEFM, the projection of each dataset into the eigenvector subspace is calculated. These important features are sent to the classification step.

6

Classification

The classification step is very important in our system. In this section, we are interested in two types of classifications: the first is based on the similarity measure and the second is based on the SVM classifier.

6.1 Distance measurement classification When we want to compare two feature vectors from the feature extraction module of a biometric system, we can carry out a measurement of similarity. Similarity measures are

60

A. Chouchane et al.

utilised in face recognition systems to compare two feature vectors resulting from the step of feature extraction. Measurement of similarity is used for obtaining the level of similarity between two feature vectors (for example, A and B). The various experiments in the literature have proved that the Euclidean Distance is surpassed by other distances (Belahcene, 2013). One of the best of them is the Normalised Correlation that is defined by: S ( A, B ) =

AT B . || A |||| B ||

(10)

Function (10) calculates the cosine of the angle between the two feature vectors A and B. A high value of normalised correlation is a good similarity between the two vectors.

6.2 Support vector machine (SVM) classification The SVM is a new statistical learning technique that analyses data and recognises patterns proposed by Cortes and Vapnik (1995). It can address a variety of issues such as classification, regression and fusion. SVM is very useful in the domain of computer vision and pattern recognition. The objective of SVM is to find a good separation that minimises the classification error on the training set. The purpose of any classifier is to classify an element X, here X = (C1 … CN). In this work, there are two classes: customer and impostor, whose label will be noted with y = −1 corresponding to the impostor class and y = 1 corresponding to the customer class. The classifier determines f such that: Y = f (x), the separation surface is the hyperplane: W k ( x ) + b = 0.

(11)

The concept used for the minimisation of the classification error is the margin. Margin is the mean square distance between the separating and the closest elements of training called support vectors.

6.3 Performance measurement The biometric systems need to be evaluated to estimate their performance. In our work, the performance of face recognition system is measured using the following classification errors, false rejection rate (FRR), false acceptance rate (FAR) and equal error rate (EER). FRR if the system rejects a person, whereas it is a customer and FAR if the system accepts a person, whereas it is an impostor, where: FRR = (Num of False Rejection)/(Num of Customer)

(12)

FAR = (Num of False Acceptation)/(Num of Impostor).

(13)

Moreover, it is necessary to calculate two types of distances Entra_Distance and Extra_Distance, the Entra_Distance is the distance calculated between two face images of the same person and the Extra_Distance is the distance between two face images of two different persons. The minimum similarity is expressed as a maximum distance between the features of the two images. A classification threshold is defined for each person; this threshold determines the minimum of similarity between two facial images to admit that they correspond to the same person. The threshold equation is given as follows:

3D and 2D face recognition using integral projection curves based depth Threshold = ((max(Entra _ Distance) + min(Extra _ Distance) )) / 2.

61 (14)

To verify the stability of the system, EER is calculated. This operating point corresponds to a value where FAR is equal to FRR. Finally, to examine the sensibility of our system, RR (Recognition Rate) is calculated as follows: RR = (100 − ( FRR + FAR ) .

(15)

After the pre-processing of facial images, all these images are normalised to a same size of 91 × 71 pixels (the size of the elliptical mask). The facial space is obtained after the feature extraction using PCA followed by EFM. In the classification process, using the protocols of each database the similarity score based on Entra_Distance and Extra_Distance between the training set and the testing set is calculated.

7

Experimental results

7.1 Results of detection stage In this section, we present the experimental results of the proposed method that is applied on two 3D face databases CASIA3D and GavabDB. CASIA3D (WRL format) contains 123 different persons having 37 or 38 scans (models). For each scan, a 2D intensity and 3D depth images are generated. In CASIA3D, we consider several variations of poses, expressions and illuminations, combined variations of expressions under illumination and poses under expressions. We only use 15 models to each person to test our method: Five scan with illumination variation, five scan with expressions variation (laugh, smile, anger, surprise and eye close) and five scan with variation of expressions under illumination variation. The second database called GavabDB (Moreno and Sanchez, 2004) contains 549 three-dimensional images of facial surfaces. In this database, there are 61 different individuals having nine images for each person. Each image is given by a mesh of 3D point cloud of the facial surface without texture, unlike CASIA3D database that contains the intensity images. GavabDB provides several variations such as the pose and the facial expressions. The nine images corresponding to each individual are: two frontal views with neutral expression, two x-rotated views (35°, looking up and looking down, respectively) with neutral expression, two y-rotated views (90°, left and right profiles, respectively) with neutral expression, two frontal images with expression variation (smile and laugh) and one random gesture chosen by the user. In the detection stage, five models for each individual are used, two frontal views with neutral expression, two frontal views with expression (laugh and smile) and one random gesture chosen by the user. The detection accuracy in both databases CASIA3D and GavabDB are depicted in Tables 3 and 4, respectively. For Table 3, we have: •

IV (615 images): Illumination variations, including lighting up, down, left and right.

•

EV (615 images): Expression variations, including smile, laughter, anger, surprise and eyes closed.

•

EVI (615 images): Expression and illumination variations.

•

D: Depth image and intensity image.

62

A. Chouchane et al.

Table 3

Detection accuracy in CASIA3D

Number of persons

Number of images/person

IV

123

EV

123

EVI

123

Dataset

Correctly detected

Incorrectly detected

Detection accuracy (%)

D

I

D

I

D

I

5

605

609

10

6

98.37

99.02

5

607

608

8

7

98.69

98.86

5

592

597

23

18

96.26

97.07

Table 4

Detection accuracy in GavabDB

Dataset

Number of persons

Number of images/person

Correctly detected

Incorrectly detected

Detection accuracy (%)

A

61

2

116

6

95.08

B

61

2

114

8

93.44

C

61

2

55

6

90.16

For Table 4, we have: •

A: Two frontal views with neutral expression.

•

B: Two frontal views with expression (laugh and smile).

•

C: One random gesture chosen by the user.

On the basis of the results presented in Tables 3 and 4, we can draw the following conclusions: •

the high detection accuracy of 99.02% is achieved with illumination variation as shown in Table 3, so it can be argued that our proposed method is very efficient in the presence of illumination variation.

•

the lowest rate obtained is 90.16% (Table 4) with GavabDB; in this case, we have a random gesture chosen by the user; it can be used to find an orientation of head or an obstacle in the front of the face, thus which led to decrease in the accuracy.

•

it is worth emphasising that intensity images are more robust than depth images under illumination variations in the detection step; this is owing to the effect of histogram equalisation, which is applied to these images in the pre-processing step.

•

experiments results on CASIA3D database show that the proposed approach can work very well with depth and intensity images, and can also deal with variation of illumination (99.02%) and expressions (98.86%).

Also, we refer to the paper (GarcíaMateos and Vicente Chicote, 2001), where the detection accuracy of 95.1% is obtained. The authors in this paper used 2D images such as an input to their system, but in our work we use the 3D point cloud and generate depth and intensity images that are the inputs of our system. A high detection accuracy of 99.02% is achieved. The reference point in our system is the nose tip, which has the largest depth value. One limitation of the proposed approach occurs when some persons have the depth value of chin or hair more than the depth value of the nose; this problem decreases the detection accuracy.


63

7.2 Results of recognition system This section describes the experimental results of the recognition system. There are three parts in this study to examine the performance of 2D and 3D information in the face recognition system. In the first experiment, we investigate the influence of illumination and expression variation with CASIA3D based on protocol 1 that is presented in Table 1 (phase 1) and the second experiment is performed using GavabDB to evaluate the performance of our system in the presence of small pose variation (looking up and down 35°). Finally, our system is tested in the presence of large pose variation left/right (20–30°, 50–60°) with CASIA3D based on protocol 2 presented in Table 1 (phase 2).

7.2.1 Protocol and database partitions Facial databases must include several challenges and variations; this allows us to test several techniques and algorithms to deal with many difficulties and problems. Our experiments are implemented on two 3D face databases CASIA3D and GavabDB. As will be seen, expressions, illumination and pose variations in the training and testing images are very different. CASIA3D is characterised by many complex variations that are difficult to any algorithm. In this paper, we have studied the effect of illumination variations (images: 1, 2, 3, 4, 5), expressions (images: 6, 7, 8, 9, 10), combined changes in expression under illumination (images: 11, 12, 13, 14, 15) and the pose variation (images: 16, 17, 18, 20, 21), where the five scans of illumination variations include: office light, up light, down light, left light and right light. The five scans of the expression variations include: laugh, smile, anger, surprise and eye close, whereas the five scans of pose variations include: a frontal pose (0°), turn right (20–30°), turn left (20–30°), turn right (50–60°) and turn left (50–60°). Scan 19 of each person in CASIA3D database is the image of the person in the rotation angle with right direction (80–90°); in our work, this pose is not taken into account. Some examples of expression and pose variations in CASIA3D database are shown in Figures 5 and 6, respectively. We used 20 images for each subject (persons) in two phases with the total number of 2460 images. As a starting point, we will study the influence of illumination and expression variations; then, we have added the pose variations in the second experiment. In the first phase, the 123 persons are separated into two classes, customer and impostor. Customer class contains 100 subjects and impostor class is subdivided into 13 impostors for the evaluation and 10 impostors for test. In the training set, we have 100 persons with 5 images (1, 4, 8, 9, 10) as customers; this set – training – does not contain customers (0 images). In the evaluation set, customer class contains the same 100 persons of the training set but with five other conditions (2, 6, 7, 14, 15). The imposter class in the evaluation set contains 13 persons with all the 15 images (1 : 15). In the test set, customer class contains the same 100 persons of the training set but with five other conditions (3, 5, 11, 12, 13). The imposter class in the test set contains 10 persons with all the 15 images (1 : 15). With the same way for the second phase, but in protocol 2 we have added different pose variations (16, 17, 18, 20, 21). The protocols used for CASIA3D database in both phase 1 and phase 2 are depicted in Table 1.

64

A. Chouchane et al.

Figure 5

Expression variations in CASIA3D database: (a) smile; (b) laugh; (c) anger; (d) surprise and (e) eye close (see online version for colours)

Figure 6

Pose variations in CASIA3D database

The second experiment is performed using GavabDB face database; as we have previously seen, there are 61 different individuals having nine images for each person. We take into account seven models to each person, the total number of depth images used is 427 images where (images: 1, 2) represent frontal views with neutral expression (Images: 3, 4) frontal views with expression variation (images: 5, 7) rotated views (35°) looking up and (−35°) looking down and (image: 6) represents random gesture chosen by the user. The 61 persons of GavabDB database are subdivided into two classes, customer and imposter. The customer class contains 55 persons and the imposter class contains six persons. In the training set, we take one image with frontal view under neutral expression as customers. The training set does not contain customers (0 images). In the test set, the customer class contains the same 55 persons of the training set with three other conditions (2, 4, 6). The imposter class in the test set contains six persons with all the seven images (1 : 7). The protocol used for GavabDB database is shown in Table 2.

7.2.2 Discussion Through our experimental results, we can conclude that: First, the size of database plays a significant role in the performance and the stability of the recognition system. Figures 7 and 8 illustrate the variation of RR and EER, respectively, according to the number of features (EFM) with the SVM classifier and the distance measurement (dis). The curves of the recognition rate (RR) and EER (see Figures 7 and 8) are more stable in CASIA3D (2460 images) compared with GavabDB (427 images). For example, we obtained for a number of feature equal to 10 an RR = 81.67% on GavabDB and RR = 93.13% on CASIA3D when we use the SVM classifier. Likewise, for the distance measurement we obtained an RR = 26.81% on GavabDB and RR = 89.60% on CASIA3D.

3D and 2D face recognition using integral projection curves based depth Figure 7

Variation of EER according to feature number (see online version for colours)

Figure 8

Variation of RR according to feature number (see online version for colours)

65

Second, in the presence of pose variation with orientation angle ≤ 35°, good results have been achieved, RR = 95.85% and EER = 2.07%; these results are near to the results obtained in the case of frontal pose (angle = 0°), when we obtained RR = 96.75% and EER = 1.6%. Therefore, the pose variation up to 35° does not affect our system. Moreover, we find that for angles ≥ 35° (up to 60°), the recognition system is less efficient and we obtained an RR = 89.63% and EER = 3.16%. In this work, ICP is used for the alignment of the pose angle; this algorithm is limited to correct the small pose angle less than 30–35°. Third, in this paper, two classifiers SVM and distance measurement are used; in all experiments, we can note that SVM is still the best solicited and efficient classifier as shown in Figures 7 and 8. SVM gives the best RR and EER in all experiments with the

66

A. Chouchane et al.

both databases. The promising results obtained in this paper in the presence of several challenges (illumination, expression and pose variation) are due, on the one hand, to the good detection process, when we use IPC-based facial area segmentation; on the other hand, thanks to PCA followed by EFM in feature extraction step, PCA reduces the large dimensionality of the data space to the smaller intrinsic dimensionality and EFM preserves a suitable balance between the selection of the eigenvalues and the requirement that the eigenvalues of dispersion matrix intra-class are not too small. The number of features that represents the number of rows of the global transformation matrix UEFM is varied according to RR and EER to determine the optimal features. As shown in Figures 7 and 8, we observe that, with a small number of features (30–35), we get the best recognition performance (RR and EER) in all experiments. We can recapitulate our best recognition accuracy on the two databases with the different challenges in Table 5. Table 5

The best recognition accuracy for different experiments

Rates

GavabDB

Expression, pose variation looking up/down (35°) and Challenges and variations of the random gesture chosen datasets by the user

CASIA3D Phase 1

Phase 2

Illumination, expression, and combined changes expression under illumination.

Illumination, expression, combined changes expression under illumination and pose variation left/right (20–30°, 50–60°)

EER (%)

2.07

1.60

3.16

FAR (%)

0.50

1.45

3.20

FRR (%)

3.63

1.80

7.16

RR (%)

95.85

96.75

89.63

The comparison of RR with the state-of-the-art is illustrated in Table 6; in our experiences on the two databases GavabDB and CASIA3D, we have obtained encouraging results near to the top compared with the state-of-the-art. Table 6

Comparison of RR on GavabDB and CASIA3D with state-of-the-art Methods

Database

Recognition rate/challenges

Moreno et al. (2005) HK segmentation GavabDB algorithm, PCA and SVM

90.16%: neutral expression

Xiaoxing et al. (2009)

sparse representation, GavabDB low-level features, geodesic distance

93.33% expression variation

Drira et al. (2013)

Elastic radial curves, GavabDB PCA

94.54%: expression variation

Robust Hausdorff distance, ICP

90.16%: neutral expression

Mahoor and AbdelMottaleb (2009)

Mousavi et al. (2008) 2DPCA, SVM

GavabDB

77.9% under gesture and light face rotation 94.68%: neutral + expression

94.67%: neutral + expression 78%: neutral + expression

GavabDB

91%: neutral + expression

3D and 2D face recognition using integral projection curves based depth Table 6

67

Comparison of RR on GavabDB and CASIA3D with state-of-the-art (continued) Methods

Database

Recognition rate/challenges

Our method

IPC (horizontal and vertical), PCA with EFM, SVM

GavabDB

95.85%: neutral + expression, pose variation (looking up/down) and random gesture chosen by the user

Xu et al. (2009)

Gabor filter, LDA, Adaboost

CASIA3D

93.3%:expression and illumination 91.0%: large pose variations left/right 50–60

Yong-An et al. (2010)

geometric feature, CASIA3D ICP, LDA, geodesic distance

91.1%: gallery set contained one image each of 90 persons with a neutral expression. The probe set contains the other 2610 models

Yue et al. (2010)

3DLBP, spectral CASIA3D regression; nearest neighbour classifiers

94.17%: expression variation

Zhou et al. (2012)

2DPCA, LBP, cosine CASIA3D similarity

94.68% : (first 30 people) Expression and illumination

Our method

IPC-based facial area CASIA3D segmentation, PCA with EFM, SVM

96.75%: illumination, expression, and expression under illumination

8

89.63%: illumination, expression, expression under illumination and pose variation left/right (20-30°, 50–60°)

Conclusion and future work

In this paper, we proposed an automatic 2D and 3D face recognition using depth and intensity images. Facial region is detected using IPC-based facial area segmentation in which some additional and unusable information are eliminated. Then, nose tip is located based on maximum intensity method to crop out the facial area using an elliptic mask centred in this point. After that, images were pre-processed. Feature matrix vectors are obtained after feature extraction using an efficient method PCA followed by EFM; this method contributed to the improvement of our system. Then, distance measurement and SVM are used for the classification. Finally, the experiments were conducted on CASIA3D and GavabDB face databases. The comparison with existing methods has demonstrated that we get promising results. Our method offers good accuracy and robustness to illumination, expression and small pose variations up to 35°. Pose correction techniques can be one of the future works to solve the problem of large pose variations.

Acknowledgements The authors thank the reviewers for their careful reading and their helpful comments, which contributed to improve the quality of this paper.

68

A. Chouchane et al.

References Belahcene, M. (2013) Electrical Engineering Development. Authentification and Identification in Biometrics, Unpublished PhD thesis, Mohamed Khider University, Biskra, Algeria. Belhumeur, P.N., Hespanha, J.P. and Kriegman, D.J. (1997) ‘Eigenfaces vs. fisherfaces: recognition using class specific linear projection’, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, pp.711–720. Besl, P.J. and McKay, N.D. (1992) ‘Method for registration of 3-D shapes. Robotics-DL tentative’, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 14, pp.239–256. Cortes, C. and Vapnik, V. (1995) ‘Support-vector networks’, Machine Learning, Vol. 20, No. 3, pp.273–297. Drira, H., Ben Amor, B., Srivastava, A., Daoudi, M. and Slama, R. (2013) ‘3D face recognition under expressions, occlusions, and pose variations’, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 9, pp.2270–2283. GarcíaMateos, G. and Vicente Chicote, C. (2001) ‘‘Face detection on still images using HIT maps’, Audio- and video-based biometric person authentication’, in Bigun, J. and Smeraldi, F. (Eds.): Springer Berlin Heidelberg, Vol. 2091, pp.102–107. García-Mateos, G., Ruiz, A. and Lopez-de-Teruel, P. (2002) ‘Face detection using integral projection models’, in Caelli, T., Amin, A., Duin, R.W., de Ridder, D. and Kamel, M. (Eds.): Structural, Syntactic, and Statistical Pattern Recognition, Springer Berlin Heidelberg, Vol. 2396, pp.644–653. Hua-Bin, W., Liang, T. and Jian, Z. (2012) ‘Novel algorithm for hand vein feature extraction and recognition based on vertical integral projection and wavelet decomposition’, 2nd International Conference on Electronics, Communications and Networks, pp.1928–1931. Liu, C. and Wechsler, H. (2002) ‘Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition’, IEEE Transactions on Image Processing, Vol. 11, No. 4, pp.467–476. Mahoor, M.H. and Abdel-Mottaleb, M. (2009) ‘Face recognition based on 3D ridge images obtained from range data’, Pattern Recognition, Vol. 42, No. 3, pp.445–451. Moreno, A. and Sanchez, A. (2004) ‘GavabDB: a 3D face database’, Proc. 2nd COST275 Workshop on Biometrics on the Internet, Vigo (Spain), pp.75–80. Moreno, A.B., Sanchez, A., Velez, J. and Diaz, J. (2005) ‘Face recognition using 3D local geometrical features: PCA vs. SVM’, Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, ISPA, Zagreb, Croatia, pp.185–190. Mousavi, M.H., Faez, K. and Asghari, A. (2008) ‘Three dimensional face recognition using SVM classifier’, Seventh IEEE/ACIS International Conference on Computer and Information Science, Portland, OR, pp.208–213. Otsu, N. (1975) ‘A threshold selection method from gray-level histograms’, Automatica, Vol. 11, Nos. 285–296, pp.23–27. Ross, A.A. and Govindarajan, R. (2005) ‘Feature level fusion of hand and face biometrics’, Defense and Security, International Society for Optics and Photonics, pp.196–204. Segundo, M.P., Queirolo, C., Bellon, O.R.P. and Silva, L. (2007) ‘Automatic 3D facial segmentation and landmark detection’, 14th International Conference on Image Analysis and Processing, Modena, Italy, pp.431–436. Taghizadegan, Y., Ghassemian, H. and Naser-Moghaddasi, M. (2012) ‘3D face recognition method using 2DPCA Euclidean distance classification’, ACEEE International Journal on Control System and Instrumentation, Vol. 3, No. 1, pp.1–5. Turk, M.A. and Pentland, A.P. (1991) ‘Face recognition using eigenfaces’, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Proceedings CVPR‘91, Maui, HI, pp.586–591.


69

Xiaoguang, L. and Jain, A.K. (2006) ‘Automatic feature extraction for multiview 3D face recognition’, 7th International Conference on Automatic Face and Gesture Recognition, FGR, Southampton, pp.585–590. Xiaoxing, L., Tao, J. and Zhang, H. (2009) ‘Expression-insensitive 3D face recognition using sparse representation’, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, Miami, FL, pp.2575–2582. Xu, C., Li, S., Tan, T. and Quan, L. (2009) ‘Automatic 3D face recognition from depth and intensity Gabor features’, Pattern Recognition, Vol. 42, No. 9, pp.1895–1905. Xu, C., Tan, T., Wang, Y. and Quan, L. (2006) ‘Combining local features for robust nose location in 3D facial data’, Pattern Recognition Letters, Vol. 27, No. 13, pp.1487–1494. Yong-An, L., Yong-Jun, S., Gui-Dong, Z., Taohong, Y., Xiu-Ji, X. and Hua-Long, X. (2010) ‘An efficient 3D face recognition method using geometric features’, 2nd International Workshop on Intelligent Systems and Applications (ISA), Wuhan, pp.1–4. Yue, M., QiuQi, R., Xueqiao, W., Meiru, M. et al. (2010) ‘Robust 3D face recognition using learn correlative features’, 10th International Conference Signal Processing (ICSP), IEEE, Beijing, pp.1382–1385. Zhou, J., Li, Y. and Wang, J. (2012) ‘2D&3D-ComFusFace: 2D and 3D face recognition by scalable fusion of common features’, Journal of Computer Science and Network Security, Vol. 12, No. 3, pp.30–36.