preserve the face data along three axes (X, Y, and Z) in three sets of pointclouds, i.e. depth data ..... form of the Pythagoras formula. Hence ..... interests pertain to the applications of computational intelligence techniques like Fuzzy logic,.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 1
Illumination, Pose and Occlusion Invariant Face Recognition from Range Images Using ERFI Model Suranjan Ganguly, Department of Computer Science and Engineering, Jadavpur University, Kolkata, India Debotosh Bhattacharjee, Department of Computer Science and Engineering, Jadavpur University, Kolkata, India Mita Nasipuri, Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
ABSTRACT In this paper the pivotal contribution of the authors is to recognize the 3D face images from range images in the unconstrained environment i.e. under varying illumination, pose as well as occlusion that are considered to be the most challenging task in the domain of face recognition. During this investigation, face images have been normalized in terms of pose registration as well as occlusion restoration using ERFI (Energy Range Face Image) model. 3D face images are inherently illumination invariant due its point-based representation of data along three axes. Here, other than quantitative analysis, a subjective analysis is also carried out. However, synthesized datasets have been accomplished to investigate the performance of recognition rate from Frav3D and Bosphorus databases using SIFT and SURF like features. Moreover, weighted fusion of these individual feature sets is also done. Later these feature sets have been classified by K-NN and Sequence Matching Technique and achieved maximum recognition rates of 99.17% and 98.81% for Frav3D and GavabDB databases respectively. Keywords:
2.5D Range Image, 3D Face Image, Face Registration, Frav3D and Bosphorus Databases, K-NN, Occlusion Restoration, Sequence Matching Technique, SIFT, SURF
1. INTRODUCTION Face recognition, an important biometric (Nandi et al., 2014) modality as well as a challenging task in the domain of computer vision. Although there exists numerous biometric data, such as finger print, iris, DNA, palm print, ear shape, heart rate, etc. human face image has gained much of the researchers attention, due to its uniqueness, easy availability and acquisition without consent of the individual etc. In last decades, in this area enormous advancements have already been DOI: 10.4018/ijsda.2015040101 Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
2 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
made. Moreover, due to the advancements in sensing technology (i.e. acquisition mechanism) and availability of sufficient computing power, 3D face image based recognition techniques have also gained (Scheenstra et al., 2005; Ganguly et al., 2015b) much of the researchers’ attention. The performance of any face recognition system mainly suffers for three main reasons, namely (1) Pose (2) Illumination and (3) Occlusion. However, the illumination (or light shading) problem can be efficiently handled by 3D face images. Due to its inherent property, 3D images preserve the face data along three axes (X, Y, and Z) in three sets of pointclouds, i.e. depth data (Z) in X-Y plane. Unlike 2D, 3D images are not affected by the illumination of face image with different light sources (or shading). Now, even in the set of frontal face images the presence of occluded faces will certainly degrade the performance of any well-established recognition algorithm. The pose variations are somehow creating a similar type of situation as that of occluded faces. Due to different pose variations along yaw, pitch and roll, some portion of the face region is suppressed that ultimately cause poor recognition rate. To deal with these challenges i.e. pose and occlusion, authors have detailed various investigations in this literature. An input 3D face image is processed to create its corresponding depth map (Ganguly et al., 2014a; Conde et al. 2006) or 2.5D range face image that preserves only the depth values. Then, ERFI (Ganguly et al., 2014b) model is used to register the rotated faces to frontal (or near frontal) position. In case of occluded faces, various techniques, such as: GPCA, ERFI model, and eigenface images are accomplished to restore back the missing part of face image after successful reconstruction of occluded region(s). After that, a synthesized face dataset is created consisting of frontal range face images only those are neutral, having facial actions i.e. expression, registered and restored. These are used to recognize by K-NN and Sequence Matching Technique from SIFT (Lenc et al., 2013) and SURF based feature extraction mechanism. Moreover, a weighted fusion mechanism is followed to create a new fused (or hybrid) feature vector for aiming higher recognition rate. Other than validating the algorithm on synthesized dataset, the recognition algorithm has been examined in different sub-groups from two databases (like: either occluded or illuminated, rotated and frontal with various expressions) along with range face images from original databases to discover the superiority of the performance in terms of recognition result. In Figure 1, the overall description of the proposed recognition scheme has been illustrated. Hence, in this paper the contribution of the authors can be summarized as follows: • •
This algorithm can handle pose variations and occlusions. Among three challenging tasks, illumination is inherently managed by 3D face images (especially by range face images) and remaining two challenges have been taken care of all of the research work; The feature vectors that have been generated by SIFT and SURF detectors are also fused. Two weighted fusion based feature vectors have been created. In the first hybrid feature vector, 60% weighted has been allowed from SURF and 40% from SIFT. The second characteristic vector is the reverse weighted ratio of the first one;
Figure 1. Proposed framework of robust face recognition mechanism
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 3
•
In addition, authors have selected two classifiers, namely: K-NN and Sequence matching mechanism for classification purpose. K-NN is well established and popular classifier (Bhatia et al., 2010). On the other hand, significance role of weighted selection of SIFT and SURF features in weighted fused feature vector has also been illustrated. Sequential processing of the feature points by sequence matching technique exhibits the importance of key points selected for discriminating purpose.
The rest of the correspondence has been done as follows. In Section-2, an overview of the related work on both the challenges (i.e. pose and occlusion) has been carried out. Proposed method is detailed in Section-3. In Section-4, an experimental result and discussion are done. Conclusion and future scope of the present work have been described in Section-5.
2. RELATED WORK In this section, authors have demonstrated some recent and relevant research activities that have been focused on restoration of the occluded region especially from Bosphorus database. In (Bagchi et al., 2014), authors have also proposed pose and occlusion invariant 3D face recognition from range image with the success rate of 91.30%. In their literature, authors have used ICP based face registration and PCA based face restoration mechanisms. Then, these restored face images have been classified by regular points. Wang et al., 2014 proposed expression invariant face recognition mechanism which is accomplished by registering small pose variations and processing the facial expression by sparse representation. Then, LDA-based feature is extracted from dual-tree complex wavelet transform to be classified by nearest neighbor classifier. They have tested their algorithm on Bosphorus and FRGC database and achieve 98.86% and 95.03% recognition rate respectively. Li et al., 2014 used a principal curvature to localize the 3D key points where the values are high along with Histogram of Gradient (HOG), the Histogram of Shape index (HOS), and the Histogram of Gradient of Shape index (HOGS). Their proposed recognition rate for Bosphorus database is 96.56, 98.82, 91.14, and 99.21% of the entire database, and the expression, pose, and occlusion subsets respectively. Authors have also tested their algorithm on FRGC v2.0 database. In (Salahshoor et al., 2012), authors have described a dynamic mask either in elliptical or circular shape, which is used to crop across nose tip. Here, authors have used 3D eigenface, Gabor filters with five scales and eight orientation and PCA based feature extraction mechanism that are classified by nearest neighbor classifier. Authors have mainly focused to propose expression invariant face recognition algorithm. Maximum recognition result that has been highlighted here is 85.36%. In (Maes et al., 2010), authors have claimed 97.7% correct identification rate from frontal face images and 93.7% for all image from Bosphorus database. They have used RANSAC for pose normalization and SIFT (meshSIFT) for feature extraction purpose. In the case of Frav3D face database, Ganguly et al., 2015a, also proposed wavelet and decision fusion based face recognition system for Frav3D database with 96.25% recognition rate. Here, authors have fused the sub-images coming from Haar wavelet transform. The varieties in recognition rate from various classifiers (ANN with different parameter setup and K-NN) are also fused using majority voting technique. Finally, Wilcoxon signed-rank test have been implemented. Bornak et al., 2010 also proposed expression invariant recognition rate with 95.8% from Frav3D database. Here, authors have only focused on nose region. Hajati, F., (2010) also proposed Geodesic Texture Warping mechanism for pose invariant 3D face recognition using 2.5D range face image. They have also applied their mechanism on Frav3D database and achieved 90.3% recognition rate. Belghini et al. (2012) applied Gaussian Hermite Moments (GHM) on
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
4 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
global as well as local depth information. Back propagation neural network has classified the GHM features and maximum 95% recognition rate is achieved.
3. PROPOSED MECHANISM The proposed mechanism involves number of modules, such as: creation of range face images from 3D images, face registration, occlusion removal and restoration, SIFT and SURF based feature extraction (and their hybridization), K-NN and sequence matching technique based recognition. These various modules can be categorized into four groups, namely: (1) Input 3D Face Images, (2) Face Image Normalization, (3) Feature Estimation and (4) Recognition. In Figure 2, the relation and data flow among these modules have been depicted. As described earlier in the introduction section, author’s contribution relies on two modules, like face image normalization and feature estimation. Creation of 2.5D range face image from input 3D face image and its effect has already been established by Ganguly et al., 2014a; Ganguly et al., 2015b. Depth values in X-Y plane from the input 3D image is accomplished in range image. Particularly in the range image, the values are scaled into the range 0 to 255. Thus, it is having 256 different variations of depth that is useful for describing more surface feature than intensity data. Again, K-NN (Hart, 1968), ANN are also a traditional machine learning (Hassanien et al., 2014) tool that are often used as supervised classifier. SIFT and SURF are also popular for estimating local features. Hence, the contribution in these two sections implies some significant role, such as: • • •
Any unconstrained range face image, either rotated or occluded, can be input to the system. Using this proposed mechanism, these images will be normalized into frontal or near frontal range face images; Due to sufficient face normalization process, the recognition rate will also be increased compared to unregistered and occluded face images; There is a property that have been claimed by the authors (Ganguly et al., 2014b) for range face images that maximum depth value is always been preserved by pronasal (or nose tip) that are considered to be the closest point to the scanner. After, successful face restoration (as well as registration) (Araki, T. et al., 2015), this property is also preserved which had been suppressed by the occlusion (and rotation). Among four variations in occlusion from Bosphorus database (Savran et al., 2008); the occluded faces with glass are not detected by the authors. However, remaining three different occluded faces and their restoration preserves this property. This phenomenon is shown in Figure 3 with a randomly selected subject’s face that is occluded by hair;
Figure 2. Explanation of the proposed methodology
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 5
Figure 3. Maximum depth value is preserved by pronasal
•
•
This research work is concerned with robustness of the face recognition in the unconstrained environment. Therefore, other than face image normalization (i.e. pose registration and restoration), the range face images have not been further processed by any smoothing techniques. Due to the scanning variations, there may be holes, spike and others on face images that have been considered during this investigation phase. The effects of various linear and non-linear filters on range face image have already been discussed in (Ganguly et al., 2015d); Feature estimation done individually from SIFT and SURF that have been implemented on Shape Index (SI) (Cantzler et al. 2001) derived from 3D face images.
3.1. Face Image Normalization In this literature, face normalization is the process where unregistered and occluded input faces will be transformed to frontal or near frontal images. Hence, the recognizing process will be more efficient in terms of recognition rate. This section is again categorized into two sub-sections. In one sub-section, the rotated faces have been registered by proposed ERFI model, and in the second section authors have described various approaches to restoring the occluded face region(s).
3.1.1. Pose Registration Pose registration is crucial for face recognition. Due to pose variations, some of the facial regions are suppressed permanently which is a disadvantage towards robust recognition. In the case of
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
6 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
2D face images, the presence of intensity values makes it impossible to register the face images. However, in 3D image the availability of depth data i.e. Z data in X, Y plane is used to rotate it. In the case of ERFI (i.e. Energy Range Face Image) (Ganguly et al., 2014b) model based face registration mechanism, authors have aimed to rotate the unregistered face image by corresponding point based technology. As the name suggests, ERFI model is an average face model that preserves the overall depth data from all possible frontal range face image of the same subject whose face will be registered. Other than face registration, this new face model could be used for face restoration purpose also. In this model, a frontal face image with facial expressions has also been considered in ERFI model. Thus, face deformation due to expression has been overlooked (or minimized), and one of the crucial facial key-point i.e. ‘pronasal’ is unaltered. Hence, it is also invariant of facial expression. Therefore, this property has been utilized to register face images by corresponding points (i.e. nose tip or ‘pronasal’). A detail description of this model based face registration on Frav3D database is already done by the authors (Ganguly et al., 2014b). This article is the extended work of this investigation. The same has also been applied to Bosphorus database and achieved the registration accuracy of 75.23%. A detailed statistical analysis of ERFI model based registration is done in Table 3.
3.1.2. Face Restoration Face restoration is also a complex task. It also involves some sequential processes, namely: occlusion detection, occlusion localization, occlusion removal and restoration of missing parts. In (Ganguly et al., 2015c), authors have already proposed depth based occlusion detection and localization methodology. After the successful implementation of this algorithm for localizing of the occluded region on range face image, the corresponding region is excluded from the original, and remaining region is processed further for occlusion restoration and recognition purpose. In Figure 4, the excluded region from occluded faces is presented. Here, randomly selected face subjects’ occluded faces have been shown from Bosphorus database. Now, it is required to restore back the depth values for recognizing them. In this connection, authors have considered two reference face images (or models) and GPCA method. The significance of these techniques has also discussed in section 4 in terms of the restoration accuracy. After the successful exclusion of the occluded region, it is required to restore back the original Figure 4. Removal occluded regions from range face image
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 7
depth values in those regions. However, restoration of depth values is also an important as well as a challenging task. During this investigation time for aiming robust recognition algorithm, authors have considered three different approaches, namely: ERFI model, eigenfaces, and GPCA. In the following sub-sections, these approaches have been discussed.
3.1.2.1. Using ERFI Model As discussed earlier in this section, ERFI model preserve the average depth data from the set of range images. Hence, restoring the missing values with ERFI model would produce the average depth data in occluded region(s) that might be comparable to non-occluded face images. In Figure 5(b), the restored images using the ERFI model (shown in Figure 5(a)) has been displayed.
3.1.2.2. Using Eigenface Eigenface (Sirovich et al., 1987; Turk et al., 1991) is related with eigenvectors when it is considered for image processing, analyzing or more specifically for computer vision. Eigenvectors are the linear transformation of a 2D matrix that is invariant of any transformation and derived from the covariance matrix. The eigenface(s) from different eigenvalues define a core set(s) of all considered face image. This characteristic inspires the authors to accomplish it during the restoration phase. From the set of eigenvalues, it is already known that eigenvector from the maximum eigenvalue forms more accurate primary set than other remaining eigenvalues. However, authors have considered this primary face image with the maximum eigenvalue for reconstructing the occluded region. The restored range face images from this mechanism are shown in Figure 6.
3.1.2.3. Using GPCA GappyPCA or GPCA (Colombo et al., 2009) is already well-established technique for restoring the occluded region. Here also authors have considered GPCA to restore the depth values for restoring occluded regions. It is a variation in PCA which follows the Equations 1 to 3: Figure 5. Restored range face images using ERFI
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
8 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
Figure 6. Restored range face images from eigenface
(
)
R = abs M + (VR ×C R ) Here,
(1)
R is the restored image, M is the mean training image, VR is the reconstructed
eigenvectors of training images and C R are coefficients. The reconstructed eigenvectos and coefficients can be formulated using Equations 2 and 3:
VR =
1 C × V × abs ( A )
C R =C T ×VR
(2)
(3)
where, C is the center image, V is eigenvectors and A is the eigenvalues of training images. In Figure 7, the restored images of Figure 3(b) using GPCA is presented. Now, from various face restoration mechanisms, it is required to compare the restoration performance among them for establishing the superiority of restoration algorithms. In this stage, creation of synthesized dataset is followed by maximum accuracies of the registration and restoration algorithms.
3.2. Feature Estimation It is nevertheless to say that feature estimation is a crucial stage to recognize the objects (especially face image) correctly. The key attributes for recognizing the faces can be estimated either from a holistic approach or feature based approach. Here, authors have followed the feature-based technique where SIFT and SURF have been applied on Shape Index (described in Equation 4) face space: Figure 7. Restored range face images from GPCA
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 9
SI R =
( P + Pmin ) −2 arctan max ( P − P ) π max min
(4)
where, Pmax and Pmin are the maximum and minimum principal curvatures. Curvatures from face image are calculated in second order derivative domain which utilizes the X, Z position for each Z (i.e. depth values or range image representation). Moreover, principal curvatures (that are perpendicular to each other) are the eigenvalues of the Weingarten matrix. In general, SI ranges between -1 to 1 that eventually describes various geometric planes (explained in Table 1), like Spherical Cup, Rut, Saddle Rut, Saddle, Saddle Ridge, Ridge, and Dome. Hence, from the SI local characteristics from the face image is preserved that are proved to be more significant for SIFT and SURF based feature extraction purpose. In the first stage of feature estimation process, it is required to extract MRoI (i.e. Minimum Region of Interest). The significance of MRoI over RoI is that, the processed range image is the RoI of input 3D face images. Now, from RoI, Minimum RoI is extracted such that all the extreme landmarks, like exocanthion (left and right), cheilion (left and right), glabella, etc. are preserved in MRoI. In Figure 8, this phenomenon for extracting MRoI is described. In Figure 8, depth map of corresponding input 3D face image that is RoI, is extracted to process MRoI further. The process of MRoI extraction has only been implemented on Frav3D dataset. In Figure 8, the phenomenon is described by a randomly selected subject that has been chosen from the database. For the Bosphorus database, the points collected from 3D face image which are not equal to −1000000000.000000 are only processed for creating the range face image. These points illustrate the outliers of the face image i.e. neck region, etc. that have been Table 1. Description of SI range Class
SI
Spherical Cup
[-1,-0.625)
Rut
[-0.625,-0.375)
Saddle Rut
[-0.375,-0.125)
Saddle
[-0.125,0.125)
Saddle Ridge
[0.125,0.375)
Ridge
[0.375,0.625)
Dome
[0.625,1)
Figure 8. Extraction MRoI for feature estimation purpose
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
10 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
discarded by such empirical selection. Therefore, the range face images that have been set up are already MRoI. Now, on extracted range face image, authors have implemented SIFT (Lowe, 2004) and SURF (Bay et al., http://www.vision.ee.ethz.ch/~surf/eccv06.pdf) based feature extraction mechanism. Rather than implementing this technique on range face image, authors have computed SI that is used to preserve detailed local description about face image.
3.2.1. Using SIFT In general SIFT feature (Zuchun, 2013) (or descriptor, the keypoints that can describe the input images) is used in gray profile where local gradient directions are accomplished. As shown in Figure 6, range face images are having gray like profile with normalized depth values, which are between 0 and 255. Here, authors have considered implementing SIFT on SI that are derived from range face images. In this mechanism, the key points are extracted on scale space extremes using DoG (Difference of Gaussian) operation in DoG pyramid. In this research work, different scales DoG are grouped into 3 octaves. The detected SIFT key-points on SI is depicted in Table 2. In the table, randomly selected subject’s neutral range face image is used to describe the implementation of SIFT descriptor.
3.2.2. Using SURF SURF feature is primarily inspired by SIFT descriptor (Bay et al., 2008) and also faster than it. Hessian matrix is the primary of this descriptor. SURF feature also preserves the local curve points of face images by second order partial derivatives. Therefore, a point on range face image, it preserves its curve rather than as a continuous curve. Second order derivative also implies gradient of the gradient, so SURF feature is in-plane scale and rotation invariant, and it has inspired the authors to consider these keypoints for recognition purpose. During the application of SURF feature, scale-spaces are applied on pyramid image where each image in the pyramid is processed. Hence, scale-space is divided into number of octaves. In this research work, authors have followed the octave size of 3 with a scale level 4. The detected SURF key-point from SI is depicted in Table 3. Here also, randomly selected subject’s neutral range face image (as shown in Table 1) is used to describe the implementation of SURF descriptor.
3.2.3. Using Weighted Fusion Individual experiments using SIFT and SURF from various sets of range face images exhibits the efficiency of the algorithm in terms of recognition rate. Henceforth, authors have also tried to explore more feature level computation using ‘weighted fusion’ technique. In the weighted fusion approach, authors have implemented ratio based mechanism where 40-60 (and 60-40) weight-age from SIFT - SURF combination. Either, SURF-SIFT or SIFT-SURF combination does not provide any significant improvement in recognition rate. In this mechanism, 40% (or 60%) from SIFT key points’ and 60% (or 40%) from SURF critical points’ magnitude values are added to form new weighted feature vectors. However, the visualization of the weighted fusion has been explained in Figure 9. Hence, the observation from four different feature vectors (like: SIFT feature vector, SURF feature vector, weighted fusion 40-60 vector and 60-40 vector) on 5 various sub datasets (namely: synthesized, frontal, rotated, either occluded or illuminated and original databases) exhibits the robustness of the recognition algorithm. The success rate from these various experiments is summarized in Table 3 of section 7.
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 11
Table 2. Detected key points from SIFT descriptor
3.2.4. Recognition In this stage, the feature vectors are classified by K-NN and sequence matching technique. A feature vector by accumulating top 80 key points from both the feature extraction mechanism is used for this investigation purpose. This selection of feature vector is empirical. Before, classifying the face images based on features, single fold cross validation is made by grouping the elements into two sets namely training set (denoted as Tr) and testing set (named as Ts). ‘ Tr’ contains feature vector with odd number of face images, whereas feature vector from even set of images are included in ‘ Ts.’ K-NN (often useful as SVM) (Parveen, 2006) is well established and classical supervised learning method which is used for recognition purpose. During this research work, Euclidian distance is considered in K-NN. This metric is the simplest form of the Pythagoras formula. Hence, it would be useful for the classification task in feature space. Along with this, authors have also used a novel classifier i.e. sequence matching technique
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
12 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
Table 3. Detected key points from SURF descriptor
Figure 9. Weighted fusion technique
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 13
(Dezhong et al., 2008) for the classification task. During the classification task using sequence matching method, the comparison function is modified as shown in Equation 5:
(
)
(
)
(
)
f Tr (i ),Ts ( j ) = Max max (Tr ), max (Ts ) − abs Tr (i ) − Ts ( j )
(5)
and:
S (Tr , Ts ) = f (Tr ( i ) , Ts ( j ) )
(6)
where, i=1 to n i.e. length of ‘Tr’ and j=1 to m i.e. ‘Ts’. Here, in each iteration of Ts ( j = 1 : m ) , it will match its sequences from Tr (i = 1 : n ) . The subject from training set having maximum zeros (or less than 1.056) of ' f ' for corresponding features in testing feature vector with known class label is recognized as the member of the class. Hence, the unknown face is recognized.
4. DISCUSSION OF EXPERIMENTAL RESULT Here, authors have described a detailed analysis of the experiment on various aspects of face recognition. Therefore, different approaches for implementing suitable feature vector for different subsets of range face images are carried out on two traditional and well-established datasets such as FRAV3D and Bosphorus.
4.1. Database Description Description of database plays another important role in establishing the algorithms. For accomplishing the goal of this research work for proposing novel face recognition algorithm which is invariant of pose, expression, illumination, as well as occlusion these two popular and wellaccepted databases have been selected. Bosphorus database contains approximately 14 numbers of 3D face images which are rotated along yaw, pitch, and roll. Other than pose variations, there are approximately 34 face images which are having various facial actions (Savran et al., 2008). Facial actions include various expression of lower as well as upper facial unit by facial properties, like nose, eyebrow, lip, eye, jaw, cheek, and so on. In addition, in this database, there are four different types of occluded face images. In Figure 10, these variations have been presented using 2D image representation. Furthermore, Frav3D database (Conde et al., 2006) also contains face images with expression, pose, and illumination variations. Although, 3D face images are free from illumination variations, it has been considered for accomplishing all the possible challenges. In Figure 11, the variations of Frav3D database is also described in two-dimensional approaches.
4.2. Registration and Restoration Accuracy The registration accuracy has already been described by the authors in (Ganguly et al., 2014b) in terms of distance computed by Manhattan distances metric between the position of the pronasal landmarks from reference face image (i.e. ERFI model) and registered image. Other than this technique, authors have implemented total error (TE) and RMS (Root-Mean-Square) error between registered and reference face image. The accuracy of the registration is made in terms of error
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
14 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
Figure 10. Description of Bosphorus database
Figure 11. Description of the Frav3D database
made in this process. In Table 4, the registration accuracies from the techniques are summarized. The equations followed for total error and RMS is shown in Equations 7 and 8 respectively: x
y
TE = ∑∑ Rg (i, j ) − Rr (i, j )
(7)
i =1 j =1
where, TE is the total error; Rg denotes registered image and Rr is the reference image. x and y are the size of the row and column.
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 15
Table 4. Registration accuracy from ERFI model Database
Total Error
RMS Error
Frav3D
1.6684
13.3401
Bosphorus
3.8667
53.0167
RMS =
2 1 x y Rg ( i, j ) − Rr ( i, j ) ∑∑ x × y i =1 j =1
(8)
Other than error distance measurement between the landmarks, TE and RMS are used to determine the average deviation of depth values between registered and ERFI model. This deviation emphasizes on the accuracy of re-positioned depth values with respect to the model in 2D grid. In the case of restoration, the accuracy is also noted in terms of error, i.e. difference of restored and a reference image, by computing entropy (Shannon, 1951), shown in Equation 9. The mathematical representation of face restoration error is explained in Equation 10. The summarization of registration error from various approaches is described in Table 5. To accomplish the overall information present in the image, entropy is calculated. Other than, entropy calculation, a simple difference map would be enough for measuring the error. However, due to different mathematical computation for restoration, the numeric data for face restoration is not exactly same as a reference model. Hence, a simple difference map failed. Finally, entropy-based analysis is carried out based on probability of occurrence of particular depth values between ERFI model and restored image: E = −∑p× log 2 (p )
(9)
where, E is the entropy and p is the probability of particular depth value.
RE = EM − ERs
(10)
where, RE is the registration error; E M , E Rs represent entropy of model and restored image respectively. From Table 5, it is evident that ERFI model based face restoration has minimum restoration error. Therefore, the synthesized dataset consists of ERFI model based registered and restored face images. Other than this quantitative analysis, authors have also done a qualitative analysis Table 5. Summarization of accuracies of various occlusion restoration techniques Method
Restoration Error
Reference Image
ERFI
0.01876
ERFI model
eigenface
1.667
Eigenvector of maximum eigenvalue
GPCA
0.0256
Neutral image
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
16 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
that define how much the registered and restored face image look like reference face image by human perception. In Table 6, rating scale that has been considered for this analysis is described. Considering Table 5, the ERFI model based face registration and restoration is accomplished by ‘Fine’ rating, whereas face restoration using eigenface belonging to ‘Marginal’ rating and ‘Fine’ rating is achieved by GPCA.
4.3. Recognition Accuracy Recognition on synthesized dataset that already consists of registered and restored face images by ERFI model is used to measure the success rate of the algorithm. Moreover, authors have sub-divided these databases into several three sets as shown in Figures 11 and 12 for Bosphorus and Frav3D databases respectively. Furthermore, authors have considered the original dataset for overall face recognition performance measurement from 3D face images in the presentence of occlusion, pose, illumination and expression. In Table 7, authors have compiled these varieties of experiments. Recognizing the faces under number of challenges as well as on synthesized dataset actually describes the robustness of the algorithm. In addition, all these investigations have been repeated for two databases where two classifiers have been implemented for four feature sets. Set of numerical values in the table represents various outcomes of the recognition rates from different combinations, described earlier.
5. CONCLUSION AND FUTURE SCOPE Here, authors have implemented Shape Index (SI) from range face image that are used to accumulate local shape deformation from depth values and later more detailed information is also accomplished using SIFT and SURF feature extraction mechanism. Furthermore, the proposed algorithm has been validated by two databases that contain most of the challenging issues, such as pose, expression, occlusion, as well as illumination. Moreover, a synthesized face dataset have been created from during the investigation process. In addition, the proposed mechanism is tested for different sub-groups. Implementing sequence is matching technique also highlights the significance of the selection of the features. Individual analysis of face registration and recognition also highlights their impact on robust face recognition measurement. In this correspondence, it has been examined that, although feature selection from SIFT operator actually exhibits better performance than SURF, weighted fusion of these two local features is the best among three i.e. SIFT, SURF, and fused one. Hence, it can be concluded that weighted fusion vector based face recognition scheme implies promising recognition rate under various sub-categorized dataset i.e. face images with expression, pose, illumination, as Table 6. Rating scale for qualitative measurement Value
Rating
Description
1
Excellent
The processed image is same as a reference image
2
Fine
The processed image is close to the original image
3
Marginal
The processed image is considerable
4
Poor
It is not so effective
5
Unusable
Nothing is done
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 17
Table 7. Analysis of recognition rates from the proposed mechanism Details Dataset
Classifier
Recognition Rate (%) Weighted Fusion 40-60 Vector
Weighted Fusion 60-40 Vector
SIFT
SURF
Frav3D Database Synthesized
K-NN
98.57
98.02
97.71
96.17
Illuminated
K-NN
99.17
99.14
97.01
97.1
Rotated
K-NN
92.22
91.0
90.24
90.01
Original
K-NN
90.16
89.55
90.22
89.21
Frontal
K-NN
96.34
97.43
96.44
96.34
Synthesized
Sequence Matching Technique
96.21
96.21
96.21
94.89
Illuminated
Sequence Matching Technique
95.89
95.80
94.19
92.9
Rotated
Sequence Matching Technique
89.29
88.21
86.87
84.79
Original
Sequence Matching Technique
87.33
87.33
88.14
85.1
Frontal
Sequence Matching Technique
94.57
94.17
93.98
90.08
Synthesized
K-NN
98.78
98.81
96.55
95.45
Occluded
K-NN
82.66
81.66
79.54
78.94
Original
K-NN
96.01
96.42
92.49
91.31
Rotated
K-NN
80.60
80.05
77.09
76.01
Frontal
K-NN
98.88
98.8
96
97.33
Synthesized
Sequence Matching Technique
96.01
95.8
97.07
96.17
Occluded
Sequence Matching Technique
79.9
80.5
75.87
74.87
Original
Sequence Matching Technique
85.04
84
83.11
84.65
Rotated
Sequence Matching Technique
78.14
77.49
75.33
75.3
Frontal
Sequence Matching Technique
95.7
96.89
94.57
94.79
Bosphorus Database
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
18 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
well as occlusion from two databases. In addition, the occluded faces with glass are not correctly detected and restored back. In some cases, ERFI model based face registration mechanism also fails to register the faces which are rotated at extreme pose. In future, authors have aimed to overcome these issues by proposing more robust and accurate algorithm for implementing a real-time face recognition system.
ACKNOWLEDGMENT The authors are thankful for a project supported by DeitY (Letter No.: 12(12)/2012-ESD), MCIT, Govt. of India, at Department of Computer Science and Engineering, Jadavpur University, India for providing the necessary infrastructure for this work.
REFERENCES Araki, T., Ikeda, N., Dey, N., Chakraborty, S., Saba, L., Kumar, D., & Suri, J. S. et al. (2015). A comparative approach to four different image registration techniques for quantitative assessment of coronary artery calcium lesions using intravascular ultrasound. Computer Methods and Programs in Biomedicine, 118(2), 158–172. doi:10.1016/j.cmpb.2014.11.006 PMID:25523233 Bagchi, P., Bhattacharjee, D., & Nasipuri, M. (2014). Robust 3D Face Recognition In Presence of Pose And Partial Occlusions or Missing Parts. In International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.4, pp. 21-35, DOI: doi:10.5121/ijfcst.2014.4402 Baya, H., Essa, A., Tuytelaarsb, T., & Gool, L. V. (2008). Speeded Up Robust Features. Computer Vision and Image Understanding, 110(3), 346–359. doi:10.1016/j.cviu.2007.09.014 Bhatia, N., Vandana, (2010). Survey of Nearest Neighbor Techniques. IJCSIS vol. 8, No. 2, pp. 302-305. Bornak, B., Rafiei, S., Sarikhani, A., & Babaei, A. (2010). 3D Face Recognition by Used Region-Based with Facial Expression Variation. In 2nd International Conference on Signal Processing Systems (ICSPS), pp. V1-710 - V1-713, DOI: doi:10.1109/ICSPS.2010.5555402 Cantzler, H., & Fisher, R. B. (2001). Comparison of HK and SC curvature description methods. In proceedings of Third International Conference on 3-D Digital Imaging and Modeling, pp. 285-291, DOI: doi:10.1109/IM.2001.924458 Colombo, A., Cusano, C., & Schettini, R. (2009). Gappy PCA Classification for Occlusion Tolerant 3D Face Detection. Journal of Mathematical Imaging and Vision, 35(3), 193–207. doi:10.1007/s10851-009-0165-y Conde, C., Serrano, A., & Cabello, E. (2006). Multimodal 2D, 2.5D & 3D Face Verification. In IEEE International Conference on Image Processing, pp 2061-2064, DOI: doi:10.1109/ICIP.2006.312863 Dezhong, Z., & Fayi, C. (2008). Face Recognition based on Wavelet Transform and Image Comparison. In International Symposium on Computational Intelligence and Design, pp. 24-2 doi:10.1109/ISCID.2008.42 Ganguly, S., Bhattacharjee, D., & Nasipuri, M. (2014a). 2.5D Face Images: Acquisition, Processing, and Application. In Computer Networks and Security, pp. 36-44, International Conference on Communication and Computing (ICC 2014), ISBN: 9789351072447. Ganguly, S., Bhattacharjee, D., & Nasipuri, M. (2014b). Range Face Image Registration Using ERFI from 3D Images. In proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA), Advances in Intelligent and Soft Computing, pp. 323-333, DOI: doi:10.1007/978-3-319-12012-6_36 Ganguly, S., Bhattacharjee, D., & Nasipuri, M. (2015a). (in press). Wavelet and Decision Fusion Based 3D Face Recognition from Range Image. In Int. J. Applied Pattern Recognition.
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015 19
Ganguly, S., Bhattacharjee, D., & Nasipuri, M. (2015b). (in press). 3D Image Acquisition and Analysis of Range Face Images for Registration and Recognition. In book of Emerging Perspectives in Intelligent Pattern Recognition. Analysis, and Image Processing, IGI Global, 2015. Ganguly, S., Bhattacharjee, D., & Nasipuri, M. (2015c). Depth based Occlusion Detection and Localization from 3D Face Image. In International Journal of Image, Graphics and Signal Processing, vol. 7, issue 5, pp. 20-31, MECS publisher. Ganguly, S., Bhattacharjee, D., & Nasipuri, M. (2015d). Automatic Analysis of Smoothing Techniques by Simulation Model Based Real-Time System for Processing 3D Human Faces. International Journal of Embedded Systems and Applications, 4(4), 13–23. doi:10.5121/ijesa.2014.4402 Hajati, F., Raie, A. A., & Gao, Y. (2010). Pose-Invariant 2.5D Face Recognition using Geodesic Texture Warping. 11th International Conference Control, Automation, Robotics and Vision, pp. 1837-1841, DOI: 978-1-4244-7815-6. doi:10.1109/ICARCV.2010.5707848 Hart, P. E. (1968). The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory, 18(3), 515–516. doi:10.1109/TIT.1968.1054155 Hassanien, A. E., Tolba, M., & Azar, A. T. (2014). Advanced Machine Learning Technologies and Applications: Second International Conference, AMLTA 2014, Cairo, Egypt, November 28-30, 2014. Proceedings, Communications in Computer and Information Science, Vol. 488, Springer-Verlag GmbH Berlin/Heidelberg. ISBN: 978-3-319-13460-4 doi:10.1007/978-3-319-13461-1 Lenc, L., & Král, P. (2013). A Combined SIFT/SURF Descriptor for Automatic Face Recognition. In Sixth International Conference on Machine Vision (ICMV 2013), pp. 90672C-1 to 90672C-6, DOI: doi:10.1117/12.2052804 Li, H., Huang, D., Morvan, J. M., Wang, Y., & Chen, L. (2014). Towards 3D Face Recognition in the Real: A Registration-Free Approach Using Fine-Grained Matching of 3D Keypoint Descriptors. In Int J Comput Vis, pp. 1-15, DOI doi:10.1007/s11263-014-0785-6 Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. doi:10.1023/B:VISI.0000029664.99615.94 Maes, C., Fabry, T., Keustermans, J., Smeets, D., Suetens, P., & Vandermeulen, D. (2010). Feature Detection on 3D Face Surfaces for Pose Normalisation and Recognition. In Fourth IEEE International Conference on Biometrics: Theory Applications and Systems (BTAS), pp. 1-6, DOI: doi:10.1109/BTAS.2010.5634543 Nandi, S., Roy, S., Dansana, J., Karaa, W. B. A., Ray, R., Chowdhury, S. R., Dey, N. (2014). Cellular Automata based Encrypted ECGhash Code Generation: An Application in Inter human Biometric Authentication System. In I.J. Computer Network and Information Security, Vol. 6, No. 11, pp.1-12. Parveen, P. (2006). Face Recognition Using Multiple Classifiers. In 18th IEEE International Conference on Tools with Artificial Intelligence, pp. 179 - 186, DOI: doi:10.1109/ICTAI.2006.59 Pears, N., Liu, Y., Bunting, P., 3D Imaging, Analysis and Applications. pp. 347-348. Prescribed Weingarten Curvature Equations. URL: http://www.math.zju.edu.cn/swm/ STW%28Weingarten101208%29.pdf Salahshoor, S., & Faez, K. (2012). 3D Face Recognition Using an Expression Insensitive Dynamic Mask. in ICISP. LNCS, 7340, 253–260. Savran, A., Alyüz, N., Dibeklioğlu, H., Çeliktutan, O., & Gökberk, B. (2008). Bosphorus Database for 3D Face Analysis. In BIOID 2008, LNCS 5372, pp. 47–56. Scheenstra, A., & Ruifrok, A. Veltkamp, et al., (2005). A Survey of 3D Face Recognition Methods. In Audio- and Video-Based Biometric Person Authentication, Lecture Notes in Computer Science, Vol. 3546, pp 891-899.
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
20 International Journal of System Dynamics Applications, 4(2), 1-20, April-June 2015
Shannon, C. E. (1951). Prediction and Entropy of Printed English. The Bell System Technical Journal, 30(1), 50–64. doi:10.1002/j.1538-7305.1951.tb01366.x Sirovich, L., & Kirby, M. (1987). Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America, 4(3), 519–524. doi:10.1364/JOSAA.4.000519 PMID:3572578 Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86. doi:10.1162/jocn.1991.3.1.71 PMID:23964806 Wang, X., Ruan, Q., Jin, Y., & An, G. (2014). Three-dimensional face recognition under expression variation. In EURASIP Journal on Image and Video Processing, pp. 1-11. doi:10.1186/1687-5281-2014-51 Weingarten matrix. Retrieved on 12nd Feb, 2015 from http://www.math.zju.edu.cn/swm/ STW%28Weingarten101208%29.pdf Zuchun, D. (2013). An Effective Keypoint Selection Algorithm in SIFT. International Journal of Signal Processing. Image Processing and Pattern Recognition, 6(2), 155–164.
Suranjan Ganguly received the MTech (Computer Technology) degree from Jadavpur University, India, in 2014. He completed B-Tech (Information Technology) in 2011. His research interest includes image processing, pattern recognition. He was a project fellow of UGC, Govt. of India, where he sponsored major research project at Jadavpur University. Currently, he is a project fellow of DietY (Govt. of India, MCIT) funded research project at Jadavpur University. Debotosh Bhattacharjee received the MCSE and PhD (Eng.) degrees from Jadavpur University, India, in 1997 and 2004 respectively. He was associated with different institutes in various capacities until March 2007. After that he joined his Alma Mater, Jadavpur University. His research interests pertain to the applications of computational intelligence techniques like Fuzzy logic, Artificial Neural Network, Genetic Algorithm, Rough Set Theory, etc. in Face Recognition, OCR, and Information Security. He is a life member of Indian Society for Technical Education (ISTE, New Delhi), Indian Unit for Pattern Recognition and Artificial Intelligence (IUPRAI) and a senior member of IEEE (USA). Mita Nasipuri received her BETelE, METelE, and PhD (Engg.) degrees from Jadavpur University, in 1979, 1981 and 1990, respectively. Prof. Nasipuri has been a faculty member of J.U since 1987. Her current research interest includes image processing, pattern recognition, and multimedia systems. She is a senior member of the IEEE, U.S.A., Fellow of I.E. (India) and W.B.A.S.T, Kolkata, India.
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.