ARTICLE International Journal of Advanced Robotic Systems
Face Recognition Algorithm for Photographs and Viewed Sketch Matching Using Score-Level Fusion Regular Paper
So Ra Cho, Gi Pyo Nam and Kang Ryoung Park* Division of Electronics and Electrical Engineering, Dongguk University, 26, Pil-dong 3-ga, Chung-gu, Seoul 100-715, Republic of Korea * Corresponding author E-mail:
[email protected]
Received 18 Jun 2012; Accepted 10 Jul 2012 DOI: 10.5772/51470 © 2012 Cho et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract For criminal searches, the necessity of matching photographs with sketches is increasing. Previously, matching was performed manually by a human observer, a time‐consuming process whose accuracy can be affected by the level of human expertise. Therefore, we propose a new face recognition algorithm for photographs and sketches. This research is novel in the following three ways. First, to overcome the decrease in matching accuracy due to pose and illumination variation, we use eye alignment and retinex filtering to normalize pose, size and illumination. Second, we compare the performance of various face recognition methods, such as principal component analysis (PCA), local binary pattern (LBP), local non‐negative matrix factorization (LNMF), support vector machine‐discriminant analysis (SVM‐DA) and modified census transform (MCT), for the matching of photographs and viewed sketches. Third, these five face recognition methods are combined on the basis of score‐ level fusion to enhance matching accuracy, thereby overcoming the performance limitations of single face recognition methods. Experimental results using a CUHK dataset showed that the accuracy of the proposed method is better than that of uni‐modal face recognition methods. www.intechopen.com
Keywords Face Recognition, Photographs, Viewed Sketches.
1. Introduction Recently, face recognition systems have been used in many fields, including user identification for access control, border control, computer and mobile devices, and information security. For criminal searches, the necessity of matching photographs with sketches is increasing. Previously, matching was performed manually by a human observer, a time‐consuming process whose accuracy can be affected by the level of human expertness. Accordingly, automatic searching for matches of photographs against sketches is important. Therefore, we propose a new face recognition algorithm for photographs and sketches. Photographs and sketches have many differences, such as shadow, texture and shape. As a result, it is very difficult to match these heterogeneous face images with high accuracy. To overcome these problems, numerous J Adv Robotic Sy, Recognition 2012, Vol. 9,Algorithm 80:2012 So Ra Cho, Gi Pyo Nam and KangIntRyoung Park: Face for Photographs and Viewed Sketch Matching Using Score-Level Fusion
1
methods of matching viewed sketches have been previously studied. Tang et al. proposed a method of transforming photo images into sketches, thereby reducing the difference between these two types of images and enhancing matching accuracy [1]. Lui et al. proposed a face recognition system based on face sketches. They generated pseudo‐sketches based on local linear preserving of the geometry between photo and sketch images using locally linear embedding. For recognition, kernel‐based nonlinear discriminate analysis was used [2]. Tang et al. synthesized face sketches from photos using eigentransformation and matched them using Bayesian classifiers [3]. Wang et al. proposed a face photo‐sketch synthesis and recognition process using a multi‐scale Markov random fields (MRF) model [4]. Most of the previous work used single‐face recognition methods for matching photos and sketches, however, the resulting performance enhancement is limited owing to the heterogeneous characteristics of these two types of face images. To overcome these problems, we propose a new face recognition algorithm of photographs and sketches. This research is novel in the following three ways. First, to overcome the decrease in matching accuracy due to pose and illumination variation, we use eye alignment and retinex filtering to normalize pose, size and illumination. Second, we compare the performance of various face recognition methods, such as principal component analysis (PCA), local binary pattern (LBP), local non‐negative matrix factorization (LNMF), support vector machine‐discriminant analysis (SVM‐DA) and
modified census transform (MCT), for the matching of photographs and viewed sketches. Third, these five face recognition methods are combined based on score‐level fusion to enhance matching accuracy, thereby overcoming the performance limitations of single‐face recognition methods. This paper is organized as follows. Section 2 describes the proposed method. The experimental results and conclusions are presented in sections 3 and 4, respectively. 2. Proposed Method 2.1 Overview of the Proposed Method An overview of the proposed face recognition system is shown in Figure 1. First, the face regions of photographs and viewed sketches are detected by Adaboost face detector. Then, eye alignment and reassignment of the facial areas are performed on the basis of the Adaboost eye detection algorithm. In the next procedure, the facial areas derived from photographs and viewed sketches are normalized in terms of size and illumination, and five sets of facial features are extracted using the five methods of PCA, LBP, SVM‐DA, LNMF and MCT, respectively. Next, five dissimilarities between the features of the photographs and those of the viewed sketches are calculated. Finally, the five calculated dissimilarities are combined on the basis of score‐level fusion and the system searches for the genuine photo in the database on the basis of the combined score.
Figure 1. Overview of the proposed system 2
Int J Adv Robotic Sy, 2012, Vol. 9, 80:2012
www.intechopen.com
2.2 Preprocessing Typically, an adaptive boosting (Adaboost) algorithm is used to detect face regions [5]. It consists of multiple Haar‐like weak classifiers used to form a stronger classifier. The following figure illustrates face region detection using the Adaboost method.
(a)
(b) Figure 4. Examples of face regions after size normalization. (a) Photographs. (b) Viewed sketches
(a) (b)
(c) (d) Figure 2. Examples of face regions detected using Adaboost algorithm. (a), (c) Original photo and corresponding sketch image. (b), (d) Detected face regions
Conventional face recognition systems are very sensitive to pose and illumination variation. To overcome these problems, the proposed system uses eye alignment and retinex filtering to normalize pose and illumination, respectively. In order to normalize pose, the two eye regions are detected using the Adaboost algorithm in the predefined area inside the located facial region. On the basis of the line that passes through both eye centres, the yaw angle of the face image can be calculated and the face image is rotated until the yaw angle becomes 0, as shown in Figure 3 (b). Then, the face region is redefined as follows. If the length between the two eyes is calculated as l, the face region is reassigned such that the width and height become 2l and 2l, respectively, as shown in Figure 3 (b). This method compensates for the size variations of the facial area.
(a) (b)
Figure 3. Example of pose normalization. (a) Original image. (b) Normalized image www.intechopen.com
Owing to differences in the Z‐distance between the face and camera, the pose‐normalized face regions exhibit individual variations in size. Therefore, the size of the pose‐normalized face region is further normalized to 32 × 32 pixels, as shown in Figure 4. In general, photos and sketches present differences in visual appearance, such as shadow, texture and shape, as shown in Figure 4 (a) and (b). Furthermore, illumination variations can exist across the face region. All of these factors can reduce face recognition accuracy. Therefore, the differences of visual appearance and illumination variations of the face region are normalized using a retinex filter, as illustrated in Figure 5. The optimal sigma value of the retinex filter is experimentally determined by considering the accuracy of face recognition.
(a)
(b)
Figure 5. Example of face regions after illumination normalization by using retinex filtering. (a) Result images of Figure 4 (a). (b) Result images of Figure 4 (b)
2.3 Feature Extraction For matching of photographs and viewed sketches, the features of the detected face region are extracted using PCA, LBP, LNMF, SVM‐DA and MCT. PCA is a global method for representing facial features as eigen‐coefficients, which are calculated from trained eigenfaces (eigen‐vectors). This method has been widely used for face recognition [7, 8]. Through experiments,
So Ra Cho, Gi Pyo Nam and Kang Ryoung Park: Face Recognition Algorithm for Photographs and Viewed Sketch Matching Using Score-Level Fusion
3
1,024 eigenfaces and the corresponding 1,024 eigen‐ coefficients are extracted as facial features by PCA.
LBP is generally used to extract facial features locally [9], because it is more robust to illumination variations than PCA. We use an LBP kernel of 3 3 pixels [6] and this method compares the centre pixel of the kernel to eight adjacent pixel values. At the position where the LBP kernel is applied, if the grey level of the centre pixel of the 3 3 kernel is less than (or equal to) that of a surrounding pixel, the corresponding surrounding pixel is assigned 1. If not, it is assigned 0. Subsequently, we obtain up to eight binary codes from one position where the LBP kernel is applied. Since the LBP kernel is moved in horizontal and vertical directions by sliding, the total number of binary codes is 7,200 (30 moving steps in the horizontal direction 30 moving steps in the vertical direction 8 binary codes) from the face region of 32 32 pixels.
NMF is a part‐based representation with only additive forms because all the pixel values of the basis and coefficients are non‐negative [10]. LNMF is a revised version of NMF that not only allows a non‐subtractive (part‐based) representation, but also makes localized features distinctive [10]. We use 1,024 bases, and the corresponding 1,024 coefficients, for recognition.
Conventional LDA is based on the assumption that all the data classes share the same density function and have normal distributions. If this assumption is not satisfied, LDA produces incorrect classification. To overcome this problem, SVM‐DA has been proposed. SVM‐DA is an enhanced method of LDA that combines SVM and LDA [11]. We used 105 bases obtained from the 32 × 32 pixel face images and consequent 105 coefficients for recognition.
MCT is similar to LBP, but it uses the average pixel value of the 3 3 kernel instead of the value of the centre pixel of the kernel [12]. We use an MCT kernel of 3 3 pixels. At the position where the MCT kernel is applied, if the average grey level of the 3 3 kernel is less than (or equal to) that of a surrounding pixel, the corresponding surrounding pixel is assigned 1. If not, it is assigned 0. Subsequently, we obtain up to nine binary codes from one position where the MCT kernel is applied. Since the MCT kernel is moved in the horizontal and vertical directions by sliding, the total number of binary codes is 8,100 (30 moving steps in the horizontal direction 30 moving steps in the vertical direction 9 binary codes) from the face region of 32 32 pixels.
used for calculating the dissimilarity since the eigen‐ coefficients are real numbers. Because the features of the LBP and MCT methods are expressed as binary code, Hamming distance (HD) is used to measure the dissimilarity between the binary codes of these two methods. HD counts the average number of unmatched bits between the extracted binary codes of the photo and sketch based on exclusive‐OR operation [13]. The five distance values calculated using PCA, LNMF, SVM‐DA, LBP and MCT are combined using a score‐level fusion method. Since each distance method has a different range of values, the results are normalized into the range of 0–100 using min–max scaling prior to score‐ level fusion. Various fusion rules of MIN, MAX, SUM and PRODUCT [14] are compared. The MIN and the MAX rules select the minimum and maximum score among multiple scores, respectively. The SUM rule obtains the summed value of all the scores as the final dissimilarity. The PRODUCT rule obtains the multiplied value of all the scores as the final dissimilarity. By combining multiple scores of multiple recognition methods, we overcome the limitation of performance enhancement, which is caused by the heterogeneous characteristics of photographs and sketches. 3. Experimental Result To evaluate the proposed method, we selected 505 face images from the AR and CUHK student databases of the CUHK dataset [15], which consists of 212 viewed sketches and 293 photographs from 212 classes, as listed in Table 1. The database also includes images of faces wearing glasses. 253 images (consisting of 106 viewed sketches and 147 photographs) were used for training. The remaining 252 images (consisting of 106 viewed sketches and 146 photographs) were used for testing. Figure 6 provides examples from the databases. Since there is no open database of forensic sketches (sketches that are drawn from witness memory without a photo), in this research we use the open dataset of CUHK, which includes viewed sketches and photographs. Database types AR database CUHK student database Total number of images
2.4 Feature Dissimilarity Calculation and Score‐Level Fusion
Euclidean distance (ED) is used for measuring dissimilarity between the eigen‐coefficients of the enrolled photographs and those of the recognized sketch images. In the PCA, LNMF and SVM‐DA methods, ED is 4
Int J Adv Robotic Sy, 2012, Vol. 9, 80:2012
Photographs
Viewed Sketches
Female
Male
Female
Male
82
93
42
52
34
84
34
84
293
212
Table 1. Composition of databases used for our experiments
www.intechopen.com
(a) (b)
Figure 6. Examples of databases (The images of the 1st and 2nd rows are photographs and corresponding viewed sketches, respectively). (a) CUHK student database. (b) AR database
We measured face recognition accuracy using the PCA, LBP, LNMF, SVM‐DA and MCT methods. The matching results are summarized in Table 2. Top 1 refers to the percentage of cases in which the genuine face is ranked first by the matching algorithm. Since the number of classes is 212, the rank is represented from Top 1 to Top 212.
Rank
PCA
LBP SVM‐DA LNMF
MCT
Top 1
77.358 64.151
43.396
28.302
68.868
Top 5
88.679 84.906
74.528
56.604
86.792
Top 10
93.396 89.623
83.962
67.925
90.566
Top 15
95.283 91.509
88.679
72.642
92.453
Top 20
96.226 92.453
92.453
77.358
94.340
Table 2. Recognition rates for matching photographs to viewed sketches using the testing data set (unit: %)
As shown in Table 2, the highest recognition rate of Top 1 was 77.358%, obtained using the PCA method. The LBP and MCT methods showed similar performance. LNMF showed the lowest accuracy of Top 1 at 28.302%.
In the next test, to enhance the matching accuracy of uni‐ modal recognition, we combined the five matching scores of PCA, LBP, SVM‐DA, LNMF and MCT based on score‐ level fusion as shown in Table 3. The Top 1 accuracy rates of the MAX, MIN and SUM rules were 66.038%, 85.849% and 77.358%, respectively. The accuracy of the PRODUCT rule was the highest at 93.396%. By comparing Tables 2 and 3, we can confirm that the accuracy is considerably enhanced by combining multiple scores.
Rank
MAX rule
MIN rule
SUM rule
PRODUCT rule
Top 1
66.038
85.849
77.358
93.396
Top 5
83.962
89.623
91.509
93.396
Top 10
90.566
92.453
93.396
93.396
Top 15
92.453
95.283
93.396
93.396
Top 20
93.396
95.283
94.340
94.340
Table 3. Score‐level fusion results (unit: %) www.intechopen.com
(a) (b) Figure 7. Examples of the matching cases as Top 1 and Top 16, respectively. (a) Cases of correct matching as Top 1. (b) Case of incorrect matching as Top 16
Figure 7 shows the examples of matched cases with photo images and corresponding sketch ones. Although the two images in Figure 7 (b) were from a same person and were not correctly matched as Top 1, they were matched as Top 16. With additional help from human observers for Top 20 candidates, this problem of matching error can be solved. 4. Conclusion In this paper, we proposed a new face recognition method for matching photographs and viewed sketches in criminal searches. We compared the accuracies of various face recognition methods, including PCA, LBP, LNMF, SVM‐DA and MCT. Among them, the accuracy of PCA was the highest. In addition, we improved the matching accuracy by combining five scores of five face recognition methods using score‐level fusion. In the future we intend to combine the synthesis method of producing sketch images from photos with the proposed score‐level fusion methods, by which performance can be greatly enhanced. 5. Acknowledgments This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2012‐0001982), and in part by the Public Welfare and Safety research program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2011‐0020976), and in part by a grant from the R&D Program(Industrial Strategic Technology Development) funded by the Ministry of Knowledge Economy(MKE), Republic of Korea. Also, The authors are deeply thankful to all interested persons of MKE and KEIT(Korea Evaluation Institute of Industrial Technology) (10040018, Development of 3D Montage Creation and Age‐specific Facial Prediction System).
So Ra Cho, Gi Pyo Nam and Kang Ryoung Park: Face Recognition Algorithm for Photographs and Viewed Sketch Matching Using Score-Level Fusion
5
6. References [1] Tang X, Wang X (2004) Face Sketch Recognition. IEEE Transactions on Circuits and Systems for video Technology. 14: 50‐57. [2] Liu Q, Tang X, Jin H, Lu H, Ma S (2005) A Nonlinear Approach for Face Sketch Synthesis and Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1005‐1010. [3] Tang X, Wang X (2003) Face Sketch Synthesis and Recognition. Proceedings of the 9th IEEE International Conference on Computer Vision. 687‐ 694. [4] Wang X, Tang X (2009) Face Photo‐Sketch Synthesis and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 31:1955‐1967. [5] Viola P, Jones M. J (2004) Robust Real‐Time Face Detection. International Journal of Computer Vision. 57: 137‐154. [6] Nam G. P, Kang B. J, Park K. R (2010) Robustness of Face Recognition to Variations of Illumination on Mobile Devices based on SVM. KSII Transactions on Internet Information Systems. 4: 25‐44. [7] Belhumeur P. N, Hespanha J. P, Kriegman D. J (1997) Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 19:711‐ 720.
6
Int J Adv Robotic Sy, 2012, Vol. 9, 80:2012
[8] Turk M, Pentland A (1991) Eigenfaces for Recognition. Journal of Cognitive Neuroscience. 3:71‐86. [9] Ahonen T, Hadid A, Pietikainen M (2004) Face Recognition with Local Binary Patterns. Proceedings of European Conference on Computer Vision. 3021: 469‐481. [10] Li S. Z, Hou X.W, Zhang H.J, Cheng Q. S (2001) Learning Spatially Localized, Parts‐based Representation. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1:207‐212. [11] Kim S.‐K, Park Y. J, Toh K.‐A, Lee S (2010) SVM‐ based Feature Extraction for Face Recognition. Pattern Recognition. 43: 2871–2881. [12] Froba B, Ernst A (2004) Face Detection with the Modified Census Transform. Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition. 91‐96. [13] Daugman J (2004) How Iris Recognition Works. IEEE Transactions on Circuits Systems and Video Technology. 14: 21–30. [14] Dass S. C, Nandarumar K, Jain A. K (2005) A Principled Approach to Score Level Fusion in Multimodal Biometric Systems. Proceedings of International Conference on Audio‐ and Video‐Based Biometric Person Authentication. 1049‐1058. [15] http://mmlab.ie.cuhk.edu.hk/facesketch.html (accessed on July 23, 2012).
www.intechopen.com