This paper presents research on importance of color information in face verification system. Four most popular color spaces where used: RGB, YIQ, YCbCr, ...
Using color for face verification Mariusz Leszczynski Warsaw University of Technology, Faculty of Electronics and Information Technology, 00-665 Warszawa, Nowowiejska 15/19, Poland ABSTRACT This paper presents research on importance of color information in face verification system. Four most popular color spaces where used: RGB, YIQ, YCbCr, luminance and compared using four types of discriminant classifiers. Experiments conducted on facial databases with complex background, different poses and light condition show that color information can improve the verification accuracy compared to the traditionally used luminance information. To achieve the best performance we recommend to use multi frames verification encoded to YIQ color space. Keywords: RGB, YIQ, YCbCr, luminance, face verification, biometrics, PCA, LDA, DLDA, Discriminant Analysis Diagram (DAD).
1. INTRODUCTION In the most cases face recognition applications based on 2D images are using only luminance information. The reasons for this situation was: processing efficiency, compatibility with existing databases, storage capability and sensors cost. Today these problems are not so important. Nearly all systems are using color cameras, disc capacity and processing power increased. This brings new potential for that systems. We may use several images taken from video camera. The second opportunity to get more information is to use color. Motivation to our work was to create methodology that combine this two extensions to improve verification accuracy.
2. PREPROCESSING To obtain the best face representation from input image we must use some preprocessing steps. 2.1
Face detection and localization
The accurate detection and localization of human faces in arbitrary scenes, is very important process in full automatic verification system. Errors in localization can influence to whole process. One of the most popular face detection algorithms is the AdaBoost. It was introduced by Viola and Jones in 2001 [1]. In our work we use algorithm based on discrete approximation of Gabor Transform [2]. This algorithm is more complex but gives better results. Comparison of face verification results on manual and automatic localization is given in [3].
Fig. 1. Example of face localization.
Result with eyes, nose and mouth localization is presented in Fig. 1. Rectangle cover area taken as face. Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2009, edited by Ryszard S. Romaniuk, Krzysztof S. Kulpa, Proc. of SPIE Vol. 7502, 75020B · © 2009 SPIE · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.838250 Proc. of SPIE Vol. 7502 75020B-1
2.2
Face normalization
Images given by face detector are in different resolution (on account of different distance from camera). To be able to compare this faces we must use some normalization. Based on MPEG-7 face descriptor [4] images are normalized to the size 46x56 based on fixed eye center position. This is done by automatic position marking of the eyes centre’s from detector in original image, so that they are placed at fixed coordinates in the normalized image, namely (16; 24) for the left eye and (31; 24) for the right eye. Next, some part of original image (the face and the neighbor) is down-scaled based on the proportions of the distance between eyes in the original one and 15, and possibly rotated to have upright face orientations in the normalized images. 2.3
Color space
In this article four most popular color spaces in video material are used: RGB, YIQ, YCbCr and luminance(Y). The RGB is an additive color model in which red, green, and blue component are added together to reproduce an array of colors. It is a convenient color model for computer graphics because the human visual system works similar. It is commonly used in consumer grade digital cameras, HD video cameras, computer monitors etc. The first TV systems transmitted only a luminance component. For today in some industrial cameras only monochromatic information is stored. Luminance information is used in most face recognition applications. Even if color information is present, the system converts the image to monochromatic form using simple transformation formula. For example a common mapping:
Y = 0.2989* R + 0.5870* G + 0.1140* B
(1)
YIQ model is used in NTSC color TV, it is downward compatible with B/W TV where only Y is used. In the NTSC color space, image data consists of three components: luminance (Y), hue (I), and saturation (Q). I is the orange-blue axis, Q is the purple-green axis. Conversion matrix from RGB is defined by:
⎡Y ⎤ ⎡ 0.299 0.587 0.114 ⎤ ⎡ R ⎤ ⎢ I ⎥ = ⎢ 0.596 -0.275 -0.321⎥ ⎢G ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢⎣Q ⎥⎦ ⎢⎣ 0.212 -0.523 0.311⎥⎦ ⎢⎣ B ⎥⎦
(2)
The YCbCr color space is used also in digital video (like mpeg-2,4) and image format (JPEG). This model is closely related to the one used in PAL television system. Conversion formula is presented below:
⎡Y ⎤ ⎡ 0.257 0.504 0.098 ⎤ ⎡ R ⎤ ⎡16 ⎤ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢Cb ⎥ = ⎢-0.148 -0.291 0.439 ⎥ ⎢G ⎥ + ⎢128⎥ ⎢⎣Cr ⎥⎦ ⎢⎣ 0.439 -0.368 -0.071⎥⎦ ⎢⎣ B ⎥⎦ ⎢⎣128⎥⎦
(3)
3. FEATURE EXTRACTION The next step in recognition/verification system is feature extraction. Of course, we can take all pixels, but this approach is not recommended regardless that facial description should not have large size and contained too many details (background etc.). One of the possible solution is transformation data into spectral domain and selection of most significant part. In this experiment we use 2D Discrete Fourier Transform (DFT). Image of Fourier magnitude for face image is shown in Fig.2. The most significant coefficients are located in corners blocks. Real and imaginary part of
Proc. of SPIE Vol. 7502 75020B-2
DFT coefficients from upper left and right corners are stacked in one vector. Optimal windows size established from previous experiments are of size 9x7 and 8x7 (upper left, upper right).
Fig. 2. Image of Fourier magnitude for face image (from left Y, I and Q components)
4. DISCRIMINANT FEATURE EXTRACTION Biometric pattern verification is conceptually different from traditional class membership verification. This is involving with following terms: 1. We deal always with a subset of the whole collection of classes. 2. The number of classes used in training time of recognition system is small and usually different from classes which are recognized in exploiting time. Since natural human centered pattern classes cannot be used in person verification biometric systems, another categorization has to be sought. It appears that differences of human features for the biometric measurements of the same person (within-class differences) and for different persons (between-class features) create a consistent categorization including two specific classes. The specificity of this two classes follows from the fact that means of these two classes are both equal to zero. Moreover, for the within-class feature variation (varw) could be sometimes greater than between-class feature variation (varb), i.e. usually the squared within-class errors are of the same magnitude as squared between-class errors. Therefore, it is natural to look for such a linear transformation W : RN → Rn of original measurements x ∈ RN (e.g. vectorized pixel matrix of face image or its 2D frequency representation) into a target feature vector z= Wtx for which intra-class differences are decreased while inter-class differences are increased. This is the problem of the classical Linear Discriminant Analysis (LDA) [5]. 4.1
Regularization of LDA by Projections in Error Spaces
In this section a novel point of view on LDA regularizations is presented which uses the concept of projections in error subspaces. It unifies in one consistent scheme three approaches and integrates with Dual Linear Discriminant Analysis (DLDA) [6] and Principal Component Analysis. There are three types of errors in this approach. They are defined w.r.t. any data matrix Y =[y1,...,yL] and with the fixed class assignments Ij ,j = 1,...,J : 1.
Grand error: the difference of data vector yk and the grand mean vector of Y. This can be modelled by the global centering operation Cg :
y=
2.
1 L ∑ y i , C g (y k ) := y k − y, k = 1,..., L L i =1
Intra-class error: the difference of data vector yk, k ∈ Ij and its class mean y
Proc. of SPIE Vol. 7502 75020B-3
( j)
(4)
y( j) =
3.
1 Lj
∑y, C
i∈I j
i
w
( yk ) := yk − y ( j ) , k ∈ I j , j = 1,..., J
Inter-class error: the difference of class mean y
( j)
(5)
and grand mean y :
Cm ( y ( j ) ) := y ( j ) − y , j = 1,..., J
(6)
The error vectors span error linear subspaces denoted as follows: Eg(Y)=span(Cg(Y))
Ew(Y) =span(Cw(Y))
Em =span(Cb( Y))
(7)
The singular bases U(g), U(w), U(b) of the error linear subspaces are obtained from Singular Value Decomposition (SVD [7]) for matrices Cg(Y), CW(Y), Cb( Y), respectively:
Cg(Y)=U(g)Σ(g)(V(g))t Cw(Y)=U(w)Σ(w)(V(w))t Cb(Y)=U(b)Σ(b)(V(b))t
(8)
where diagonal squared matrices ∑(.) are of size equal to the rank of centered data matrix C.(). In case of grand and intra-class centering singular values are ordered from maximal to minimal value while in case of inter-class centering the standard SVD order is inverse - the first element on the diagonal is minimal. Let Y ∈ Ra x L. Then we identify all singular subspaces of dimension a' < dim( ε (Y ) ) of error spaces by the projection operators which map the space Ra onto Ra' - the space of projection coefficients w.r.t. to the singular base Ua’> restricted to the first a' vectors: 1. Ρa ,a′ : projection onto grand error singular subspace of dimension a'; (g)
2. Ρa ,a′ : projection onto intra-class error singular subspace of dimension a'; ( w)
3. Ρ a , a′ : projection onto inter-class error singular subspace of dimension a'. (b )
Additional operation required after projection is component-wise scaling by the first inverse singular values which create the diagonal matrix : (g)
1. S a ′ : scaling of projected vector in grand error singular subspace of dimension a'; (w )
2. S a ′ : scaling of projected vector in intra-class error singular subspace of dimension a'; (m )
3. S a ′ : scaling of projected vector in inter-class error singular subspace of dimension a'. In matrix composition terms the projection and scaling operations have the form::
Ρa ,a′ ( x) = U at ′ x, Sa′ ( y ) = ∑a′ y −1
(9)
Using the above notation the all known to authors LDA and PCA transformations can be defined via the diagram in Fig.3 which is the extraction part of the discriminant analysis diagram (DAD). The diagram is labelled directed graph with the distinguished source node, sink node, and other nodes representing the source, the target, and the inter-mediate discriminant features of recognized object. Through the source node the original input feature vector x is delivered while from the sink node the discriminant output feature ye=Wtex is received.
Proc. of SPIE Vol. 7502 75020B-4
Fig. 3. LDA and PCA type feature extraction part of discriminant analysis diagram.
The edge label represents one of vector transformations: data centering, orthogonal projection onto linear subspace, vector component scaling, and orthogonal projection onto unit sphere. A path from the source node to the sink node defines a basic discrimination scheme which is identified by sequential composition of operations assigned to path edges. Upper path of the graph illustrated classical LDA approach, middle – PCA and lower defined classical DLDA method. 4.2
Transformations for Matching
In matching stage additional transformations are performed before a distance function is called. In our approach the distance function is defined by the Euclidean norm. One of the operations not defined yet used in matching stage is the vector length normalization Na which can be geometrically interpreted as the projection on the unit sphere in Ra.
Ν a ( x) :=
x x
(10)
Then four different matching transformations are defined. Extraction and matching parts of DAD are linked in Fig.4. The particular application can be tuned against DAD and the best path can be selected. We illustrate in the next section this DAD face verification with using PCA, R-LDA [8] and [9] (based on classical LDA) and extended by matching part LDA and DLDA methods.
Proc. of SPIE Vol. 7502 75020B-5
Fig. 4. Complete discriminant analysis diagram.
5. EXPERIMENTAL RESULTS For our experiments we selected two test sets with normalized color images from the following databases (Fig. 5): {1}. Altkom (80 persons, 1680 images), Banca (52 persons, 474 images), Valid (106 persons, 1575 images) and WUT database (143 persons, 769 images) that gives 391 persons with 4525 images. Pictures from databases are taken in different light conditions and except Altkom in some time interval. {2}. Xm2vts with the representative photos from two session (294 persons with 2629 images) taken from digital video recordings. During this session the subject were asked to rotate their head from the centre to the right, then up, then down, finally returning it to the centre while illumination conditions remained uniform and homogenous blue background was kept constant through all sessions.
Fig. 5. Face databases – from upper Altkom, Banca, Valid, WUT, Xm2vts.
In this approach every color component have its own feature extraction and discrimination path. For every of them Euclidean distance to person descriptor is calculated and aggregated using quadratic mean. If this distance is smaller than threshold we accept query image to that person. To quantify verification performance the Receiver Operating Characteristic (ROC) is used. It shows the tradeoff between two types of error by plotting estimates of the verification rate: False Rejection error against False Acceptance error as a parametric function of the prior distance threshold. Equal Error Rate (EER) presented on characteristic is the value where both accept and reject errors are equal. In Fig 6, 7 and 8 we have shown experimental results for RGB, YCbCr, YIQ color spaces and confronted with luminance information using presented on DAD: PCA, LDA, RLDA and DLDA classification methods. It appears that for LDA and DLDA method’s color information improved verification results compared to traditionally used luminance. Additionally for all color spaces the DLDA gives the best verification results. As it shown in Fig.9 for both testing set YIQ color space gives less verification errors (respectively for dataset {1} 30% and for dataset {2} 48%). Having series of pictures we can try verification based on the several facial images. In this case person id will be accepted if at least half of the query images are accepted. Like in the previous experiments YIQ model gives the best results (Fig.10), additionally using this scenario we get another errors reductions, especially for Xm2vts database with EER=5x10-5.
Proc. of SPIE Vol. 7502 75020B-6
Fig. 6. Image ROC for database {1} (left) and {2} (right) using RGB color space.
Fig. 7. Image ROC for database {1} (left) and {2} (right) using YCbCr color space.
Fig. 8. Image ROC for database {1} (left) and {2} (right) using YIQ color space.
Proc. of SPIE Vol. 7502 75020B-7
Fig. 9. Image ROC for database {1} (left) and {2} (right) using DLDA method.
Fig. 10. Person ROC for database {1} (left) and {2} (right) using DLDA method.
6. CONCLUSION In this paper the color impact on face verification performance was analyzed. It was shown that the use of the color information can improve the verification accuracy when compared to the same scheme which used only a luminance information. We recommend to use proposed method based on all three components from YIQ space with DLDA classification method and if it’s possible on several query images. For this condition we get significant errors reduction from 29% to even 99,6% comparison to luminance used in the most face verification applications. Our experiments conducted on two different data sets taken in different light conditions, background, pose and some time interval guarantee that proposed methods are universal and should work with any databases as well.
Proc. of SPIE Vol. 7502 75020B-8
7. ACKNOWLEDGEMENTS The work presented was developed within VISNET 2, a European Network of Excellence (http://www.visnet-noe.org), funded under the European Commission IST FP6 Programme and was also supported by the Foundation for the Development of Radio-communication and Multimedia Techniques (Fundacja Wspierania Rozwoju Radiokomunikacji i Technik Multimedialnych).
8. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9.
Viola, P. and Jones, M. (2001). “Rapid object detection using a boosted cascade of simple features”. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 511–518. J. Naruniec, W. Skarbek (2007). “Face Detection by Discrete Gabor Jets and Reference Graph of Fiducial Points”. In Rough Sets and Knowledge Technology, s. 187–194. Springer Berlin/Heidelberg. J. Naruniec, W. Skarbek, A. Rama, "Face Detection and Tracking in Dynamic Background of Street" , Int. Conf. Signal Processing and Multimedia Applications (SIGMAP 2007), Barcelona, Spain, July 2007 ISO/IEC 15938-3:2002 (2002). Fukunaga, K.: “Introduction to Statistical Pattern Recognition. Academic Press” (1992) Skarbek, W., Kucharski, K., Bober, M.: “Dual LDA for Face Recognition. Fundamenta Informaticae”. 61 (2004) 303-334 Golub, G., Loan, C.: Matrix Computations. The Johns Hopkins University Press (1989) Juwei Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, "Face Recognition Using LDA Based Algorithms", IEEE Transactions on Neural Networks, Vol. 14, No. 1, Page: 195-200, January 2003. Juwei Lu, K.N. Plataniotis, A.N. Venetsanopoulos, “Regularization Studies of Linear Discriminant Analysis in Small Sample Size Scenarios with Application to Face Recognition”, Pattern Recognition Letter, vol. 26, issue 2, pp. 181-191, 2005.
Proc. of SPIE Vol. 7502 75020B-9