feature extraction using discrete cosine transform for ... - IEEE Xplore

6 downloads 0 Views 276KB Size Report
A simple statistical analysis has been used to select the discriminant coefficients of the Discrete Cosine. Transform for the face recognition. The proposed.
FEATURE EXTRACTION USING DISCRETE COSINE TRANSFORM FOR FACE RECOGNITION S. Dabbaghchian1, A. Aghagolzadeh1, and M. S. Moin2 1-Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran. 2- Iranian Telecommunication Research Center, Tehran, Iran. [email protected], [email protected]. ABSTRACT A simple statistical analysis has been used to select the discriminant coefficients of the Discrete Cosine Transform for the face recognition. The proposed procedure is different from the traditional zigzag or zonal masking. It searches for coefficients which have more ability to discriminate different classes better than other coefficients. Also the extensive span of DCT coefficients has been concerned in our proposed algorithm. Various coefficient selection algorithms have been compared. Simulation results on the ORL and Yale face database show the success of the proposed approach.

1. INTRODUCTION Various approaches have been proposed by researchers for face recognition [1, 2]. From one point of view, these various approaches can be categorized into two general groups, namely feature based and holistic. Discrete transforms have been widely used for data redundancy reduction as the first step of the holistic approaches. Of the deterministic discrete transforms, DCT is very close to the Karhunen-Loeve Transform (KLT) [3] and has strong abilities for data decorrelation [4]. Also there are fast algorithms for DCT realization, which is not the case for KLT. These special properties of DCT make it as a powerful transform in image processing applications, including face recognition. Ramasubramanian and Venkatesh used a combination of the DCT, PCA and the characteristics of the Human Visual System for encoding and recognition of faces [5]. In [6] various combinations of the DCT, the PCA and the LDA have been surveyed. In [7] the DC or the first three low frequency coefficients have been truncated in order to decrease the effects of the illumination variation. In [8] polynomial coefficients derived from the 2D-DCT coefficients obtained from the spatially neighboring blocks. In most of the various applications of the DCT which have been mentioned above, the coefficients are usually selected with a static method such as zigzag or zonal masking. These approaches are not necessarily efficient in all applications and for all databases. In our proposed method first, in the train phase, a coefficients selection (CS) mask is achieved by the simple statistical analysis of

1-4244-0779-6/07/$20.00 ©2007 IEEE

the DCT coefficients of the all training facial images of the database. This mask is used for the CS during the test phase. Our feature extraction approach is database dependent and it is able to find the best discriminant coefficients for each database. Also the DCT coefficients have been distributed in a relative wide area of the real numbers. This cause some disadvantage when minimum Euclidean distance (ED) classifier was used. Our proposed method modifies the ED classifier and improves the recognition rate. 2. FEATURE EXTRACTION Feature extraction by the DCT method, consists of two stages. At first, the DCT is applied on an image and in the second, some of the coefficients are selected. In this section, we focus on the second stage and propose efficient approaches of the CS. The result of the applying the DCT on an M × N image will be a 2D M × N matrix which is called the coefficients matrix. In fact the DCT, by itself, does not decrease data dimension; so it compresses the most signal information in a small percent of coefficients. Entire image DCT has been used here. CS is the essential and important step which strongly influences the algorithm efficiency. We categorize CS approaches into two groups, namely, static and datadependent. The static CS approaches, select coefficients with a fix mask (or fix location) such as zigzag or zonal masking. Fig. 1.a-b shows these approaches. In datadependent CS approaches, CS mask (the locations of the selected coefficients) is determined by data analysis. Although the static approaches are simple but are not efficient. We show that the data-dependent approaches improve the recognition rate in exchange for a little increase in computations. In order to reach an efficient CS algorithm, two questions must be answered: “Which coefficients and how many numbers of coefficients must be selected?” We think that there is no general answer for these questions but it depends on a specific application. For example the DC coefficient, which is useful in image compression, may be harmful in face recognition because of illumination variation of database images. Our suggestion for reaching an efficient CS algorithm in a specific application is as below: At first, the desired properties of the coefficients, based on the application, are defined. Then a function is defined to convert the qualitative properties to the quantitative values. Finally the locations of the coefficients with desired values are used to construct a CS mask. In the face recognition

1 0 (b)

b2 b5 b9

b6 b8

Train set

(c)

DF

Figure 1. (a)zigzag, (b) zonal masking, (c) bands of mode 1 problem, the coefficients which are selected must have the ability to separate various individual faces. Let us term to the defined function, which measures this property as Discriminant Function (DF). 2.1.

Frequency Band Modes

We can process the DCT coefficients one by one or by frequency bands. We define three modes, namely mode 1 to 3, for dividing the coefficients into some bands. Description of these modes is as follow: Suppose the DCT coefficients matrix is given as:  ac1,1 ac1,2 … ac1, N   ac2,1 ac2,1 … ac2, N 

  …  acM ,1 acM ,2… acM , N  M ×N

Mode 3 [9]: bk = {aci, j (i =

k, j ≤ k)or (j = k, i ≤ k )} , k = 1, 2, ... min(M , N )

(2)

Theoretically, the mode 1 has advantage over the other two modes. Mode 1 searches coefficients locally against the modes 2 and 3 which search coefficients globally. In mode 2 and 3, if one coefficient with good discriminant ability stands beside the other coefficients with weak discriminant abilities in a band, the DF gives a small value for this band; because all of the coefficients, which construct the band, influence on the discriminant value. Therefore, the abilities of those entire coefficients are lost. In the mode 1, the discriminant ability of the coefficients is processed independently. 2.2.

Test set

Discriminant Function

Discriminant abilities of the coefficients resulted from two properties: variation between and within classes. Large variations between the classes and small variations within the classes increase the discriminant ability. In one possible way, the DF can be defined as the division of the between class variations to the within class variations. In this way, DF computes large values for bands with high discriminant ability and small values for others. Statistical approaches are usually used as a measuring criterion. We use statistical mean and variance

CS mask Generate feature vectors

DCT

Figure 2. Procedure of our proposed approach to define the DF. Fig. 2 illustrates the whole procedure of our proposed face recognition approach. Complete description of the algorithm and DF computation is as below: Step 1) Randomize the database images into train and test set. Equal number of samples must be selected from each class. Step 2) Apply the DCT on the images of the train set. Use below equation for an image size of M × N and intensity function of f ( x , y ) : F (u , v ) =

Mode 1: In this mode, each band includes only one coefficient. For simplicity we use the “band” word for single coefficient too. Mathematical description of mode 1 is difficult. It is described by Fig. 1-c. Mode 2: bk = {aci, j i + j = k } , k = 2, 3, ... min(M , N ) (1)

Generate feature vectors

DCT

Classifier

(a)

b1 b3 b4

M −1

1 MN

α (u )α ( v )

N −1

∑∑ f ( x, y) × cos  x=0

y =0

(2 y + 1)vπ  , u = 0, 1, ..., M , ×cos   2 N  v = 0, 1, ..., N

α (ω ) =

{

1 1

2

(2 x + 1)uπ 2M

ω =0 otherwise

 

(3)

Step 3) Choose one of the three explained modes and then separate the DCT coefficients into the bands bk . Step 4) Select a band of the all training images. Let us to term the DCT coefficients of band k and sample “s” of class “c” as X cs . X cs = {bk for sample s of class c } (4) Step 5) Determine the within class variance as follow: 5-a) Calculate mean of each class

{

} , c = 1, 2, ..., C

(5)

= tr ( E [( Ac − X c )T ( Ac − X c )] ), c = 1, 2, ..., C

(6)

s

X c = E ( Ac ) , Ac = X c

s =1,2,..., S

where E(.) represents the expectation operator. S is the number of samples in each class and C is the number of classes in the database. We suppose the equal probability for all of the samples. 5-b) Calculate the variance of samples in each class: Vc ( k )

where tr(.) represents trace of matrix. 5-c) Average the variances of the all classes C

∑ Vc (k ) C

VW ( k ) = 1

(7)

c =1

Vw is the within class variance. Step 6) Determine the between class variances: 6-a) Calculate the mean of all samples in all classes. X

{

= E ( A ) , A = X cs

}

s =1,2,..., S c =1,2,...,C

(8)

6-b) Calculate the variance of all samples in all classes VB (k ) = tr ( E ( ( A − X ) ( A − X ) )) (9) where VB(k) is the between class variance for band k. T

2 .5

x 10

-3

0 .9 2

0 .8 0 .7

Mode 1 Mode 2 Mode 3 Zigzag Zonal masking

0 .6 0 .5 0 .4 0 .3

variance

Recognition rate (mean)

1

1 .5

1

0 .5

0 .2 0 .1

0

50

1 00

0

150

0

50

1

0.016

0.9

0.014

150

0.012

0.8 0.7

Mode 1 Mode 2 Mode 3 Zigzag Zonal masking

0.6 0.5 0.4 0.3

1 00

Number of coefficients (b)

variance

Recognition rate (mean)

Number of coefficients (a)

0

50

100

0.01 0.008 0.006 0.004 0.002

150

0

0

50

100

150

Number of coefficients (b)

Number of coefficients (c)

Figure 3. Comparison of various data dependent and the static CS approaches. (a) ORL, mean of 10 run. (b) ORL, variance of the results. (c) Yale, mean of 10 run. (d) Yale, variance of the results Step 7) Determine DF value for band k as below: V (k ) DF (k ) = B (10) VW ( k ) Step 8) Repeat steps 4-7 for all bands. Step 9) Specify the bands with the large values of the DF and generate a CS mask. Step 10) Use CS mask to generate feature vectors for train and test set. Step 11) Classify the test images and compute rate of the false recognition or rate of the true recognition. 3. CLASSIFIER Using of Euclidean Distance (ED) as similarity criterion has a disadvantage and fails in some cases. Suppose two vectors X , Y : X = [ x1 x2

xn ] , Y = [ y1 y2

yn ]

The ED is calculated for these vectors as below: n

d ( X , Y ) = ∑ ( xi − yi ) 2

(11)

i =1

Any pair of the elements of the vectors such as ( xi , yi ) which have large values respect to other elements influence the summation in equation (11) and decrease the role of the other elements. For solving this problem, a modified ED is proposed: n

dw ( X , Y ) = ∑

( xi − yi )

2

(12) mi where mi is a weight which adjust the effect of the vector i =1

elements on the summation in equation (12). M is defined as the mean vector of the all feature vectors of train set M = [ m1 m2 mn ] = E ( F ) (12)

{

F = Fsc

}

c =1,2,..., C s = index of samples in train set

where F is a set of the feature vectors of the train set and is generated on step 10 of the algorithm. We call dw weighted ED where multiplies each difference of the pair elements by a weight. By attention to this fact that the DCT coefficients have been distributed in a relative wide area of real numbers, it will be reasonable to claim that using of the weighted ED instead of ED will improve the results. 4. SIMULATION RESULTS Two most popular face databases, ORL (http://www.camorl.co.uk) and Yale (http://cvc.yale.edu) have been used in our simulations. Table 1 show the specification of databases and details of the simulations. Train and test sets images have been selected randomly. Results have been achieved with 10 runs, with different train and test sets, of this procedure. Mean and variance of the results has been presented. The large mean of the true recognition rate, the small variance and the small number of coefficients show the preference of a method. Fig. 3 shows the rate of true recognition rate in terms of the number of coefficients with various approaches. Simulation results show that the database dependent CS approach is more efficient than the static CS approach. Also the simulation results show that the mode 1 has higher rate over the other two modes as has been discussed intuitively. Result of using ED and weighted

1

2.5

x 10

-3

Recognition rate (mean)

0.9 2

0.8

Variance

0.7 0.6 0.5

simple ED weighted ED

0.4 0.3

1.5

1

0.5

0.2 0.1

0

50

100

0

150

0

Number of coefficients (a)

50

100

150

Number of coefficients (b) 0.025

1

Recognition rate (mean)

0.9 0.02

Variance

0.8 0.7

simple ED weighted ED

0.6 0.5

0.015

0.01

0.005 0.4 0.3

0

50

100

0

150

Number of coefficients (c)

0

50

100

Number of coefficients (d)

150

Figure 4. Comparison of simple and weighted ED classifiers: (a) ORL, mean of 10 run. (b) ORL, variance of the results. (c) Yale, mean of 10 run. (d) Yale, variance of the results. Table 1. Database and simulations details Database ORL Yale

number of classes (C) 40 15

number of samples in each class (S) 10 11

train

test

5 6

5 5

ED classifier has been compared in Fig. 4. Obtained results in the ORL are similar to each other but there is a great improvement in the Yale database. ACKNOWLEDGEMENT: This research has been supported by Iranian Telecommunication Research Center, Tehran, Iran.

[6] A. Pnevmatikakis and L. Polymenakos, “Comparison of Eigenface-Based Facture Vectors under Different Impairments,” International Conference on Pattern Recognition (ICPR), Vol. 1, pp. 296-299, 2004. [7] M. J. Er, W. Chen, and S. Wu., “High Speed Face Recognition Based on Discrete Cosine Transform and RBF Neural Networks,” IEEE Transactions on Neural Networks, Vol. 16, No. 3, pp. 679-691, 2005. [8] C. Sanderson and K. K. Paliwal, “Features for Robust Face-Based Identity Verification,” Journal of Signal Processing, Vol. 83, pp. 931-940, 2003.

REFERENCES

[9] X. Y. Jing and D. Zhang, “A Face and Palmprint Recognition Approach Based on Discriminant DCT Feature Extraction,” IEEE Transactions on Systems, MAN and Cybernetics, Vol. 34, No. 6, pp. 2405-2415, 2004.

[1] W. Zhao, R. Chellappa, P. J. Phillips and A. Rosenfeld, “Face Recognition: A Literature Survey,” ACM Computing Surveys, Vol. 35, pp. 399-458, 2003.

[10] P.N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, pp. 711-720, 1997.

[2] R. Chellappa, C. L. Wilson and S. Sirohey, “Human and Machine Recognition of Faces: A Survey, ” Proceeding of IEEE, Vol. 83, No. 5, pp. 705-740, 1995. [3] M. Turk and A. Pentland, “Eigenfaces for Recognition,” International Journal of Cognitive Neuroscience, Vol. 3, No. 1, pp. 71-86, 1991. [4] Z. M. Hafed and M. D. Levine, “Face Recognition Using the Discrete Cosine Transform,” International Journal of Computer Vision, Vol. 43 (3), pp.167-188, 2001. [5] D. Ramasubramanian and Y. V. Venkatesh, “Encoding and Recognition of Faces Based on the Human Visual Model and DCT,” Journal of Pattern Recognition, Vol. 34, pp. 24472458, 2001.

[11] D. Zhong and I. Defee, “Pattern Recognition in Compressed DCT Domain,” Proceeding of IEEE International Conference on Pattern Recognition (ICPR), pp. 2031-2034, 2004. [12] N. Ahmed, T. Natarajan, and K. Rao, “Discrete Cosine Transform,” IEEE Transactions on Computers, Vol. 23(1), pp. 90-93, 1974. [13] S. Zhao and R. R. Grigat, “Multiblock Fusion Scheme for Face Recognition,” International Conference on Pattern Recognition (ICPR), Vol. 1, pp. 309-312, 2004. [14] W. Chen, M. J. Er and S. Wu, “PCA and LDA in DCT domain,” Pattern Recognition Letters, Vol. 26, pp. 2474-2482, 2005.