Automatic Parameter Selection for Eigenfaces

6 downloads 0 Views 196KB Size Report
such methods are not suitable in automatic face recognition systems as they ... The motivation of Eigenfaces was taken from the work by Sirovich and Kirby in 1987 [7]. ... whilst its last dimension captures the least amount of variance in the database. ... ues containing the greatest amount of variance among the images [13].
Automatic Parameter Selection for Eigenfaces Ronny Tjahyadi,Wanquan Liu and Svetha Venkatesh School of Computing Curtin University of Technology GPO Box U1987 Perth WA 6845 Email: {tjahyadi, wanquan, svetha}@cs.curtin.edu.au

Abstract In this paper, we investigate the parameter selection issues for Eigenfaces. Our focus is on the eigenvectors and threshold selection issues. We propose a systematic approach in selecting the eigenvectors based on the relative errors of the eigenvalues. In addition, we have designed a method for selecting the classification threshold that utilizes the information obtained from the training database effectively. Experimentation was conducted on the ORL and AMP face databases with results indicating that the automatic eigenvectors and threshold selection methods provide an optimum recognition in terms of precision and recall rates. Furthermore, we show that the eigenvector selection method outperforms energy and stretching dimension methods in terms of selected number of eigenvectors and computation cost.

1. Introduction Researches into face recognition has flourished in recent years due to the increased need for surveillance and more secure systems and has attracted a multidisciplinary research effort. Many approaches have been proposed to represent and recognize faces under various viewing conditions and numerous algorithms have been developed such as statistical-based, neural networks and feature-based algorithms [4]. Eigenfaces is a popular face recognition method based Principal Components Analysis (PCA) [16]. This approach is one of the preferred techniques in face recognition due to its simplicity, ability to perform in real-time situations and robustness under varying illuminations. Many algorithms have been proposed to the extension of Eigenfaces such as in Modular Eigen-spaces [12], Global Eigen approach [9], and Bayesian modeling [11]. There are several issues in Eigenfaces. The first is the issue of eigenvector selection [16, 15, 17]. Selecting appropriate eigenvectors to create the eigenspace is crucial to the computational cost, threshold selection and performance of the Eigenfaces. Although several approaches have been introduced to overcome this issue [17], there are still no methods for selecting the appropriate number of eigenvectors systematically and still not adequate to represent high dimensional space with lowest dimensional subspace with good approximation. The next issue is the threshold selection [16, 3, 14]. Manual threshold selection based on ad hoc heuristics and Receiver Operating Characteristics (ROC) curves are the common approaches. However, such methods are not suitable in automatic face recognition systems as they require a manual step and are tedious in practice. In this paper, we will address these basic issues in the Eigenfaces, namely the eigenvector and threshold selection. First, we develop a systematic approach for selecting the eigenvectors based on the relative errors of the eigenvalues. Second, we propose a technique to automatically select the threshold, which utilizes the information obtained from the training database. The performance of the proposed techniques will be investigated using five datasets derived from ORL and AMP database. Finally, the performance of the eigenvectors selection method is compared with other existing methods. 1

This paper is organized as follows. Section 2 explains a brief description of the Eigenfaces and summarizes the related work on eigenvectors and threshold selection. Section 3 describes the proposed methods in selecting the eigenvectors and the classification threshold. Experimental results are presented in Section 4. Finally, Section 5 concludes the paper.

2. Preliminaries 2.1. Eigenfaces The motivation of Eigenfaces was taken from the work by Sirovich and Kirby in 1987 [7]. Turk and Pentland [16] developed the PCA face recognition system known as Eigenfaces. PCA extracts the eigenvectors and eigenvalues from a covariance matrix constructed from the original training images. The first orthogonal dimension of this eigenspace captures the greatest amount of variance in the database whilst its last dimension captures the least amount of variance in the database. The training images are then projected into the eigenspace, thus creating a lower dimensional space. To test whether an image corresponds to images in the training database, the test image is projected into the eigenspace and the Euclidian distance between the test image and training image used as a basis for matching [16].

2.2. Eigenvectors Selection The Eigenfaces method encounters an issue in the selection of eigenvectors to represent the optimal subspace [16, 17, 15]. Several approaches have been proposed for selecting the eigenvectors. However, none of these methods are scalable in the sense of those related to the variance with the database. Five of these approaches are described below [17]: 1. Standard eigenspace projection: All eigenvectors corresponding to non-zero eigenvalues are used to create the eigenspace. 2. Removing the last 40% of the eigenvectors: Since the eigenvectors are sorted by the corresponding descending eigenvalues, this method removes the last 40% of the eigenvectors that contains the least amount of variance among the images [13]. 3. Removing the first eigenvector: This method removes the first eigenvector with the assumption that information in this eigenvector is affected by lighting conditions and degrades classification [13]. 4. Energy Dimension: This method uses the minimum number of eigenvectors to guarantee that the energy (e) is greater than a threshold. A typical threshold is 0.9. The accumulated energy of the i th eigenvector is the ratio of the sum of the first i eigenvalues over the sum of all the eigenvalues [6].

ei =

i P

j=1 k P

λj (1) λj

j=1

where k is the number of non-zero eigenvalues.

5. Stretching Dimension: This method calculates the stretch (s) of an eigenvector. The stretch of the i th eigenvector is the ratio of the ith eigenvalues λi over the maximum eigenvalue λ1 [6]. A common threshold for the stretching dimension is 0.01. λi (2) si = λ1

Removing unnecessary eigenvectors also reduces the computation cost in Eigenfaces. The first and third eigenvector selection methods do not remove the unnecessary eigenvectors that have the least amount of variance amongst the images, and thus have high computation cost. None of these methods are adequate to the variance with a database.

2.3. Threshold Selection Selecting the optimum threshold for facial classification is crucial to the performance of the recognition system [16, 3, 14]. Each face database requires a unique threshold value. Ad hoc methods are used for the threshold selection and most of them are manual and required more extra testing to validate the selected threshold. Another method to select the classification threshold is by using the Receiver Operating Characteristic (ROC) curves [5, 10, 8]. An ROC graph illustrates the trade off between True Positive (TP) versus False Positive (FP), and the ROC curve is obtained using a subset of the training database. The point on the ROC curve where TP and FP values are high and low respectively is selected as classification threshold. This threshold has shown to perform well [10] but its main shortcoming is found manually and not suitable for changing database, in particular where new images are added or old ones are taken out.

3. Methodology In this section, we present a systematic approach for eigenvector and threshold selection by using the information obtained from the training database.

3.1. Eigenvectors Selection The motivation for selecting a subset of eigenvectors (M 0 ) is to find a good low dimensional subspace approximation of the high dimensional space. As the change of eigenvalues is a reflection of the subspace capacity via corresponding eigenvectors, we expect to obtain an optimal subspace via analysing the changes of eigenvalues. Here, we present a method which uses the relative errors of the eigenvalues to achieve an optimal subspace. The eigenvector selection algorithm is based on the relative errors obtained from 60% of the eigenvalues containing the greatest amount of variance among the images [13]. As demonstrated by [13], 60% of the eigenvalues represents the whole eigenspace quite well. The relative error (R k ) is defined as Rk =

λk − λk+1 × 100% λk

(3)

where k = 1, . . . , Nl and Nl represents 60% of the eigenvalues. With these data, K-means clustering is then used to cluster {R1 , . . . , RNl } into two classes. The class that contains the relative errors corresponding to the larger eigenvalues is then chosen and the index of the smallest eigenvalue in the chosen class is selected as the number of eigenvectors (M 0 ) required to create the eigenspace. All these operations can be done automatically.

3.2. Computation Cost Analysis In this section, we investigate the computational advantage in using the proposed eigenvector selection approach. Let M denote the total number of images in the training data where each Γ i is N × N . First, the average image is computed as below M 1 X Ψ= Γi (4) M i=1

with dimension N 2 ×1. Hence, we construct the normalized image, Φ i = Γi −Ψ and A = [Φ1 , Φ2 ...ΦM ] with dimension N 2 × M . Let covariance matrix B = AT A.

The eigenspace (ul ) is defined as ul =

M X

vlk Φk , l = 1, 2...M

(5)

k=1

with dimension N 2 × M , where vl k are eigenvectors of B. The projection of an image w k is wk = uTk (Γ − Ψ), k = 1, 2...M 0

(6)

The computation cost reduction for an image will be wk = uTk (Γ − Ψ), k = M 0 , M 0 + 1...M2

(7)

if the number of eigenvectors obtained in Eigenfaces is M 2 > M 0 , and M 0 is the number of eigenvectors obtained by the proposing approach. This computation cost reduction for an image will be roughly in order (O(M2 − M 0 ) × N2 ). Due to the fact that N is usually vary large, if M 2 − M 0 is large, the computation cost reduction will be significant when the computation reduction is carried over into large database, threshold selection and comparison processes.

3.3. Threshold Selection The proposed threshold for the Eigenfaces is computed via intra and inter class information gathered from the training database. The intra class (D i ) is a set where the distances between the images of the same individual are calculated as shown in Algorithm 1. This class gives an indication of how similar the images of the same individual are. The inter class (P i ) is a set where the distances between the images of an individual are measured against the images of other individuals in the training database in Algorithm 1. This class indicates how different each image of an individual is when compared to images of other individuals in the database. The individual threshold values (θ i ) are then calculated from each individual’s intra and inter class information which will be described in Algorithm 2. The minimum of θ i over all individuals is then defined as the classification threshold (δ) for the database. The training database requires a suitable number of images per individual in order to obtain sufficient distance information. Large training dataset should result in a better estimation for the threshold. The proposed algorithm for the threshold selection of the Eigenfaces is as follows: Denote I = {1,. . . ,number of individuals} K = {1,. . . ,number of images per individual} Algorithm 1: Finding Individual’s Intra and inter Classes Select the number of eigenvectors (M 0 ) with the algorithm in Section 3.1. for each image Fik where i ∈ I and k ∈ K do Compute wik for k th image of the ith individual based on the M 0 . wik is the image projection into the eigenspace defined in (6). Compute the intra distances 0 2 0 0 dik ik = kwik − wik 0 k where i ∈ I, k ∈ K and k 6= k and the inter distances 2 pjl ik = kwik − wjl k where j ∈ I, j 6= i and l ∈ K end 0 0 Get the intra class (Di ) = {dik ik | k 6= k , k, k ∈ K} jl Get the inter class (Pi ) = {pik | j ∈ I, j 6= i, k, l ∈ K} 0

and sort the Di and Pi in ascending order 0 0 Compute Dimax = max {dik ik | k 6= k , k, k ∈ K}, 0

k∈K

Dimax is a measure of generalisation among images for individual i Compute Pimin = min {pjl ik | j ∈ I, j 6= i, k, l ∈ K}, j∈I

Pimin is a measure of differences between individual i against others. Algorithm 2: Finding The Classification Threshold Denote T Pi0 = F Pi0

Qi kDi k Li kPi k

× 100%

= × 100% where Qi is the number of elements in Di that are less than a given threshold. kD i k is the number of elements in Di . For example, an individual i has s images, then kD i k = (s − 1) ∗ s. Li is the number of elements in Pi that are less than a given threshold. kP i k is the number of elements in Pi . For example, if n individuals with s images per individual, then kP i k = (n − 1) ∗ s2 . Through Dimax and Pimin , the individual threshold (θi ) is defined . If Dimax < Pimin , then θi is defined as: Dimax + Pimin (8) θi = 2 If Dimax > Pimin , then F Pi0 will occur. Therefore, we need to find a value between D imax and Pimin to maximise the T Pi0 and minimise the F Pi0 . As shown in Figure 1, the derivatives of the ROC curve that are less than 1, indicate that small changes in T P i0 will bring significant changes in F P i0 . The points in the ROC curve where their derivatives are greater than 1, indicate that small changes in F P i0 will bring significant changes in T Pi0 . Therefore, a point in the ROC curve where its derivative equals to 1 can balance T Pi0 and F Pi0 . The ROC Graph in this thesis is based on T P 0 and F P 0 instead of True Positive (TP) and False Positive (FP) as defined in the existing literature.

Figure 1. ROC graph

The number of elements in the intra class depends on the number of images per individual; the higher the number of images per individual, the better the approximation of the individual threshold. However, a greater number of images causes a higher computational cost. It follows that choosing each value in D i as threshold will result in a high computational cost O(kD i k). To overcome this issue, only a subset of Di is used to measure the T Pi0 values in this thesis.

Now, we have obtained Di = {di1 , di2 , . . . , dikDi k } where di1 ≤ di2 ≤ . . . ≤ dikDi k . Our aim is to find the index x1 , x2 , . . . , x10 such that Di can be grouped into Di = {Dxi 1 , Dxi 2 , . . . , Dxi 10 } where −



Dxi j = {dixj−1 +1 , . . . , dixj } for j = 1, 2, . . . , 10 with x10 = kDi k, x0 = 0. Further, xj = [xj ] where xj −



x



satisfies kDji k × 100 = j × 10 and [xj ] refers to rounding the value ( xj ) to the nearest integer.  Then, we can obtain thresholds (γji ) = dix1 , dix2 , . . . , dix10 . Now, we can compute the T Pi0 and F Pi0 with each element in (γji ). The derivative (Tij ) is calculated as: Tij

=

(

∆T Pij0 max(F Pi0 ) × ∆F Pij0 100%

)

(9)

where max F Pi0 is the maximum value of F Pi0 ; 0 ∆T Pij0 = T Pi(j+1) − T Pij0 for i ∈ I and j = 1, 2, . . . , 9; 0 and ∆F Pij0 = F Pi(j+1) − F Pij0 for i ∈ I and j = 1, 2, . . . , 9; T Pij0 and F Pij0 are the T Pi0 and F Pi0 values respectively when we choose the threshold γji with j = 1, 2, . . . , 10; If ∆F Pij0 = 0 then we define Tij = 0. From the equation above, one can see that T ij is decreasing with j = 1, 2, . . . , 9 for each i ∈ I. We only need to find a j0 ∈ {1, 2, . . . , 9} such that Tij0 −1 > 1 and Tij < 1. The θi is chosen as the value i . γji0 −1 . If none of the Tij values is less than 1, then the θi is set as γ10 Once θi for a training database has been calculated, classification threshold (δ) is set as the min (θi ). i

4. Experimental Results 4.1. Face Databases Experiments were carried out on five datasets created from the ORL [2] and AMP face database [1] as shown in Table 1. The ORL face database consists of 400 images compiled of 40 individuals with 10 images each. All the images were taken against a dark homogeneous background with various lighting conditions, facial expression (open/closed eyes, smiling/non smiling) and face details (glasses/no glasses). The faces are in a frontal upright position with some head tilting, rotation and scaling. The AMP database contains 975 images comprising 13 individuals with 75 images each. The images were captured over a range of different facial expressions, including smiling, frowning, stern, laughing, shocked and neutral. The query effectiveness is evaluated using precision and recall statistics. Face Database ORL AMP

Dataset

# of person

1 2 3 1 2

36 36 30 11 11

# of images /person 5 5 5 15 10

# of training images 180 180 150 164 110

# of testing images 220 220 250 810 865

Table 1. Five datasets created from ORL and AMP face database

4.2. Eigenvetors Selection The eigenvector selection is based on the relative errors of the eigenvalues. Figure 2 shows the relative errors of the eigenvalues for the first ORL dataset. As illustrated in Figure 2, the class 1 consists of

the relative errors corresponding to larger eigenvalues. The index of this class is 17, which is then set as the number of eigenvectors (M 0 ). The performance of the selected (M 0 ) value with the proposed threshold method is shown in Figure 3. The precision and recall for the (M 0 ) value are 85.24% and 86.40% respectively. For comparison, we implemented the Energy and Stretching Dimension [17] to select the (M 0 ). The thresholds for these methods were selected based on the selection method in Section 3.2. The comparative results of these methods and the proposed method are shown in Table 2. The performance for the Energy Dimension is 79.64% for precision and 89.87% for recall while the performance for Stretching Dimension is 86.49% in precision and 85.91% in recall. These performance rates are comparable to the proposed algorithm. However, the proposed algorithm uses much fewer eigenvectors and much lower computation cost for comparable classification results. In simulation, the first ORL dataset requires 29.22 seconds with the proposed algorithm whist it requires 65.99 seconds for Energy Dimension and 72.03 seconds for Stretching Dimension. The simulation was conducted on AMD Athlon 1.7GHz, 512Mb DDRRAM running Windows XP with Matlab version 6.5. The time was measured as the total time required for training and testing each dataset. The relative errors classification of ORL Dataset 1

40

class 2 class 1

35

relative error (%)

30

25

20

15

10

5

0

0

10

20

30

40

50

60

70

80

90

eigenvalue

Figure 2. The relative errors classification for the first ORL dataset

100

The accuracy in various eigenvectors on ORL Dataset 1

90

80

accuracy (%)

70

60

50

40

30

20

precision recall 10

20

30

40

eigenvectors

50

60

70

80

Figure 3. The accuracy in various eigenvectors for the first ORL dataset

4.3. Threshold Selection The threshold value varies depending on the number of eigenvectors chosen to create the eigenspace. Prior to the threshold selection, the eigenvectors selection is performed to find the suitable number of eigenvectors (M 0 ).

Method Proposed Algorithm Energy Dimension Stretching Dimension

M0

Precision (%)

Recall (%)

Time (secs)

17

85.24

86.40

29.22

61

79.64

89.87

65.99

70

86.49

85.91

72.03

Table 2. Comparison with other eigenvectors selection methods on the first ORL dataset

The appropriate (M 0 ) for the first ORL dataset was selected as 17. Figure 4 shows the individual threshold (θi ) values for the first ORL dataset derived using our threshold selection criteria. The classification threshold (δ) was chosen as the minimum of the (θ i ) values, which was 1.68. As shown in Figure 5, the precision and recall rates of the first ORL dataset vary under different threshold values and the (θi ) value gives recognition with the precision and recall rates of 85.24% and 86.4% respectively. The threshold selection has performed significantly well as the precision and recall rates are balanced. Inidividual Threshold for ORL Dataset 1

7

6

threshold

5

4

3

2

individual threshold

1

5

10

15

individual

20

25

30

Figure 4. The individual threshold values for the first ORL dataset

100

Performance on variable threshold values for ORL Dataset 1

95

accuracy (%)

90

85

80

75

70

65 1.3

precision recall 1.4

1.5

1.6

1.7

threshold value

1.8

1.9

2

2.1

Figure 5. The performance on various threshold values for the first ORL dataset

4.4. Evaluation of the Threshold and Eigenvectors Selection on Other Datasets The performance of the eigenvectors and threshold selection has shown an optimal recognition with the first ORL dataset. In this section, we evaluate the performance of the eigenvectors and threshold selection

with four different datasets. The performance results of these datasets are shown in Table 3. The precision and recall for the second ORL dataset are 92.41% and 83.91% respectively. The performance on the third ORL dataset shows 76.47% in precision and 79.76% in recall. The precision and recall for the first AMP dataset shows a near optimal with the rate of 99.85% and 100% respectively while for the second AMP dataset, it rates of 100% in both precision and recall. The results in these datasets have shown that our eigenvectors and threshold selection have performed well with fewer eigenvectors. The results for the energy and stretching dimension methods in selecting the eigenvectors are shown in Table 4. Both methods achieve comparable precision and recall rates with the proposed eigenvectors selection method. In contrast, the proposed method is able to remove unnecessary eigenvectors in creating the eigenspace and thus in less computational when compared to the other eigenvectors selections methods. Dataset ORL 2 ORL 3 AMP 1 AMP 2

M0

Threshold

12 10 15 13

1.77 2.17 0.59 0.44

Precision (%) 92.41 76.47 99.85 100

Recall (%) 83.91 79.76 100 100

Time (secs) 27.18 34.66 43.98 35.33

Table 3. The performance with four different datasets

Dataset ORL 2

ORL 3

AMP 1

AMP 2

Method Energy Dimension Stretching Dimension Energy Dimension Stretching Dimension Energy Dimension Stretching Dimension Energy Dimension Stretching Dimension

M0

Precision (%)

Recall (%)

Time (secs)

71

95.48

83.15

80.79

74

95.48

83.15

83.29

66

75.84

81.82

80.17

66

75.84

81.82

80.40

22

99.85

100

71.02

33

99.85

100

88.31

20

100

100

51.63

33

98.62

100

88.31

Table 4. The results of other eigenvectors selection methods on other datasets

5. Conclusions This paper has investigated the Eigenface algorithm. Specifically, we studied the eigenvector selection issue and introduced an alternative approach in selecting the number of eigenvectors. In addition, we proposed a method in selecting the classification threshold using the distance information obtained from the training database. Experimentation has been conducted with the ORL and AMP face databases. Results have shown that the automatic selection methods in eigenvectors and threshold provide an optimal

recognition performance. In addition, the eigenvectors selection method has shown to outperform the energy and stretching dimension methods in terms of computational cost, precision and recall rates.

References [1] Advanced multimedia processing (amp database). [Online] http://amp.ece.cmu.edu. [2] At&t laboratories cambridge (orl database). [Online] http://www.cam-rol.co.uk/facedatabase.html. [3] T. E. Campos, R. S. Feris, and R. M. J. Cesar. Eigenfaces versus eigeneyes: First steps toward performance assessment of representarions for face recognition. Lecture Notes in Artificial Intelligence, 1793:197–206, 2000. [4] R. Chellappa, C. L. Wilson, and S. Sirohey. Human and machine recognition of faces: A survey. Proceeding of the IEEE, 83(5):705–740, 1995. [5] Z. M. Hafed and M. D. Levine. Face recognition using the discrete cosine transform. International Journal of Computer Vision, 43(3):167–188, 2001. [6] M. Kirby. Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Pattern. Wiley, New York, 2000. [7] M. Kirby and L. Sivorich. Application of the karhunen-lo´e8ve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1):103–108, 1990. [8] J. Kittler, Y. P. Li, and J. Matas. On matching scores for lda-based face verification. In Proc British Machine Vision Conference BMVC2000, 2000. [9] L. Lorente and L. Torres. A global eigen approach for face recognition. In International Workshop on Very Low Bit-rate Video Coding, Urbana, Illinois, USA, 1998. [10] G. L. Marcialis and F. Roli. Fusion of lda and pca for face verification. In Biometric Authentication, volume LNCS 2359, pages 30–37, 2002. [11] B. Moghaddam, T. Jebara, and A. Pentland. Bayesian modeling of facial similarity. In Advances in Neural Information Processing Systems (NIPS’98), pages 910–916, 1998. [12] B. Moghaddam and A. Pentland. Face recognition using view-based and modular eigenspaces. In In Automatic Systems for the Identification and Inspection of Humans, volume 2277. SPIE, 1994. [13] H. J. Moon and P. J. Phillips. Analysis of pca-based face recognition algorithms. In EEMCV98, 1998. [14] G. Shakhnarovich, J. W. Fisher, and T. Darrell. Face recognition from long-term observations. In Proceedings of the European Conference on Computer Vision, volume 3, pages 851–868, 2002. [15] M. Turk. A random walk through eigenspace. IEICE Transaction Information and Systems, E84-D(12):1586– 1595, 2001. [16] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 13(1):71–86, 1991. [17] W. S. Yambor. Analysis of pca-based and fisher discriminant-based image recognition algorithms, 2000. Technical report CS=00=013, Computer Science Department, Colorado State University, Fort Collins.