Extended feed forward neural networks with random ...

6 downloads 0 Views 129KB Size Report
Dec 23, 2013 - Jing Lu, Jianwei Zhao, Feilong Cao. ∗. Department of Information Sciences and Mathematics, China Jiliang University,. Hangzhou 310018 ...
Extended feed forward neural networks with random weights for face recognition Jing Lu, Jianwei Zhao, Feilong Cao∗ Department of Information Sciences and Mathematics, China Jiliang University, Hangzhou 310018, Zhejiang Province, China

Abstract Face recognition is always a hot topic in the field of pattern recognition and computer vision. Generally, images or features are often converted into vectors in the process of recognition. This method usually results in the distortion of correlative information of the elements in the vectorization of an image matrix. This paper designs a classifier called two dimensional neural network with random weights (2D-NNRW) which can use matrix data as direct input, and can preserve the image matrix structure. Specifically, the proposed classifier employs left and right projecting vectors to replace the usual high dimensional input weight in the hidden layer to keep the correlative information of the elements, and adopts the idea of neural network with random weights (NNRW) to learn all the parameters. Experiments on some famous databases validate that the proposed classifier 2DNNRW can embody the structural character of the face image and has good performance for face recognition. Keywords: Face recognition; Classifier; Neural network with random weights (NNRW); Matrix data.

1. Introduction The strong adaptability, high security, and non-contact smart interaction of face recognition make it of great potential in applications, such as public security, intelligent access control, criminal investigation, and so on. Thus face recognition ∗

Corresponding author. [email protected]

Preprint submitted to Neurocomputing

December 23, 2013

becomes a hot topic increasingly in the fields of pattern recognition and computer vision. Traditional face recognition system generally contains four steps: Face detection, image preprocess, feature extraction, and classification with some classifier, and among which feature extraction and classification are the cores. After decades of developments, there have been various effective methods for feature extraction and classifiers in the field of automatic face recognition [1]. For example, classical feature extraction methods include eigenfaces [2], fisherfaces [3], independent component analysis (ICA) [4], laplacianfaces [5], kernel tricks [6, 7], and so on. And the popular applied classifiers contain the nearest neighbor (NN) network [8, 9], support vector machine (SVM) [10, 11], feed-forward neural network (FNN) [12], and so on. It is easy to find that these existing feature extraction and classification methods only take effect for vector input. That is, before applying these methods to deal with the face recognition, the matrix data of face image must be converted into a row or column vector. However, this method usually will destroy the relationship among elements of the original matrix data, which may affect the extracted feature and the subsequent classification results. Recently, many researchers have proposed some two-dimensional feature extraction methods that operate on the matrix data directly, e.g. two-dimensional principal component analysis (2DPCA) [13, 14] and two-dimensional linear discriminant analysis (2DLDA) [15, 16, 17], which have been verified useful for extracting effective information of the neighbouring elements as well as reducing the computational complexity of the extraction. On the other hand, for the existing classifiers, such as SVM and FNN, when they are used to classify, we have to convert the extracted feature matrices into column vectors, which will result in that the neighbouring information of the feature matrix usually is destroyed and the recognition rate is decreased. Although the NN classifier can be used to classify the feature matrices directly because of the same distance for the form of matrix or vector, its structure is so simple that it usually can not achieve the recognition rate we expected. Therefore, it is meaningful to study the classifier applied for the matrix data, i.e. 2D input. So to classify matrix data directly and to preserve the matrix or 2D feature

2

structure effectively, we will propose a novel classifier, named two-dimensional neural network with random weights (2D-NNRW), which will also be used availably for face recognition. To construct the classifier, we will employ a kind of special feed forward networks introduced firstly in [18], named neural networks with random weights (NNRW). These networks have fast learning speed because of its randomly chosen input weights and biases. Meanwhile, it still can achieve good classification performance [18, 19, 20, 21, 22, 23, 24]. So the designed classifier can achieve efficient classification. The proposed model employs the left projecting vector and right projecting vector to regress each matrix data to its label for each class, which is inspired by a recent matrix input-based classifier named multiple rank regression (MRR) [25], and uses the random idea to train the weights. So the proposed classifier can achieve the higher accuracy compared with some vector-based regression methods, and it has a fast training speed by means of randomly choosing projecting vectors and biases of the hidden layer. The rest of this paper is as follows. In Section 2, we propose a new classifier 2D-NNRW and its corresponding algorithm to maintain the structural properties of the matrix data based on NNRW. In Section 3, we firstly carry out experiments on recognition rates on some famous face image databases to verify the effectiveness of 2D-NNRW, then analyze the nature of the proposed model, and lastly we combine the proposed classifier 2D-NNRW with some 2D feature extraction methods to obtain a new face recognition tool and compare it with some other methods. Conclusions based on the study are highlighted in Section 4.

2. The Proposed Method 2.1. The Notations First, we give some important notations in Table 1, and the other notations as well as their concrete meanings will be explained when they are firstly used.

3

Table 1: Notations m

The first dimensionality of matrix data

n

The second dimensionality of matrix data

N

Number of training data

c

Number of class

yi ∈ R

d

Xi ∈ R

m×n

The ith training vector data The ith training matrix data

ti ∈ R

The label vector of Xi or yi

ϕ:R→R

The active function

L

The number of hidden node

c

bj ∈ R

The bias of the jth hidden node m

uj ∈ R

The left projecting vector on the jth hidden node

vj ∈ R

n

The right projecting vector on the jth hidden node

2.2. A Brief Review of NNRW It is well known that an FNN with single hidden layer can be mathematically modeled as fL (y) =

L ∑

( ) βi ϕ wi⊤ y + bi ,

(1)

i=1

where L is the number of hidden nodes, ϕ : R → R is the active function, y = (y 1 , y 2 , . . . , y d )⊤ ∈ Rd is the input vector, wi = (wi1 , wi2 , . . . , wid )⊤ ∈ Rd is the input weight connecting the i-th hidden node to the input, bi ∈ R is the bias of the i-th hidden node, and βi is the output weight, i = 1, . . . , L. According to the conventional FNNs theory, the hidden layer parameters wi , bi , and the output weight βi (i = 1, . . . , L) are required freely adjustable. In the supervised learning theory, the hidden layer parameters and the output weights need to be trained and tuned properly for the given set of training samples. One of the famous algorithms for training the weights and biases is the error backpropagation (BP) algorithm, where the gradient descent method is employed to adjust all the weights and biases. However, BP algorithm generally converges very slowly due to the process of iterations and it easily falls into the local minima. A fast learning algorithm for the FNNs with single hidden layer, called neural network with random weights (NNRW) was first proposed in [18], and was developed in [19], [20], [21], [22], [23], and [24]. Its main idea is as follows: For a 4

given set of training samples, choose the input weights and biases randomly, i.e., each element of the input weights and biases are considered as random variables, then the output weights can be calculated by using Moore-Penrose generalized inverse. Actually, the vector of output weights can be expressed as follows (see [21]):

2

L N ∑



βj ϕ(wj∗ ⊤ yi + b∗j ) − ti , βˆ = arg min

β

j=1

i=1

where the

wj∗

and

b∗j

2

are input weights and biases which were randomly generated,

j = 1, 2, . . . , L, β = [β1 , β2 , . . . , βL ]⊤ , and {(yi , ti )}N i=1 is the set of given training samples and their corresponding true output. Rewriting the above model in the form of matrix, we have βˆ = arg min ∥Hβ − T ∥2F , β

where

  H= 

ϕ(w1∗ ⊤ y1

+

.. .

b∗1 )

···

ϕ(wL∗ ⊤ y1

···

+

.. .

b∗L )

ϕ(w1∗ ⊤ yN + b∗1 ) · · · ϕ(wL∗ ⊤ yN + b∗L )

is the hidden output matrix, and

   



 t⊤  .1  .  T =  . . t⊤ N

Then βˆ = H † T by means of Moore-Penrose generalized inverse H † of H. Remark 2.1 For the case of classification using the NNRW, the objective value of each sample is a label vector. That is, if there are c classes to be classified, then we usually set ti = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rc , where 1 only appears in the i-th position. Remark 2.2 In the case of recognition, for a new input z ∈ Rd , its real output of the learned NNRW can be computed as fL (z) =

L ∑

) ( ∗ ∗⊤ ˆ βi ϕ wi z + bi ∈ Rc .

i=1

Then we can judge the class that the z belongs to by means of finding the position of the maximum component of the fL (z). 5

2.3. The Proposed 2D-NNRW As we all know, the traditional methods for face recognition often concatenate each image as a row or column vector, then use vector-based classifier to classify those vectors. However, with the development of image processing techniques, researchers have expended feature extraction methods from 1D (vector) to 2D (matrix), for example, 2DPCA and 2DLDA. These feature extraction methods can preserve the structural characters of the elements in the original images, so they are more suitable for image classification than other methods [26]. Then the following problem is how to classify these matrix-based features. In order to preserve the structural information of the extracted features, it is meaningful to design a classifier for the matrix input. m×n For a given set of matrix features {(Xi , ti )}N , ti ∈ Rc , we i=1 , where Xi ∈ R

construct the following 2D-FNN with single hidden layer as an approximator: fL (X) =

L ∑

( ) βj ϕ u⊤ j Xvj + bj ,

(2)

j=1

where X ∈ Rm×n , uj ∈ Rm , vj ∈ Rn , bj ∈ R, and βj ∈ Rc , j = 1, 2, . . . , L. Remark 2.3 Let x = Vec(Xm×n ) and wj = Vec(uj vj⊤ ), where Vec(·) denotes the process of displaying a matrix with m × n into a column vector with mn elements, then ⊤ ⊤ ⊤ ⊤ u⊤ j Xvj = Tr(uj Xvj ) = Tr(Xvj uj ) = Tr((uj vj ) X) ( )⊤ = Vec(uj vj⊤ ) Vec(X) = w⊤ j x.

So we can see that the 2D-FNN is equivalent to an FNN as in (1) with mn inputs. Remark 2.4 Although 2D-FNN is equivalent to an FNN, it does not need to convert a matrix into a column vector. So it preserves the structural information among the elements of the matrix, which is very important for the subsequent classification. Remark 2.5 For a same image X, there are different numbers of parameters to be calculated for the 2D-FNN and FNN, respectively. For example, there are only (m+n+1+c)L parameters to be computed for 2D-FNN, while (mn+1+c)L for FNN. 6

Remark 2.6 When X is a matrix of m×1, then uj is an m×1 column vector, and vj is a 1 × 1 vector, i.e. vj is a number, then 2D-FNN reduced to FNN given in (1). Now we begin to design a learning algorithm to determine all the weights and biases. Inspired by the idea of NNRW [18], we choose the left and right projecting vectors u∗j and vj∗ , and the bias b∗j randomly, j = 1, 2, . . . , L. That is, each element of the projecting vectors and biases obey some distributions on (0, 1). Then the following problem is how to determine the output weights βi , i = 1, 2, . . . , L. After choosing the projecting vectors and biases, the problem of interpolation becomes a problem of solving a system of linear equations as follows: Gβ = T + ε, where



 ϕ(u∗1 ⊤ X1 v1∗ + b∗1 ) · · · ϕ(u∗L ⊤ X1 vL∗ + b∗L )   .. ..  G= . · · · .   ∗⊤ ∗ ∗ ∗⊤ ∗ ∗ ϕ(u1 XN v1 + b1 ) · · · ϕ(uL XN vL + bL )

(3)

is the hidden output matrix, ε is the error of the system,     β1⊤ t⊤ 1  .    ..  , and T =  ...  . β=     ⊤ ⊤ βL tN So the output weights can be solved by the following optimal

2 N ∑ L



βˆ = arg min βj ϕ(u∗j ⊤ Xi vj∗ + b∗j ) − ti .

β

i=1

j=1

2

With the least square method, βˆ = G† T, where G† is the Moore-Penrose generalized inverse of G. We call the above algorithm two dimensional neural network with random weights (2D-NNRW), where 2D represents that it is designed for the matrix-input data. Now the concrete process is given in the following Algorithm 2.7, i.e. 2D-NNRW. Remark 2.8 The NNRW generates the mn dimensional random vectors wj , j = 1, 2, . . . , L to make the arbitrarily linear combination of the elements in an m × n image Xi . However, the 2D-NNRW generates the m dimension row 7

Algorithm 2.7: 2D-NNRW Input: Given a set of sample data {(Xi , ti )|Xi ∈ Rm×n , ti ∈ Rc , i = 1, . . . , N }, the number L of hidden nodes, and the active function ϕ. Step 1. Randomly generate the left projecting vector u∗j , right projecting vector vj∗ and biases b∗j , j = 1, 2, . . . , L. Step 2. Compute the hidden output matrix G as in (3), where its element in the position (i, j) is ϕ(u∗j ⊤ Xi vj∗ + b∗j ). Step 3. Calculate the output weights βˆ = G† T . ) ( ∑ Output: The determined network fL (X) = Lj=1 βˆj ϕ u∗j ⊤ Xvj∗ + b∗j .

vectors and the n dimension column vectors, and it firstly makes arbitrarily linear combination of the rows of image Xi , and then makes arbitrarily linear combination of the obtained row vectors. Thus, to some extent, it preserve the structure of the original matrix data. Its good effects in classification will be seen in the experiments of Section 3. Remark 2.9 Since the input weights and biases are generated randomly, the learned network for once will be unstable [22]. So we usually take the average value of p runs as the final network.

3. Performance Evaluation 3.1. Databases The ORL database [27]. The ORL database contains images from 40 distinct subjects, each subject has 10 different images. For some subjects, the images were taken at different times, under varying lighting conditions, and also have different facial expressions or facial details. The FERET database [28]. The images of FERET database were collected in a semi-controlled environment. The database contains a total of 14,126 images from 1199 individuals. For some individuals, over two years had elapsed between their first and last sittings, with some subjects being photographed multiple times. Here, we chose 72 subjects with 6 frontal images per person for experiments. 8

The Aberdeen database (see http://pics.psych.stir.ac.uk/2D_face_ sets.htm). Aberdeen database includes 687 colour faces from Ian Craw at Aberdeen. Each of the 90 individuals has 1 to 18 different number of images. Almost all are frontal images with variations in lighting. The Extended Yale B database [29, 30]. The Extended Yale B database contains 2414 frontal face images of 38 individuals. The cropped and normalized 192 × 168 images were captured under various laboratory-controlled lighting conditions. In this paper, we take the images with most neutral light sources as training data, and the not so dark images as the testing data. 3.2. Recognition Rate Comparison of the Proposed 2D-NNRW with NNRW In this subsection, we carry out some experiments to show the proposed 2DNNRW can achieve higher recognition rate than the NNRW, or in other words, our method is more suitable for face image recognition. 1

1

0.9

0.9

0.8 Recognition Rate (%)

Recognition Rate (%)

0.8 0.7 0.6 0.5 0.4 0.3

NNRW Test 2D−NNRW Test NNRW Train 2D−NNRW Train

0.2 0.1 0

0

200 400 600 800 1000 Number of Hidden Neurons

1200

0.7 0.6 0.5 NNRW Test 2D−NNRW Test NNRW Train 2D−NNRW Train

0.4 0.3 0.2

0

200 400 600 800 Number of Hidden Neurons

(a) ORL Database

1000

(b) FERET Database

Figure 1: Recognition rate comparison of the proposed 2D-NNRW with NNRW under different number of hidden nodes on ORL and FERET databases Figure 1 is the recognition rate comparison of the proposed 2D-NNRW with NNRW under different numbers of hidden nodes on ORL and FERET databases. Here, the final rates are the average value of 20 runs. Meanwhile, in order to reflect the performance of these two classifiers more clearly and truly, we did 9

not carry out feature extraction before classification. That is, we just classified the original images. In Figure 1 (a), we used the first 5 images of each subject for training, and the remaining images for testing. And in Figure 1 (b), we selected 72 subjects that have at least 6 frontal images taken at different times, and randomly chose 3 images of these subjects for training, and the remaining images for testing. From Figure 1, we found that the recognition rate of our proposed 2D-NNRW is obviously superior than that of the NNRW. With the increase of the numbers of hidden nodes, the recognition rates of two classifiers tend to stable. Although the gap between two recognition rates is getting narrow, the advantage of 2D-NNRW still exists. Table 2: Recognition rate comparison of the proposed 2D-NNRW with NNRW on ORL database (%) Tests

NNRW

Proposed 2D-NNRW

training

testing

training

testing

Test 1

100

89.80

100

91.90

Test 2

100

89.35

100

91.23

Test 3

100

84.65

100

87.35

Test 4

100

87.63

100

89.53

Table 3: Recognition rate comparison of the proposed 2D-NNRW with NNRW on FERET database (%) Tests

NNRW

Proposed 2D-NNRW

training

testing

training

testing

Test 1

100

81.00

100

87.06

Test 2

100

82.82

100

88.45

Test 3

100

80.30

100

86.16

Test 4

100

84.19

100

88.31

Change the training images of each subject, we carried out the same experiments as above on the ORL and FERET databases. Their experimental results are shown in Table 2 and Table 3. Here we used 1000 hidden nodes, and the recognition rates are the average values over 20 runs. 10

From the Table 2 and 3, we found that the improvement of the recognition rate of the proposed method is not by chance. Figure 2 is the recognition rate comparison of the proposed 2D-NNRW with NNRW under different number of hidden nodes on the Aberdeen database. Here we chose 60 subjects that has at lease 4 frontal images, and used the two images under nearly neutral illumination of each subject for training, the remaining two illumination images for testing. In the figure, 2D means the 2D-NNRW, and 1D represents the NNRW, and the number followed is the number of hidden nodes. In this experiment, besides showing the recognition rates under different number of hidden nodes, we also repeated each experiment 10 times, and recorded the average recognition rate of 20 runs in each time. The reason to do so is tend to show that the average recognition rate of 20 runs is relatively stable, so we can take this average recognition rate as these two networks’ corresponding recognition rate. In the remaining paper, we all use the average recognition rate of 20 runs as the final recognition rate. 0.95 2D−500

2D−600

2D−700

2D−800

1D−500

1D−600

1D−700

1D−800

Recognition Rate (%)

0.9

0.85

0.8

0.75

0.7

1

2

3

4

5

6

7

8

9

10

Times

Figure 2: Recognition rate comparison of 2D-NNRW with NNRW under different numbers of hidden nodes and repeated times on Aberdeen database Observed from Figure 2, we can get the same conclusion as that on ORL and FERER databases. That is, the proposed 2D-NNRW has better performance than NNRW for face images classification. At the end of this subsection, we will further show the performance comparison with different number of training samples and testing samples. The results can be found in Figure 3. Figure 3 (a) is the recognition rates with the number of training samples from 3 to 7 on ORL database, and Figure 3 (b) is the recognition 11

rates with the number of training samples from 2 to 5 on FERET database. In these two figures, like in Figure 2, 2D means the 2D-NNRW, and 1D represents the NNRW, and the number followed is the number of hidden nodes. We can see that under the same number of hidden nodes, 2D-NNRW always has much better performance than NNRW, and when uses more hidden nodes (900 in the experiment), with the increase of the number of training samples, both networks have increasing recognition rates, however when uses less hidden nodes (500 in the experiment), the recognition rate of NNRW drops dramatically with the increase of the number of training samples, but 2D-NNRW can keep relatively stable and

95

95

90

90

85

85 Recognition Rate (%)

Recognition Rate (%)

higher performance.

80 75 70 65 1D−900 2D−900 1D−500 2D−500

60 55 50

3

4 5 6 Number of Training Samples

(a) ORL Database

80 75 70 65 1D−900 2D−900 1D−500 2D−500

60 55

7

50

2

3 4 Number of Training Sample

5

(b) FERET Database

Figure 3: Recognition rate comparison of the proposed 2D-NNRW with NNRW under different number of training samples on ORL and FERET databases

3.3. Performance Analysis for 2D-NNRW Since all our experiments showed that 2D-NNRW has better performance than NNRW, then what is the essential difference between 2D-NNRW and NNRW? Is it really the structural character of the 2D-NNRW that makes sense? To see this, we will make some analysis on the proposed 2D-NNRW. From Remark 2.3, we found that the structure of 2D-NNRW is equivalent to 12

that of NNRW, and the only difference between them is the generation rule of the input weights. From the previous section, we know that our generation rule for input weights is more suitable for face image classification. Now let us study more about this rule. In NNRW, the input weights wj and biases bj are both random vectors obeying some continuous probability distribution, and the usually used distribution is uniform distribution on interval [−1, 1]. However, in the proposed 2D-NNRW, the equivalent input weight wj is obtained by wj = Vec(uj vj⊤ ), where uj and vj are two random column vectors obeying uniform distribution on the interval [−1, 1]. Hence wj actually obeys the product distribution of two uniform distributions, j = 1, 2, . . . , L. The probability density function of this distribution is ∫ pw (z) =

+∞

−∞

1 p |y|

( ) z p(y)dy. y

When the distribution interval is taken to be [−1, 1], we have p(x) = 1/2, and { − 21 ln z, 0 < z ≤ 1; pw (z) = 1 − 2 ln(−z), −1 ≤ z < 0. Furthermore, uj vj⊤ is a matrix with rank-1. As mentioned in Section 2, it can preserve the structural character among the matrix elements. Then what on earth makes the classification performance different, distribution or structural character? 0.9 1D−500

P−500

2D−500

1D−900

P−900

2D−900

Recognition Rate (%)

0.85

0.8

0.75

0.7

0.65

1

2

3

4

5

6

7

8

9

10

Times

Figure 4: Impact from distribution Figure 4 shows the recognition rates of NNRW with input weights obeying uniform distribution and pw distribution (we can get such distribution weight 13

vectors through Cartesian product of two uniform distributions vectors), respectively, and the recognition rate of 2D-NNRW under 500 and 900 hidden nodes on ORL database. The symbol P − 500 represents the recognition rate of NNRW with input weights obeying pw distribution with 500 hidden nodes, and P − 900 is similar. From Figure 4, we can see that although the distribution have some impacts on the results, the improvement of recognition rate of the proposed 2D-NNRW is not only up to the change of distribution. Meanwhile, we have made the same comparison on Extended Yale B and FERET databases, and its results are shown in Table 4, where Group 1 and Group 2 represent the experiments under 500 hidden nodes and 900 hidden nodes, respectively. And NNRW-P represents NNRW with the input weights obeying pw distribution. Observed from Table 4, the proposed 2D-NNRW can achieve higher recognition rates than the original network. Although the new distribution of input weights can also improve the rates of NNRW, the proposed 2D-NNRW has the highest. So we can say that it is not the distribution that makes 2D-NNRW and NNRW different. Table 4: Recognition rate comparison on Extended Yale B and FERET databases with the input weights in NNRW obeying uniform distribution (%) Database

Group 1

Group 2

NNRW

NNRW-P

2D-NNRW

NNRW

NNRW-P

2D-NNRW

YaleB

87.24

91.98

97.00

95.31

96.22

98.65

FERET

72.85

80.12

86.46

87.78

89.72

91.94

Next we further illustrate the performance of 2D-NNRW by changing the distributions in NNRW and 2D-NNRW from uniform distribution to standard normal distribution, and the results are shown in Table 5. Similar to Table 4, Group 1 and Group 2 represent the recognition rates under 500 hidden nodes and 900 nodes, respectively. And NNRW-P represents NNRW with input weights obeying product distribution of two standard normal distributions. Table 5 exposes that when the input weights in NNRW obeying the stan14

dard normal distribution, the impact of product distribution on NNRW is small, some even lower than that of NNRW. However the proposed 2D-NNRW still has obviously outstanding performance than others. Hence we can say that it is really the structural character of the 2D-NNRW that makes sense in recognition rate. And it also makes us confident that the proposed 2D-NNRW based on the matrix-input data is more suitable for face image classification. Table 5: Recognition rates comparison on different databases with the input weights in NNRW obeying standard normal distribution (%) Database

Group 1

Group 2

NNRW

NNRW-P

2D-NNRW

NNRW

NNRW-P

2D-NNRW

ORL

59.10

61.05

70.05

80.15

80.80

82.15

YaleB

81.68

82.89

95.68

93.52

92.96

98.32

FERET

64.58

64.98

80.72

85.21

84.54

89.68

Finally in this section, we will compare the stability between NNRW and the proposed 2D-NNRW. Actually, we intuitively believe that the proposed 2DNNRW can be more stable than NNRw since the total numbers of parameters in 2D-NNRW is (m + n + 1)L that far less than the (mn + 1)L in NNRW. It means that there are less uncertain factors in 2D-NNRW than those in NNRW. And the following experimental results in Table 6 actually illustrate our belief. In this experiment, we recorded the standard deviation of 20 runs, and Group 1 and Group 2 still represent the results of NNRW and 2D-NNRW with 500 and 900 hidden neurons, respectively. Table 6: Standard deviation comparison of the proposed 2D-NNRW with NNRW on different databases Database

Group 1

Group 2

NNRW

2D-NNRW

NNRW

2D-NNRW

ORL

0.0387

0.0271

0.0189

0.0122

YaleB

0.0196

0.0163

0.0088

0.0061

FERET

0.0320

0.0198

0.0150

0.0099

Aberdeen

0.0296

0.0191

0.0218

0.0128

As a matter of fact, from Table 6 we can see that for all these four databases, 15

under the same number of nodes, the standard deviations of 2D-NNRW are always smaller than that of NNRW, which is in accordance with our analysis. 3.4. Combine 2D Feature Extraction Method with 2D-NNRW As we all know, feature extraction and classification are two key steps in face recognition. In this subsection, we will combine 2D-NNRW with 2D feature extraction method, bi-directional two dimensional principal component analysis (B2DPCA) [31] to obtain a novel face recognition method. Table 7 shows the recognition rate comparison of different classifiers with the same feature extraction tool B2DPCA on three databases. That is, we compare the proposed B2DPCA+2D-NNRW with B2DPCA+NN (nearest network) and B2DPCA+SVM. We design this experiment since NN is also a classifier which can be combined with 2D feature extraction method directly, and we also want to show that 2D feature extraction method combined with 1D classifier is not so favorable. The symbol in the parenthesis means the metric we used in NN. Here we chose the best metric for each database. Table 7: Recognition rate comparison of the classifiers with B2DPCA as the feature extraction tool (%) 2D-

Database

NN

SVM

ORL

91.50(L2 )

92.00

94.12

FERET

86.11(L1 )

74.07

93.24

Aberdeen

88.33(L1 )

70.00

94.29

NNRW

From Table 7, we can see that the proposed B2DPCA+2D-NNRW always has the highest recognition rate among three methods. As for B2DPCA+NN and B2DPCA+SVM, we can see that only on ORL database, B2DPCA+SVM has little better performance than B2DPCA+NN, while on the other two databases, B2DPCA+SVM has much poorer performance than B2DPCA+NN. Therefore, we can say that our new classifier is really effective and has potential practical utility.

16

4. Conclusions Since the traditional classifiers for face recognition are usually designed for the vector data, ones should firstly convert the face images or their features into vectors, which unavoidably destroys the structural correlation among elements that may subsequently influence recognition performance. Furthermore, though there have been 2D feature extraction methods, there is few 2D effective classifiers. In this paper, we designed a new matrix-input-based classifier, 2D-NNRW, for face recognition. In the proposed 2D-NNRW, we used a left projecting vector and a right projecting vector to replace the high dimensional input weight vector in FNN with single hidden layer to preserve the structure information of matrix data, and learn the weights with the random idea. The recognition rate comparison showed that the proposed 2D-NNRW can really improve recognition performance, and our analysis for the proposed 2D-NNRW further showed that it was really structural character that contributed to this improvements. Acknowledgments. The research was supported by the National Nature Science Foundation of China (Nos. 61101240, 61272023, 91330118). [1] F. Camastra, A. Vinciarelli, Automatic face recognition, In: Machine Learning for Audio, Image, and Video Analysis: Theory and Applications, Springer (2008) 381-411. [2] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognitive Neuroscience 3(1) (1991) 71-86. [3] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Trans. Pattern Analysis and Machine Intelligence 19 (7) (1997) 711-720. [4] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independent component analysis, IEEE Trans. Neural Networks 13(6) (2002) 14501464. [5] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using laplacianfaces, IEEE Trans. Pattern Analysis and Machine Intelligence 27(3) (2005) 328-340. [6] M.H. Yang, N. Ahuja, D. Kriegman, Face recognition using kernel eigenfaces, In Proc. IEEE Int. Conf. Image Processing (2000) 37-40. 17

[7] M.H. Yang, Kernel eigenfaces vs. kernel fisherfaces: Face Recognition using kernel methods, In Proc. Int. Conf. Automatic Face and Gesture Recognition (2002) 215-220. [8] B. Poon, M.A. Amin, H. Yan, Performance evaluation and comparison of PCA Based human face recognition methods for distorted images, Int. J. Machine Learning and Cybernetics 2(4) (2011) 245-259. [9] V.P. Vishwakarma, Illumination normalization using fuzzy filter in DCT domain for face recognition, Int. J. Machine Learning and Cybernetics (2013) 1-18. [10] G. Guo, S.Z. Li, and K. Chan, Face recognition by support vector machines, In Proc. Fourth IEEE Int. Conf. Automatic Face and Gesture Recognition (2000) 196-201. [11] J. Qin, Z.S. He, A SVM face recognition method based on Gabor-featured key points, In Proc. IEEE Int. Conf. Machine Learning and Cybernetics 8 (2005) 5144-5149. [12] M.J. Er, S. Wu, J. Lu, H.L. Toh, Face recognition with radial basis function (RBF) neural networks, IEEE Trans. Neural Networks 13(3) (2002) 697-710. [13] J. Yang, D. Zhang, A.F. Frangi, J. Yang, Two-dimensional PCA: A new approach to appearance-based face representation and recognition, IEEE Trans. Pattern Analysis and Machine Intelligence 26(1) (2004) 131-137. [14] D. Zhang, Z.H. Zhou, (2D)2 PCA: two-directional two-dimensional PCA for efficient face representation and recognition, Neurocomputing 69(1) (2005) 224-231. [15] M. Li, B. Yuan, 2D-LDA: A statistical linear discriminant analysis for image matrix, Pattern Recognition Letters 26(5) (2005) 527-532. [16] P. Sanguansat, W. Asdornwised, S. Jitapunkul, S. Marukatat, Twodimensional linear discriminant analysis of principle component vectors for face recognition, In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, 2 (2006) 345-348. [17] C. Lu, S.J. An, W.Q. Liu, X.D. Liu, An innovative weighted 2DLDA approach for face recognition. Journal of Signal Processing Systems 65(1) (2011) 81-87.

18

[18] W.F. Schmidt, M.A. Kraaijveld and R.P.W. Duin, Feed forward neural networks with random weights, In Proc. 11th IAPR Int. Conf., Vol. II. Conf. B: Pattern Recognition Methodology and Systems, (1992) 1-4. [19] B. Igelnik, Y.H. Pao, Additional perspectives of feedforward neural-nets and the functional-link, Technical Report 93-115, Center for Automation and Intelligent Systems, Case Western Reserve University, 1993, also in Proc. IICNN’93, Nagoya, Japan (Oct. 25-29, 1993) 2284-2287. [20] Y.H. Pao, G.H. Park, D.J. Sobajic, Learning and generalization characteristics of the random vector functional-link net, Neurocomputing 6(2) (1994) 163-180. [21] B. Igelnik, Y.H. Pao, Stochastic choice of basis functions in adaptive function approximation and the functional-link net, IEEE Trans. Neural Networks 6(6) (1995) 1320-1329. [22] S.Mc Loone, M.D. Brown, G. Irwin et al, A hybrid linear/nonlinear training algorithm for feedforward neural networks, IEEE Trans. Neural Networks 9 (1998) 669-684. [23] S.Mc Loone, G. Irwin, Improving neural network training solutions using regularisation, Neurocomputing 37(1) (2001) 71-90. [24] I.Y. Tyukin, D.V. Prokhorov, Feasibility of random basis function approximators for modeling and control, In Proc. IEEE Conf. Control Applications (CCA) & Intelligent Control (ISIC) (2009) 1391-1396. [25] C. Hou, F. Nie, D. Yi, Y. Wu, Efficient image classification via multiple rank regression, IEEE Trans. Image Processing 22(1) (2013) 340-352. [26] X. Wang, C. Huang, X. Fang, J. Liu, 2DPCA vs. 2DLDA: Face recognition using two-dimensional method, In Proc. Int. Conf. Artificial Intelligence and Computational Intelligence 2 (2009) 357-360. [27] F.S. Samaria, A.C. Harter, Parameterisation of a stochastic model for human face identification, In Proc. Second IEEE Workshop on Applications of Computer Vision (1994) 138-142. [28] P.J. Phillips, H. Wechsler, J. Huang, P.J. Rauss, The FERET database and evaluation procedure for face-recognition algorithms, Image and vision computing 16(5) (1998) 295-306.

19

[29] A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Pattern Analysis and Machine Intelligence 23(6) (2001) 643660. [30] K.C. Lee, J. Ho, D.J. Kriegman, Acquiring linear subspaces for face recognition under variable lighting, IEEE Trans. Pattern Analysis and Machine Intelligence 27(5) (2005) 684-698. [31] A.A. Mohammed, R. Minhas, Q.M. Jonathan Wu, M.A. Sid-Ahmed, Human face recognition based on multidimensional PCA and extreme learning machine, Pattern Recognition 44(10) (2011) 2588-2597.

20