include skin coloration, lines and/ (or) wrinkles. Shape and texture ..... En (endocanthion): Is the eye fissures interior corners, in the location where the eyelids ...
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
A Simple Geometric Model for Age-invariant Face Recognition
000 001 002
000 001 002 003
003 004
Anonymous ACCV 2014 submission
004 005
005
Paper ID ***
006
006
007
007
008
008
009
Abstract. Whilst facial recognition systems are vulnerable to different acquisition conditions, most notably lighting effects, and pose variations; their particular level of vulnerability towards the effects of facial aging is yet to be studied adequately. The Face Recognition Vendor Test (FRVT) 2012 annual statement estimated deterioration in the face recognition systems performance due the effect of aging. There was about 5. . .
010 011 012 013 014
017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044
010 011 012 013 014 015
015 016
009
1
Introduction
Biometric verification technological innovation plays a significant role in the filed of information privacy and security. Recently, biometric verification has gained an increasing attention from researchers in the computer vision field. Biometrics include the techniques used inorder to distinctly identifying individuals in accordance with some form of unique physiological or potential tendency of characteristics. These include fingerprints, facial recognition, palm geometry, iris identification, vein recognition, signature recognition, and voice recognition [1]. Facial recognition is considered as the most ideal biometric technique [1]. It performs the authentication task based on the personal facial features of human beings. Related studies have involved face detection, face expression recognition [2], face tracking, and identity recognition. The accuracy of facial recognition is usually affected by substantial intra-class variations due to underlying factors, including age, pose, lighting, and expression. As a result, the majority of the current works focuses on the efforts of minimizing the effects of such variations that could deteriorate the overall performance of the facial recognition [3][4]. Among these works, the facial aging effects on the performance of the facial recognition systems have not been thoroughly studied. Due to this, there is a need to develop facial recognition algorithms which are generally invariant towards aging variations [5]. These algorithms are called age-invariant facial recognition methods. Facial aging is the compound process that changes both texture and shape of the facial area. Shape variations include craniofacial whereas texture variations include skin coloration, lines and/ (or) wrinkles. Shape and texture are both considered as the common facial aging patterns. Since the aging process occurs throughout different ages, it can be classified into the above-mentioned aging patterns. As a result, an age-invariant facial recognition method should account for these aging patterns [6]. The applications of age-invariant facial recognition are: lost children investigations, human computer interaction, and passport photo
016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
2 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089
ACCV-14 submission ID ***
verification. These applications possess two basic attributes: 1) Large difference in age between gallery and probe photos (photos acquired within enrollment and also authentication phases), and 2) failure to acquire the person’s facial photo in order to update the template [5]. Furthermore, real-world conditions continue to be challenging, mainly due to the great deal of changes in the process of acquiring faces. For example, criminal offense inspections, and society safety organizations must frequently fit a probe photo with registered photos in a database. This may depict significant differences in facial characteristics as a direct consequence of the presence of different age groups. Numerous attempts have already been made to overcome such challenges. The research on age-invariant face recognition has increased since 2002. Among the early studies didecated to age-invariant facial recognition from digital images is Kwon and Lobos study[7]. In their system, they used certain facial anthropometric measurement to represent shape variations due to aging. Moreover, they addressed texture variations using the thickness of facial lines and wrinkles. Ling et al. [8], [9] investigated the effect of age variations on the performance of facial recognition during a passport picture authentication procedure. They presented a non-generative concept, that extracts face operator using gradient orientations of images from several image resolutions. Following, Support vector machines are used to perform age-invariant face verification. Lanitis et al. [10] [11] presented a method which generally performs perform age estimation using the active Appearance model (AAM). They developed a model for facial aging that incorporates both texture intensity and shape variations information. Geng et al. [12], [13] studied an aging pattern subspace based on the concept that similar faces aged in similar ways. Guo et al. [14] proposed for facial age estimation, some sort of learning pattern of age manifold. The aging facial features are extracted and also altered locally. They used a robust repressor to predict the ages of human beings. Furthermore, Fu and Huang [15] proposed a technique for learning manifold where a low dimensional manifold could be identified from a group of facial images arranged according to age. To demonstrate facial age estimation, they applied quadratic and Linear regression functions on the low-dimensional feature vectors through the particular manifolds. Ramanathan and Chellapa [16] presented a functional two-step strategy to model the aging process of adults. Such model is composed of facial texture and shape variations. In their model shape variations are modelled by developing physical layouts which usually characterize the abilities of the facial muscular tissues. Drygajlo et al. [17] established the effective use of a Q-stack classifier inorder to perform facial aging authentication. Mahalingam and Kambhamettu [18] developed a probabilistic method for age-invariant face verification. In their method, descriptors of facial feature are extracted from the hierarchical apperance of face photos. These descriptors are then applied in a probabilistic model designed for verification applications. Park et al. [19] [20] introduced a facial aging simulation approach, which is able to learn shape and texture patterns of aging using Principle Component Analysis (PCA) coefficients. Suo et al. [21] proposed a dynamic model to imitate the process of facial aging. The proposed model is a facial growth model influenced by the characteristics of age and
045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
3
hair. The model describes each photo using a multi-layer and-Or graph which incorporates adjustments of hairstyle. It also recognizes deformations, shape, and wrinkles variations within the facial components. Singh et al. [22] employed a facial aging approach that projects the probe and gallery facial images in a polar coordinate space. Moreover, it decreases the changes in the characteristics of face due to facial aging. Tiddeman et al. [23] adopted a face age simulation approach. They presented a functional wavelet transformation to model composite facial images. Burt and Perrett [24] proposed a face age simulation technique that uses texture and shape information for forming composite facial images comprising several age ranges. They evaluated and analyzed particular facial cues that are influenced by age. Ramanathan et al. [25] presented an age-difference Bayesian classifier to be used with passport photo verification applications. Most of the previously proposed approaches attempt to eliminate such challenges by imitating the facial aging models. However, these are still far from hands-on utilization. Our system is designed to deal with the facial aging effects on the performance of facial recognition systems. We are proposing a simple, fast, yet robust geometric approach. Our work includes developing a mathematical and a geometrical model to maintain the degree of similarity between six triangular facial features connecting and characterizing the main facial features. The system is anticipated to work with real time applications such as airports identity verification systems where the overall processing time and false positive rate are critical factors. The remainder of this paper is organized as follows: Section 2 describes the proposed face recognition geometrical model. Section 3 is dedicated to the feature selection methods, data normalization methods, and classifiers used throughout the experimental part. The results and discussions are presented in Sections 4. The conclusion and future work are under Section 5.
2
The Proposed Geometric Model
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
117 118
118 119
091
116
116 117
090
The proposed geometric model consists of several phases. A block diagram of the main phases of our proposed geometric model is illustrated in Fig.1. First the test image will be passed to the system through a terminal (either a digital camera or video camera in case of surveillance system). Face detection is considered the initial phase of every facial recognition system. In our work, the well-known Viola and Jones face detector [26] is used for detecting and also cropping the facial area of the sample images. Such a detector is effective for real time applications. Following detecting the facial area, a total of 12 facial feature points is localized. The FG-NET [27] database images are available with annotation of the feature points where the coordinates of such points attach to the database. In Fig. 2 two sample images collected from the FG-NET face aging database with annotations of the facial feature points are shown. As Fig. 2 shows, there is a total of 64 facial features (focal) points in each annotated sample image in the FG-NET database. Nevertheless, only 12 focal points are considered in our proposed model as illustrated in Fig. 4. The annotated version of the FG-NET database is accompanied by a list of the focal point coordinates for each sample
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
4 135 136 137 138 139 140 141 142 143
ACCV-14 submission ID ***
image. The coordinates of the selected 12 focal points will be passed to the system and the rest of the sample images focal point coordinates will be ignored. The localized focal points will be used to form 6 different triangular regions surrounding the main facial features, namely: nose, eyes, and mouth. Subsequently, the parameters of the triangular regions (i.e., perimeters and areas) are calculated using the standard formulas. Finally, the parameters of the triangles are passed to a set of developed mathematical equations to build the feature vectors for the sample images. The main phases of our proposed model will be discussed in more details in the following.
135 136 137 138 139 140 141 142 143
144
144
145
145
146
146
147
147
148
148
149
149
150
150
151
151
152
152
153
153
154
154
155
155
156
156
157
157
158
158
159
159
160
160
161
161
162
162
163
163
164
164
165
165
166
166
167
167
168
168
169
169
170
170
171
171
172
172
173
173
174
174
175
175
176
176
177
177
178
178
179
179
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID ***
5
180
180
181
181
182
182
183
183
184
184
185
185
186
186
187
187
188
188
189
189
190
190
191
191
192
192
193
193
194
194
195
195
196
196
197
197
198
198
199
199
200
200
201
201
202
202
203
203
204
204
205
205
206
206
207
207
208
208
209
209
210
210
211
211
212
212
213
213
214
214
215
215
216
216
217
217
218
218
219
219
220
220
221
221
222
222
223
223
224
224
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
6
ACCV-14 submission ID ***
225
225
226
226
227
227
228
228
229
229
230
230
231
231
232
232
233
233
234
234
235
235
236
236
237
237
238
238
239
239
240
240
241
241
242
242
243
243
244
244
245
245
246
246
247
247
248
248
249
249
250
250
251
251
252
252 253
253
Fig. 2. Annotated Sample Images from the FG-NET Database.
254
256
256 257 258 259 260 261 262 263 264 265 266 267 268 269
254 255
255
2.1
Face Detection and Cropping
The facial area is extracted from the sample images using the Viola and Jones face detector [26] of Boosted Cascades approach. Such approach implements the AdaBoost boosting method. In AdaBoost boosting method, simple features which discriminate between non-face and face areas in an image are combined in cascades associated with sophisticated classifiers. A rectangular window of pixels in the image is compared with the related rectangular window in the previous image. Such a procedure is performed through moving the rectangle in the present image upward, downward, right, and left. Subsequently, image matching is carried out based on the level of energy. Thus, lower level of energy in the output increases the probability that a person has been moving in this direction. Such types of outputs are used to construct a human detector which can be trained using the AdaBoost method [26]. The detector runs on a test
257 258 259 260 261 262 263 264 265 266 267 268 269
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 270 271 272 273 274 275 276
7
image and the score for each window is estimated by summing up the classifier scores from every level of the cascade. The image window having the maximum possible score is assumed to be a face region in the test image. After localizing the facial area, a bounding box is created around the face. Following, automatic cropping of the face is performed using the coordinates of the bounding box provided from the face detector. A sample image from the FGNET database before and after applying face detection and cropping is shown in Fig. 3.
270 271 272 273 274 275 276
277
277
278
278
279
279
280
280
281
281
282
282
283
283
284
284
285
285
286
286
287
287
288
288
289
289
290
290
291
291
292
292
293
293
294
294
295
295
296
296
297
297
298
298
299
299
300
300
301
301
302
302
303
303
304
304
305
305
306
306
307
307
308
308
309
309
310
310
311
311 312
312 313 314
Fig. 3. Face Sample Image from the FG-NET Database before and After Face Detection and Cropping.
313 314
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
8 315
2.2
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332
ACCV-14 submission ID ***
Face Focal Points Localization and Triangular Features Extraction
Craniofacial anthropometry is the scientific discipline that encompasses cranium measurements and characteristics, facial measurements, and landmarks (facial focal points) [28]. Our work considers 12 of such focal points which are mainly focal points that represent corners and centers of the main facial features as illustrated in Fig. 4. Such focal points are usually localized using an Active Appearance Model (AAM) [29]. Active Appearance Model (AAM) designates 64 distinct facial focal points. As we mentioned before, the FG-NET database images are available with annotation of the focal points where the coordinates of such points are provided. In the field of craniofacial anthropometry, facial focal points are given scientific notations [30]. Here are examples of such notations: En (endocanthion): Is the eye fissures interior corners, in the location where the eyelids come in contact with each other. G (Glabella): Is essentially the most dominant spot around the plane of the median sagittal in between the supraorbital. Ridges.ex (exocanthion): Is typically the eye fissures outer corner; the point where the eyelids meet. Gun (Jonathan): Is within the face mid line and it is considered the lowest spot in the lower chin area circumference.
315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332
333
333
334
334
335
335
336
336
337
337
338
338
339
339
340
340
341
341
342
342
343
343
344
344
345
345
346
346
347
347
348
348
349
349
350
350
351
351
352
352
353
353
Fig. 4. The Main Focal Points in the Proposed Model.
354
356
356 357
2.3
358
Once the facial focal points are localized, the triangle vertex coordinates will be determined. Following, the Euclidean distances [31] [32] between the triangle
359
354 355
355
Calculating the Parameters of the Triangles
357 358 359
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 360 361 362
9
vertexes coordinates are calculated in order to compute the perimeters and areas for each triangle. Fig. 5 depicts the six triangular regions which represent the geometric features in our proposed model.
360 361 362
363
363
364
364
365
365
366
366
367
367
368
368
369
369
370
370
371
371
372
372
373
373
374
374
375
375
376
376
377
377
378
378
379
379
380
380
381
381
382
382
383
383
384
384
385
385
386
386
387
387
388
388
389
389
390
390
391
391
392
392
393
393
394
Fig. 5. The Triangular Features.
394
395
395
396
396
397
397
398 399 400 401 402 403 404
The triangles parameters are presented by the notations Pr and Ar which represent the perimeters and areas successively. The symbol (r) represents the designation of the particular triangle. For instance, and stand for the perimeter and area of triangle number 1. As a final stage, the areas and perimeters of the triangles are passed to a number of mathematical equations to create feature vectors for each sample image. The mathematical equations are discussed in the following section.
398 399 400 401 402 403 404
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
10 405
2.4
ACCV-14 submission ID ***
Developing the Mathematical Triangles Relationships
405 406
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421
In the science of computational geometry, triangles are considered similar when their shapes are similar, but not necessarily having identical dimensions [33]. Such a scientific fact motivated us to develop mathematical relations between the 6 triangular areas. There is a massive human population in our planet estimated recently as about 7 Billion. Therefore, it is unrealistic to carry out a direct oneto-one comparison using the triangles parameters to perform face recognition. As a different approach, we propose using a set of triangle proportion ratios. As a result, a set of 15 mathematical equations is developed. These equations are used to estimate the degree of similarity between one triangle and another. Relatively, any two triangles are considered similar if the following geometric relationship in equation (1) is fulfilled:
424 425 426 427 428
(1)
431
434 435 436 437 438 439
442
(2)
445 446 447 448 449
412 413 414 415 416 417 418
420
423 424 425 426 427 428
430 431 432
A statistical analysis was carried out over the collected data from the dataset. The dataset consists of the feature vectors created using our proposed geometric model. The results of the statistical analysis revealed that there is a considerable distinction in the data related to each different subject. The similarity proportion between every two triangles is calculated using the triangle similarity proportion (TSP). For example the TSP between triangle 1 and triangle 2 can be demonstrated as follows:
433 434 435 436 437 438 439 440
(T 1, T 2) =
Ar1 × Ar2 ×
pr22 pr12
(3)
441 442 443
443 444
411
429
Ari × prj2 T SP = Arj × pri2
440 441
410
422
Where A and p, are triangles perimeters and areas successively whereas i and j are the designations of the two triangles. To develop the triangle similarity proportion, equation (2) is used. The triangle similarity proportion is used to quantify the degree of similarity between every two triangles. The triangle similarity proportion relationship is given the designation TSP. Where i and j are positive numbers representing the triangle indexes such as, i, j=1 6.
432 433
409
421
429 430
408
419
prj2 Ari = 2 Arj pri
422 423
407
A set of thirty TSP relationships is calculated for all of the sample images collected from the FG-NET database. The output of the TSP relationships is used to build a row feature vector in the dataset. Each row vector is related to one sample image. The set of rows features vectors related to the same subject is used to build a class for this particular subject. Since the FG-NET database consists of 82 subjects, a dataset of 82 classes is created.
444 445 446 447 448 449
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 450
3
451
11
Feature Selection, Classification, and Data Normalization Techniques
454 455 456 457 458 459 460 461
In this section we first discuss the main concept of data pre-processing and its applications. Data pre-processing is a first step for several data mining applications [34]. Moreover, a number of data pre-processing techniques including normalization, scaling, and standardization are discussed. Since a set of the most important features in any model can be demonstrated using features selection algorithms, we discuss the concept of feature selection and particularly a type of Regularized trees feature selection approach known as Random Forest feature selection. Finally, a number of common classification algorithms are discussed, namely: K-Nearest Neighbor, Nave Bayes, and Support Vector Machines (SVM).
464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489
3.1
Data Normalization and Standardization
Data normalization usually attempts to give all the features the same weight. Normalization is ideal for classification algorithms, including neural networks as well as distance metrics, for example nearest-neighbor classification and clustering. In distance-based classification methods, normalization assist in avoiding features having massive ranges surpassing features having smaller ranges. Additionally, normalization is beneficial when there is no prior perception of the data [35]. Typically, data sets are required to be normalized or sometimes standardized to make the entire variables in proportion with each other. The most popular techniques for feature normalization are [36]: Basic Rescaling. Feature Standardization (zero-mean and unit variance for every single feature within the dataset). Basic Rescaling With basic re-scaling, the main objective is to re-scale the data along each data dimension to ensure that the final data vectors are within the range [0, 1] or [- 1, 1] (depending on the dataset). This is often ideal for later processing since several default parameters (e.g., epsilon in PCA-whitening) deal with the data as if it has been scaled to some standard range [37]. Feature Standardization Feature standardization is the process of setting each dimension of the data (independently) to have zero-mean and unit-variance. This is actually the most popular approach for data normalization and is often used by supervised learning algorithms (e.g., with Support Vector Machines (SVMs), feature standardization is usually recommended as an effective preprocessing stage) [38]. In practice, feature standardization requires calculating the mean for each dimension (throughout the dataset) and then subtracting the mean from each data element in the same dimension. Subsequently, each dimension is divided by its standard deviation [39].
491
3.2
492
It is commonly useful to analyze how far exactly in term of standard deviation a data element deviates from the mean. Such approach is known as z-score normalization [40]. Although many datasets have a relatively normal distribution,
494
454 455 456 457 458 459 460 461
463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490
490
493
453
462
462 463
451 452
452 453
450
Z-score normalization
491 492 493 494
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
12 495 496 497 498 499 500 501 502
ACCV-14 submission ID ***
it is a useful approach to compare data elements coming from different populations which may have different means and standard deviations [41]. The Z score offers a useful measurement for comparing data elements from different data sets. Typically the formula used for calculating the Z - score is as follows: χ−µ z= (4) S Where: Z: z-score; x: data element; : mean of the data set; s: standard deviation [38].
505 506 507 508 509 510
3.3
0
513
518 519 520 521 522
(5)
525 526 527 528
Column vector Normalization
Column vector normalization of the data is identified as the square of the sum of all column elements in a dataset divided by the square root of the sum of squares of all the elements [38] as illustrated in equation (6): Pn 2 0 ( i=1 χji ) χji = Pn (6) 2 i=1 χji 0
Where χji is a column element before normalization, χji is a column element after normalization, j is a positive number such that j=1,, m which represents an index for the dataset columns, m is the number of columns in the dataset, i is a positive number such that i=1,, n which represents the index of column element, and n is the total number of elements in each column.
531
An alternative data scaling approach commonly used in machine-learning is to scale the elements in a feature vector to have a length of one. This requires dividing each element by the Euclidean length of the vector. In most applications (e.g. Histogram features) it might be more effective to use the L1 norm (i.e. Manhattan or City-Block) of the feature vector [38]: 0 χ (7) χ = |χ|
535 536 537 538 539
506 507 508 509 510
512
514
517 518 519 520 521 522 523 524 525 526 527 528 529
3.5
534
502
516
530
533
501
515
3.4
529
532
500
513
Where x is the original data, x’ is the normalized data [38].
523 524
499
511
χ − min(χ) max(χ) − min(χ)
515
517
498
505
χ =
512
516
497
504
Min-Max Scaling
Typically the simplest feature scaling approach is to re-scale the data to have a range of [0, 1] or [-1, 1]. The approach is used to insure that features related to different classes are distinct. The range of the data is usually specified based on the characteristics of the data. The standard formula of the min-max scaling is represented by equation (5):
511
514
496
503
503 504
495
Scaling to unit length
530
0
Where χ is the original data, and χji is the normalized data.
531 532 533 534 535 536 537 538 539
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 540
3.6
13
Feature Selection
542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564
In statistics and machine learning, feature selection, also known as attribute selection, variable selection or variable subset selection is defined as the approach of deciding on a subset of appropriate features to be used in model construction. When using any feature selection method, the main assumption is that the data include many redundant or possibly insignificant features. Insignificant features are the ones which offer little information compared to the selected features. Feature selection methods are often associated with the feature extraction methods. Typically, feature extraction generates a full set of features using functions, whilst feature selection generates a subset of features [42]. Typically, the features generated using a decision tree as well as an ensemble tree is known to be redundant. A recent effective technique known as regularized tree [43] can be used for feature selection. The Regularized tree uses the same set of variables as the one used by the prior tree nodes for splitting the present node. Typically, Regularized trees need to construct a single tree model (or a single tree ensemble model). Thus, Regularized trees tend to be computationally effective [43]. Regularized trees can handle numerical and categorical features, nonlinearities and interactions. They are invariant towards scales (units) of attribute and also insensitive to outliers. Therefore, they require a minor data pre-processing including normalization [43]. Regularized Random Forest (RRF) [43] is one form of regularized trees. Because of the interesting capabilities of Random Forest as a feature selection method, we decided to use it for feature selection in our experiments. The Random Forest feature selection algorithm [44] is discussed in details in the subsequent section.
567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584
542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565
565 566
540 541
541
3.7
Random Forest Feature Selection
Random forest is a classification method which possesses outstanding efficiency. Random Forest was introduced first by Leo Breiman [44]. From a practical point of view, it can be defined as a classifier which combines a collection of classification trees inside an integrated scheme to quantify the significance of every feature [45]. Random forest can be used in the case where the amount of variables is higher than the amount of observations. It is also applied in case there are more than two classes. It produces measures of the importance of variables. The set of classification trees is constructed by means of a bootstrap pattern related to the data. Additionally, during every node split, the list of predicted variables is usually an arbitrary subset of the variables. For tree construction, Random Forest makes use of both bagging (bootstrap aggregation an effective method for merging unpredictable learners [46]), and randomly selected variables. All the constructed trees are unpruned in order to acquire low-bias trees. By doing so, random variable selection and bagging lead to the minimal correlation of the particular individual tree. Typically, the algorithm establishes some sort of ensemble that acquires both minimal bias and maximal variation (by averaging across a significant collection of low-bias, high distinction, and minimal
566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
14 585 586 587 588 589 590 591 592 593 594
ACCV-14 submission ID ***
correlation trees). Random forest offers outstanding capabilities for classification compared to support vector machines. It offers a number of features which makes it appropriate for the following conditions: It is usually used when there are more attributes compared to observations. It works effectively in both multiclass and also two-class conditions. It offers useful predictive capabilities even if the majority of predicted features is noise. Thus, it does not require preselecting the features [46]. Does not over-fit. It can deal with a combination of continuous and categorical predictors. Feature interactions among the predictor variables. Typically, the outcome is invariant towards monotonic changes associated with the predictors. Delivers measures related to variables importance.
597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629
586 587 588 589 590 591 592 593 594 595
595 596
585
3.8
Definition of the Random Forest Feature Importance Measures
The initial measure of the future importance is usually calculated by permuting Out Of Bag (OOB) data. Typically, for each tree the error of prediction within the Out Of Bag part of the data can be recorded. After that, the same procedure is performed over the permuted predictor variables. The difference between the recoded error of prediction before and after permuting every predictor variable should be subsequently averaged over all of the trees. It should also be normalized using the standard deviation of the recorded differences. Whenever the standard deviation of the differences is equal to Zero for any variable, the normalization will not be carried out. The other measure is the overall reduction in node impurities which results from splitting around the variable and averaging across the trees. For classification using Random Forest, the node impurity can be calculated using the Gini index measure. With respect to regression, such measure can be calculated basically by the residual sum of squares [47]. The output of the random forest feature selection algorithm is often visualized by the ”Gini importance” [48] which can also be used as an indicator for feature relevance. Such feature importance score (Gini importance) offers, a relative ranking for the set of features. At every node throughout the binary tree T of the random forest, the perfect split is searched using the Gini impurity i() which is an important computationally effective approximation for the entropy. Gini impurity determines how good a possible split for isolating the samples from the two classes in a particular node. ηk ρk = (8) η
596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620
The term represented by equation (8) is the fraction of the ρk samples from a class k= 0, 1 out of the total of η samples at node τ , and the Gini impurity i() is calculated as: i(ρk ) = 1 − ρ21 − ρ20 (9)
621
Gini impurity decreases by a value (i) as a result of splitting and sending the samples to two sub-nodes τl and τr . The respective sample fractions ρ1 = ηη1 , and ρ1 = ηηr . The threshold tθ on a variable θ is defined as follows:
625
i(τ ) = i(τ ) − ρl i(τl ) − ρr i(τr )
629
(10)
622 623 624
626 627 628
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 630 631 632 633 634 635 636 637 638 639
15
During an extensive search over all values of θ available at each node (random forest usually limits such search to some random subset incorporating available features [48]), and also over all possible thresholds tθ , the pair θ, tθ Which leads to the highest value of i is established. The reduction in Gini impurity due to the optimal split is recorded and accumulated for all variables independently at every node within a tree T. The quantity represented by the equation (11) -known as the Gini importance IG - indicates how frequently a specific feature θ has been selected for a split as well as how discriminating is its value. XX IG (θ) = iθ (τ, T ) (11) r
640
τ
3.9
Classifiers
645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674
632 633 634 635 636 637 638 639 640
642 643
643 644
631
641
641 642
630
Classification is the concept of deciding to which of any set of classes (subpopulations) a new observation belongs. This is accomplished when a class membership of a set of training data is recognized. Typically, the individual observations are converted into a set of quantifiable attributes. Such attributes could variously be ordinal (e.g. ”Large”, ”medium” or ”small”), categorical (e.g. ”A”, ”B”, ”AB” or ”O”, for blood type), integer-valued (e.g. The amount of incidences of parting words within an email), as well as real-valued (e.g. A measurement of blood pressure) [49]. An algorithm which performs classification, especially in an effective way is called a classifier. In some cases the term classifier also indicates the mathematical function carried out in a classification algorithm. Such function map input data to some class. In machine learning, classification is considered an example of supervised learning (i.e. Learning in a way a training set of correctly classified observations can be acquired). The approach of unsupervised learning is known as clustering or sometimes clusters analysis. Furthermore, it includes grouping data into classes according to some measure of inherent similarity (e.g. The distance between instances which are considered as vectors in a multi-dimensional vector space) [49]. In statistics, classification is usually carried out using logistic regression or sometimes a similar procedure. The attributes associated with the observations are known as explanatory variables (or Regress, independent variables, etc.). Additionally, the classes being predicted are referred to as outcomes, which are considered as potential values for the dependent variable. In machine learning, the observations are usually referred to as instances, and the explanatory variables are called features (grouped into a feature vector). Additionally, the potential categories being predicted are called classes [50]. Bayesian Classifiers Bayesian classification techniques offer a natural way for acquiring any available information about the sizes of different sub-populations within the overall population [50]. Bayesian methods are typically computationally expensive. Approximations related to Bayesian clustering rules were developed before the days Markov chain Monte Carlo computations were developed. Few of Bayesian techniques require the calculation of group membership probabilities. Such methods
644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
16 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691
ACCV-14 submission ID ***
offer a more descriptive outcome which includes data analysis compared to a simple attribute description which is limited to giving one group-label for each new observation[50]. Binary and Multiclass Classifiers There are two modes of classification: binary classification and multiclass classification. In binary classification (a better recognized task), simply two classes are included, whilst multi-class classification consists of assigning an object to one of many classes [51]. Since several classification techniques have been developed especially for binary classification, multi-class classification usually requires combining several binary classifiers. Linear Classifiers A wide group of classification algorithms is often expressed in terms of a linear function which assigns a score for each potential category k. This is accomplished by combining the feature vector of instance with a vector of weights using a dot product. Typically, the predicted category could be the one having the maximum score. Such type of score function is considered as a linear predictor function and has the following general form:
693
score(Xi , k) = βk .Xi
(12)
697 698 699 700 701
677 678 679 680 681 682 683 684 685 686 687 688 689 690 691
693 694
694
696
676
692
692
695
675
Examples of such algorithms are: Linear discriminant analysis (LDA). Support vector machines (SVM). A selected set of classifiers each represents one of the above mentioned classification approaches is discussed briefly and used to perform classification in the experimental part. K-Nearest Neighbor Classifier, Nave Bayes Classifier, and Support Vector Machines (SVM) represent Binary and Multiclass Classifiers, Bayesian Classifiers, and Linear Classifiers successively [52].
695 696 697 698 699 700 701
702
702
703
703
704
3.10
K-Nearest Neighbor Classifier
706 707 708 709 710 711 712 713 714 715 716 717 718 719
704 705
705
In the field of pattern recognition, the k-nearest neighbor (k-NN) algorithm is actually a non-parametric technique for classifying objects depending on the nearest training examples in the feature space. Also, k-NN is a form of examplebased learning or even lazy learning in which the function is just predicted locally and all of the computations are actually delayed right up until classification. The k-nearest neighbor technique is considered the simplest among all machine learning algorithms. In the k-nearest neighbor technique, any object is usually classified via a vast majority vote among its own neighbors (where k is often a small positive integer). When k is equal to one, the object will be classified to the nearest neighbor class. The same technique can be used for regression by simply setting the property value of this object as the mean of the values of the k nearest neighbors. It is typically useful to weight the contributions among the neighbors to ensure that the nearer neighbors contribute more in the mean compared with the further neighbors [53].
706 707 708 709 710 711 712 713 714 715 716 717 718 719
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 720
3.11
17
Nave Bayes Classifier
720 721
721 722 723 724 725 726 727 728 729 730 731
The Nave Bays classification algorithm is an easy to use probabilistic classification algorithm. It is based on implementing a Bays rule. The Bays theory offers a way of calibrating the actual posterior probability Pr (χr \ br ), from Pr (br ), Pr (χr , and Pr (χr \ br ). The Nave Bays classification algorithm presumes that the particular impact on the value associated with some predictor χr in a specific class C is actually independent of the values associated with all of the other predictors. This particular theory is known as class conditional independence [54]. (Pr (χr \ (br )Pr (br )) Pr (χr \ br ) = (13) Pr (χr )
732
Pr (br \ χr ) = Pr (χr \ br1 ) × Pr (χr \ br2 ) × ... × Pr (χr \ brn ) × Pr (br )
(14)
3.12
Support Vector Machines (SVM)
739 740 741 742 743 744 745 746 747 748 749 750 751 752
In the field of machine learning, Support Vector Machines (SVMs, also known as support vector networks) are supervised learning methods and algorithms which analyze data and recognize models. They are used for regression analysis as well as classification. The standard SVM takes an amount of input data and predicts which class among two possible classes forms the output [55]. This makes it some sort of non-probabilistic binary linear classification algorithm. Having a group of training examples, each one labelled as one of two groups, an SVM training technique creates a complete model. It assigns new examples directly into one class of the two classes. An SVM model is a representation of the examples. It can be described as points in a space which are mapped so that the examples of the independent classes are separated by a gap that is as wide as possible. Different examples are mapped into that same space and also predicted to fit into a class based on which part of the gap they fit in. In addition to performing linear classifications, SVMs can efficiently perform non-linear classifications using what is known as the kernel trick. This is accomplished by implicit mapping of their inputs directly into high-dimensional feature spaces [55].
4
Experiments
757 758 759 760 761 762 763 764
727 728 729 730 731 732
735
737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752
754 755
755 756
726
753
753 754
725
736
736
738
724
734
734
737
723
733
733
735
722
The experiments are conducted over the publicly available (FG-NET) face aging database [29]. There is a total of 1,002 color and grayscale high resolution images of eighty two subjects of several ethnicities with a considerable variety of expression, pose, and lighting conditions. The image size in pixels is roughly 400x500. The age range is 0-69 years and on average there is a total of 12 images for each subject. For classification we used WEKA built in classifiers (WEKA serves as a set of machine learning algorithms pertaining to data mining tasks) [56]. The experiments are conducted in two phases. In the first phase of the experiments we investigate the significance of the proposed geometric features in different
756 757 758 759 760 761 762 763 764
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
18 765 766 767 768 769 770 771 772 773
ACCV-14 submission ID ***
age groups. The main goal of this part of the experiments is to show which features contribute the most during each age group. For this purpose the FG-NET database is divided into three subsets. The three subsets represent childhood, teenage, and adulthood age groups successively. In the second part of the experiments, a number of classification algorithms, namely: K-Nearest neighbor (KNN), Nave Bayes, and Support Vector Machines (SVM) classifiers are used to estimate and validate the performance of the system. In the next section the feature subsets produced by the Random Forest feature selection are illustrated. Moreover, the classification results are illustrated. 4.1
Features Selection Experiments
778 779 780 781 782 783 784 785 786 787 788 789 790 791 792
The FG-NET database is divided into three subsets based on the protocol adopted in reference [53] for validating face recognition systems over the FGNET database as follows: FGNET-8 consists of all the data collected at the ages between 0 and 8. It includes 290 facial images from 74 subjects, in which 580 intra-person pairs and 6000 inter-person pairs are randomly generated for verification. FGNET-18 consists of all the data collected at the ages between 8 and 18. It includes 311 facial images from 79 subjects, in which 577 intra-person pairs and 6000 inter-person pairs are randomly generated for verification. FGNETadult consists of all the data collected at the ages 18 to 69 and mainly frontal view. It includes 272 images from 62 subjects, in which 665 interpersonal pairs and about 6000 intra-personal pairs are randomly generated for verification. This protocol for dividing the FGNET database is used to determine which features contribute the most for discriminating between the subjects classes during each age group. To accomplish this, a Random Forest algorithm is used to quantify the importance of each feature and to select a subset of the most significant features among the thirty extracted features for each age group.
794
4.2
795
In this part of the experiments classification is conducted using three classifiers. The ability of each of the three classifiers in discriminating between the classes (which are built using the proposed geometric features) is evaluated. In pattern recognition such procedure is beneficial in deciding which classifier is the most appropriate for classifying particular data patterns [54]. As mentioned before, classification is conducted in the WEKA platform using three supervised learning classification algorithms, namely: K-Nearest neighbor (KNN), Nave Bayes, and Support Vector Machines (SVM). The performance of the three classifiers was evaluated in terms of classification accuracy, True Positive (TP) rate, False Positive (FP) rate, Precision, Recall, and F-Measure. This part of the experiments are conducted in four phases. In each phase a different data pre-processing method is used. In the first phase, feature vectors are scaled using the Min-Max method. For the second round the data are normalized using a Z - score method where the normalization is applied using the global standard deviation and mean of the whole data set. Alternatively, the Z-score normalization is performed in
797 798 799 800 801 802 803 804 805 806 807 808 809
768 769 770 771 772 773
775
777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793
793
796
767
776
776 777
766
774
774 775
765
Classification Experiments
794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 810 811 812 813 814
19
the third phase over each column independently, which requires normalizing each column of the data set using their individual standard deviation and mean values. Finally, the feature vectors are scaled using the unit length rule in the fourth phase. Different pre-processing techniques are considered to evaluate their effect in enhancing the classification accuracy for each of the three classifiers.
817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836
5
Results and Discussion
The results of applying the Random Forest feature selection algorithm over the three subsets of the FG-NET database are introduced. The importance of the thirty feature vectors extracted from each sample image is quantified using the Gini index. The higher the numerical value of the Gini index, the greater the significance of the feature. Such factor is significant as it shows how each feature is contributing in discriminating between the dataset classes. Graphs showing the average value of the Gini index of each feature in the subsets are illustrated. The Y axis represents the average Gini index value related to each feature whereas the X axis represents the set of features which are given the designation Fi. The index of the features is represented by i which is a positive number such that i=1... 30. For example, F1 represents the first feature in the data set. Following, the classification results over the entire FG-NET database using different preprocessing methods are illustrated and discussed. The results of combining each pre-processing method with each classifier are reported. Finally, the classification results for each FG-NET subset are illustrated in terms of classification accuracy and False Positive (FP) rate. The data is pre-processed using Z-score normalization since it yielded the best results compared with other pre-processing methods. Moreover, feature selection is performed over the normalized data using Random Forest feature selection method.
841 842 843 844 845 846 847 848 849 850 851 852 853 854
813 814
816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836
838
838
840
812
837
837
839
811
815
815 816
810
5.1
Feature selection results
This subsection shows the results of feature selection using the Random Forest algorithm. First, the features selection results from the entire FG-NET database are illustrated using graphs. In the graphs the most important feature among all other features is highlighted in black to distinguish it from the rest of the features. As we mentioned before the significance of the feature is quantified using the average value of the Gini index related to each feature. The results of applying the Random Forest feature selection algorithm over the FGNET-8 subset are illustrated in the second part of this section. Following, the feature selection results of the FGNET-18 and FGNET-Adults subsets are illustrated successively. FGNET-8 Feature Selection Results Here we discuss the features ranking results for the FGNET-8 subset. This subset represents the age group between 0 to 8 years. All the sample images in this subset are for subjects during their childhood stage. Features selection results for the FGNET-8 subset are illustrated in Fig. 6.
839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
20
ACCV-14 submission ID ***
855
855
856
856
857
857
858
858
859
859
860
860
861
861
862
862
863
863
864 865
864
Fig. 6. FGNET-8 Feature Selection Results using Random Forest.
865
866
866
867
867
868
868
869
869
870
870
871
871
872
872
873
873
874
874
875
875
876
876
877 878 879 880
Using the Random Forest feature selection algorithm, the top five features for this subset are successively: Feature18, Feature 20, Feature 28, Feature1, and Feature16. The ranking of features in the FGNET-8 subset shows that the most important feature is Feature18 which is produced using equation (15) as follows:
877 878 879 880
881
881
882
882
883
883
884
884
885
885
886
886
887
887
888 889 890
F18
Ar4 × pr12 = (T r4 , T r1 ) = Ar1 × pr42
888
(15)
889 890
891
891
892
892
893
893
894
894
895
895
896
896 897
897 898 899
Feature18 (F18 ) represents the triangle similarity proportion between triangle 4 (T r4 ) and triangle 1 (T r1 ) illustrated in Fig. 7.
898 899
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID ***
21
900
900
901
901
902
902
903
903
904
904
905
905
906
906
907
907
908
908
909
909
910
910
911
911
912
912
913
913
914
914
915
915
916
916
917
917
918
918
919
919
920
920
921
921
922
922
923
923
924
924
925
925
926
926
927
927 928
928 929
Fig. 7. Feature 18 (F18 ) Triangles Combination.
929
930
930
931
931 932
932 933 934 935 936 937 938 939 940 941
As Fig. 7 shows, F18 incorporates the distances between the inner corners of the eyes and the tip of the nose. Moreover, it includes the distances between the outer corners of the eyes and the center of the mouth. This indicates that the ratio between those two regions is very important during childhood for discriminating between one subject and another. FGNET-18 Feature Selection Results
933
In Fig. 8 the ranking of the features for FGNET-18 subset is illustrated. The top five features are successively: Feature1, Feature 20, Feature 3, Feature18, and Feature 19. The most important feature is Feaure1 (F1 ) which is represented by equation (16).
938
935 936 937
939 940 941 942
942 943 944
934
F18
Ar1 × pr22 = (T r1 , T r2 ) = Ar2 × pr41
943
(16)
944
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
22
ACCV-14 submission ID ***
945
945
946
946
947
947
948
948
949
949
950
950
951
951
952
952
953
953
954 955
Fig. 8. FGNET-18 Feature Selection result using Random Forest.
954 955
956
956
957
957
958 959 960 961 962 963 964
It can be seen from Fig. 9 that the most important feature in the FGNET-18 subset includes the distances between the inner corners of the eyes and the tip of the nose. It also includes the distances between the center point between the eyes and the corners of the nose. This indicates that the ratio between those two regions is very important during the teenage period for discriminating between one subject and another.
958 959 960 961 962 963 964
965
965
966
966
967
967
968
968
969
969
970
970
971
971
972
972
973
973
974
974
975
975 976
976 977
Fig. 9. Feature 1 (F1 ) triangles combination.
977
978
978
979
979 980
980 981 982 983 984 985 986 987 988 989
FGNET-Adult Feature Selection Results As Fig. 10 shows that using the Random Forest feature selection method, the top five most important features are successively: Feature1, Feature11, Feature 2, Feature13, and Feature 24. Further, the most important feature is Feaure1 which is represented by Equation (16). Although the set of the top five features for the FGNET-18 and FGNET-Adult subsets are different, they share the same most important feature namely feature1 (F1). Such result indicates that the ratio between triangles represented by feature 1 is the most discriminating feature between the ages of 18 to 96.
981 982 983 984 985 986 987 988 989
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID ***
23
990
990
991
991
992
992
993
993
994
994
995
995
996
996
997
997
998
998
999 1000
Fig. 10. FGNET-Adult Feature Selection Result using Random Forest.
1002
1002
1004 1005 1006 1007
1000 1001
1001
1003
999
FGNET Complete Set Feature Selection Results The feature ranking graph for the entire data set is depicted in Fig.11. The features ranking is performed using the Random Forest variable importance algorithm. The top five features are successively: Feature 1, Feature 15, Feature18, Feature19, and Feature 24.
1003 1004 1005 1006 1007
1008
1008
1009
1009
1010
1010
1011
1011
1012
1012
1013
1013
1014
1014
1015
1015
1016
1016
1017
1017
1018
1018 1019
1019 1020
Fig. 11. FGNET Complete Set Feature Selection using Random Forest.
1020
1021
1021
1022
1022
1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034
The most important feature among all the age groups is Feature 1 (F1) which is represented by Equation (16). Feature 1 is selected from the total set of features of the entire database. The FG-NET database includes several age groups which means that feature 1 is the most important feature in our proposed model for different age groups. The magnitudes of the most significant features extracted from different samples related to the same subject vary. However, the rate in which the features magnitudes change due to facial aging is constant for each subject. This is a very important factor for discriminating between the different classes related to different subjects. The results of applying Random Forest feature selection method over the FG-NET subsets clearly show that for each age group, there are specific facial features that contribute the most in discriminating between the different subjects.
1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
24 1035
5.2
ACCV-14 submission ID ***
Classification Results
1035 1036
1036 1037 1038 1039 1040 1041 1042 1043 1044 1045
In this section we discuss the results of evaluating the performance of the proposed system using several classifiers. Three classifiers are used in our experiments, namely: KNN, Nave Bayes, and SVM. A number of preprocessing methods are used alternately with each classifier. The classification results are presented in terms of classification accuracy, TP rate, FP rate, Precision, Recall, and F-Measure. The results of using the min-max scaling method is for preprocessing the feature vectors are illustrated in Table 1. The feature vectors are scaled using the min - max method and then passed to each of the three classifiers.
1050 1051 1052 1053 1054
1039 1040 1041 1042 1043 1044 1045
1047
1047
1049
1038
1046
1046
1048
1037
Table 1. Classification Results of the Min-Max Scaling Method. Classifier Classification Ac- True Posicuracy( tive Rate KNN 95.34 0.953 Nave 86.95 0.870 Bayes SVM 56.17 0.562
False Posi- Precision Recall tive Rate 0.002 0.957 0.953 0.005 0.885 0.870
FMeasure 0.954 0.874
0.009
0.499
1048 1049 1050 1051 1052 1053
0.487
0.562
1054
1055
1055
1056
1056
1057
1057
1058 1059 1060 1061 1062
The highest accuracy (95.34The classification results reported when the Zscore method is used for normalizing the feature vectors are illustrated in table 2. The goal of using this pre-processing method is to eliminate the unit of measurements by transforming the data into new scores with a mean of 0 and a standard deviation of 1.
1067 1068 1069 1070 1071
1060 1061 1062
1064
1064
1066
1059
1063
1063
1065
1058
Table 2. Classification Results of the Z-Score Normalization. Classifier Classification Ac- True Posicuracy( tive Rate KNN 97.56 0.976 Nave 79.91 0.799 Bayes SVM 48.17 0.482
1065
False Posi- Precision Recall tive Rate 0.000 0.977 0.976 0.004 0.804 0.799
FMeasure 0.976 0.799
0.011
0.482
1066 1067 1068 1069 1070
0.506
0.482
1071
1072
1072
1073
1073
1074
1074
1075 1076 1077 1078 1079
As Table 2 shows the ranking of the classifiers performances of the Z-score is the same as the min-max scaling. Again, the KNN classifier reported the best classification results. Using the Z-score normalization, the classification accuracy increased to above 97Normalizing the feature vectors using column vector normalization yielded also a good classification accuracy of above 94
1075 1076 1077 1078 1079
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** Table 3. Classification Results of the Column Vector Normalization.
1080 1081 1082 1083 1084 1085 1086
25
Classifier Classification Ac- True Posicuracy( tive Rate KNN 94.26 0.943 Nave 91.8 0.918 Bayes SVM 72.95 0.730
False Posi- Precision Recall tive Rate 0.019 0.944 0.943 0.027 0.919 0.918
FMeasure 0.942 0.918
0.092
0.656
1080 1081 1082 1083 1084 1085
0.835
0.730
1086
1087
1087
1088
1088 1089
1089 1090 1091 1092 1093 1094
The classification results of the unit length scaling data pre-processing method are illustrated in Table 4. The best classification results for SVM classifier are achieved when the feature vectors are scaled to unit length as a pre-processing step. Although the KNN classifier is able to achieve classification accuracy of more than 90
1096
1100 1101 1102 1103
1092 1093 1094
1096
Table 4. Classification Results of the Unit length Scaling.
1097
1097
1099
1091
1095
1095
1098
1090
Classifier Classification Ac- True Posicuracy( tive Rate KNN 83.48 0.835 Nave 76.89 0.769 Bayes SVM 86.22 0.862
False Posi- Precision Recall tive Rate 0.002 0.849 0.835 0.003 0.804 0.769
FMeasure 0.837 0.777
0.002
0.856
1098 1099 1100 1101 1102
0.884
0.862
1103
1104
1104
1105
1105
1106 1107 1108 1109 1110 1111 1112
The best classification results are reported when the feature vectors are normalized using Z-score normalization and classified using the KNN classifier. The results indicate that for different patterns of data, there are different preprocessing methods and different classifiers that are most appropriate for those patterns of data. In our model a combination of Z-score normalization method and KNN classifier is adopted.
6
Conclusions and Future Work
1117 1118 1119 1120 1121 1122 1123 1124
1108 1109 1110 1111 1112
1114 1115
1115 1116
1107
1113
1113 1114
1106
This work proposed a novel age- invariant face recognition system based on a geometric approach. The proposed geometric features are mainly triangular regions which surround and connect the main facial features (i.e. Eyes, nose, and mouth). A set of six triangular features were extracted the sample face images by connecting particular facial focal points. Twelve focal points were used in our model. These focal points represent the main facial features corners and center points. The coordinates of the focal points are attached to the annotated version of the FG-NET database. Later, those coordinates were used to calculate the areas and perimeters of each of the triangular regions using the standard Euclidean
1116 1117 1118 1119 1120 1121 1122 1123 1124
ACCV2014 #***
ACCV2014 #*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
26 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154
ACCV-14 submission ID ***
distance formula. Following, a set of mathematical relationships called Triangles similarity proportions (TSP) was developed to measure the degree of similarity between every two triangles. The significance of the proposed triangular features resides in their ability of showing the unique craniofacial growth rate for each subject. Additionally, such geometric features show the way the main facial features are correlated with each other through different age spans. A total of thirty features was created for each sample image in the database using the proposed Triangles similarity proportions (TSP) mathematical relationships. The entire feature vectors related to the same subject were used to form a class for each subject. Thus, a dataset of 82 classes was created and passed to the feature selection and classification algorithms. Feature selection was performed using the Random Forest algorithm. Different set of the most significant features was produced for different age groups using the RF algorithm. This indicates that there are different facial aging patterns for different age groups. The proposed system performance was evaluated in terms of classification accuracy, TP rate, FP rate, Precision, Recall, and F-Measure. Different data preprocessing methods were used with different classifiers over the developed dataset. The highest classification accuracy was achieved when the Z-score normalization method was used for pre-processing the feature vectors and the KNN classifier was used for classification. The system performance in terms of classification accuracy and processing time was satisfying and promising towards robust real time age-invariant face recognition. The experimental results show that the proposed geometric features are unique for each subject and that there is a unique craniofacial growth rate for each individual. Also, the results indicate that the facial aging process affects the way the main facial features are correlated with each other. In our future work, we are planning to evaluate our proposed system using different feature selection methods in conjunction with multiple classification methods. Also, the system will be evaluated over the largest publicly available face aging database, namely: MORPH face aging database, to test the system’s ability of handling large populations.
1156
6.1
1157
The list of references is headed “References” and is not assigned a number in the decimal system of headings. The list should be set in small print and placed at the end of your contribution, in front of the appendix, if one exists. Do not insert a page break before the list of references if the page is not completely filled. Citations in the text are with square brackets and consecutive numbers, such as [3], or [4, 5].
1159 1160 1161 1162
References
1167 1168 1169
1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154
1156 1157 1158 1159 1160 1161 1162
1164
1164
1166
1127
1163
1163
1165
1126
1155
1155
1158
1125
References 1. Authors: The frobnicatable foo filter (2012) ACCV12 submission ID 512. Supplied as additional material accv12-512-frfofi.pdf. 2. Authors: Frobnication tutorial (2012) Supplied as additional material accv12-512-frtut.pdf.
1165 1166 1167 1168 1169
ACCV2014
ACCV2014
#***
#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ACCV-14 submission ID *** 1170 1171 1172 1173 1174
27
3. Alpher, A.: Frobnication. J. of Foo 12 (2002) 234–778 4. Alpher, A., Fotheringham-Smythe, J.P.N.: Frobnication revisited. J. of Foo 13 (2003) 234–778 5. Herman, S., Fotheringham-Smythe, J.P.N., Gamow, G.: Can a machine frobnicate? J. of Foo 14 (2004) 234–778
1170 1171 1172 1173 1174
1175
1175
1176
1176
1177
1177
1178
1178
1179
1179
1180
1180
1181
1181
1182
1182
1183
1183
1184
1184
1185
1185
1186
1186
1187
1187
1188
1188
1189
1189
1190
1190
1191
1191
1192
1192
1193
1193
1194
1194
1195
1195
1196
1196
1197
1197
1198
1198
1199
1199
1200
1200
1201
1201
1202
1202
1203
1203
1204
1204
1205
1205
1206
1206
1207
1207
1208
1208
1209
1209
1210
1210
1211
1211
1212
1212
1213
1213
1214
1214