A Simple Geometric Model for Age-invariant Face

1 downloads 0 Views 420KB Size Report
include skin coloration, lines and/ (or) wrinkles. Shape and texture ..... En (endocanthion): Is the eye fissures interior corners, in the location where the eyelids ...
ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

A Simple Geometric Model for Age-invariant Face Recognition

000 001 002

000 001 002 003

003 004

Anonymous ACCV 2014 submission

004 005

005

Paper ID ***

006

006

007

007

008

008

009

Abstract. Whilst facial recognition systems are vulnerable to different acquisition conditions, most notably lighting effects, and pose variations; their particular level of vulnerability towards the effects of facial aging is yet to be studied adequately. The Face Recognition Vendor Test (FRVT) 2012 annual statement estimated deterioration in the face recognition systems performance due the effect of aging. There was about 5. . .

010 011 012 013 014

017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044

010 011 012 013 014 015

015 016

009

1

Introduction

Biometric verification technological innovation plays a significant role in the filed of information privacy and security. Recently, biometric verification has gained an increasing attention from researchers in the computer vision field. Biometrics include the techniques used inorder to distinctly identifying individuals in accordance with some form of unique physiological or potential tendency of characteristics. These include fingerprints, facial recognition, palm geometry, iris identification, vein recognition, signature recognition, and voice recognition [1]. Facial recognition is considered as the most ideal biometric technique [1]. It performs the authentication task based on the personal facial features of human beings. Related studies have involved face detection, face expression recognition [2], face tracking, and identity recognition. The accuracy of facial recognition is usually affected by substantial intra-class variations due to underlying factors, including age, pose, lighting, and expression. As a result, the majority of the current works focuses on the efforts of minimizing the effects of such variations that could deteriorate the overall performance of the facial recognition [3][4]. Among these works, the facial aging effects on the performance of the facial recognition systems have not been thoroughly studied. Due to this, there is a need to develop facial recognition algorithms which are generally invariant towards aging variations [5]. These algorithms are called age-invariant facial recognition methods. Facial aging is the compound process that changes both texture and shape of the facial area. Shape variations include craniofacial whereas texture variations include skin coloration, lines and/ (or) wrinkles. Shape and texture are both considered as the common facial aging patterns. Since the aging process occurs throughout different ages, it can be classified into the above-mentioned aging patterns. As a result, an age-invariant facial recognition method should account for these aging patterns [6]. The applications of age-invariant facial recognition are: lost children investigations, human computer interaction, and passport photo

016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

2 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089

ACCV-14 submission ID ***

verification. These applications possess two basic attributes: 1) Large difference in age between gallery and probe photos (photos acquired within enrollment and also authentication phases), and 2) failure to acquire the person’s facial photo in order to update the template [5]. Furthermore, real-world conditions continue to be challenging, mainly due to the great deal of changes in the process of acquiring faces. For example, criminal offense inspections, and society safety organizations must frequently fit a probe photo with registered photos in a database. This may depict significant differences in facial characteristics as a direct consequence of the presence of different age groups. Numerous attempts have already been made to overcome such challenges. The research on age-invariant face recognition has increased since 2002. Among the early studies didecated to age-invariant facial recognition from digital images is Kwon and Lobos study[7]. In their system, they used certain facial anthropometric measurement to represent shape variations due to aging. Moreover, they addressed texture variations using the thickness of facial lines and wrinkles. Ling et al. [8], [9] investigated the effect of age variations on the performance of facial recognition during a passport picture authentication procedure. They presented a non-generative concept, that extracts face operator using gradient orientations of images from several image resolutions. Following, Support vector machines are used to perform age-invariant face verification. Lanitis et al. [10] [11] presented a method which generally performs perform age estimation using the active Appearance model (AAM). They developed a model for facial aging that incorporates both texture intensity and shape variations information. Geng et al. [12], [13] studied an aging pattern subspace based on the concept that similar faces aged in similar ways. Guo et al. [14] proposed for facial age estimation, some sort of learning pattern of age manifold. The aging facial features are extracted and also altered locally. They used a robust repressor to predict the ages of human beings. Furthermore, Fu and Huang [15] proposed a technique for learning manifold where a low dimensional manifold could be identified from a group of facial images arranged according to age. To demonstrate facial age estimation, they applied quadratic and Linear regression functions on the low-dimensional feature vectors through the particular manifolds. Ramanathan and Chellapa [16] presented a functional two-step strategy to model the aging process of adults. Such model is composed of facial texture and shape variations. In their model shape variations are modelled by developing physical layouts which usually characterize the abilities of the facial muscular tissues. Drygajlo et al. [17] established the effective use of a Q-stack classifier inorder to perform facial aging authentication. Mahalingam and Kambhamettu [18] developed a probabilistic method for age-invariant face verification. In their method, descriptors of facial feature are extracted from the hierarchical apperance of face photos. These descriptors are then applied in a probabilistic model designed for verification applications. Park et al. [19] [20] introduced a facial aging simulation approach, which is able to learn shape and texture patterns of aging using Principle Component Analysis (PCA) coefficients. Suo et al. [21] proposed a dynamic model to imitate the process of facial aging. The proposed model is a facial growth model influenced by the characteristics of age and

045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

3

hair. The model describes each photo using a multi-layer and-Or graph which incorporates adjustments of hairstyle. It also recognizes deformations, shape, and wrinkles variations within the facial components. Singh et al. [22] employed a facial aging approach that projects the probe and gallery facial images in a polar coordinate space. Moreover, it decreases the changes in the characteristics of face due to facial aging. Tiddeman et al. [23] adopted a face age simulation approach. They presented a functional wavelet transformation to model composite facial images. Burt and Perrett [24] proposed a face age simulation technique that uses texture and shape information for forming composite facial images comprising several age ranges. They evaluated and analyzed particular facial cues that are influenced by age. Ramanathan et al. [25] presented an age-difference Bayesian classifier to be used with passport photo verification applications. Most of the previously proposed approaches attempt to eliminate such challenges by imitating the facial aging models. However, these are still far from hands-on utilization. Our system is designed to deal with the facial aging effects on the performance of facial recognition systems. We are proposing a simple, fast, yet robust geometric approach. Our work includes developing a mathematical and a geometrical model to maintain the degree of similarity between six triangular facial features connecting and characterizing the main facial features. The system is anticipated to work with real time applications such as airports identity verification systems where the overall processing time and false positive rate are critical factors. The remainder of this paper is organized as follows: Section 2 describes the proposed face recognition geometrical model. Section 3 is dedicated to the feature selection methods, data normalization methods, and classifiers used throughout the experimental part. The results and discussions are presented in Sections 4. The conclusion and future work are under Section 5.

2

The Proposed Geometric Model

120 121 122 123 124 125 126 127 128 129 130 131 132 133 134

092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

117 118

118 119

091

116

116 117

090

The proposed geometric model consists of several phases. A block diagram of the main phases of our proposed geometric model is illustrated in Fig.1. First the test image will be passed to the system through a terminal (either a digital camera or video camera in case of surveillance system). Face detection is considered the initial phase of every facial recognition system. In our work, the well-known Viola and Jones face detector [26] is used for detecting and also cropping the facial area of the sample images. Such a detector is effective for real time applications. Following detecting the facial area, a total of 12 facial feature points is localized. The FG-NET [27] database images are available with annotation of the feature points where the coordinates of such points attach to the database. In Fig. 2 two sample images collected from the FG-NET face aging database with annotations of the facial feature points are shown. As Fig. 2 shows, there is a total of 64 facial features (focal) points in each annotated sample image in the FG-NET database. Nevertheless, only 12 focal points are considered in our proposed model as illustrated in Fig. 4. The annotated version of the FG-NET database is accompanied by a list of the focal point coordinates for each sample

119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

4 135 136 137 138 139 140 141 142 143

ACCV-14 submission ID ***

image. The coordinates of the selected 12 focal points will be passed to the system and the rest of the sample images focal point coordinates will be ignored. The localized focal points will be used to form 6 different triangular regions surrounding the main facial features, namely: nose, eyes, and mouth. Subsequently, the parameters of the triangular regions (i.e., perimeters and areas) are calculated using the standard formulas. Finally, the parameters of the triangles are passed to a set of developed mathematical equations to build the feature vectors for the sample images. The main phases of our proposed model will be discussed in more details in the following.

135 136 137 138 139 140 141 142 143

144

144

145

145

146

146

147

147

148

148

149

149

150

150

151

151

152

152

153

153

154

154

155

155

156

156

157

157

158

158

159

159

160

160

161

161

162

162

163

163

164

164

165

165

166

166

167

167

168

168

169

169

170

170

171

171

172

172

173

173

174

174

175

175

176

176

177

177

178

178

179

179

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID ***

5

180

180

181

181

182

182

183

183

184

184

185

185

186

186

187

187

188

188

189

189

190

190

191

191

192

192

193

193

194

194

195

195

196

196

197

197

198

198

199

199

200

200

201

201

202

202

203

203

204

204

205

205

206

206

207

207

208

208

209

209

210

210

211

211

212

212

213

213

214

214

215

215

216

216

217

217

218

218

219

219

220

220

221

221

222

222

223

223

224

224

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

6

ACCV-14 submission ID ***

225

225

226

226

227

227

228

228

229

229

230

230

231

231

232

232

233

233

234

234

235

235

236

236

237

237

238

238

239

239

240

240

241

241

242

242

243

243

244

244

245

245

246

246

247

247

248

248

249

249

250

250

251

251

252

252 253

253

Fig. 2. Annotated Sample Images from the FG-NET Database.

254

256

256 257 258 259 260 261 262 263 264 265 266 267 268 269

254 255

255

2.1

Face Detection and Cropping

The facial area is extracted from the sample images using the Viola and Jones face detector [26] of Boosted Cascades approach. Such approach implements the AdaBoost boosting method. In AdaBoost boosting method, simple features which discriminate between non-face and face areas in an image are combined in cascades associated with sophisticated classifiers. A rectangular window of pixels in the image is compared with the related rectangular window in the previous image. Such a procedure is performed through moving the rectangle in the present image upward, downward, right, and left. Subsequently, image matching is carried out based on the level of energy. Thus, lower level of energy in the output increases the probability that a person has been moving in this direction. Such types of outputs are used to construct a human detector which can be trained using the AdaBoost method [26]. The detector runs on a test

257 258 259 260 261 262 263 264 265 266 267 268 269

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 270 271 272 273 274 275 276

7

image and the score for each window is estimated by summing up the classifier scores from every level of the cascade. The image window having the maximum possible score is assumed to be a face region in the test image. After localizing the facial area, a bounding box is created around the face. Following, automatic cropping of the face is performed using the coordinates of the bounding box provided from the face detector. A sample image from the FGNET database before and after applying face detection and cropping is shown in Fig. 3.

270 271 272 273 274 275 276

277

277

278

278

279

279

280

280

281

281

282

282

283

283

284

284

285

285

286

286

287

287

288

288

289

289

290

290

291

291

292

292

293

293

294

294

295

295

296

296

297

297

298

298

299

299

300

300

301

301

302

302

303

303

304

304

305

305

306

306

307

307

308

308

309

309

310

310

311

311 312

312 313 314

Fig. 3. Face Sample Image from the FG-NET Database before and After Face Detection and Cropping.

313 314

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

8 315

2.2

316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332

ACCV-14 submission ID ***

Face Focal Points Localization and Triangular Features Extraction

Craniofacial anthropometry is the scientific discipline that encompasses cranium measurements and characteristics, facial measurements, and landmarks (facial focal points) [28]. Our work considers 12 of such focal points which are mainly focal points that represent corners and centers of the main facial features as illustrated in Fig. 4. Such focal points are usually localized using an Active Appearance Model (AAM) [29]. Active Appearance Model (AAM) designates 64 distinct facial focal points. As we mentioned before, the FG-NET database images are available with annotation of the focal points where the coordinates of such points are provided. In the field of craniofacial anthropometry, facial focal points are given scientific notations [30]. Here are examples of such notations: En (endocanthion): Is the eye fissures interior corners, in the location where the eyelids come in contact with each other. G (Glabella): Is essentially the most dominant spot around the plane of the median sagittal in between the supraorbital. Ridges.ex (exocanthion): Is typically the eye fissures outer corner; the point where the eyelids meet. Gun (Jonathan): Is within the face mid line and it is considered the lowest spot in the lower chin area circumference.

315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332

333

333

334

334

335

335

336

336

337

337

338

338

339

339

340

340

341

341

342

342

343

343

344

344

345

345

346

346

347

347

348

348

349

349

350

350

351

351

352

352

353

353

Fig. 4. The Main Focal Points in the Proposed Model.

354

356

356 357

2.3

358

Once the facial focal points are localized, the triangle vertex coordinates will be determined. Following, the Euclidean distances [31] [32] between the triangle

359

354 355

355

Calculating the Parameters of the Triangles

357 358 359

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 360 361 362

9

vertexes coordinates are calculated in order to compute the perimeters and areas for each triangle. Fig. 5 depicts the six triangular regions which represent the geometric features in our proposed model.

360 361 362

363

363

364

364

365

365

366

366

367

367

368

368

369

369

370

370

371

371

372

372

373

373

374

374

375

375

376

376

377

377

378

378

379

379

380

380

381

381

382

382

383

383

384

384

385

385

386

386

387

387

388

388

389

389

390

390

391

391

392

392

393

393

394

Fig. 5. The Triangular Features.

394

395

395

396

396

397

397

398 399 400 401 402 403 404

The triangles parameters are presented by the notations Pr and Ar which represent the perimeters and areas successively. The symbol (r) represents the designation of the particular triangle. For instance, and stand for the perimeter and area of triangle number 1. As a final stage, the areas and perimeters of the triangles are passed to a number of mathematical equations to create feature vectors for each sample image. The mathematical equations are discussed in the following section.

398 399 400 401 402 403 404

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

10 405

2.4

ACCV-14 submission ID ***

Developing the Mathematical Triangles Relationships

405 406

406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421

In the science of computational geometry, triangles are considered similar when their shapes are similar, but not necessarily having identical dimensions [33]. Such a scientific fact motivated us to develop mathematical relations between the 6 triangular areas. There is a massive human population in our planet estimated recently as about 7 Billion. Therefore, it is unrealistic to carry out a direct oneto-one comparison using the triangles parameters to perform face recognition. As a different approach, we propose using a set of triangle proportion ratios. As a result, a set of 15 mathematical equations is developed. These equations are used to estimate the degree of similarity between one triangle and another. Relatively, any two triangles are considered similar if the following geometric relationship in equation (1) is fulfilled:

424 425 426 427 428

(1)

431

434 435 436 437 438 439

442

(2)

445 446 447 448 449

412 413 414 415 416 417 418

420

423 424 425 426 427 428

430 431 432

A statistical analysis was carried out over the collected data from the dataset. The dataset consists of the feature vectors created using our proposed geometric model. The results of the statistical analysis revealed that there is a considerable distinction in the data related to each different subject. The similarity proportion between every two triangles is calculated using the triangle similarity proportion (TSP). For example the TSP between triangle 1 and triangle 2 can be demonstrated as follows:

433 434 435 436 437 438 439 440

(T 1, T 2) =

Ar1 × Ar2 ×

pr22 pr12

(3)

441 442 443

443 444

411

429

Ari × prj2 T SP = Arj × pri2

440 441

410

422

Where A and p, are triangles perimeters and areas successively whereas i and j are the designations of the two triangles. To develop the triangle similarity proportion, equation (2) is used. The triangle similarity proportion is used to quantify the degree of similarity between every two triangles. The triangle similarity proportion relationship is given the designation TSP. Where i and j are positive numbers representing the triangle indexes such as, i, j=1 6.

432 433

409

421

429 430

408

419

prj2 Ari = 2 Arj pri

422 423

407

A set of thirty TSP relationships is calculated for all of the sample images collected from the FG-NET database. The output of the TSP relationships is used to build a row feature vector in the dataset. Each row vector is related to one sample image. The set of rows features vectors related to the same subject is used to build a class for this particular subject. Since the FG-NET database consists of 82 subjects, a dataset of 82 classes is created.

444 445 446 447 448 449

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 450

3

451

11

Feature Selection, Classification, and Data Normalization Techniques

454 455 456 457 458 459 460 461

In this section we first discuss the main concept of data pre-processing and its applications. Data pre-processing is a first step for several data mining applications [34]. Moreover, a number of data pre-processing techniques including normalization, scaling, and standardization are discussed. Since a set of the most important features in any model can be demonstrated using features selection algorithms, we discuss the concept of feature selection and particularly a type of Regularized trees feature selection approach known as Random Forest feature selection. Finally, a number of common classification algorithms are discussed, namely: K-Nearest Neighbor, Nave Bayes, and Support Vector Machines (SVM).

464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489

3.1

Data Normalization and Standardization

Data normalization usually attempts to give all the features the same weight. Normalization is ideal for classification algorithms, including neural networks as well as distance metrics, for example nearest-neighbor classification and clustering. In distance-based classification methods, normalization assist in avoiding features having massive ranges surpassing features having smaller ranges. Additionally, normalization is beneficial when there is no prior perception of the data [35]. Typically, data sets are required to be normalized or sometimes standardized to make the entire variables in proportion with each other. The most popular techniques for feature normalization are [36]: Basic Rescaling. Feature Standardization (zero-mean and unit variance for every single feature within the dataset). Basic Rescaling With basic re-scaling, the main objective is to re-scale the data along each data dimension to ensure that the final data vectors are within the range [0, 1] or [- 1, 1] (depending on the dataset). This is often ideal for later processing since several default parameters (e.g., epsilon in PCA-whitening) deal with the data as if it has been scaled to some standard range [37]. Feature Standardization Feature standardization is the process of setting each dimension of the data (independently) to have zero-mean and unit-variance. This is actually the most popular approach for data normalization and is often used by supervised learning algorithms (e.g., with Support Vector Machines (SVMs), feature standardization is usually recommended as an effective preprocessing stage) [38]. In practice, feature standardization requires calculating the mean for each dimension (throughout the dataset) and then subtracting the mean from each data element in the same dimension. Subsequently, each dimension is divided by its standard deviation [39].

491

3.2

492

It is commonly useful to analyze how far exactly in term of standard deviation a data element deviates from the mean. Such approach is known as z-score normalization [40]. Although many datasets have a relatively normal distribution,

494

454 455 456 457 458 459 460 461

463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490

490

493

453

462

462 463

451 452

452 453

450

Z-score normalization

491 492 493 494

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

12 495 496 497 498 499 500 501 502

ACCV-14 submission ID ***

it is a useful approach to compare data elements coming from different populations which may have different means and standard deviations [41]. The Z score offers a useful measurement for comparing data elements from different data sets. Typically the formula used for calculating the Z - score is as follows: χ−µ z= (4) S Where: Z: z-score; x: data element; : mean of the data set; s: standard deviation [38].

505 506 507 508 509 510

3.3

0

513

518 519 520 521 522

(5)

525 526 527 528

Column vector Normalization

Column vector normalization of the data is identified as the square of the sum of all column elements in a dataset divided by the square root of the sum of squares of all the elements [38] as illustrated in equation (6): Pn 2 0 ( i=1 χji ) χji = Pn (6) 2 i=1 χji 0

Where χji is a column element before normalization, χji is a column element after normalization, j is a positive number such that j=1,, m which represents an index for the dataset columns, m is the number of columns in the dataset, i is a positive number such that i=1,, n which represents the index of column element, and n is the total number of elements in each column.

531

An alternative data scaling approach commonly used in machine-learning is to scale the elements in a feature vector to have a length of one. This requires dividing each element by the Euclidean length of the vector. In most applications (e.g. Histogram features) it might be more effective to use the L1 norm (i.e. Manhattan or City-Block) of the feature vector [38]: 0 χ (7) χ = |χ|

535 536 537 538 539

506 507 508 509 510

512

514

517 518 519 520 521 522 523 524 525 526 527 528 529

3.5

534

502

516

530

533

501

515

3.4

529

532

500

513

Where x is the original data, x’ is the normalized data [38].

523 524

499

511

χ − min(χ) max(χ) − min(χ)

515

517

498

505

χ =

512

516

497

504

Min-Max Scaling

Typically the simplest feature scaling approach is to re-scale the data to have a range of [0, 1] or [-1, 1]. The approach is used to insure that features related to different classes are distinct. The range of the data is usually specified based on the characteristics of the data. The standard formula of the min-max scaling is represented by equation (5):

511

514

496

503

503 504

495

Scaling to unit length

530

0

Where χ is the original data, and χji is the normalized data.

531 532 533 534 535 536 537 538 539

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 540

3.6

13

Feature Selection

542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564

In statistics and machine learning, feature selection, also known as attribute selection, variable selection or variable subset selection is defined as the approach of deciding on a subset of appropriate features to be used in model construction. When using any feature selection method, the main assumption is that the data include many redundant or possibly insignificant features. Insignificant features are the ones which offer little information compared to the selected features. Feature selection methods are often associated with the feature extraction methods. Typically, feature extraction generates a full set of features using functions, whilst feature selection generates a subset of features [42]. Typically, the features generated using a decision tree as well as an ensemble tree is known to be redundant. A recent effective technique known as regularized tree [43] can be used for feature selection. The Regularized tree uses the same set of variables as the one used by the prior tree nodes for splitting the present node. Typically, Regularized trees need to construct a single tree model (or a single tree ensemble model). Thus, Regularized trees tend to be computationally effective [43]. Regularized trees can handle numerical and categorical features, nonlinearities and interactions. They are invariant towards scales (units) of attribute and also insensitive to outliers. Therefore, they require a minor data pre-processing including normalization [43]. Regularized Random Forest (RRF) [43] is one form of regularized trees. Because of the interesting capabilities of Random Forest as a feature selection method, we decided to use it for feature selection in our experiments. The Random Forest feature selection algorithm [44] is discussed in details in the subsequent section.

567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584

542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565

565 566

540 541

541

3.7

Random Forest Feature Selection

Random forest is a classification method which possesses outstanding efficiency. Random Forest was introduced first by Leo Breiman [44]. From a practical point of view, it can be defined as a classifier which combines a collection of classification trees inside an integrated scheme to quantify the significance of every feature [45]. Random forest can be used in the case where the amount of variables is higher than the amount of observations. It is also applied in case there are more than two classes. It produces measures of the importance of variables. The set of classification trees is constructed by means of a bootstrap pattern related to the data. Additionally, during every node split, the list of predicted variables is usually an arbitrary subset of the variables. For tree construction, Random Forest makes use of both bagging (bootstrap aggregation an effective method for merging unpredictable learners [46]), and randomly selected variables. All the constructed trees are unpruned in order to acquire low-bias trees. By doing so, random variable selection and bagging lead to the minimal correlation of the particular individual tree. Typically, the algorithm establishes some sort of ensemble that acquires both minimal bias and maximal variation (by averaging across a significant collection of low-bias, high distinction, and minimal

566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

14 585 586 587 588 589 590 591 592 593 594

ACCV-14 submission ID ***

correlation trees). Random forest offers outstanding capabilities for classification compared to support vector machines. It offers a number of features which makes it appropriate for the following conditions: It is usually used when there are more attributes compared to observations. It works effectively in both multiclass and also two-class conditions. It offers useful predictive capabilities even if the majority of predicted features is noise. Thus, it does not require preselecting the features [46]. Does not over-fit. It can deal with a combination of continuous and categorical predictors. Feature interactions among the predictor variables. Typically, the outcome is invariant towards monotonic changes associated with the predictors. Delivers measures related to variables importance.

597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629

586 587 588 589 590 591 592 593 594 595

595 596

585

3.8

Definition of the Random Forest Feature Importance Measures

The initial measure of the future importance is usually calculated by permuting Out Of Bag (OOB) data. Typically, for each tree the error of prediction within the Out Of Bag part of the data can be recorded. After that, the same procedure is performed over the permuted predictor variables. The difference between the recoded error of prediction before and after permuting every predictor variable should be subsequently averaged over all of the trees. It should also be normalized using the standard deviation of the recorded differences. Whenever the standard deviation of the differences is equal to Zero for any variable, the normalization will not be carried out. The other measure is the overall reduction in node impurities which results from splitting around the variable and averaging across the trees. For classification using Random Forest, the node impurity can be calculated using the Gini index measure. With respect to regression, such measure can be calculated basically by the residual sum of squares [47]. The output of the random forest feature selection algorithm is often visualized by the ”Gini importance” [48] which can also be used as an indicator for feature relevance. Such feature importance score (Gini importance) offers, a relative ranking for the set of features. At every node throughout the binary tree T of the random forest, the perfect split is searched using the Gini impurity i() which is an important computationally effective approximation for the entropy. Gini impurity determines how good a possible split for isolating the samples from the two classes in a particular node. ηk ρk = (8) η

596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620

The term represented by equation (8) is the fraction of the ρk samples from a class k= 0, 1 out of the total of η samples at node τ , and the Gini impurity i() is calculated as: i(ρk ) = 1 − ρ21 − ρ20 (9)

621

Gini impurity decreases by a value (i) as a result of splitting and sending the samples to two sub-nodes τl and τr . The respective sample fractions ρ1 = ηη1 , and ρ1 = ηηr . The threshold tθ on a variable θ is defined as follows:

625

i(τ ) = i(τ ) − ρl i(τl ) − ρr i(τr )

629

(10)

622 623 624

626 627 628

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 630 631 632 633 634 635 636 637 638 639

15

During an extensive search over all values of θ available at each node (random forest usually limits such search to some random subset incorporating available features [48]), and also over all possible thresholds tθ , the pair θ, tθ Which leads to the highest value of i is established. The reduction in Gini impurity due to the optimal split is recorded and accumulated for all variables independently at every node within a tree T. The quantity represented by the equation (11) -known as the Gini importance IG - indicates how frequently a specific feature θ has been selected for a split as well as how discriminating is its value. XX IG (θ) = iθ (τ, T ) (11) r

640

τ

3.9

Classifiers

645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674

632 633 634 635 636 637 638 639 640

642 643

643 644

631

641

641 642

630

Classification is the concept of deciding to which of any set of classes (subpopulations) a new observation belongs. This is accomplished when a class membership of a set of training data is recognized. Typically, the individual observations are converted into a set of quantifiable attributes. Such attributes could variously be ordinal (e.g. ”Large”, ”medium” or ”small”), categorical (e.g. ”A”, ”B”, ”AB” or ”O”, for blood type), integer-valued (e.g. The amount of incidences of parting words within an email), as well as real-valued (e.g. A measurement of blood pressure) [49]. An algorithm which performs classification, especially in an effective way is called a classifier. In some cases the term classifier also indicates the mathematical function carried out in a classification algorithm. Such function map input data to some class. In machine learning, classification is considered an example of supervised learning (i.e. Learning in a way a training set of correctly classified observations can be acquired). The approach of unsupervised learning is known as clustering or sometimes clusters analysis. Furthermore, it includes grouping data into classes according to some measure of inherent similarity (e.g. The distance between instances which are considered as vectors in a multi-dimensional vector space) [49]. In statistics, classification is usually carried out using logistic regression or sometimes a similar procedure. The attributes associated with the observations are known as explanatory variables (or Regress, independent variables, etc.). Additionally, the classes being predicted are referred to as outcomes, which are considered as potential values for the dependent variable. In machine learning, the observations are usually referred to as instances, and the explanatory variables are called features (grouped into a feature vector). Additionally, the potential categories being predicted are called classes [50]. Bayesian Classifiers Bayesian classification techniques offer a natural way for acquiring any available information about the sizes of different sub-populations within the overall population [50]. Bayesian methods are typically computationally expensive. Approximations related to Bayesian clustering rules were developed before the days Markov chain Monte Carlo computations were developed. Few of Bayesian techniques require the calculation of group membership probabilities. Such methods

644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

16 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691

ACCV-14 submission ID ***

offer a more descriptive outcome which includes data analysis compared to a simple attribute description which is limited to giving one group-label for each new observation[50]. Binary and Multiclass Classifiers There are two modes of classification: binary classification and multiclass classification. In binary classification (a better recognized task), simply two classes are included, whilst multi-class classification consists of assigning an object to one of many classes [51]. Since several classification techniques have been developed especially for binary classification, multi-class classification usually requires combining several binary classifiers. Linear Classifiers A wide group of classification algorithms is often expressed in terms of a linear function which assigns a score for each potential category k. This is accomplished by combining the feature vector of instance with a vector of weights using a dot product. Typically, the predicted category could be the one having the maximum score. Such type of score function is considered as a linear predictor function and has the following general form:

693

score(Xi , k) = βk .Xi

(12)

697 698 699 700 701

677 678 679 680 681 682 683 684 685 686 687 688 689 690 691

693 694

694

696

676

692

692

695

675

Examples of such algorithms are: Linear discriminant analysis (LDA). Support vector machines (SVM). A selected set of classifiers each represents one of the above mentioned classification approaches is discussed briefly and used to perform classification in the experimental part. K-Nearest Neighbor Classifier, Nave Bayes Classifier, and Support Vector Machines (SVM) represent Binary and Multiclass Classifiers, Bayesian Classifiers, and Linear Classifiers successively [52].

695 696 697 698 699 700 701

702

702

703

703

704

3.10

K-Nearest Neighbor Classifier

706 707 708 709 710 711 712 713 714 715 716 717 718 719

704 705

705

In the field of pattern recognition, the k-nearest neighbor (k-NN) algorithm is actually a non-parametric technique for classifying objects depending on the nearest training examples in the feature space. Also, k-NN is a form of examplebased learning or even lazy learning in which the function is just predicted locally and all of the computations are actually delayed right up until classification. The k-nearest neighbor technique is considered the simplest among all machine learning algorithms. In the k-nearest neighbor technique, any object is usually classified via a vast majority vote among its own neighbors (where k is often a small positive integer). When k is equal to one, the object will be classified to the nearest neighbor class. The same technique can be used for regression by simply setting the property value of this object as the mean of the values of the k nearest neighbors. It is typically useful to weight the contributions among the neighbors to ensure that the nearer neighbors contribute more in the mean compared with the further neighbors [53].

706 707 708 709 710 711 712 713 714 715 716 717 718 719

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 720

3.11

17

Nave Bayes Classifier

720 721

721 722 723 724 725 726 727 728 729 730 731

The Nave Bays classification algorithm is an easy to use probabilistic classification algorithm. It is based on implementing a Bays rule. The Bays theory offers a way of calibrating the actual posterior probability Pr (χr \ br ), from Pr (br ), Pr (χr , and Pr (χr \ br ). The Nave Bays classification algorithm presumes that the particular impact on the value associated with some predictor χr in a specific class C is actually independent of the values associated with all of the other predictors. This particular theory is known as class conditional independence [54]. (Pr (χr \ (br )Pr (br )) Pr (χr \ br ) = (13) Pr (χr )

732

Pr (br \ χr ) = Pr (χr \ br1 ) × Pr (χr \ br2 ) × ... × Pr (χr \ brn ) × Pr (br )

(14)

3.12

Support Vector Machines (SVM)

739 740 741 742 743 744 745 746 747 748 749 750 751 752

In the field of machine learning, Support Vector Machines (SVMs, also known as support vector networks) are supervised learning methods and algorithms which analyze data and recognize models. They are used for regression analysis as well as classification. The standard SVM takes an amount of input data and predicts which class among two possible classes forms the output [55]. This makes it some sort of non-probabilistic binary linear classification algorithm. Having a group of training examples, each one labelled as one of two groups, an SVM training technique creates a complete model. It assigns new examples directly into one class of the two classes. An SVM model is a representation of the examples. It can be described as points in a space which are mapped so that the examples of the independent classes are separated by a gap that is as wide as possible. Different examples are mapped into that same space and also predicted to fit into a class based on which part of the gap they fit in. In addition to performing linear classifications, SVMs can efficiently perform non-linear classifications using what is known as the kernel trick. This is accomplished by implicit mapping of their inputs directly into high-dimensional feature spaces [55].

4

Experiments

757 758 759 760 761 762 763 764

727 728 729 730 731 732

735

737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752

754 755

755 756

726

753

753 754

725

736

736

738

724

734

734

737

723

733

733

735

722

The experiments are conducted over the publicly available (FG-NET) face aging database [29]. There is a total of 1,002 color and grayscale high resolution images of eighty two subjects of several ethnicities with a considerable variety of expression, pose, and lighting conditions. The image size in pixels is roughly 400x500. The age range is 0-69 years and on average there is a total of 12 images for each subject. For classification we used WEKA built in classifiers (WEKA serves as a set of machine learning algorithms pertaining to data mining tasks) [56]. The experiments are conducted in two phases. In the first phase of the experiments we investigate the significance of the proposed geometric features in different

756 757 758 759 760 761 762 763 764

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

18 765 766 767 768 769 770 771 772 773

ACCV-14 submission ID ***

age groups. The main goal of this part of the experiments is to show which features contribute the most during each age group. For this purpose the FG-NET database is divided into three subsets. The three subsets represent childhood, teenage, and adulthood age groups successively. In the second part of the experiments, a number of classification algorithms, namely: K-Nearest neighbor (KNN), Nave Bayes, and Support Vector Machines (SVM) classifiers are used to estimate and validate the performance of the system. In the next section the feature subsets produced by the Random Forest feature selection are illustrated. Moreover, the classification results are illustrated. 4.1

Features Selection Experiments

778 779 780 781 782 783 784 785 786 787 788 789 790 791 792

The FG-NET database is divided into three subsets based on the protocol adopted in reference [53] for validating face recognition systems over the FGNET database as follows: FGNET-8 consists of all the data collected at the ages between 0 and 8. It includes 290 facial images from 74 subjects, in which 580 intra-person pairs and 6000 inter-person pairs are randomly generated for verification. FGNET-18 consists of all the data collected at the ages between 8 and 18. It includes 311 facial images from 79 subjects, in which 577 intra-person pairs and 6000 inter-person pairs are randomly generated for verification. FGNETadult consists of all the data collected at the ages 18 to 69 and mainly frontal view. It includes 272 images from 62 subjects, in which 665 interpersonal pairs and about 6000 intra-personal pairs are randomly generated for verification. This protocol for dividing the FGNET database is used to determine which features contribute the most for discriminating between the subjects classes during each age group. To accomplish this, a Random Forest algorithm is used to quantify the importance of each feature and to select a subset of the most significant features among the thirty extracted features for each age group.

794

4.2

795

In this part of the experiments classification is conducted using three classifiers. The ability of each of the three classifiers in discriminating between the classes (which are built using the proposed geometric features) is evaluated. In pattern recognition such procedure is beneficial in deciding which classifier is the most appropriate for classifying particular data patterns [54]. As mentioned before, classification is conducted in the WEKA platform using three supervised learning classification algorithms, namely: K-Nearest neighbor (KNN), Nave Bayes, and Support Vector Machines (SVM). The performance of the three classifiers was evaluated in terms of classification accuracy, True Positive (TP) rate, False Positive (FP) rate, Precision, Recall, and F-Measure. This part of the experiments are conducted in four phases. In each phase a different data pre-processing method is used. In the first phase, feature vectors are scaled using the Min-Max method. For the second round the data are normalized using a Z - score method where the normalization is applied using the global standard deviation and mean of the whole data set. Alternatively, the Z-score normalization is performed in

797 798 799 800 801 802 803 804 805 806 807 808 809

768 769 770 771 772 773

775

777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793

793

796

767

776

776 777

766

774

774 775

765

Classification Experiments

794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 810 811 812 813 814

19

the third phase over each column independently, which requires normalizing each column of the data set using their individual standard deviation and mean values. Finally, the feature vectors are scaled using the unit length rule in the fourth phase. Different pre-processing techniques are considered to evaluate their effect in enhancing the classification accuracy for each of the three classifiers.

817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836

5

Results and Discussion

The results of applying the Random Forest feature selection algorithm over the three subsets of the FG-NET database are introduced. The importance of the thirty feature vectors extracted from each sample image is quantified using the Gini index. The higher the numerical value of the Gini index, the greater the significance of the feature. Such factor is significant as it shows how each feature is contributing in discriminating between the dataset classes. Graphs showing the average value of the Gini index of each feature in the subsets are illustrated. The Y axis represents the average Gini index value related to each feature whereas the X axis represents the set of features which are given the designation Fi. The index of the features is represented by i which is a positive number such that i=1... 30. For example, F1 represents the first feature in the data set. Following, the classification results over the entire FG-NET database using different preprocessing methods are illustrated and discussed. The results of combining each pre-processing method with each classifier are reported. Finally, the classification results for each FG-NET subset are illustrated in terms of classification accuracy and False Positive (FP) rate. The data is pre-processed using Z-score normalization since it yielded the best results compared with other pre-processing methods. Moreover, feature selection is performed over the normalized data using Random Forest feature selection method.

841 842 843 844 845 846 847 848 849 850 851 852 853 854

813 814

816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836

838

838

840

812

837

837

839

811

815

815 816

810

5.1

Feature selection results

This subsection shows the results of feature selection using the Random Forest algorithm. First, the features selection results from the entire FG-NET database are illustrated using graphs. In the graphs the most important feature among all other features is highlighted in black to distinguish it from the rest of the features. As we mentioned before the significance of the feature is quantified using the average value of the Gini index related to each feature. The results of applying the Random Forest feature selection algorithm over the FGNET-8 subset are illustrated in the second part of this section. Following, the feature selection results of the FGNET-18 and FGNET-Adults subsets are illustrated successively. FGNET-8 Feature Selection Results Here we discuss the features ranking results for the FGNET-8 subset. This subset represents the age group between 0 to 8 years. All the sample images in this subset are for subjects during their childhood stage. Features selection results for the FGNET-8 subset are illustrated in Fig. 6.

839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

20

ACCV-14 submission ID ***

855

855

856

856

857

857

858

858

859

859

860

860

861

861

862

862

863

863

864 865

864

Fig. 6. FGNET-8 Feature Selection Results using Random Forest.

865

866

866

867

867

868

868

869

869

870

870

871

871

872

872

873

873

874

874

875

875

876

876

877 878 879 880

Using the Random Forest feature selection algorithm, the top five features for this subset are successively: Feature18, Feature 20, Feature 28, Feature1, and Feature16. The ranking of features in the FGNET-8 subset shows that the most important feature is Feature18 which is produced using equation (15) as follows:

877 878 879 880

881

881

882

882

883

883

884

884

885

885

886

886

887

887

888 889 890

F18

Ar4 × pr12 = (T r4 , T r1 ) = Ar1 × pr42

888

(15)

889 890

891

891

892

892

893

893

894

894

895

895

896

896 897

897 898 899

Feature18 (F18 ) represents the triangle similarity proportion between triangle 4 (T r4 ) and triangle 1 (T r1 ) illustrated in Fig. 7.

898 899

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID ***

21

900

900

901

901

902

902

903

903

904

904

905

905

906

906

907

907

908

908

909

909

910

910

911

911

912

912

913

913

914

914

915

915

916

916

917

917

918

918

919

919

920

920

921

921

922

922

923

923

924

924

925

925

926

926

927

927 928

928 929

Fig. 7. Feature 18 (F18 ) Triangles Combination.

929

930

930

931

931 932

932 933 934 935 936 937 938 939 940 941

As Fig. 7 shows, F18 incorporates the distances between the inner corners of the eyes and the tip of the nose. Moreover, it includes the distances between the outer corners of the eyes and the center of the mouth. This indicates that the ratio between those two regions is very important during childhood for discriminating between one subject and another. FGNET-18 Feature Selection Results

933

In Fig. 8 the ranking of the features for FGNET-18 subset is illustrated. The top five features are successively: Feature1, Feature 20, Feature 3, Feature18, and Feature 19. The most important feature is Feaure1 (F1 ) which is represented by equation (16).

938

935 936 937

939 940 941 942

942 943 944

934

F18

Ar1 × pr22 = (T r1 , T r2 ) = Ar2 × pr41

943

(16)

944

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

22

ACCV-14 submission ID ***

945

945

946

946

947

947

948

948

949

949

950

950

951

951

952

952

953

953

954 955

Fig. 8. FGNET-18 Feature Selection result using Random Forest.

954 955

956

956

957

957

958 959 960 961 962 963 964

It can be seen from Fig. 9 that the most important feature in the FGNET-18 subset includes the distances between the inner corners of the eyes and the tip of the nose. It also includes the distances between the center point between the eyes and the corners of the nose. This indicates that the ratio between those two regions is very important during the teenage period for discriminating between one subject and another.

958 959 960 961 962 963 964

965

965

966

966

967

967

968

968

969

969

970

970

971

971

972

972

973

973

974

974

975

975 976

976 977

Fig. 9. Feature 1 (F1 ) triangles combination.

977

978

978

979

979 980

980 981 982 983 984 985 986 987 988 989

FGNET-Adult Feature Selection Results As Fig. 10 shows that using the Random Forest feature selection method, the top five most important features are successively: Feature1, Feature11, Feature 2, Feature13, and Feature 24. Further, the most important feature is Feaure1 which is represented by Equation (16). Although the set of the top five features for the FGNET-18 and FGNET-Adult subsets are different, they share the same most important feature namely feature1 (F1). Such result indicates that the ratio between triangles represented by feature 1 is the most discriminating feature between the ages of 18 to 96.

981 982 983 984 985 986 987 988 989

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID ***

23

990

990

991

991

992

992

993

993

994

994

995

995

996

996

997

997

998

998

999 1000

Fig. 10. FGNET-Adult Feature Selection Result using Random Forest.

1002

1002

1004 1005 1006 1007

1000 1001

1001

1003

999

FGNET Complete Set Feature Selection Results The feature ranking graph for the entire data set is depicted in Fig.11. The features ranking is performed using the Random Forest variable importance algorithm. The top five features are successively: Feature 1, Feature 15, Feature18, Feature19, and Feature 24.

1003 1004 1005 1006 1007

1008

1008

1009

1009

1010

1010

1011

1011

1012

1012

1013

1013

1014

1014

1015

1015

1016

1016

1017

1017

1018

1018 1019

1019 1020

Fig. 11. FGNET Complete Set Feature Selection using Random Forest.

1020

1021

1021

1022

1022

1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034

The most important feature among all the age groups is Feature 1 (F1) which is represented by Equation (16). Feature 1 is selected from the total set of features of the entire database. The FG-NET database includes several age groups which means that feature 1 is the most important feature in our proposed model for different age groups. The magnitudes of the most significant features extracted from different samples related to the same subject vary. However, the rate in which the features magnitudes change due to facial aging is constant for each subject. This is a very important factor for discriminating between the different classes related to different subjects. The results of applying Random Forest feature selection method over the FG-NET subsets clearly show that for each age group, there are specific facial features that contribute the most in discriminating between the different subjects.

1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

24 1035

5.2

ACCV-14 submission ID ***

Classification Results

1035 1036

1036 1037 1038 1039 1040 1041 1042 1043 1044 1045

In this section we discuss the results of evaluating the performance of the proposed system using several classifiers. Three classifiers are used in our experiments, namely: KNN, Nave Bayes, and SVM. A number of preprocessing methods are used alternately with each classifier. The classification results are presented in terms of classification accuracy, TP rate, FP rate, Precision, Recall, and F-Measure. The results of using the min-max scaling method is for preprocessing the feature vectors are illustrated in Table 1. The feature vectors are scaled using the min - max method and then passed to each of the three classifiers.

1050 1051 1052 1053 1054

1039 1040 1041 1042 1043 1044 1045

1047

1047

1049

1038

1046

1046

1048

1037

Table 1. Classification Results of the Min-Max Scaling Method. Classifier Classification Ac- True Posicuracy( tive Rate KNN 95.34 0.953 Nave 86.95 0.870 Bayes SVM 56.17 0.562

False Posi- Precision Recall tive Rate 0.002 0.957 0.953 0.005 0.885 0.870

FMeasure 0.954 0.874

0.009

0.499

1048 1049 1050 1051 1052 1053

0.487

0.562

1054

1055

1055

1056

1056

1057

1057

1058 1059 1060 1061 1062

The highest accuracy (95.34The classification results reported when the Zscore method is used for normalizing the feature vectors are illustrated in table 2. The goal of using this pre-processing method is to eliminate the unit of measurements by transforming the data into new scores with a mean of 0 and a standard deviation of 1.

1067 1068 1069 1070 1071

1060 1061 1062

1064

1064

1066

1059

1063

1063

1065

1058

Table 2. Classification Results of the Z-Score Normalization. Classifier Classification Ac- True Posicuracy( tive Rate KNN 97.56 0.976 Nave 79.91 0.799 Bayes SVM 48.17 0.482

1065

False Posi- Precision Recall tive Rate 0.000 0.977 0.976 0.004 0.804 0.799

FMeasure 0.976 0.799

0.011

0.482

1066 1067 1068 1069 1070

0.506

0.482

1071

1072

1072

1073

1073

1074

1074

1075 1076 1077 1078 1079

As Table 2 shows the ranking of the classifiers performances of the Z-score is the same as the min-max scaling. Again, the KNN classifier reported the best classification results. Using the Z-score normalization, the classification accuracy increased to above 97Normalizing the feature vectors using column vector normalization yielded also a good classification accuracy of above 94

1075 1076 1077 1078 1079

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** Table 3. Classification Results of the Column Vector Normalization.

1080 1081 1082 1083 1084 1085 1086

25

Classifier Classification Ac- True Posicuracy( tive Rate KNN 94.26 0.943 Nave 91.8 0.918 Bayes SVM 72.95 0.730

False Posi- Precision Recall tive Rate 0.019 0.944 0.943 0.027 0.919 0.918

FMeasure 0.942 0.918

0.092

0.656

1080 1081 1082 1083 1084 1085

0.835

0.730

1086

1087

1087

1088

1088 1089

1089 1090 1091 1092 1093 1094

The classification results of the unit length scaling data pre-processing method are illustrated in Table 4. The best classification results for SVM classifier are achieved when the feature vectors are scaled to unit length as a pre-processing step. Although the KNN classifier is able to achieve classification accuracy of more than 90

1096

1100 1101 1102 1103

1092 1093 1094

1096

Table 4. Classification Results of the Unit length Scaling.

1097

1097

1099

1091

1095

1095

1098

1090

Classifier Classification Ac- True Posicuracy( tive Rate KNN 83.48 0.835 Nave 76.89 0.769 Bayes SVM 86.22 0.862

False Posi- Precision Recall tive Rate 0.002 0.849 0.835 0.003 0.804 0.769

FMeasure 0.837 0.777

0.002

0.856

1098 1099 1100 1101 1102

0.884

0.862

1103

1104

1104

1105

1105

1106 1107 1108 1109 1110 1111 1112

The best classification results are reported when the feature vectors are normalized using Z-score normalization and classified using the KNN classifier. The results indicate that for different patterns of data, there are different preprocessing methods and different classifiers that are most appropriate for those patterns of data. In our model a combination of Z-score normalization method and KNN classifier is adopted.

6

Conclusions and Future Work

1117 1118 1119 1120 1121 1122 1123 1124

1108 1109 1110 1111 1112

1114 1115

1115 1116

1107

1113

1113 1114

1106

This work proposed a novel age- invariant face recognition system based on a geometric approach. The proposed geometric features are mainly triangular regions which surround and connect the main facial features (i.e. Eyes, nose, and mouth). A set of six triangular features were extracted the sample face images by connecting particular facial focal points. Twelve focal points were used in our model. These focal points represent the main facial features corners and center points. The coordinates of the focal points are attached to the annotated version of the FG-NET database. Later, those coordinates were used to calculate the areas and perimeters of each of the triangular regions using the standard Euclidean

1116 1117 1118 1119 1120 1121 1122 1123 1124

ACCV2014 #***

ACCV2014 #*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

26 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154

ACCV-14 submission ID ***

distance formula. Following, a set of mathematical relationships called Triangles similarity proportions (TSP) was developed to measure the degree of similarity between every two triangles. The significance of the proposed triangular features resides in their ability of showing the unique craniofacial growth rate for each subject. Additionally, such geometric features show the way the main facial features are correlated with each other through different age spans. A total of thirty features was created for each sample image in the database using the proposed Triangles similarity proportions (TSP) mathematical relationships. The entire feature vectors related to the same subject were used to form a class for each subject. Thus, a dataset of 82 classes was created and passed to the feature selection and classification algorithms. Feature selection was performed using the Random Forest algorithm. Different set of the most significant features was produced for different age groups using the RF algorithm. This indicates that there are different facial aging patterns for different age groups. The proposed system performance was evaluated in terms of classification accuracy, TP rate, FP rate, Precision, Recall, and F-Measure. Different data preprocessing methods were used with different classifiers over the developed dataset. The highest classification accuracy was achieved when the Z-score normalization method was used for pre-processing the feature vectors and the KNN classifier was used for classification. The system performance in terms of classification accuracy and processing time was satisfying and promising towards robust real time age-invariant face recognition. The experimental results show that the proposed geometric features are unique for each subject and that there is a unique craniofacial growth rate for each individual. Also, the results indicate that the facial aging process affects the way the main facial features are correlated with each other. In our future work, we are planning to evaluate our proposed system using different feature selection methods in conjunction with multiple classification methods. Also, the system will be evaluated over the largest publicly available face aging database, namely: MORPH face aging database, to test the system’s ability of handling large populations.

1156

6.1

1157

The list of references is headed “References” and is not assigned a number in the decimal system of headings. The list should be set in small print and placed at the end of your contribution, in front of the appendix, if one exists. Do not insert a page break before the list of references if the page is not completely filled. Citations in the text are with square brackets and consecutive numbers, such as [3], or [4, 5].

1159 1160 1161 1162

References

1167 1168 1169

1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154

1156 1157 1158 1159 1160 1161 1162

1164

1164

1166

1127

1163

1163

1165

1126

1155

1155

1158

1125

References 1. Authors: The frobnicatable foo filter (2012) ACCV12 submission ID 512. Supplied as additional material accv12-512-frfofi.pdf. 2. Authors: Frobnication tutorial (2012) Supplied as additional material accv12-512-frtut.pdf.

1165 1166 1167 1168 1169

ACCV2014

ACCV2014

#***

#*** CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

ACCV-14 submission ID *** 1170 1171 1172 1173 1174

27

3. Alpher, A.: Frobnication. J. of Foo 12 (2002) 234–778 4. Alpher, A., Fotheringham-Smythe, J.P.N.: Frobnication revisited. J. of Foo 13 (2003) 234–778 5. Herman, S., Fotheringham-Smythe, J.P.N., Gamow, G.: Can a machine frobnicate? J. of Foo 14 (2004) 234–778

1170 1171 1172 1173 1174

1175

1175

1176

1176

1177

1177

1178

1178

1179

1179

1180

1180

1181

1181

1182

1182

1183

1183

1184

1184

1185

1185

1186

1186

1187

1187

1188

1188

1189

1189

1190

1190

1191

1191

1192

1192

1193

1193

1194

1194

1195

1195

1196

1196

1197

1197

1198

1198

1199

1199

1200

1200

1201

1201

1202

1202

1203

1203

1204

1204

1205

1205

1206

1206

1207

1207

1208

1208

1209

1209

1210

1210

1211

1211

1212

1212

1213

1213

1214

1214