SVM Classifier Combination for Handwritten

𝐍𝐨 : 02/2015. D/EL

PEOPLE'S DEMOCRATIC REPUBLIC OF ALGERIA Ministry of Higher Education and Scientific Research UNIVERSITY OF SCIENCE AND TECHNOLOGY HOUARI BOUMEDIENE

FACULTY OF ELECTRONICS AND COMPUTER SCIENCE

THESIS

Presented in partial fulfillment of the requirements for the degree of Doctor of Science IN : ELECTRONICS Speciality : Signal and Image Processing By : Nassim ABBAS

SVM Classifier Combination for Handwritten Recognition

This thesis was publicly defended on 2nd February, 2015 in front of the examination committee composed of : Mrs Mr Mr Mr Mr Mrs

SERIR CHIBANI CHIKH MEZIANE LAGHA NEMMOUR

Amina Youcef Mohamed Amine Abdelkrim Mohand Hassiba

Prof. Prof. Prof. MRA MCA MCA

USTHB USTHB Univ. Tlemcen CERIST Univ. Blida USTHB

President Thesis Supervisor Examiner Examiner Examiner Examiner

SVM CLASSIFIER COMBINATION FOR HANDWRITTEN RECOGNITION Nassim Abbas Abstract The Support Vector Machine (SVM) classifiers are considered to be the most efficient for the handwritten recognition. However, the main limitation of using SVMs is related to the choice of suitable descriptor. Indeed, for the same application, the SVM may respond differently depending on the used descriptor. Hence, various methods have been used for combining multiple sources of information in order to improve the accuracy of the handwritten recognition. In this thesis, the proposed works focus on the development and implementation of various schemes to combine SVM classifiers using the Dempster-Shafer theory (DST) of evidence and Dezert-Smarandache theory (DSmT) of plausible and paradoxical reasoning. When using the DSmT in conjunction with the SVMs (DSmT-SVM), two main problems are occurred. The first problem is related to the choice of the estimation model. The second problem concerns the difficulty of extending the DSmT-SVM for multi-class classification. Indeed, an important number of focal elements is produced leading to the impossible use of the DSmT. Both DST and DSmT allow dealing with conflicts between the responses of classifiers attempting to select the best responses. Two main applications are considered for evaluating the effective use of both theories which are the handwritten signature verification and the handwritten digit recognition. In this context, we propose four solutions for improving the handwritten recognition: The first solution is based on the implementation of a combination scheme using bi-class SVM classifiers through the DSmT. This scheme is applied to the writer dependent off-line handwritten signature verification and off-line and on-line handwritten signature verification. This implementation offers better security and improved performance, especially in the case of simultaneous verification of off-line and on-line signatures. The second solution is to implement a combination scheme using two one-class SVM classifiers through the DST and DSmT. In this scheme, we exploit an intelligent learning technique to use only genuine signatures. The implementation of this scheme allows us to significantly reduce the errors of the writer-independent off-line handwritten signature verification. The third solution is to implement a supervised combination model based on DSmT for multiclass classification. This model allows us to effectively use the DSmT in conjunction with the multi-class SVM classifier implementation based on bi-class SVM for handwritten digit recognition. The fourth solution consists to incorporate one-class SVM classifiers into the DSmT based combination framework in order to reduce the huge number of focal elements occurring when using the bi-class SVM classifiers. Hence, we propose to use a learning technique based on one-class SVM classifier for each target class. Therefore, a combination of these classifiers is performed by the DSmT for each class independently to other classes and then allows extending the applicability of DSmT in the context of the multi-class classification. To prove the effective use of the proposed scheme, a case study is conducted on handwritten digit recognition.

COMBINAISON DE CLASSIFIEURS SVMs POUR LA RECONNAISSANCE DE L’ECRITURE MANUSCRITE Nassim Abbas Sommaire Les classifieurs Support Vector Machine (SVM) sont considérés comme étant les plus performants pour la reconnaissance de l’écriture manuscrite. Cependant, la principale limitation de leur utilisation est liée au choix approprié du descripteur. En effet, pour une même application, le SVM peut répondre différemment selon le choix d’un descripteur. Aussi, diverses méthodes sont proposées pour la combinaison de plusieurs sources d’information afin d’améliorer la précision de la reconnaissance manuscrite. Les travaux proposés dans cette thèse portent essentiellement sur le développement et la mise en œuvre de divers schémas de combinaison des classifieurs SVM en utilisant la théorie de l’évidence de Dempster-Shafer (DST) et la théorie du raisonnement plausible et paradoxal de DezertSmarandache (DSmT). Lorsqu’on utilise la DSmT en association avec les SVMs (DSmTSVM), deux problèmes majeurs sont rencontrés. Le premier problème est lié au choix du modèle d’estimation. Le deuxième problème concerne la difficulté d’étendre cette association DSmT-SVM dans le cadre de la classification multi-classes. En effet, un nombre important d’éléments focaux est produit conduisant à l’impossibilité de l’utilisation de la DSmT. Les deux théories DST et DSmT permettent de traiter les conflits entre les réponses des classifieurs en tentant de sélectionner les meilleures réponses. Deux applications principales sont considérées pour évaluer l’utilisation efficace des deux théories qui sont respectivement la vérification de signature manuscrite et la reconnaissance des chiffres manuscrits. Dans ce contexte, nous proposons quatre solutions pour améliorer la reconnaissance de l’écriture manuscrite: La première solution vise à proposer un schéma de combinaison de deux classifieurs SVM biclasses en utilisant la DSmT. Ce schéma est expérimenté sur une application portant sur la vérification de signatures manuscrites dépendante du scripteur. Celui-ci est validé sur deux cas d'études de vérification de signatures manuscrites : vérification hors-ligne/hors-ligne et vérification simultanée hors-ligne/en-ligne. La deuxième solution concerne la conception d’un nouveau schéma de vérification de signatures manuscrites indépendante du scripteur à travers une combinaison de deux classifieurs SVM mono-classe en utilisant la DST et la DSmT. Dans ce schéma, nous exploitons une technique d’apprentissage intelligente qui permet d’utiliser uniquement les signatures authentiques et permet ainsi une réduction considérable des erreurs de vérification. La troisième solution consiste à implémenter un modèle de combinaison supervisé basé sur la DSmT pour la classification multi-classes. Il permet d’utiliser efficacement la DSmT en association avec une implémentation multi-classes à base du classifieur SVM bi-classes pour la reconnaissance des chiffres manuscrits. La quatrième solution consiste à traiter le problème principal lié au nombre énorme d'éléments focaux produits par la combinaison des classifieurs SVM bi-classe. Afin de réduire la complexité calculatoire, nous proposons une technique d’apprentissage basée sur le classifieur SVM mono-classe pour chaque classe cible. Par conséquent, une combinaison de ces classifieurs est effectuée par la DSmT pour chaque classe indépendamment des autres classes. Ce schéma permet d’étendre l’applicabilité de la DSmT dans le cadre de la classification multi-classes. Pour montrer l’utilisation efficace du schéma proposé, un cas d’étude est mené sur la reconnaissance des chiffres manuscrits.

ACKNOWLEDGEMENTS

The work presented in this thesis is carried out at the Speech Communication and Signal Processing Laboratory (LCPTS), Faculty of Electronics and Computer Science of University of Science and Technology Houari Boumediene (USTHB). This thesis aims at the preparation of a PhD in Signal and Image Processing. First, I would like to sincerely thank my supervisor Professor Youcef CHIBANI for having proposed and directed this work, for having let me join his team and for having given me the means to conduct my research in very good conditions in addition to his support and encouragement throughout this thesis. I also thank Professor Mohamed CHERIET, director of Synchromedia Laboratory at the École de Technologie Supérieure (ÉTS) of University of Québec-Canada, who co-directed my work between 2009 and 2010. I thank him for having hosted me for an internship in the Laboratory for Imagery, Vision and Artificial Intelligence (LIVIA). I warmly thank Professor Arnaud MARTIN for having hosted me for a training course at the Institut Universitaire de Technologie (IUT) of University of Rennes 1-France and for enlightening me on the foundations of the theory of belief functions. I am grateful to Professors Jean DEZERT, a researcher at the French Aeronautics, Space and and Defense Research lab (ONERA) and Florentin SMARANDACHE, professor of mathematics at the University of New Mexico (UNM)-USA, for their help, advice and guidance. I wish to express my gratitude to Mrs. Amina SERIR, Professor at USTHB who will chair the jury of my thesis, which is a great honor for me. My warm thanks go to the members of the jury who have agreed to spare their time to examine my work: Mr. Mohamed Lamine CHIKH, Professor at the University of Abou Bekr Belkaid Tlemcen (UABBT), Mr. Abdelkrim MEZIANE, Senior Research Officer at the Centre for Research on Science and Technology of Information (CERIST), Mr. Mohand LAGHA, Lecturer at the University of Saad Dahlab Blida (USDB) and Ms. Hassiba NEMMOUR, Lecturer at USTHB. My sincere thanks also go to all members of the LCTPS laboratory, especially Bilal HADJADJI, Abdenour SEHAD and Youcef BRIK for their help and support. I particularly want to thank Mrs. Siham BENZOUAI, Maitre Assistant classe A at the National School of Marine Sciences and Spatial Littoral (ENSSMAL) for her support and encouragement without forgetting Ms. Amel SMIEL from Agence Thématique de Recherche en Sciences de la Nature et de la Vie. I also want to thank my friend Mehdi NEGGAZI, PhD student at the National Polytechnic School of Algiers, for welcoming me to his home in Houston, State of Texas (USA) and for having put at my disposal all necessary means to the finalization of this thesis. Many thank to the students that I co-supervised with Prof. Y. CHIBANI who helped me to progress faster in my research. And at last but not the least, I would like to express my sincere thanks and warm gratitude to my family for their support and encouragement that have allowed me to reach the end of this thesis. I hope they will be proud of me for this work. To all the above-mentioned individuals and all the others that I have not mentioned but who I know, from my heart, thank you!

Table of contents Introduction

1

Chapter 1 Approaches of Classifier Combination for Handwritten Recognition 1.1. Introduction 1.2. Classification approaches 1.2.1. Template matching techniques 1.2.2. Structural approaches 1.2.3. Connectionist approaches 1.2.4. Statistical approaches 1.2.4.1. Modeling approaches a. Nonparametric methods b. Parametric methods 1.2.4.2. Discriminative approaches 1.2.5. Review of the SVM classifier 1.2.5.1. Case of linearly separable data 1.2.5.2. Case of non-linearly separable data 1.3. Classifier combination 1.3.1. Combination levels 1.3.1.1. Combination at class level 1.3.1.2. Combination at rank level 1.3.1.3. Combination at measure level 1.3.2. Combination schemes of multi-classifiers 1.3.2.1. Sequential combination 1.3.2.2. Parallel combination 1.3.2.3. Hybrid combination 1.4. General parallel combination scheme for handwritten recognition 1.4.1. Estimation of masses 1.4.2. Overview of belief model based combination rules 1.4.2.1. Notations 1.4.2.2. Combination rule based on the PT 1.4.2.3. Combination rule based on the DST 1.4.2.4. Combination rule based on the DSmT 1.4.3. Decision making 1.5. Summary

6 7 9 9 9 10 10 10 10 10 11 11 13 16 16 16 16 16 17 17 17 18 19 19 20 20 21 22 23 25 26

Chapter 2 A DSmT Based Systems for Writer-Dependent Handwritten Signature Verification 2.1. Introduction 2.2. Review of PCR5 combination rule 2.3. System description 2.3.1. Pre-processing 2.3.2. Feature generation

28 31 32 33 34

Table of contents

2.3.2.1. Features used for combining individual off-line HSV systems 2.3.2.2. Features used for combining individual off-line and on-line HSV systems 2.3.3. Classification based on SVM 2.3.3.1. Review of SVMs 2.3.3.2. Decision rule 2.3.4. Classification based on DSmT 2.3.4.1. Estimation of masses 2.3.4.2. Combination of masses 2.3.4.3. Decision rule 2.4. Description of datasets and performance criteria 2.4.1. Description of datasets 2.4.1.1. CEDAR signature database 2.4.1.2. NISDCC signature database 2.4.2. Performance criteria 2.4.3. SVM model 2.4.3.1. SVM models used for combined individual off-line HSV systems 2.4.3.2. SVM models used for combined individual off-line and on-line HSV systems 2.5. Experimental results and discussion 2.6. Conclusion

34 36 37 37 38 39 40 40 41 42 42 42 42 43 43 43 44 44 47

Chapter 3 A DSmT Based System for Writer-Independent Handwritten Signature Verification 3.1. Introduction 3.2. Related works 3.3. System description 3.3.1. Pre-processing 3.3.2. Feature generation 3.3.2.1. Discrete cosine transform based descriptor 3.3.2.2. Curvelet transform based descriptor 3.3.3. Classification based on OC-SVM 3.3.3.1. Review of OC-SVM classifier 3.3.3.2. Writer-independent verification scheme 3.3.3.3. Generating vectors of (dis) similarity measures 3.3.3.4. Decision rule in OC-SVM framework 3.3.4. Classification based on DSmT 3.3.4.1. Estimation of masses 3.3.4.2. Combination of masses 3.3.4.3. Decision criterion 3.4. Experimental results 3.4.1. Experimental protocol 3.4.2. Validation of OC-SVM models 3.4.3. Performance criteria 3.4.4. OC-SVM models used for combined individual writer-independent HSV systems 3.4.5. Determining of parameters in relation with both descriptors during the validation phase 3.4.5.1. Selecting the optimal number of DCT coefficients and the corresponding decision threshold 3.4.5.2. Selecting the optimal decomposition level of CT and the corresponding decision threshold

48 51 53 53 54 54 56 59 59 60 61 62 62 63 65 66 68 68 69 69 70 70 70 72

Table of contents

3.4.6. Verification results and discussion 3.5. Conclusion

73 75

Chapter 4 The effective use of the DSmT for multi-class classification 4.1. Introduction 4.2. Methodology 4.2.1. Classification based on SVM 4.2.2. Classification Based On DSmT 4.2.2.1. Estimation of Masses 4.2.2.2. Combination of masses 4.2.2.3. Decision rule 4.3. Experimental results 4.3.1. Database description and performance evaluation 4.3.2. Pre-processing 4.3.3. Feature Generation 4.3.4. Validation of SVM Models 4.3.5. Quantitative results and discussion 4.3.5.1. Comparative analysis of features 4.3.5.2. Performance evaluation of the proposed combination framework 4.4. Conclusion

77 82 82 83 84 86 87 88 88 88 89 89 90 90 91 97

Chapter 5 A DSmT Based Combination Scheme for Multi-Class Classification 5.1. Introduction 5.2. Related works 5.3. Effective combination scheme of one-class classifiers 5.4. Multi-class classification scheme based on belief function theories 5.4.1. Classification based on OC-SVM 5.4.1.1. Review of OC-SVM 5.4.1.2. Extension of OC-SVM for constructing multi-class OC-SVM 5.4.2. Estimation of masses 5.4.2.1. Estimation technique using PT framework 5.4.2.2. Estimation technique using DST framework 5.4.2.3. Estimation technique using DSmT framework 5.4.3. Combination of masses 5.4.4. Decision rule 5.5. Database and algorithms used for validation 5.5.1. Database description and performance criteria 5.5.2. Methods used for generating features 5.5.3. Algorithm used for validation of OC-SVM models 5.6. Experimental results 5.6.1. Performance evaluation of the proposed descriptors 5.6.2. Performance evaluation of the proposed combination scheme 5.7. Conclusion Conclusion Bibliography

98 101 102 104 104 105 107 108 108 108 109 110 112 113 113 114 114 115 116 117 124 125 128

List of tables Table

Page

1.1

Example of kernel functions

15

2.1

Set of dynamic characteristics. 𝑠 = 𝑃𝑡1 , 𝑃𝑡2 , … , 𝑃𝑡𝑛 denotes an on-line 37 signature composed of 𝑛 events 𝑃𝑡𝑖 𝑥𝑖 , 𝑦𝑖 , 𝑡𝑖 , 𝑥𝑖 , 𝑦𝑖 , 𝑃𝑟𝑖 , 𝐴𝑧𝑖 , 𝐴𝑙𝑖 denote x-position, y-position, pen pressure, azimuth and elevation angles of the pen at the 𝑖 𝑡ℎ time instant 𝑡𝑖 , respectively.

2.2

Error rates (%) obtained for individual and combined HSV systems

45

2.3

Error rates (%) obtained for individual and combined HSV systems

46

3.1

Optimal parameters of the OC-SVM models obtained according the 70 proposed validation approach

3.2

Influence of the number of DCT coefficients on the different error rates 71 during the validation phase

3.3

Influence of the decomposition level 𝑗 on the different error rates during 72 the validation phase

3.4

Experimental results of proposed algorithms

74

4.1

Partitioning of the USPS dataset

88

4.2

Optimal parameters of the UG-SVMs classifier

90

4.3

Mean error rates of the SVM classifiers using different methods of feature 91 generation

4.4

Ranges of conflict variations measured between both SVM-OAA 95 implementations using (BF,FF,GF) and UGF-based descriptors

4.5

Error rates of the proposed framework with PCR6 combination rule using 96 (BF,FF,GF) and UGF descriptors

5.1

Cardinality of combination space

103

5.2

Optimal parameters of the OC-SVM classifiers using UG features

115

5.3

Recognition rates of the MC-OC-SVM classifier using different methods 116 of generating features

5.4

FAR errors provided by the tree sources of information for each class

118

Table

Page

5.5

Comparison of the recognition performance of Sum, DS and PCR6 122 combination rules using the pair of sources (S1, S2)

5.6

Comparison of the recognition performance of Sum, DS and PCR6 122 combination rules using the pair of sources (S1, S3)

5.7

Comparison of the recognition performance of Sum, DS and PCR6 123 combination rules using three sources (S1, S2, S3)

List of figures Figure

Page

1.1

Character recognition system

8

1.2

Principle of separation through a kernel function in SVM framework

11

1.3

Classification between two classes using the hyperplanes: (a) the arbitrary 12 hyperplanes l, m and n ; (b) the optimal hyperplane of separation with a large margin identified by the canonical hyperplanes through the support vectors

1.4

Separating hyperplanes in the case of non-linearly separable data, where 14 H is any hyperplane, HO is the optimal hyperplane and SV are the support vectors

1.5

Sequential combination of L classifiers

17

1.6

Parallel combination of classifiers

18

1.7

Hybrid combination of classifiers

18

1.8

Proposed parallel combination scheme within general belief function 19 theory framework

2.1

Structure of the combined individual HSV systems

2.2

Preprocessing steps: (a) Scanning (b) Binarization (c) Elimination of the 34 useless information

2.3

Steps for generating the feature vector from the Radon transform

35

2.4

Steps for generating the feature vector from the Ridgelet transform

35

2.5

Visualization of different grid sizes

36

2.6

Signature samples of the CEDAR

42

2.7

Signature samples of the NISDCC signature collection

43

2.8

Performance evaluation of the individual off-line HSV systems

45

2.9

Conflict between off-line and on-line signatures for the writers 3, 7, and 46 10, respectively.

2.10

Performance evaluation of the individual off-line and on-line HSV 46 systems

33

Figure

Page

3.1

Structure of the combined individual systems for writer-independent HSV

53

3.2

Normalization of a scanned signature image of size 606 × 378 to 54 1024 × 1024

3.3

Steps for generating the feature vector from the DCT

56

3.4

Examples for characterizing an object with edges through a curvelet

56

3.5

(a) Curvelet to wedge transformation using Fourier transform and (b) 57 spectral partitionning of the Curvelet transform

3.6

Steps for generating the feature vector from the Curvelet transform

59

3.7

Flowchart of writer-independent verification using an OC-SVM classifier

60

3.8

An effective belief function theories based combination scheme for 63 writer-independent verification signature

3.9

Error rates of the OC-SVM classifier associated to DCT based descriptor 71 using different values of the decision threshold during validation phase

3.10

Error rates of the OC-SVM classifier associated to CT based descriptor 72 using different values of the decision threshold during validation phase

3.11

Conflict between both OC-SVM classifiers using DCT and CT-based 75 descriptors for testing signatures

4.1

Structure of the combination scheme using SVM and DSmT

82

4.2

DSmT-based parallel combination for multi-class classification

83

4.3

Some samples with their alleged classes from USPS database

88

4.4

Measured conflict between both SVMs classifiers using (BF,FF,GF) and 92 UGF-based descriptors for the digits belonging to  0

4.5

Measured conflict between both SVMs classifiers using (BF,FF,GF) and 92 UGF-based descriptors for the digits belonging to 1 .

4.6

Measured conflict between both SVMs classifiers using (BF,FF,GF) and 92 UGF-based descriptors for the digits belonging to  2 .

4.7


Figure

Page

4.8


4.9


4.10


4.11


4.12


4.13


5.1

General concept of the proposed combination scheme

103

5.2

Belief function theories-based multi-classification scheme

104

5.3

Pattern classification based on OC-SVM

107

5.4

Training and validation of the OC-SVM models

114

5.5

Measures of conflict between the OC-SVM classifiers for the handwritten 118 digits belonging to  0

5.6

Measures of conflict between the OC-SVM classifiers for the handwritten 119 digits belonging to 1

5.7


5.8


5.9


5.10


5.11


Figure

Page

5.12


5.13

Measures of conflict between the OC-SVM classifiers for the handwritten 121 digits belonging to 8

5.14


List of abbreviations SVM

Support Vector Machines

DST

Dempster-Shafer Theory

DSmT

Dezert-Smarandache Theory

NNs

Neural Networks

MLP

Multi Layer Perceptrons

GMM

Gaussian Mixture Model

HMM

Hidden Markov Model

K-NN

K-Nearest Neighbors

MSE

Mean Squared Error

OH

Optimal Hyperplane

DS

Dempster-Shafer

DSm

Dezert-Smarandache

PT

Probabilistic Theory

bpa

basic probability assignment

bba

basic belief assignment

gbba

generalized basic belief assignment

PCR6

Proportional Conflict Redistribution rule no. 6

DSmC

DSm Classic rule (also called conjunctive consensus rule)

HSV

Handwritten Signature Verification

WT

Wavelet Transform

UG

Uniform Grid

RBF

Radial Basis Function

MD

Membership Degree

List of abbreviations

DSmP

Dezert-Smarandache Probability

CEDAR

Center of Excellence for Document Analysis and Recognition

NISDCC

Norwegian Information Security laboratory and Donders Centre for Cognition

ICDAR

International Conference on Document Analysis and Recognition

FAR

False Accepted Rate

FRR

False Rejected Rate

HTER

Half Total Error Rate

EER

Equal Error Rate

OC-SVM

One-Class Support Vector Machines

DCT

Discrete Cosine Transform

NN

Nearest Neighbor

ROC

Receiver Operating Characteristic

EoCs

Ensembles of Classifiers

KNORA

K-Nearest-Oracles

CT

Curvelet Transform

NFFT

Nonequispaced Fast Fourier Transform

PNG

Portable Network Graphics

AER

Average Error Rate

ESMS

Evidence Supporting Measure of Similarity

OAA

One-Against-All

DSmH

Dezert-Smarandache Hybrid rule

OAO

One Against One

DDAG

Decision Directed Acyclic Graph

ML

Maximum Likelihood

USPS

United States Postal Service

List of abbreviations

ERC

Error Rate per Class

MER

Mean Error Rate

ERCs

Error Rates per Simple Class

ERCc

Error Rates per Complementary Class

FF

Foreground Features

BF

Background Features

GF

Geometric Features

UGF

Uniform Grid Features

B-SVM

Bi-class Support Vector Machines

RR

Recognition Rate

MRR

Mean Recognition Rate

MC-OCSVM

Extension of OC-SVM into Multi-Class (MC) classification framework

List of notations ℱ Ψ .

Feature space Nonlinear mapping

𝑥

Input pattern characterizing an object (word, signature, numeral, character, etc.)

𝑁

Number of learning examples in the binary SVM framework

𝑥𝑖

Learning sample

𝑦𝑖

Instance-label

.,.

Dot product

𝐾 .,. ℝ𝑛

Kernel function 𝑛-dimensional vector space over the field of the real numbers

𝑏

Scalar computed by using any support vector

𝛼𝑖

Lagrange multiplier

𝐶

User-defined parameter that controls the tradeoff between the machine complexity and the number of nonseparable points

𝑆𝑣

Number of support vectors

𝑓 .

Decision function of the binary SVM classifier

𝑡

Decision threshold

Θ

Frame of discernment (also called discernment space)

2Θ

Power set

𝐷Θ

Hyper-powerset

𝐺

Set of elements belonging to Θ, 2Θ or 𝐷Θ

𝜃𝑖

Simple class belonging to the frame of discernment Θ

𝜃𝑖

Complementary class of 𝜃𝑖

𝜃𝑔𝑒𝑛

Genuine class of signatures

List of notations

𝜃𝑖𝑚𝑝 𝐴

Impostor class of signatures Simple or compound class belonging to 𝐷Θ or 2Θ

𝐶𝑎𝑟𝑑 𝐴

Classical cardinality corresponding to the number of simple classes belonging to 𝐴

𝐶ℳ 𝐴

DSm cardinality corresponding to the number of parts of 𝐴 in the Venn diagram

∅

Classical/universal empty set

ℳ

Shafer’s model defined by the exhaustive and exclusive constraints

Φℳ

Set of ball elements of 𝐷Θ which have been forced to be empty in the Shafer’s model ℳ

Φ

Set of all relatively and absolutely empty elements

𝐹

Set of focal elements

𝑚𝑖 . ⊕

Mass function (i.e. bpa, bba or gbba) issued from the 𝑖 − th information source 𝑆𝑖 Combination operator defined within PT, DST or DSmT framework

𝑚𝑐 .

Combined masses within PT, DST or DSmT framework

𝑚𝑠𝑢𝑚 𝐴

Combined mass of 𝐴 obtained by means of the Sum rule

𝑚𝐷𝑆 𝐴

Combined mass of 𝐴 obtained by means of the DS rule

𝑚𝑃𝐶𝑅6 𝐴

Combined mass of 𝐴 obtained by means of the PCR6 rule

𝑚∧ 𝐴

Combined mass of 𝐴 obtained by means of the DSmC rule

𝐾𝑐 𝐷𝑆𝑚𝑃𝜖 𝜃𝑖 𝜖

Conflict measured between 𝑝 information sources Dezert-Smarandache probability of the 𝑖 − th simple class 𝜃𝑖 Tuning parameter of the Dezert-Smarandache probability

𝑕𝑑 𝜃𝑖

Membership degree of a signature to the class 𝜃𝑖 , 𝑖 = 𝑔𝑒𝑛, 𝑖𝑚𝑝 , provided by the information source 𝑆𝑑

𝑃𝑘 (𝜃𝑖 /𝑥)

Posterior probability that the correct class is 𝜃𝑖 for the pattern 𝑥 issued from the information source 𝑆𝑘

𝛽𝑖𝑑

Confidence factor of 𝜃𝑖 corresponding to the information source 𝑆𝑑

List of notations

𝑁𝑟

Number of projection points or lines in the Radon matrix

𝑁𝜃

Number of orientations or columns in the Radon matrix

𝐿

Decomposition level of the WT

𝑇𝑟𝑎𝑑 . , .

Radon transform operator

𝐸𝑖𝑟𝑎𝑑

Coefficient issued from Radon transform

𝑇𝑟𝑖𝑑 . , . , .

Ridgelet transform operator

𝐸𝑖𝑟𝑖𝑑

Coefficient issued from Ridgelet transform

𝐶𝑘𝑑𝑐𝑡 𝑢, 𝑣

Coefficient issued from DCT at both 𝑢 and 𝑣 frequencies Normalized DCT coefficient using softmax function

𝐶𝑘𝑑𝑐𝑡 𝑈𝑗 𝑟, 𝜃

Wedge (Curvelet) defined within the Fourier domain, where 𝑟 is the Cartesian coronae and 𝜃 is the projection angle

𝑗𝑜𝑝𝑡

Optimal decomposition level using Curvelet transform (i.e. wrapping method)

𝐸𝑤𝑐𝑢𝑟

Energy of wedges

𝐸𝑤𝑐𝑢𝑟

Normalized Curvelet coefficient

𝑓𝑂𝐶 .

Decision function of the one class SVM classifier

𝜌

Distance of the hyper sphere from the origin

𝑚

Cardinal of training dataset in the OC-SVM framework

𝑣

Percentage of data considered as outliers

OC − SVM𝑖

One Class SVM classifier trained with the 𝑖 − th descriptor

𝑡𝑜𝑝𝑡

Optimal threshold associated with 𝑖 − th OC − SVM𝑖 classifier and determined during the validation phase

𝑔𝑖 .

Reassigned output of the 𝑖 − th OC − SVM𝑖 logarithmic function

𝑔𝑖∗ .

Output of the 𝑖 − th OC − SVM𝑖 classifier selected from 𝑁𝑏𝑠𝑐𝑜𝑟𝑒𝑠 responses

𝜃𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑜𝑟

𝑖

classifier using

Class of (dis) similarity vectors between off-line signatures which are characterized by the 𝑖 − th descriptor

List of notations

𝜀

Tuning parameter of the extended version of Appriou’s model

𝛽𝑖

Sum of false accepted rates (FAR) made by the 𝑖 − th OC − SVM𝑖 , 𝑖 = 1, … , 𝑝, classifiers, which are trained with 𝑝 sources of information, respectively

𝑛

Cardinal of the discernment space Θ

𝐷𝑒𝑑𝑒𝑘𝑖𝑛𝑑 . SVM𝑘𝑖

Sequence of Dedekind’s numbers Binary SVM classifier trained with the 𝑖 − th descriptor which allows separating data issued form the two classes 𝜃𝑘 and 𝜃𝑘

𝑥𝑘𝑖

𝑘 − th training sample issued from the 𝑖 − th class 𝜃𝑖

𝑦𝑘𝑖

Instance-label of the corresponding training sample 𝑥𝑘𝑖

𝑍𝑏

Normalization factor introduced in the axiomatic approach in order to respect the mass definition

𝑓𝑖𝑏 .

𝑖 − th output of binary SVM classifier allows separating data issued form the two classes 𝜃𝑖 and 𝜃𝑖 which are issued from the source 𝑆𝑏

OC − C𝑖𝑘

OC-SVM classifier trained with samples belonging to the 𝑖 − th target class 𝜃𝑖 issued from the information source 𝑆𝑘

𝑔𝑖𝑘 .

Reassigned output of the 𝑖 − th OC − SVM𝑖 classifier, using logarithmic function, feeded by the information source 𝑆𝑘

𝛾𝑗 ER𝑗

𝑗 − th RBF parameter used in the kernel of OC-SVM classifier Error rate corresponding to 𝛾𝑗 , which is computed during the validation phase

Introduction Automatic reading of handwritten documents is currently an active research topic. Its main objective is to convert images of handwritten documents into a digital representation for archiving, research, modification, re-use and transmission of information contained into an image. Applications are increasingly numerous and varied: analysis of the handwritten signature for authenticating individuals [Plamondon, 1989], [Leclerc, 1994], [Dimauro, 2004], [Impedovo, 2008] automatic mail sorting [Srihari, 1993], [Gilloux, 1993], [Chen, 1995], [Cohen, 1994], [El-Yacoubi, 1996], [Filatov, 1998], [Gader, 1997], [Gilloux, 1995a], [Kim, 1997], [Kim, 1998], [Kundu, 1998], [Lecolinet, 1990], automatic transcription of historical documents [Romero, 2007], [Toselli, 2010], [Berg-Kirkpatrick, 2013], automatic reading and forms sorting [Clavier, 2000], [Ramdane, 2003], [Mandal, 2005], [Milewski, 2006], and even the automatic processing of bank checks [Impedovo, 1997], [Dimauro, 1998], [Dzub, 1998], [Gilloux, 1992], [Guillevic, 1998], [Han, 1997], [Knerr, 1996], [Knerr, 1998], [Leroux, 1997], [Paquet, 1993]. Generally, a handwritten recognition system is composed of four main modules: acquisition, preprocessing, feature generation and classification. The first module is the acquisition module that depends on data acquisition mode: on-line or off-line. Hence, the handwriting recognition systems are commonly categorized into two application areas: a. On-line handwriting recognition for which the text is entered with a stylus in continuous tracing on a sensitive surface. Acquired data are then available under form of subsequent time-points. In this case, the signal is restricted to one-dimensional representation and the recognition system can benefit from both temporal and dynamic information of the plotted matching [Starner, 1994], [Tappert, 1990]. b. Off-line handwriting recognition for which the text is transcribed on paper sheets. The data are acquired using a scanner or camera allowing grey-level images. Compared to on-line systems, off-line systems have only two-dimensional static information whose variable thickness of the tracing becomes an additional constraint to be taken into account [Plamondon, 2000]. The preprocessing module which allows reducing the possible artifact contained into the handwritten data in order to make them standard and ready for feature generation. This module is generally composed by several steps: Background elimination, noise reduction, size normalization, skeletonization, etc [Kumar, 2012]. 1

Introduction

The feature generation module allows extracting the pertinent features through various methods, which are mainly classified as global or local. Global features describe an entire image and include the discrete Radon transform [Coetzer, 2004], the Hough transform [Kaewkongka, 1999], the discrete Wavelet transform [Deng, 1999], the Contourlet transform [Yang, 2007a], [Hamadene, 2012], horizontal and vertical projections [Fang, 2003], and smoothness features [Fang, 2001]. Local features are extracted at stroke and substroke levels and include unballistic motion and tremor information in stroke segments [Guo, 2001], stroke “elements” [Fang, 2003], local shape descriptors [Sabourin, 1997], and pressure and slant features [Quek, 2002]. Finally, the classification module allows assigning a pattern to predefined class. Several classification methods have been proposed as template matching techniques [Deng, 1999], [Fang, 2003], [Guo, 2001], minimum distance classifiers [Fang, 2001], [Sabourin, 1997], support vector machine [Justino, 2005], hidden Markov models [Justino, 2001], [Justino, 2005], [Coetzer, 2004], neural networks [Kaewkongka, 1999], [Quek, 2002], etc. Despite researches in various application areas of pattern recognition [Jain, 2000], [Duda, 2001], [Cheriet, 2007], the handwriting recognition remains an open and important problem. In many applications, various constraints do not allow an efficient joint use of classifiers and feature generation methods leading to an inaccurate performance. The main reasons come from two aspects. First, for a specific application problem, each of these classifiers could attain a different degree of success, but maybe none of them is totally perfect, or even not as good as expected for practical applications. The second aspect is that for a specific recognition problem, usually numerous types of features could be used to represent and recognize patterns [Xu, 1992], [Cheriet, 2007]. However, with the existence of the constraints corresponding to both aspects mentioned before, the concept of classifier combination is proposed as a new direction for enhancing the robustness of recognition systems [Kittler, 1998]. In this thesis, we are interested to the latest thoughts about classifier combination within writing recognition framework (handwritten). The idea of combining the classifier outputs for designing a system with high reliability is not new. The latter has always attracted mainly the interest of the scientific community since the 19th century. More than 200 research works [Clemen, 1989] have already been cited by Clemen the original idea of which is attributed to Laplace in 1818 [Laplace, 1847]. In [Carney, 1999], the first application of a combination of neural networks has been attributed to Nilsson [Nilsson, 1965].

2

Introduction

This technique has become increasingly used as a way [Srihari, 1982], [Hull, 1983], [Hull, 1988], [Mandler, 1988], [Lam, 1988] to improve the robustness of handwriting recognition systems particularly in several applications: recognition of handwritten numerals [Duin, 2000], [Huang, 1995], [Jain, 2000], [Kang, 1997], [Kittler, 1998] characters and words [Ho 1994], signature verification [Sabourin, 1994], [Zois, 1999], identification of forms [Clavier, 2000], etc. More recently, systems based on the combination of classifiers for handwriting recognition have known great progress and research works have been successfully implemented. These systems are designed according to the type of combined classifier outputs, the nature of the used classifiers as well as adopted combination schemes. Nevertheless, the first two approaches require developing a large number of classifiers and feature generation methods. Though there are numerous works in this area [Jain, 2000], [Duda, 2001], [Cheriet, 2007], they did not highlight the incontestable superiority of a method over another in both steps of generating features and classification. Therefore, rather than trying to optimize a single classifier by choosing the best features for a given problem, researchers thought it was more interesting to combine multiple classifiers [Duin, 2000], [Hansen, 1990], [Ho 1994], [Jain, 2000], [Xu, 1992]. Indeed, the combination of classifiers allows exploiting the redundant and complementary nature between the responses issued from different classifiers [Anisimovich, 1997], [Battati, 1994], [Ji, 1997], [Rahman, 2000], [Tumer, 1996]. Usually, neural or Bayesian classifiers are widely used. More recently, the classifiers based on Support Vector Machine (SVM) have raised up the interest of researchers in different areas of pattern recognition for their performances judged significantly higher than those of other traditional classifiers. Basically, an SVM, also called bi-class SVM (B-SVM), is a supervised learning algorithm introduced by Vapnik et al. [Vapnik, 1995] for separating only two classes. It is based on the principle of structural risk minimization (SRM), which addresses two central problems of the statistical learning theory: controlling the efficiency of the classifier and the phenomenon of overfitting. The basic idea is to use kernel functions for representing data of the original space in a potentially much higher dimensional feature space and find a hyperplane that separates two points, each one belongs to a class, while maximizing the margin between them [Vapnik, 1995]. Later, an alternative to the standard B-SVM algorithm, called one-class SVM (OC-SVM), is proposed in [Schölkopf, 2001] for adapting the latter to a one-class classification problem. As the case of B-SVM classifier, the OC-SVM classifier represents the input data in a potentially much higher dimensional feature space through using a kernel function and tries to 3

Introduction

find a hyper sphere in which the most of learning data are included into a minimum volume. It also attempts to separate the majority of points from the origin through a maximum margin considering the latter as the only member of the second class [Bergamini, 2009]. However, the main limitation of using SVMs is related to the choice of suitable descriptor. Indeed, for the same application, the SVM may respond differently depending on the used descriptor. Hence, various methods have been used for combining multiple sources of information in order to improve the accuracy of the handwritten recognition. In this thesis, the proposed works focus on the development and implementation of various schemes to combine SVM classifiers using the Dempster-Shafer theory (DST) of evidence and DezertSmarandache theory (DSmT) of plausible and paradoxical reasoning. When using the DSmT in conjunction with the SVMs (DSmT-SVM), two main problems are occurred. The first problem is related to the choice of the estimation model. The second problem concerns the difficulty of extending the DSmT-SVM for multi-class classification. Indeed, an important number of focal elements is produced leading to the impossible use of the DSmT. Both DST and DSmT allow dealing with conflicts between the responses of classifiers attempting to select the best responses. Two main applications are considered for evaluating the effective use of both theories which are the handwritten signature verification and the handwritten digit recognition. Indeed, probabilistic approaches are generally able to represent the uncertain knowledge but are unable to model easily the information which is imprecise, incomplete, or not totally reliable. Moreover, they often lead to confuse both concepts of uncertainty and imprecision with the probability measure. Furthermore, the modeling through these approaches allows the reasoning only on singletons, which represent the different hypotheses (classes), under the closed world assumption. Therefore, new original theories dealing with uncertainty and imprecise information have been introduced, such as the fuzzy set theory [Zadeh, 1968], evidence theory [Shafer, 1976], possibility theory [Dubois, 1988] and, more recently, the theory of plausible and paradoxical reasoning [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009], etc. As part of our thesis, we focus on the combination of SVM classifiers in a context of handwritten recognition of signatures and numerals based on DST and DSmT. On that point, the work presented in this thesis is organized as follows: In the first chapter, we give general points on the approaches of classifier combination used for the handwriting recognition in particular the SVM classifiers and the theories of combination based on DST and DSmT. 4

Introduction

The second chapter presents a DSmT based combination scheme for writer-dependent handwritten signature verification, where two case studies are addressed for validating the effective use of the DSmT in conjunction with B-SVM classifiers. In the third chapter, we propose a DSmT based combination scheme for writer-independent handwritten signature verification, where an effective intelligent learning technique is implemented using the DSmT in conjunction with OC-SVM classifiers. In the fourth chapter, we propose to investigate the effective use of a DSmT based supervised model for multi-class classification in conjunction with the B-SVM classifiers using the OneAgainst-All (OAA) implementation. To prove the effective use of the proposed scheme, a case study is conducted on the handwritten digit recognition. In the last chapter, we propose a general belief function framework based combination scheme for reducing the complexity of DSmT, in particular in the multi-class classification framework, where an effective learning technique is incorporated using the OC-SVM classifiers. The evaluation is performed on a case study for handwritten digit recognition. Finally, we conclude this thesis with the main contributions of our work as well as the future prospects.

5

Chapter 1

Classifier Combination Approaches for Handwriting Recognition

Abstract: In this chapter we present an overview of the most used mathematical tools in setting up of a handwriting recognition system, notably the investigated methods in this thesis. Firstly, we recall the meaning of the classifier, and we highlight the most used classification approaches in the literature. We particularly review the discriminative approach, called support vector machines, which is studied in depth for their performances judged significantly higher than those of other traditional classifiers. After that, we browse the combination schemes used for improving recognition system robustness, which differ mainly by the combination level of classifiers and the way in which data will be treated. In particular, the parallel classifier combination is studied to present the necessary steps included in this scheme, namely, estimation of masses, belief combination model and decision making, where an overview of the main combination rules based on belief function theories is given. This overview allows identifying the strengths and weaknesses of the combination rules used in belief function theory framework.

1.1 Introduction Generally, classification methods based on different theories and methodologies have been considered as possible solutions for a given problem. However, there is not dominant classifier that is suitable for various applications. In addition, the study of these techniques proved differences in behavior and therefore a potential complementarity which can be used to get higher performance than best single classifier. In fact, in the same way that an additional classifier incorporated within a combination scheme leads to decide better. Since, each classifier in the multiple classifiers system has its own domain of expertise. Therefore, the idea of classifier combination has been seriously considered. In order to benefit the most from the complementary information issued by different classifiers, robust combination schemes and methods must be developed. Hence, great efforts have been done for proposing various combination methods, which become more and more 6

Chapter 1: Approaches of Classifier Combination for Handwritten Recognition

used as a way explored in real word applications [Srihari, 1982], [Hull, 1983], [Hull, 1988], [Mandler, 1988], [Lam, 1988] to enhance the robustness of recognition systems, as well as handwriting recognition particularly: recognition of handwritten numerals [Duin, 2000], [Huang, 1995], [Jain, 2000], [Kang, 1997], [Kittler, 1998] and signature verification [Sabourin, 1994], [Zois, 1999]. Part of the current research in many pattern recognition application areas is focused on the parallel combination of classifiers. This approach has been proposed as a promising way for improving the performance of a recognition system. It can be defined as a very particular technique which assumes that with a suitable choice of classification methods, but retaining for each classifier the best suited kind of features, it is possible to integrate into the same system the opinion of several classifiers by exploiting their complementarity. Hence, it seems be interesting expect to benefit from responses provided by individual classifiers approaching the same problem in different ways [Kurzweil, 1990]. This chapter is organized as follows. Section 1.2 is devoted to the well-known classification approaches in the literature, particularly the learning technique called support vector machines (SVM). In Section 1.3, we present the combination levels, as well as different combination schemes used for combining multiple classifiers. We give in Section 1.4 the different steps included in a general parallel combination scheme for handwriting recognition, where an overview of the main combination rules based on belief function theories, i.e. probabilistic theory, Dempster-Shafer evidence theory (DST), and the DezertSmarandache theory (DSmT), is presented. 1.2 Classification approaches Before defining a classification problem, it is necessary to choose a distribution of patterns to recognize a set of classes. Hence, the classification returns to assign each pattern of the image in one or more predefined classes, based on a similarity of features. In other words, a classifier is considered as an estimator or a data processing system that receives a pattern and provides information about its corresponding class [Moobed, 1996]. Generally, the pattern to be recognized may be associated to membership degrees. In this case, it may belong to several classes. However, in the majority of classification problems, we are dealing with an exclusive classification in which a pattern must belong only to one class. For this purpose, it is necessary to choose a representation for describing the data (features), to establish a decisionmaking model that allows deciding of the membership of an object or a pattern to a given class and use a validation dataset for setting parameters of the classifier (Figure 1.1). 7


Moreover, the classification approaches can be divided into three main categories involving different strategies according to the way data are processed. In the first category, the number of possible classes is known from prior information about the data to classify. A learning dataset is then constructed by selecting a number of samples of each class. The problem consists in assigning any new object to the most suitable class according to an appropriate decision-making model, which is designed by the samples of learning dataset. In this case, such kind of a classification approach is called a supervised classification. In the second category, there is no need to use learning samples. The number of classes and the membership functions to these classes should be established only from observations, without reference to a learning dataset. The grouping of samples is carried out on the basis of similarity and is generally conditioned by choosing the number of classes. The user intervenes only once the classification is performed to interpret the content of the classes, without requiring other assumptions about the data to be processed. Such kind of classification approaches is called automatic or unsupervised classification. Finally, in the third category, a part of the samples is only labeled. Thus, such kind of a classification approach is called a semi-supervised classification.

Document image

Classification process Acquisition (Scanner)

Document

Classifiers

Prior information (Validation data)

Decision model

Validation of the model

Label (Recognized character)

Figure 1.1: Character recognition system.

8


In regards to the classifiers, there are four major families [Jain, 2000]: template matching techniques, structural approaches, neural networks and statistical classification. 1.2.1 Template matching techniques Template matching is a technique in digital image processing which aims by comparing a pattern with prototypes representing each class through a distance measure (correlation or similarity). This technique is poorly suited to the handwriting recognition because of the extreme variability of handwritten characters, which involves a huge number of prototypes for each class [Brunelli, 2009]. 1.2.2 Structural approaches These approaches are based on the own structure of the character, emphasizing certain aspects even in tracing of letters. The structure is expressed in terms of primitive components corresponding to elementary patterns of the tracing and to events produced when plotting as: the retrogression, changing of orientation, crossing, increase or decrease of the slope, etc. These components are called primitives [Schalkoff, 1991]. In this technique, several methods such as the graph structures, syntactic structures, methods of tests, comparison of strings, etc, can be distinguished [Schalkoff, 1991]. 1.2.3 Connectionist approaches The pattern recognition problem can be solved by finding a function that associates an input set of patterns to a set of output classes. The connectionist network plays the role of a transfer function so that the application of this one to an input value provides an expected output response. The neural networks allow modeling the surfaces of separation between the classes, which allows reducing the classification error. They take into account the interclass structure of the data. The connectionist techniques are effective in complex cases of the pattern recognition. However, the neural network is considered as the most time-consuming approaches in learning phase and its architecture is hard to modelize. The neural networks (NNs) have known a great success from the 90s, particularly through developing an effective learning algorithm and easy to implement: the backpropagation algorithm [Smagt, 1996], [Zhang, 2000]. There are many types of neural networks, but the two most used types in characters recognition are the multi layer perceptrons (MLP) [Tay, 2003] and Radial Basis Function (RBF) Networks [Augustin, 2001], [Gilloux, 1995b].

9


1.2.4 Statistical approaches In statistical approaches, the recognition depends on statistical analysis of measures performed on the patterns to be recognized. Consequently, it requires a huge number of samples for achieving an effective learning and uses the decision rule called higher probability of belonging to a class. The supervised learning phase, through the samples, consists to determine the parameters to be introduced in the decision rule. For a given learning set, the construction of the decision surfaces (also called decision boundaries) can be performed according two different ways. The first one, called modeling approaches, consists implicitly to generate the boundaries from the probability distributions of each class. For example, Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), K-nearest neighbors (K-NN) and so on. The second one relates to the so-called discriminative approaches, which consists to explicitly estimate the decision boundaries between classes, for example the SVMs. 1.2.4.1 Modeling approaches These approaches construct the decision boundaries from the probability distributions of each class. They benefit from the automatic learning methods which lean on the well known theoretical bases such as: a. Nonparametric methods: implemented in the case, where there is no a prior knowledge about the probability distribution of the classes. The most known nonparametric approach is the K-NN [Poisson, 2005]. b. Parametric methods: parametric methods used when the form of the probability distributions is known, Gaussian in the classical case. For each class, the unknown parameters of Gaussian (mean, variance, covariance matrices, etc.) are estimated during the learning phase. Once these parameters estimated, the decision is made naturally by using the Bayes rule. The most known parametric approach is the HMMs [Rabiner, 1989]. 1.2.4.2 Discriminative approaches These approaches aim to construct directly the decision boundaries by minimizing an error criterion between the actual and predicted outputs of the classifier. The misclassification error or the mean squared error (MSE) is often chosen as an error criterion. There are several discriminative approaches for data classification, and the most recent classifiers are the SVMs [Vapnik, 1998]. 10


1.2.5 Review of the SVM classifier The SVM is a supervised learning method introduced by Vapnik et al. [Vapnik, 1995], which is initially proposed to construct binary classifiers. For a two-class problem, the SVM search an optimal decision surface, determined by some points (called support vectors) of the learning set, by projecting the non linearly separable input data into a potentially much higher dimensional representation space (also called feature space) ℱ via a nonlinear mapping Ψ (Figure 1.2). This surface, which is constructed in the feature space, can be considered as an optimal hyperplane of decision. It is obtained by solving a quadratic programming problem depending on regularization parameters.

Input space: ℱ, 𝑥 ∈ ℱ Ψ

Mapping

Feature space: Ψ 𝑥 Optimal hyperplane

Class 𝑥

Figure 1.2: Principle of separation through a kernel function in SVM framework. Given a set 𝐷 of 𝑁 learning examples, such that 𝐷 =

𝑥𝑖 , 𝑦𝑖 ∈ ℝ𝑛 × ∓1 ; 𝑖 =

1, 2, … , 𝑁 , each marked as belonging to one of two separable classes 𝐶1 = 𝑥𝑖 , 𝑦𝑖 = +1 or 𝐶2 = 𝑥𝑖 , 𝑦𝑖 = −1 , where each example 𝑥𝑖 is characterized by a set of 𝑛 features 𝑥𝑖 = 𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑛 and 𝑦𝑖 defines a class (also called instance-label) of a given example 𝑥𝑖 . The objective of the SVMs is to find a hyperplane which separates the learning data so that all points with the same label are on the same side of the hyperplane. Then there are two cases: case of linearly separable data, and case of non-linearly separable data. 1.2.5.1 Case of linearly separable data For a classification problem of linearly separable data, the objective is to separate the two classes of data by a function which is determined from the available samples. Let's consider the examples in Figure 1.3 (a), where there are several possible linear classifiers that can 11


separate the data, but there is only one (see Figure 1.3 (b)) which maximizes the margin (represents the smallest distance between the different data of both classes and the hyperplane). This linear classifier is known as the Optimal Hyperplane (OH) in literature [Vapnik, 1998]. Intuitively, we will expect to generalize well this limit compared to other possible limits as shown in Figure 1.3 (a). Class +1

w, xi  b  1 w, xi  b  0

SV

w, xi  b  1

Canonical hyperplane

Class -1

Margin

(b)()

(a)

SV

Figure 1.3: Classification between two classes using the hyperplanes: (a) Arbitrary hyperplanes l, m and n ; (b) Optimal hyperplane of separation with a large margin identified by the canonical hyperplanes through the support vectors. Let’s consider any canonical hyperplane H : w, xi  b in Figure 1.3 (a) satisfying the following constraints.

y .[ w, x  b]  1, i  1, ..., N i i

(1.1)

The optimal hyperplane is given by maximizing the margin M (w, b) , where w denotes the normal vector to the hyperplane and b the scalar bias computed by using any support vector, under constraints of the Equation (1.1). Hence, the margin M (w, b) is expressed under the form M ( w, b) 

2 . More the margin, the greater the expected error is small. Maximizing the w

margin, is returned to minimize the square of the vector norm under the constraints of Equation (1.1). Thus, the optimal hyperplane which satisfies the constraints of Equation (1.1) is one that minimizes the function defined by:

( w) 

12

1 w 2

2

(1.2)


The solution of the optimization problem of (1.2) subject to (1.1) (also called quadratic 





programming optimization problem) is given by the saddle point w, b,  of the Lagrangian: L( w, b,  ) 

N 1 2 w    i y i [ w, xi  b]  1 2 i 1

(1.3)

where   ( 1 , ...,  N ) denote the Lagrange multipliers. The Lagrangian should be minimal with respect to w and b and maximal with respect to

  0. Hence, this saddle point satisfies the conditions: 









 i [ yi  w, xi  b   1]  0, i  1, ..., N  

(1.4)







The support vectors SV are the samples x i for which the equality y i ( w, xi  b)  1 is verified. Specifically, they are the closest vectors to the optimal hyperplane. For all other 

samples that satisfy the equation (1.4), so we have  i  0 . This is equivalent to

SV  xi  i  0, i  1, ..., N . Finally, the decision function is given as follows: f x  





 i yi xi , x  b

(1.5)

xiSV

If the function f (x) is negative, then x belongs to the class -1 otherwise x belongs to the class +1. 1.2.5.2 Case of non-linearly separable data In most real-world problems, the data are not always linearly separable as shown in Figure 1.4. It is necessary to bypass this problem. The SVMs have been generalized through using two tools: the soft margin and kernel functions.  Soft margin: The principle of the soft margin consists in authorizing the misclassification errors. The new problem of finding the optimal hyperplane of separation is reformulated as being the optimal hyperplane separating the two classes is one that separates the data with minimal errors, and satisfies the following conditions: i. The distance between the well classified vectors and the hyperplane must be maximal. ii. The distance between the misclassified vectors and the hyperplane must be minimal. 13


Class +1

SV Misclassified example Misclassified example

Margin

Class -1

SV Figure 1.4: Separating hyperplanes in the case of non-linearly separable data, where H is any hyperplane, HO is the optimal hyperplane and SV are the support vectors. To perform a generalization of the optimal hyperplane within the case of non-linearly separable data, slack variables  i are introduced in [Cortes, 1995]. Consequently, the constraints (1.1) are modified as follows:

yi .[ w, xi  b]  1  i , i  1, ..., N

(1.6)



Thus, instead of searching the weight vector w that minimizes the square of the norm

 w, w  as in the case of linearly separable data, we try to minimize the function



under the constraints (1.6) such that:

( w,  ) 

N 1 2 w  C  i 2 i 1

(1.7)

where   (1 , ...,  N ) and C ( C  0 ) is a user-defined parameter. The parameter C is interpreted as a tolerance to the classification noise. For large values of C , only very small values of  are permitted and therefore the number of misclassified points is very low. However, if C is small,  can become quite large and then allows more of the misclassification errors. Thus, we get the equation of the optimal hyperplane within the meaning of new constraints taking into account the values of  i . The support vectors are always the closest sample vectors to this hyperplane. The calculation of the normal 14

Chapter 1: Approaches of Classifier Combination for Handwritten Recognition 



vector w , the bias b and the classification function f (x) is exactly the same as for the case of linearly separable data.  Kernel functions: Intuitively, the fundamental idea behind SVM is to construct a decision surface for separating linearly two classes. When data 𝑥1 , … , 𝑥𝑁 ∈ ℝ𝑛 are non linearly separable, they are mapped into a potentially much higher dimensional feature space ℱ via a nonlinear mapping: Ψ: ℝ𝑛 → ℱ 𝑥↦Ψ 𝑥

(1.8)

For a given learning problem one now considers the same algorithm in ℱ instead of ℝ𝑛 , i.e. one works with the sample Ψ 𝑥𝑖 , 𝑦𝑖 ∈ ℱ × −1, +1 , 𝑖 = 1, … , 𝑁. Therefore, a dot product is computed in the feature space using a kernel function 𝐾 such that: 𝐾 𝑥, 𝑥𝑖 = Ψ 𝑥 , Ψ 𝑥𝑖 . Thus, all mathematical functions, which satisfy Mercer’s conditions, are eligible to be a SVM-kernel [Vapnik, 1995]. Examples of such kernels are given in Table 1.1. 𝑲 𝒙, 𝒙𝒊

Kernel RBF (Radial Basis Function)

exp

𝑥 − 𝑥𝑖 2 2 × 𝜎2

𝑥, 𝑥𝑖 + 1

Polynomial

Parameters

𝑃

𝜎 : is the standard deviation

of the Gaussian function 𝑃:

is

the

order

of

the

polynomial − 𝑥 − 𝑥𝑖

Negative distance

𝛾

𝛾 : is the kernel parameter to adjust in 0, 2

Table 1.1: Example of kernel functions. Notice that 𝑥 − 𝑥𝑖 , in the Table 1.1, is the Euclidian distance between two samples. Therefore, the decision function 𝑓: ℝ𝑛  −1, +1 , is expressed in terms of kernel expansion as: 𝑓 𝑥 =

𝑆𝑣 𝑘=1 𝛼𝑘 𝑦𝑘

𝐾 𝑥, 𝑥𝑘 + 𝑏

(1.9)

where 𝛼𝑘 are Lagrange multipliers, 𝑆𝑣 is the number of support vectors 𝑥𝑘 which are training data, such that 0 ≤ 𝛼𝑘 ≤ 𝐶, 𝐶 is a user-defined parameter that controls the tradeoff between the machine complexity and the number of nonseparable points

15


[Huang, 2002], the bias 𝑏 is a scalar computed by using any support vector. Finally, test data 𝑥 is classified according to: 𝑥∈

Class +1 Class −1

if 𝑓 𝑥 > 0 otherwise

(1.10)

1.3 Classifier combination Many research works have shown the importance to have robust systems for dealing with recognition problems, in particular in the area of handwriting recognition. Since for the same application using the same set of features as input of classifiers, the recognition rates can vary from one classifier to another, where it seems be interesting to exploit the behavior of complementarities between them. Consequently, the interest of using simultaneously multiple classifiers has gradually imposed. 1.3.1 Combination levels There are many methods for combining classifiers, which are generally classified into three categories according to the level of information provided by the classifier [Xu, 1992]. This categorization is also adopted within the majority of research works [Jain, 2000], [Ruta, 2000], [Heutte, 1994]: 1.3.1.1 Combination at class level In a class level combination, the opinion of the classifier is binary. We can then represent the response of classifier through a binary vector in which “1” indicates the proposed class by the classifier. A classifier can also produce a set of classes. It then considers a pattern belongs to a class of this set without giving other information, which allows discriminating between classes. 1.3.1.2 Combination at rank level A rank level combination performs a ranking on the classes. The classifier indicates the ranking by providing in the output a vector of ranks. The class placed at the first rank of the list by the classifier is considered as the most probable for a given pattern and the class of last rank is the less probable one. 1.3.1.3 Combination at measure level A measure level combination indicates the confidence factor of the classifier in its proposal. The output of the classifier is a vector of measures (normalized or not), which may be a 16


distance, a posterior probability, a confidence value, a match score, belief function, a possibility, credibility or a fuzzy measure, etc. 1.3.2 Combination schemes of multiple classifiers Various methods are proposed in the literature for combining multiple classifiers, which led the development of several schemes in order to treat data in different ways [Heutte, 1994], [Moobed, 1996], [Rahman, 1999]. Generally, three approaches for combining classifiers can be considered: parallel approach, sequential approach and hybrid approach [Zouari, 2004]. 1.3.2.1 Sequential combination A sequential combination (also called serial combination) is organized into successive levels of decision, allowing reducing gradually the number of possible classes. In each level, there exists only one classifier which takes into account the provided response by the classifier placed up stream in order to deal with the rejected classes or confirm the obtained decision on the pattern that is presented to it (Figure 1.5).

𝜃1

Input pattern

𝜃1

𝜃𝑛

𝜃1

𝜃1

𝜃1

𝜃𝑛

Classifier 1

Classifier 2

rejection

rejection

……

𝜃1

𝜃𝑛

Classifier L

Recognized pattern

rejection

Figure 1.5: Sequential combination of L classifiers. Where 𝜃𝑖 is the set of samples of the 𝑖 − th class, 𝑖 = 1, … , 𝑛. 1.3.2.2 Parallel combination A parallel combination leads in the first time different classifiers to operate independently of each other and then combines their respective responses. This combination is performed either automatically, without privileging a classifier over another, or is conversely directed and, in this case, the response of each classifier is assigned by a weight based on its performances. The execution order of the classifiers is not involved in this approach. Figure 1.6 shows a representation of the parallel combination of classifiers.

17


𝜃1

𝜃1

𝜃𝑛

Classifier 1

Combination

Input pattern

Classifier 2

module

⋮ ⋮

Final decision

Classifier L

Figure 1.6: Parallel combination of classifiers. 1.3.2.3 Hybrid combination A hybrid approach combines both sequential and parallel architectures in order to take full advantage of each used classifier. Figure 1.7 shows an example of hybrid combination in which a classifier in series is combined with two classifiers in parallel. This kind of approach allows generating many cooperation schemes that can become promptly complex to optimize. It illustrates the two aspects of the combination: on the one hand reducing the set of possible classes, on the other hand searching a consensus between classifiers in order to achieve a single decision. Classifier 2 Input pattern

Final decision

Classifier 1 Classifier 3

Figure 1.7: Hybrid combination of classifiers. In literature, various theories have been introduced for combining information sources (in our case multiple classifiers), such as the fuzzy set theory [Zadeh, 1968], evidence theory [Shafer, 1976], possibility theory [Dubois, 1988] and the theory of plausible and paradoxical reasoning [Dezert, 2002a], [Smarandache, 2004]. In the following, we study in depth the use of belief function theories for combining p information sources within a general parallel combination scheme.

18


1.4 General parallel combination scheme for handwriting recognition In handwriting recognition, DST and Probabilistic theory (PT) have been used for various measure levels of combination [Arif, 2006], [Xu, 1992], [Wang, 2011], [Burger, 2011]. However, both theories have some limitations and they cannot always provide good performance with imprecise, imperfect or uncertain data. In our case, we propose the use of PT and DST including DSmT through a general parallel combination scheme to improve recognition systems robustness, particularly the recognition of handwritten numerals and signature verification. In this scheme, the first step is to transform the outputs issued from different classifiers into belief assignments using an estimation technique of masses. In the next step, a belief function theories based rule is used for measure level combination and finally a decision measure is computed, and then a decision rule is applied for classification. Figure 1.8 shows the steps involved in the proposed scheme. This combination scheme includes three modules: (1) estimation of masses, (2) belief model combination and (3) decision module. In this chapter, the description of the proposed scheme uses 𝑛 class – 𝑝 classifier approach and the subscript 𝑐𝑖 , 𝑖 = 1, 2, … , 𝑝, represents the 𝑖 − th classifier.

Figure 1.8: Proposed parallel combination scheme within general belief function theory framework. 1.4.1 Estimation of masses The estimation of belief masses represents a crucial step in the combination process and remains a largely unsolved problem, which did not yet find a general answer. In image processing, Bloch [Bloch, 2003] describes three different levels from where a mass function may be derived: at the highest level where information representation is used in a way similar 19


to that in artificial intelligence and masses are assigned to propositions; at an intermediate level, masses are computed from attributes, and may involve simple geometrical models; at the pixel level, mass assignment is inspired from statistical pattern recognition [Bouakache, 2009]. Furthermore, the difficulty increases when we are interested on the compound hypotheses and their mass functions [Garvey, 1986], [Lowrance, 1991], [Abbas, 2009]. The most widely used approach is to assign to simple hypotheses masses that are computed from conditional probabilities [Abbas, 2009]. Then a transfer model is introduced to distribute the initial masses over all compound hypotheses (union and intersection of classes), or only some compound hypotheses are involved through an appropriate model inducing simplifications, in terms of computations and memory size, during the combination process [Lee, 1987], [Rasoulian, 1990], [Abbas, 2009], [Abbas, 2012d], [Abbas, 2013b]. This transfer operation is done through a coarsening (discounting) factor and/or a conditioning factor applying to the conditional probabilities (initial masses) [Abbas, 2009]. Among the transfer models which have been introduced in the literature for estimating belief assignments are the following: the upper and lower probabilities model proposed by Dempster [Dempster, 1967], the estimation model through a distance [Denoeux, 1995], the Transferable Belief Model (TBM) developed by Smets [Smets, 1994], the consonant model of Dubois and Prade [Dubois, 1994] and the dissonant model of Appriou [Appriou, 1991]. 1.4.2 Overview of belief model based combination rules In this section, we describe the formulation of the evidence-theoretic Probability theory, DS theory and DSm theory using belief function models. Thus, we consider a 𝑛-class problem with the classes being 𝜃1 , 𝜃2 , … , 𝜃𝑛 −1 and 𝜃𝑛 . Consequently, the combination algorithms are formulated as a 𝑛-class problem. These formulations can be extended to any number of classes including the compound hypotheses (i.e. union and intersection of classes). 1.4.2.1 Notations A 𝑛-class problem consists to generate masses, which are combined through an appropriate rule. Hence, we denote by p sources S1 , S2 ,, S p the sources of information corresponding to p classifiers. For each source, 𝑛 classes, 𝜃1 , 𝜃2 , … , 𝜃𝑛−1 , 𝜃𝑛 , are considered. To generate the combined masses from p classifiers, a set of elements is defined namely

G  A1 ,, ACarG  , which attributes for each element Aj  j  1,, Car G  a “belief” value mi Aj  associated to each source S i taking the value in the range 0,1 and verifying 20




Car( G ) j 1

mi Aj   1 , such that Car G  is the cardinal of the set G and i  1, 2,, p  1, p

defines the index of the corresponding source. According the finite set of hypotheses, a combination rule is then performed on elementary masses for generating masses from p sources as follows:





mc  A  m1  m2    m p  A   mi Bi , B1 , B2 ,, B p  G p



where B1 , B2 ,, Bp



p

(1.11)

i 1

define the elementary or compound hypotheses of the p sources

S1 , S2 ,, S p1 and S p , respectively. mc  A is the mass associated to the elementary or

compound hypothesis A of the combined sources and verifying

CarG 



j 1

mc Aj   1 .

Hence, the choice of an appropriate combination rule depends on the set of predefined hypotheses. Therefore, three theories can be considered:   

Probabilistic theory (PT), Dempster-Shafer theory (DST), Dezert-Smarandache theory (DSmT).

1.4.2.2 Combination rule based on the PT The PT is the first theory defined for combining sources of information. In PT framework,

G is defined as a finite set of exhaustive and mutually exclusive hypotheses called the discernment space, which corresponds in the multi-class classification to n elementary hypotheses G    1 , 2 ,, n  such that

 m    1 . Therefore, the combination rule is n

j 1

i

j

defined as the sum of the probability assignments according to the following equation: 1  mc  A  msum  A   p  0

 m   p

i

j

if A   j ,

i 1

(1.12)

otherwise.

The combination based on the PT using the sum rule seems effective for non-conflicting cases. However, when sources provide imprecise, uncertain and conflicting information, the sum rule is not appropriate. Hence, a theoretical framework based on the evidential reasoning has been proposed by Dempster [Dmpster, 1967] then developed by Shafer [Shafer, 1976]. Example of such approaches is DS combination rule.

21


1.4.2.3 Combination rule based on the DST The main concept of the DST is to distribute elementary mass of certainty over all the disjunctions of the elements contained into  instead of making this distribution over the elementary hypothesis only. In DST framework,  generates the power set 2  with the  (union) operator only. More precisely, the power set 2  is defined as the set of all composite propositions/subsets built from elements of  with ∪ operator such that: 1. Ø,1 ,, n 2 . 2. If A,B  2 , then A  B  2 . 3. No other elements belong to 2  , except those obtained by using rules 1 and 2. Therefore, the belief functions, also known as the basic belief assignment (bba), are computed on the power set defined as

𝐴𝑗 ∈2Θ

𝑚𝑖 𝐴𝑗 = 1,

    mi  j     k ,k,lj 1l,, n  



G  2  Ø,1 , 2 ,, n ,1  2 ,1  2   n  ,

such that

defines the mass of partial (or total) ignorance, which is

attributed when classes 𝜃𝑘 , 𝜃𝑘+1 , … , 𝜃𝑙−1 and 𝜃𝑙 are not well distinguished through the p sources. In the evidential framework, the combined bba mDS obtained from p belief assignments m1 . ,, m p . by means of Dempster-Shafer combination rule [Shafer, 1976] is defined as: if A  Ø, 0  p  mk Bk  mc  A  mDS  A   k 1 B1 , B2 ,, B p2  B B B  A p  1 2 otherwise,  1 Kc



Kc 





(1.13)

p

 m B . k

k

(1.14)

k 1 B1 , B2 ,, B p2 B1  B2  B p ø

K c  0,1 defines the mass assigned to the empty set, which is often interpreted as a conflict

measure between the different sources, and 1  K c  is a normalization factor. The larger K c is, the more the sources are conflicting. In this case, the sources are totally considered contradictory and the combination based on the DS rule is not possible. In fact, these limitations already have reported by Zadeh [Zad, 1979], Dubois and Prade [Dub, 1986], Voorbraak [Voorbraak, 1991] and Dezert [Dezert, 2002b]. To cope with an unknown and 22


unpredictable reality, the DST was extended to a new alternative approach which has been developed by Dezert and Smarandache to deal with (highly) conflicting imprecise and uncertain sources of information [Dez, 2004d], [Smarandache, 2006b], [Dezert, 2009]. Example of such approaches is Proportional Conflict Redistribution (PCR6) rule. 1.4.2.4 Combination rule based on the DSmT The main concept of the DSmT is to distribute basic belief assignment of certainty over all the composite propositions built from elements of  with  (union) and  (intersection) operators instead of making this distribution over the elementary or union hypothesis only. In DSmT framework,  generates the hyper-powerset D  with the  and  operators as follows: 1. Ø,1,, n  D . 2. If A, B  D  , then A  B  D  and A  B  D  . 3. No other elements belong to D  , except those obtained by using rules 1 or 2. For instance, the hyper-powerset for three hypotheses (classes) is defined as: Ø, 1, 2 ,3 ,1   2 ,1  3 , 2  3 ,1   2 ,1  3 , 2  3 ,1   2  3 ,1   2  3 , 1   2   3  G  D    , 1  3    2 ,  2  3   1, 1   2   3 , 1  3    2 ,  2  3   1, 1   2   3   1   2  

The DSmT uses generalized basic belief mass, also known as the generalized basic belief assignment (gbba) computed on hyper-powerset of  , which is defined as: mi Ø  0 and

𝐴𝑗 ∈𝐷 Θ

𝑚𝑖 𝐴𝑗 = 1, such that

     mi  Aj   k  j l     k ,, l  1,, Car( D )



defines the mass of the ignorance,

     represents the mass of the conflict (or paradoxical information), Car D  mi  Aj   k  j l     k ,,l  1,,Car( D )

 



is the cardinal of the hyper-powerset D  and i  1, 2,, p . Many combination rules have been proposed in the last few years [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009]. The way the conflicting mass is redistributed yields to several versions of a Proportional Conflict Redistribution (PCR) rules [Smarandache, 2006b], [Mar, 2006]. Form PCR1 to PCR2, PCR3, PCR4, PCR5 one increases the complexity of the rules and also the exactitude of the redistribution of conflicting masses. The PCR rules redistribute the 23


conflicting mass, after applying the conjunctive rule, proportionally with some functions depending on the masses assigned to their corresponding columns in the mass matrix. The combination rule (PCR5) proposed by [Smarandache, 2006b] for two sources is mathematically one of the best for the proportional redistribution of the conflict applicable in the context of the DST and the DSmT. Martin and Osswald [Martin, 2006] have proposed the following alternative rule to PCR5 for combining more than two sources altogether (i.e. p upper then 3). This new rule denoted PCR6 does not follow back on the track of conjunctive rule as PCR5 general formula does, but it gets better intuitive results. For p  2 PCR5 and PCR6 coincide. The combined gbba mPCR6 obtained from p generalized belief assignments m1  . ,, m p  .

by means of the PCR6 rule [Martin, 2006] is defined as:

mc  A  mPCR6  A

if A  ,

0   m  A    p 2 Li  mi  A p 1  k 1  Y i l   A    l 1  Y i 1 ,, Y i  p1 D  p1 



(1.15)



otherwise.

Where p 1

 m   Y    Li 

j 1

mi  A 

i

j

i

j

p 1

 m   Y    j 1

i

j

i

(1.16)

,

j

   M ,Ø is the set of all relatively and absolutely empty elements,  M is the set of all

elements of D  which have been forced to be empty in the hybrid model M defined by the exhaustive mi  A 

and

exclusive

constraints,

Ø

is

the

empty

set,

the

denominator

p 1

 m   Y    is different to zero, and where  j 1

i

j

i

j

i

counts from 1 to p avoiding i , i.e.:

if j  i, j j  1 if j  i. 

 i  j  

(1.17)

Here, m  A corresponds to the classical DSm rule on the free Dsm model [Dezert, 2002b], which is defined as: 24

Chapter 1: Approaches of Classifier Combination for Handwritten Recognition 0   m  A    B1 ,B2 ,,B pD  B1B2 B p  A



if A  Ø, p

 m B  i

i

(1.18)

otherwise.

i 1

1.4.3 Decision making Once the combined mass is obtained, after performing a combination rule abovementioned, the decision making step follows. Indeed, in real applications, we need a reliable decision and it is the final results that matter. In Bayesian framework, a priori probability is used to obtain the maximum a posterior classification. In this case, the decision rule is characterized by maximizing one value. Whereas in both DST and DSmT frameworks, two measures of evidence are available: credibility and plausibility. Hence, a belief interval is obtained rather than a one measure (i.e. the probability measure). Therefore, any measure estimated in PT framework would fall in this interval, which is defined by the values of credibility and plausibility. Hence, the maximization criteria become a crucial problem in the decision making step. In DST framework, different functions such as credibility, plausibility and pignistic probability [Dezert, 2004e], [Shafer, 1976], [Smets, 1990] are usually used for decision making purpose. Dezert and Smarandache [Dezert, 2004e] follow the Smets’ idea and his justifications to work at the pignistic level [Smets, 2000] rather than at the credal level when a final decision has to be taken from any combined belief mass mc . . A new generalized pignistic transformation, denoted DSmP, is then proposed in [Dezert, 2009] based on DSmT. Indeed, the DSmP offers, as the classical transformation (i.e. pignistic probability denoted BetP), a well compromise between the maximum of credibility and the maximum of plausibility for decision support. Furthermore, it has the advantage to provide high probabilistic information content (PIC) for expecting better performances, contrary to the classical transformation which doesn’t provide the highest PIC in general as pointed out by Sudano [Sudano, 2002]. Let’s consider a discrete frame Θ with a given model (free DSm model, hybrid DSm model or Shafer’s model), the DSmP mapping is defined as follows: 0  mc  Ak    CM Ai  A j   Ak  Ai  A j DSmP  Ai    C M  Ak 1 mc A j  mc  Ak    CM A j  A j G Ak  A j  C M  Ak 1 









 



25

 

if Ai  Ø,

otherwise,

(1.19)


where   0 is a tuning parameter and G corresponds to the hyper-powerset including eventually all the integrity constraints (if any) of the model M ; CM Ai  Aj  and C M A j  denote the DSm cardinals [Dezert, 2004b] of the sets Ai  A j and A j respectively. The parameter  allows to reach the maximum PIC value of the approximation of mc . into a subjective probability measure. The smaller  , the better/bigger PIC value. In some particular degenerate cases, however, the DSmP 0 values cannot be derived, but the DSmP  0 values can however always be derived by choosing  as a very small positive number, say   1 / 1000 for example in order to be as close as we want to the maximum of the PIC (see

reference [Dezert, 2009] for details and examples). It is interesting to note also that when   1 and when the masses of all elements Ak having C  Ak   1 are zero, (1.19) reduces to (1.20), i.e. DSmP 1  BetP , as follows:

BetP  Ai  



A j G



CM Ai  A j

 

CM A j

 m A  c

j

(1.20)

1.5 Summary In this chapter, we have presented an overview of the most used mathematical tools in setting up of a handwriting recognition system, particularly the methods that will be investigated in this thesis. We have first recalled the meaning of the classifier, and given the most used classification approaches in the literature. In particular, the discriminative approach, called support vector machines, is studied in depth for their performances judged significantly higher than those of other traditional classifiers. The combination of several individual classifiers is a general problem that occurs in various handwriting recognition applications. According to the levels of output information, the problems of combining multiple classifiers can be resolved through three combination schemes, namely, sequential combination scheme, parallel combination scheme and the hybrid combination scheme. Indeed, the presented schemes differ mainly by the combination level of classifiers and the way in which data will be treated. For improving handwriting recognition system robustness, we proposed and presented a general parallel combination scheme based on belief function theories, which is composed of three modules: (1) estimation of masses, (2) belief model combination and (3) decision module. In first step, a brief review of different estimation techniques allowing to transform the outputs issued from different classifiers into belief assignments is presented. In the next step, various existing original theories for combining multiple classifiers are mentioned, and 26


an overview of the main combination rules based on belief function theories are presented using the proposed parallel combination scheme at measure level. This overview has allowed us to identify the strengths and weaknesses of the combination rules used in belief function theory framework, where the limits attained by each rule developed, particularly, in probabilistic theory and evidence theory frameworks are concluded. Finally, a decision measure is used, namely, credibility, plausibility, pignistic probability and the generalized pignistic transformation, for decision making purpose. Consequently, the generalized pignistic transformation is chosen for making a reliable decision at the pignistic level: on the one hand it offers a good compromise between the maximum of credibility and the maximum of plausibility for decision support, on the other hand it has the advantage to provide high probabilistic information content for expecting better performances. The next chapters are dedicated to present the use of the DSmT for various handwriting recognition applications.

27

Chapter 2

A DSmT Based Systems for WriterDependent Handwritten Signature Verification

Abstract: The verification or authentication from the handwritten signature is the most accepted biometric modality for identifying a person. However, a single handwritten signature verification (HSV) system does not allow achieving the required performances. Therefore, rather than trying to optimize a single HSV system by choosing the best features or classifier for a given system, researchers found more interesting to combine different systems. In that case, the DSmT is reported as very useful and powerful theoretical tool for enhancing the performance of multimodal biometric systems. Hence, we propose in this chapter a study of applying the DSmT for combining different HSV systems. Two cases are addressed for validating the effective use of the DSmT. The first one is to enhance the performance of off-line HSV systems by associating features based on Radon and Ridgelet transforms for each individual system. The second one is associating off-line image and dynamic information in order to improve the performance of single-source biometric systems and ensure greater security. Experimental results conducted on standard datasets show the effective use of the proposed DSmT based combination for improving the verification accuracy comparatively to individual systems.

2.1 Introduction Biometrics is one of the most widely used approaches for identification and authentication of persons [Jain, 2007]. Hence, several biometric modalities have been proposed in the last decades, which are based on physiological and behavioral characteristics depending on their nature. Physiological characteristics are related to anatomical properties of a person, and include for instance fingerprint, face, iris and hand geometry. Behavioral characteristics refer to how a person performs an action, and include typically voice, signature and gait [Jain, 2007], [Jain, 2004]. Therefore, the choice of a biometric modality depends on several factors 28

Chapter 2: A DSmT Based Systems for Writer-Dependent Handwritten Signature Verification

such as nonuniversality, nonpermanence, intraclass variations, poor image quality, noisy data, and matcher limitations [Jain, 2007], [Ross, 2006]. Thus, recognition based on unimodal biometric systems is not always reliable. To address these limitations, various works have been proposed for combining two or more biometric modalities in order to enhance the recognition performance [Ross, 2006], [Ross, 2003], [Kittler, 1998]. This combination can be performed at data, feature, match score, and decision levels [Ross, 2006], [Ross, 2003]. However, with the existence of the constraints corresponding to the joint use of classifiers and methods of generating features, an appropriate operating method using mathematical approaches is needed, which takes into account two notions: uncertainty and imprecision of the classifier responses. In general, the most theoretical advances which have been devoted to the theory of probabilities are able to represent the uncertain knowledge but are unable to model easily the information that is imprecise, incomplete, or not totally reliable. Moreover, they often lead to confuse both concepts of uncertainty and imprecision with the probability measure. Therefore, new original theories dealing with uncertainty and imprecise information have been introduced, such as the fuzzy set theory [Zadeh, 1968], evidence theory [Shafer, 1976], possibility theory [Dubois, 1988] and, more recently, the theory of plausible and paradoxical reasoning also called Dezert-Smarandache theory (DSmT) [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009]. The DSmT has been elaborated by Jean Dezert and Florentin Smarandache for dealing with imprecise, uncertain and paradoxical sources of information. Thus, the main objective of the DSmT is to provide combination rules that would allow to correctly combine evidences issued from different information sources, even in presence of conflicts between sources or in presence of constraints corresponding to an appropriate model (i.e. free or hybrid DSm models [Smarandache, 2004]). The use of the DSmT has been proved its efficiency in many kinds of applications [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009]. Indeed, the DSmT is reported as very useful and powerful theoretical tool for enhancing the performance of multimodal biometric systems. Hence, combination algorithms based on DSmT have been used by Singh et al. [Singh, 2008] for robust face recognition through integrating multilevel image fusion and match score fusion of visible and infrared face images. Vatsa et al. proposed a DSmT based fusion algorithm [Vatsa, 2009a] to efficiently combine level-2 and level-3 fingerprint features by incorporating image quality. Vatsa et al. proposed an unification of evidence-theoretic fusion algorithms [Vatsa, 2009b] applied for fingerprint verification using level-2 and level-3 features. A DSmT based dynamic reconciliation scheme for fusion rule 29


selection [Vatsa, 2010] has been proposed in order to manage the diversity of scenarios encountered in the probe dataset. Generally, the handwritten signature is considered as the most known modality for biometric

applications.

Indeed,

it

is

usually

socially

accepted

for

many

government/legal/financial transactions such as validation of checks, historical documents, etc [Impedovo, 2008]. Hence, an intense research field has been devoted to develop various robust verification systems [Plamondon, 2000] according to the acquisition mode of the signature. Thus, two modes are used for capturing the signature, which are off-line mode and on-line mode, respectively. The off-line mode allows generating a handwriting static image from the scanning document. In contrast, the on-line mode allows generating dynamic information such as velocity and pressure from pen tablets or digitizers. For both modes, many Handwritten Signature Verification (HSV) systems have been developed in the past decades [Plamondon, 2000], [Plamondon, 1989], [Leclerc, 1994]. Generally, the off-line HSV systems remains less robust compared to the on-line HSV systems [Impedovo, 2008] because of the absence of dynamic information of the signer. Generally, a HSV system is composed of three modules, which are preprocessing, feature generation and classification. In this context, various methods have been developed for improving the robustness of each individual HSV system. However, the HSV failed to underline the incontestable superiority of a method over another in both steps of generating features and classification. Hence, rather than trying to optimize a single HSV system by choosing the best features for a given problem, researchers found more interesting to combine several classifiers [Ruta, 1994]. Recently, approaches for combining classifiers have been proposed to improve signature verification performances, which led the development of several schemes in order to treat data in different ways [Cordella, 1999a]. Generally, three approaches for combining classifiers can be considered: parallel approach [Qi, 1995], [Dimauro, 1997], sequential approach [Sansone, 2000], [Zhang, 2002] and hybrid approach [Cordella, 1999b], [Cordella, 2000]. However, the parallel approach is considered as more simple and suitable since it allows exploiting the redundant and complementary nature of the responses issued from different signature verification systems. Hence, sets of classifiers have been used, which are based on global and local approaches [Fierrez-Aguilar, 2005], [Kumar, 2010] and feature sets [Huang, 1996], [Huang, 1997a], parameter features and function features [Plamondon, 1992], [Nakanishi, 2006], static and dynamic features [Liwicki, 2011], [Mottl, 2008]. Furthermore, several 30


decision combination schemes have been implemented, ranging from majority voting [Dimauro, 1997], [Ramesh, 1999] to Borda count [Arif, 2004], from simple and weighted averaging [Bovino, 2003] to Dempster-Shafer evidence theory [Arif, 2004], [Arif, 2006] and Neural Networks [Cardot, 1994], [Bajaj, 1997]. The boosting algorithm has been used to train and integrate different classifiers, for both verification of on-line [Hongo, 2005], [Muramatsu, 2009] and off-line [Wan, 2003] signatures. In this research, we follow the path of combined biometric systems by investigating the DSmT for combing different HSV systems. Therefore, we study the reliability of the DSmT for achieving a robust multiple HSV system. Two cases are considered for validating the effective use of the DSmT. The first one is to enhance the performance of off-line HSV systems by associating features based on Radon and Ridgelet transforms for each individual system. The second one is associating off-line image and dynamic information in order to improve the performance of single-source biometric systems and ensure greater security. For both cases, the combination is performed through the generalized biometric decision combination framework using Dezert-Smarandache theory (DSmT) [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009]. The chapter is organized as follows. We give in Section 2.2 a review of sophisticated Proportional Conflict Redistribution (PCR5) rule based on DSmT. Section 2.3 describes the proposed verification system and Section 2.4 presents the description of datasets and the performance criteria of handwritten signatures used for evaluation. Section 2.5 discusses the experimental results of the proposed verification system. The last section gives a summary of the proposed verification system and looks to the future research direction. 2.2 Review of PCR5 combination rule Generally, the signature verification is formulated as a two-class problem where classes are associated to genuine and impostor, namely 𝜃𝑔𝑒𝑛 and 𝜃𝑖𝑚𝑝 , respectively. In the context of the probabilistic theory, the frame of discernment, namely Θ, is composed of two elements as: Θ = 𝜃𝑔𝑒𝑛 , 𝜃𝑖𝑚𝑝 , and a mapping function 𝑚 ∈ 0, 1 is associated for each class, which defines the corresponding mass verifying 𝑚 ∅ = 0 and 𝑚 𝜃𝑔𝑒𝑛 + 𝑚 𝜃𝑖𝑚𝑝 = 1. When combining two sources of information and so two individual systems, namely information sources 𝑆1 and 𝑆2 , respectively, the sum rule seems effective for non-conflicting responses [Ross, 2006]. In the opposite case, an alternative approach has been developed by Dezert and Smarandache to deal with (highly) conflicting imprecise and uncertain sources of 31


information [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009]. For Twoclass problem, a reference domain also called the frame of discernment should be defined for performing the combination, which is composed of a finite set of exhaustive and mutually exclusive hypotheses. Example of such approaches is PCR5 rule. The main concept of the DSmT is to distribute unitary mass of certainty over all the composite propositions built from elements of Θ with ∪ (Union) and ∩ (Intersection) operators instead of making this distribution over the elementary hypothesis only. Therefore, the hyper-powerset 𝐷Θ is defined as 𝐷Θ = ∅, 𝜃𝑔𝑒𝑛 , 𝜃𝑖𝑚𝑝 , 𝜃𝑔𝑒𝑛 ∪ 𝜃𝑖𝑚𝑝 , 𝜃𝑔𝑒𝑛 ∩ 𝜃𝑖𝑚𝑝 . The DSmT uses the generalized basic belief mass, also known as the generalized basic belief assignment (gbba) computed on hyper-powerset of Θ and defined by a map 𝑚 . ∶ 𝐷Θ ⟶ 0, 1 associated to a given source of evidence, which can support paradoxical information, as follows: 𝑚 ∅ = 0 and 𝑚 𝜃𝑔𝑒𝑛 + 𝑚 𝜃𝑖𝑚𝑝 + 𝑚 𝜃𝑔𝑒𝑛 ∪ 𝜃𝑖𝑚 𝑝 + 𝑚 𝜃𝑔𝑒𝑛 ∩ 𝜃𝑖𝑚𝑝 = 1. The combined masses 𝑚𝑃𝐶𝑅5 obtained from 𝑚1 .

and 𝑚2 .

by means of the PCR5 rule

[Smarandache, 2006a] is defined as: 0 𝑚𝐷𝑆𝑚𝐶 𝐴 + 𝑚𝐴∩𝑋 𝐴

𝑚𝑃𝐶𝑅5 𝐴 =

if 𝐴 ∈ Φ otherwise

(2.1)

Where 𝑚𝐴∩𝑋 𝐴 = 𝑋∈𝐷 Θ ∖ 𝐴 𝑐 𝐴∩𝑋 =∅

𝑚1 𝐴 2 𝑚2 𝑋 𝑚2 𝐴 2 𝑚1 𝑋 + 𝑚1 𝐴 + 𝑚2 𝑋 𝑚2 𝐴 + 𝑚1 𝑋

and Φ ={Φℳ , ∅} is the set of all relatively and absolutely empty elements, Φℳ is the set of all elements of 𝐷Θ which have been forced to be empty in the Shafer’s model ℳ defined by the exhaustive and exclusive constraints, ∅ is the empty set, and 𝑐 𝐴 ∩ 𝑋 is the canonical form (conjunctive normal) of 𝐴 ∩ 𝑋 and where all denominators are different to zero. If a denominator is zero, that fraction is discarded. Thus, the term 𝑚𝐷𝑆𝑚𝐶 𝐴 represents a conjunctive consensus, also called DSm Classic (DSmC) combination rule [Smarandache, 2004], which is defined as: 𝑚𝐷𝑆𝑚𝐶 𝐴 =

0 𝑋,𝑌∈𝐷 Θ ,𝑋∩𝑌=𝐴

𝑚1 𝑋 𝑚2 𝑋

if 𝐴 = ∅ otherwise

(2.2)

2.3 System description The structure of the combined individual HSV systems is depicted in Figure 2.1, which is composed of an off-line verification system, an on-line or off-line verification system and a combination module. 𝑠1 and 𝑠2 define the off-line and on-line (or off-line) handwritten 32


signatures provided by two sources of information 𝑆1 and 𝑆2 , respectively. Each individual verification system is generally composed of three modules: pre-processing, feature generation and classification. 2.3.1 Pre-processing According the acquisition mode, each handwritten signature is pre-processed for facilitating the feature generation. Hence, the pre-processing of the off-line signature includes two steps: Binarization using the local iterative method [Larkins, 2009] and elimination of the useless information around the signature image without unifying its size. The pre-processing steps of a signature example are shown in Figure 2.2. The binarization method was chosen to capture signature from the background. It takes the advantages of locally adaptive binarization methods [Larkins, 2009] and adapts them to produce an algorithm that thresholds signatures in a more controlled manner. By doing this, the local iterative method limits the amount of noise generated, as well as it attempts to reconstruct sections of the signature that are disjointed.

Signature s ON-LINE or OFF-LINE ACQUISITION

OFF-LINE ACQUISITION

Feature Generation

x1 SVM Classifier

OFF-LINE VERIFICATION SYSTEM

Preprocessing

Preprocessing Feature Generation

x2 SVM Classifier

h1

ON-LINE or OFF-LINE VERIFICATION SYSTEM

s2

s1

h2 COMBINATION

Accepted or Rejected

Figure 2.1: Structure of the combined individual HSV systems.

33


(a)

(c)

(b)

Figure 2.2: Preprocessing steps: (a) Scanning (b) Binarization (c) Elimination of the useless information.

While the on-line signature, no specific pre-processing is required. More details on the acquisition method and pre-processing module of the on-line signatures are provided in references [Franke, 2003] and [InkML, 2006]. 2.3.2 Feature generation Features are generated according the acquisition mode. In the combined individual HSV systems, we use the uniform grid, Radon and Ridgelet transforms for off-line signatures and dynamic characteristics for on-line signatures, respectively. 2.3.2.1 Features used for combining individual off-line HSV systems The first case study for evaluating the performance of the proposed combination using DSmT is performed with two individual off-line HSV systems. Features are generated from the same off-line signature using the Radon and Ridgelet transforms. The Radon transform is well adapted for detecting linear features. In contrast, the Ridgelet transform allows representing linear singularities [Candès, 1998]. Therefore, Radon and Ridgelet coefficients provide complementary information about the signature.  Radon transform based features: The Radon transform of each off-line signature is calculated by setting the respective number of projection points 𝑁𝑟 and orientations 𝑁𝜃 , which define the length of the radial and angular vectors, respectively. Hence, a radon matrix is obtained having a size 𝑁𝑟 × 𝑁𝜃 which provides in each point the cumulative intensity of pixels forming the image of the off-line signature. Figure 2.3 shows an example of a binarized image of an off-line signature and the steps involved for 34


generating features based on Radon transform. Since the Radon transform is redundant, 𝑁𝑟 /2 × 𝑁𝜃 . Then after, for each

we take into account only positive radial points

angular direction, the energy of Radon coefficients is computed to form the feature vector 𝑥1 of dimension 1 × 𝑁𝜃 . This energy is defined as: 2

𝐸𝜃𝑟𝑎𝑑 = 𝑁

𝑟

𝑁𝑟 2 2 𝑟=1 𝑇𝑟𝑎𝑑

𝑟, 𝜃 , 𝜃 ∈ 1, 2, … , 𝑁𝜃

(2.3)

where 𝑇𝑟𝑎𝑑 is the Radon transform operator. Binarized image

Radon image without redundancy

Radon image

Radial axis

Radial axis

r

0

-r 0

r

0 0

180 360 Angular axis

Feature vector 𝐸1𝑟𝑎𝑑 ⋮ ⋮ 𝐸𝑁𝑟𝑎𝑑 𝜃


Figure 2.3: Steps for generating the feature vector from the Radon transform.  Ridgelet transform based features: For generating complementary information of the Radon features, the wavelet transform (WT) is performed along the radial axis allowing the generation the Ridgelet coefficients [Mallat, 1989]. Figure 2.4 shows an example for generating the feature vector from the Ridgelet transform. For each angular direction, the energy of Ridgelet coefficients is computed taking into account only details issued from the decomposition level 𝐿 of the WT. Therefore, the different values of energy are finally stored in a vector 𝑥2 of dimension 1 × 𝑁𝜃 . This energy is defined as: 2

𝐸𝜃𝑟𝑖𝑑 = 𝑁

𝑟

𝑁𝑟 2 2 𝑟=1 𝑇𝑟𝑖𝑑

𝑎, 𝑏, 𝜃 , 𝜃 ∈ 1, 2, … , 𝑁𝜃

(2.4)

where 𝑇𝑟𝑖𝑑 is the Ridgelet transform operator whereas 𝑎 and 𝑏 are the scaling and translation factors of the WT, respectively.

Radial axis

Radon image without redundancy r

Ridgelet image WT

0 0


0


Feature vector 𝐸1𝑟𝑖𝑑 ⋮ ⋮ 𝐸𝑁𝑟𝑖𝑑 𝜃

Figure 2.4: Steps for generating the feature vector from the Ridgelet transform.

35


2.3.2.2 Features used for combining individual off-line and on-line HSV systems The considered second case study is the performance evaluation of the proposed DSmT based combination for combining both individual off-line and on-line HSV systems. Features are generated from both off-line and on-line signatures of the same user using the Uniform Grid (UG) and dynamic characteristics, respectively. The UG allows extracting locally features without normalization of the off-line signature image. On each grid, densities are computed providing overall signature appearance information. In contrast, dynamic characteristics computed from the on-line signature allow providing complementary dynamic information in the combination process.  Uniform grid based features: Features are generated using the UG [Abbas, 2011], [Abbas, 2012a], [Abbas, 2012b], which consists to create 𝑛 × 𝑚 rectangular regions for sampling. Each region has the same size and shape. Parameters 𝑛 and 𝑚 define the number of lines (vertical regions) and columns (horizontal regions) of the grid, respectively. Hence, the feature associated to each region is defined as the ratio of the number of pixels belonging to the signature and the pixel total number of the region. Therefore, the different values are finally stored in a vector 𝑥1 of dimension 𝑛 × 𝑚, which characterizes the off-line signature image. Figure 2.5 shows a 3 × 5 grid, which allows an important reduction of the representation vector, but it preserves wrongly the visual information. In contrast, a 15 × 30 grid which provides an accurate representation of images, but it leads a larger characteristic vector. A 5 × 9 grid seems to be an optimal choice between the quality of representation and dimensionality. Thus, the optimal choice of the grid size for all writers is obviously too important to effectively solve our problem of signature verification. In our case, for all experiments, the parameters 𝑛 and 𝑚 of the grid are fixed to 5 and 9, respectively.

3 5

5 9 Figure 2.5: Visualization of different grid sizes. 36

15 30


 Dynamic information based features: For the individual on-line verification system, features are generated using only the dynamic characteristics. Each on-line signature is represented by a vector 𝑥2 composed of 11 features, which are signature total duration, average velocity, vertical average velocity, horizontal average velocity, maximal velocity, average acceleration, maximal acceleration, variance of pressure, mean of azimuth angle, variance of azimuth angle and mean of elevation angle. A complete description of the feature set is shown in Table 2.1. Ranking

Feature Description

Ranking

1

𝑡𝑛 − 𝑡1

7

2 3 4 5 6

𝑛−1 𝑖=1

𝑑𝑖𝑠𝑡𝐸𝑢𝑐𝑙 𝑃𝑡𝑖 , 𝑃𝑡𝑖+1 𝑡𝑛 − 𝑡1 𝑛−1 𝑖=1 𝑦𝑖+1 − 𝑦𝑖 𝑡𝑛 − 𝑡1 𝑛−1 𝑖=1 𝑥𝑖+1 − 𝑥𝑖 𝑡𝑛 − 𝑡1 𝑑𝑖𝑠𝑡𝐸𝑢𝑐𝑙 𝑃𝑡𝑖 , 𝑃𝑡𝑖+1 max 𝑖=1,…,𝑛−1 𝑡𝑖+1 − 𝑡𝑖 𝑛−1 𝑖=1 𝑑𝑖𝑠𝑡𝐸𝑢𝑐𝑙 𝑃𝑡𝑖 , 𝑃𝑡𝑖+1 𝑡𝑛 − 𝑡1 2

Feature Description max

𝑖=1,…,𝑛−2

𝑑𝑖𝑠𝑡𝐸𝑢𝑐𝑙 𝑃𝑡𝑖 , 𝑃𝑡𝑖+2 𝑡𝑖+1 − 𝑡𝑖 2

𝑛

8 𝑖=1

𝑃𝑟𝑖 −

𝑛

𝑛 𝑖=1 𝐴𝑧𝑖

9

𝑛 𝑛

10 𝑖=1

11

2 𝑛 𝑖=1 𝑃𝑟𝑖

𝐴𝑧𝑖 −

2 𝑛 𝑖=1 𝐴𝑧𝑖

𝑛

𝑛 𝑖=1 𝐴𝑙𝑖

𝑛

Table 2.1: Set of dynamic characteristics. 𝑠 = 𝑃𝑡1 , 𝑃𝑡2 , … , 𝑃𝑡𝑛 denotes an on-line signature composed of 𝑛 events 𝑃𝑡𝑖 𝑥𝑖 , 𝑦𝑖 , 𝑡𝑖 , 𝑥𝑖 , 𝑦𝑖 , 𝑃𝑟𝑖 , 𝐴𝑧𝑖 , 𝐴𝑙𝑖 denote x-position, yposition, pen pressure, azimuth and elevation angles of the pen at the 𝑖 𝑡𝑕 time instant 𝑡𝑖 , respectively. 2.3.3 Classification based on SVM 2.3.3.1 Review of SVMs The classification based on Support Vector Machines (SVMs) has been widely used in many pattern recognition applications as the handwritten signature verification [Mottl, 2008], [Justino, 2005]. The SVM is a learning method introduced by Vapnik et al. [Vapnik, 1995], which tries to find an optimal hyperplane for separating two classes. Its concept is based on the maximization of the distance of two points belonging each one to a class. Therefore, the misclassification error of data both in the training set and test set is minimized. Basically, SVMs have been defined for separating linearly two classes. When data are non linearly separable, a kernel function is used. Thus, all mathematical functions, which satisfy Mercer’s conditions, are eligible to be a SVM-kernel [Vapnik, 1995]. Examples of such kernels are sigmoid kernel, polynomial kernel, and Radial Basis Function (RBF) kernel. Generally, the RBF kernel is used for its better performance, which is defined as: 37


𝐾 𝑥, 𝑥𝑘 = 𝑒𝑥𝑝 −

𝑥−𝑥 𝑘 2 2 𝜎2

(2.5)

Where σ is the kernel parameter, 𝑥 − 𝑥𝑘 is the Euclidian distance between two samples. Therefore, the decision function 𝑓: ℝ𝑝  −1, +1 , is expressed in terms of kernel expansion as: 𝑓 𝑥 =

𝑆𝑣 𝑘=1 𝛼𝑘 𝑦𝑘

𝐾 𝑥, 𝑥𝑘 + 𝑏

(2.6)

where 𝛼𝑘 are Lagrange multipliers, 𝑆𝑣 is the number of support vectors 𝑥𝑘 which are training data, such that 0 ≤ 𝛼𝑘 ≤ 𝐶, 𝐶 is a user-defined parameter that controls the tradeoff between the machine complexity and the number of nonseparable points [Huang, 2002], the bias 𝑏 is a scalar computed by using any support vector. Finally, test data 𝑥𝑑 , 𝑑 = 1,2 , are classified according to: 𝑥𝑑 ∈

𝑐𝑙𝑎𝑠𝑠 +1 𝑐𝑙𝑎𝑠𝑠 −1

if 𝑓 𝑥𝑑 > 0 otherwise

(2.7)

2.3.3.2 Decision rule The direct use of SVMs does not allow defining a decision threshold to assign a signature to genuine or forgery classes. Therefore, outputs of SVM are transformed to objective evidences, which express the membership degree (MD) of a signature to both classes (genuine or forgery). In practice, the MD has no standard form. However, the only constraint is that it must be limited in the range of 0, 1 whereas SVMs produce a single output. In this chapter, we use a fuzzy model which has been proposed in [Nemmour, 2006], [Abbas, 2011], [Abbas, 2012b], [Abbas, 2012c] to assign MD for SVM output in both genuine and impostor classes. Let 𝑓 𝑥𝑑 be the output of a SVM obtained for a given signature to be classified. The respective membership degrees 𝑕𝑑 𝜃𝑖 , 𝑖 = 𝑔𝑒𝑛, 𝑖𝑚𝑝 associated to genuine and impostor classes are defined according to membership models given in the Algorithm 1 [Abbas, 2012b]. To compute the values of membership degrees 𝑕𝑑 , 𝑑 = 1, 2 , we consider the two case studies as follows:  in first case study, the main problem for generating features is the appropriate number of the angular direction 𝑁𝜃 for the Radon transform and the number of the decomposition level 𝐿 of the WT (Haar Wavelet) in the Ridgelet domain. Hence, many experiments are conducted for finding the optimal values for which the error rate in the training phase is null. In this case, feature vectors are generated from both Radon 𝑑 = 1 and Ridgelet 𝑑 = 2 of the same off-line signature by setting 𝑁𝜃 and 𝐿 to 32 and 3, respectively. 38


 in second case study, we calculate the values 𝑕𝑑 , 𝑑 = 1 of off-line signature by using the optimal size 5 × 9 of the grid for which the error rate in the training phase is null. In the same way, we calculate also the values 𝑕𝑑 , 𝑑 = 2 of on-line signature by using the vector of 11 dynamic features for which the error rate in the training phase is null.

Algorithm 1. 𝑕𝑑 𝜃𝑖 , 𝑖 = 𝑔𝑒𝑛, 𝑖𝑚𝑝 : Respective membership models for two classes. if 𝑓 𝑥𝑑 > 1 then 𝑕𝑑 𝜃𝑔𝑒𝑛 ← 1 𝑕𝑑 𝜃𝑖𝑚𝑝 ← 0 else if 𝑓 𝑥𝑑 < −1 then 𝑕𝑑 𝜃𝑔𝑒𝑛 ← 0 𝑕𝑑 𝜃𝑖𝑚𝑝 ← 1 else 1 + 𝑓 𝑥𝑑 𝑕𝑑 𝜃𝑔𝑒𝑛 ← 2 1 − 𝑓 𝑥𝑑 𝑕𝑑 𝜃𝑖𝑚𝑝 ← 2 end if end if

Hence, a decision rule is performed about whether the signature is genuine or forgery as described in Algorithm 2 [Abbas, 2011], [Abbas, 2012b], [Abbas, 2012c]. Algorithm 2. Decision making in SVM framework. 𝑕𝑑 𝜃𝑔𝑒𝑛 ≥ 𝑡 then 𝑕𝑑 𝜃𝑖𝑚𝑝 𝑠𝑑 ∈ 𝜃𝑔𝑒𝑛 else 𝑠𝑑 ∈ 𝜃𝑖𝑚𝑝 end if

if

Where 𝑡 defines a decision threshold. 2.3.4 Classification based on DSmT The proposed combination module consists of three steps: i) transform membership degrees of the SVM outputs into belief assignments using estimation technique based on the dissonant model of Appriou, ii) combine masses through a DSmT based combination rule and iii) make a decision for accepting or rejecting a signature.

39


2.3.4.1 Estimation of masses In this chapter, the mass functions are estimated using a dissonant model of Appriou, which is defined for two classes [Appriou, 1991]. Therefore, the extended version of Appriou’s model in DSmT framework is given as: 𝑚𝑖𝑑 ∅ = 0 𝑚𝑖𝑑 𝜃𝑖 =

(2.8)

1−𝛽 𝑖𝑑 𝑕 𝑑 𝜃 𝑖

(2.9)

1+𝑕 𝑑 𝜃 𝑖 1−𝛽 𝑖𝑑

𝑚𝑖𝑑 𝜃𝑖 = 1+𝑕

𝑑

(2.10)

𝜃𝑖

𝑚𝑖𝑑 𝜃𝑖 ∪ 𝜃𝑖 = 𝛽𝑖𝑑

(2.11)

𝑚𝑖𝑑 𝜃𝑖 ∩ 𝜃𝑖 = 0

(2.12)

where 𝑖 = 𝑔𝑒𝑛, 𝑖𝑚𝑝 , 𝑕𝑑 𝜃𝑖 is the membership degree of a signature provided by the corresponding source 𝑆 𝑑 𝑑 = 1, 2 , 1 − 𝛽𝑖𝑑 is a confidence factor of 𝑖-th class, and 𝛽𝑖𝑑 defines the error provided by each source 𝑑 = 1, 2 for each class 𝜃𝑖 . In our approach, we consider 𝛽𝑖𝑑 as the verification accuracy prior computed on the training database for each class [Vatsa, 2009b]. Since both SVM models have been validated on the basis that errors during training phase are zero, therefore 𝛽𝑖𝑑 is fixed to 0.001 in the estimation model. Note that the same information source cannot provide two responses, simultaneously. Hence, in DSmT framework, we consider that the paradoxical hypothesis 𝜃𝑖 ∩ 𝜃𝑖 has no physical sense towards the two information sources 𝜃𝑔𝑒𝑛 and 𝜃𝑖𝑚𝑝 . Therefore, the beliefs assigned to this hypothesis are null as given in Equation (2.12). 2.3.4.2 Combination of masses The combined masses are computed in two steps. First, the belief assignments 𝑚𝑖𝑑 . , 𝑖 = 𝑔𝑒𝑛, 𝑖𝑚𝑝

are combined for generating the belief assignments for each source as follows: 𝑚1 = 𝑚 𝑔𝑒𝑛 𝑚2 = 𝑚 𝑔𝑒𝑛

1

⊕ 𝑚 𝑖𝑚𝑝

2

⊕ 𝑚 𝑖𝑚𝑝

1 2

(2.13) (2.14)

where ⊕ represents the conjunctive consensus of the DSmC rule. Finally, the belief assignments for the combined sources 𝑚𝑑 . , 𝑑 = 1, 2 are then computed as: 𝑚𝑐 = 𝑚1 ⊕ 𝑚2 40

(2.15)


where ⊕ represents the combination operator, which is composed of both conjunctive and redistribution terms of the PCR5 rule. 2.3.4.3 Decision rule A decision for accepting or rejecting a signature is made using the statistical classification technique. First, the combined beliefs are converted into probability measure using a probabilistic transformation, called Dezert-Smarandache probability (DSmP), that maps a belief measure to a subjective probability measure [Smarandache, 2009] defined as: 𝐷𝑆𝑚𝑃𝜖 𝜃𝑖 = 𝑚𝑐 𝜃𝑖 + 𝑚𝑐 𝜃𝑖 + 𝜖 𝑤ℳ

(2.16)

where 𝑤ℳ is a weighting factor defined as:

𝑚𝑐 𝐴𝑗

𝑤ℳ = 𝐴𝑗 ∈2Θ 𝐴𝑗 ⊃𝜃 𝑖 𝐶ℳ 𝐴𝑗 ≥2

𝐴𝑘 ∈2Θ 𝑚𝑐 𝐴𝑘 ⊂𝑋 𝐶ℳ 𝐴𝑘 =1

𝐴𝑘 + 𝜖 𝐶ℳ 𝐴𝑗

such that 𝑖 = 𝑔𝑒𝑛, 𝑖𝑚𝑝 , 𝜖 ≥ 0 is a tuning parameter, ℳ is the Shafer’s model for Θ, and 𝐶ℳ 𝐴𝑘 denotes the DSm cardinal [Smarandache, 2009] of the set 𝐴𝑘 . Therefore, the likelihood ratio test is performed for decision making as described in Algorithm 3. Algorithm 3. Decision making in DSmT framework. 𝐷𝑆𝑚𝑃𝜀 𝜃𝑔𝑒𝑛 ≥ 𝑡 then 𝐷𝑆𝑚𝑃𝜀 𝜃𝑖𝑚𝑝 𝑠𝑑 ∈ 𝜃𝑔𝑒𝑛 else 𝑠𝑑 ∈ 𝜃𝑖𝑚𝑝 end if

if

Where 𝑡 defines a decision threshold and 𝑠 = 𝑠1 , 𝑠2 is the j -th signature represented by two modalities according the case study as follows:  in first case study, 𝑠 is an off-line signature characterized by both Radon and Ridgelet features.  in second case study, 𝑠 is a signature represented by both off-line and on-line modalities.

41


2.4 Description of datasets and performance criteria In this section, we briefly describe datasets used and performance criteria for evaluating the proposed DSmT for combing individual HSV systems. 2.4.1 Description of datasets To evaluate the verification performance of the proposed DSmT based combination of individual HSV systems, we use two datasets of handwritten signatures: (1) CEDAR signature dataset [Kalera, 2004] used for evaluating the performance for combining individual off-line HSV systems and (2) NISDCC signature dataset [Van, 2009] for the experiments related to the simultaneous verification of individual off-line and on-line HSV systems. 2.4.1.1 CEDAR signature database The Center of Excellence for Document Analysis and Recognition (CEDAR) signature dataset [Kalera, 2004] is a commonly used for off-line signature verification. The CEDAR dataset consists of 55 signature sets, each one being composed by one writer. Each writer provided 24 samples of their signature, where these samples constitute the genuine portion of the dataset. While, forgeries are obtained by asking arbitrary people to skillfully forge the signatures of the previously mentioned writers. In this fashion, 24 forgery samples are collected per writer from about 20 skillful forgers. In total, this dataset contains 2640 signatures, built from 1320 genuine signatures and 1320 skilled forgeries. Figures 2.6(a) and 2.6(b) show two examples of both preprocessed genuine and forgery signatures for one writer, respectively.

(b) Forgery signatures.

(a) Genuine signatures.

Figure 2.6: Signature samples of the CEDAR.

2.4.1.2 NISDCC signature database The Norwegian Information Security laboratory and Donders Centre for Cognition (NISDCC) signature dataset has been used in the ICDAR’09 signature verification competition [Van, 2009]. This collection contains simultaneously acquired on-line and offline samples. The off-line dataset is called “NISDCC-offline” and contains only static 42


information while the on-line dataset which is called “NISDCC-online” also contains dynamic information, which refers to the recorded temporal movement of handwriting process. Thus, the acquired on-line signature is available under form of a subsequent sampled trajectory points. Each point is acquired at 200 Hz on tablet and contains five recorded pen-tip coordinates: x-position, y-position, pen pressure, azimuth and elevation angles of the pen [Arsforensica, 2009]. The NISDCC-offline dataset is composed of 1920 images from 12 authentic writers (5 authentic signatures per writer) and 31 forging writers (5 forgeries per authentic signature). Figures 2.7(a) and 2.7(b) show an example of both preprocessed off-line signature and a plotted matching on-line signature for one writer, respectively.

(a) Off-line signature.

(b) On-line signature.

Figure 2.7: Signature samples of the NISDCC signature collection. 2.4.2 Performance criteria For evaluating performances of the combined individual HSV systems, three different kinds of error are considered: False Accepted Rate (FAR) allows taking into account only skilled forgeries; False Rejected Rate ( FRR ) allows taking into account only genuine signatures and finally the Half Total Error Rate (HTER) allows taking into account both rates. Thus, Equal Error Rate is a special case of HTER when FRR = FAR.

2.4.3 SVM model For both case studies, signature data are split into training and testing sets for evaluating the performances of the proposed DSmT based combination of individual HSV systems. Thus, the training phase allows finding the optimal hyperparameters for each individual SVM model. In our system, the RBF kernel is selected for the experiments. 2.4.3.1 SVM models used for combined individual off-line HSV systems In first case study, the SVM model is produced for each individual off-line HSV system according the Radon and Ridgelet features, respectively. For each writer, 2/3 and 1/3 samples 43


are used for training and testing, respectively. The optimal parameters 𝐶, 𝜎 of each SVM are tuned experimentally, which are fixed as 𝐶 = 19.1, 𝜎 = 4 and 𝐶 = 15.1, 𝜎 = 4.6 , respectively. 2.4.3.2 SVM models used for combined individual off-line and on-line HSV systems In second case study, the SVM model is produced for both individual off-line and on-line HSV systems according the uniform grid features and dynamic information, respectively. For each writer and both datasets, 2/3 and 1/3 samples are used for training and testing, respectively. The optimal parameters 𝐶, 𝜎 for both SVM classifiers (off-line and on-line) are tuned experimentally, which are fixed as 𝐶 = 9.1, 𝜎 = 9.4 and 𝐶 = 13.1, 𝜎 = 2.2 , respectively. 2.5 Experimental results and discussion For each case study, decision making will be only done on the simple classes. Hence, we consider the masses associated to all classes belonging to the hyper-powerset 𝐷Θ = ∅, 𝜃𝑔𝑒𝑛 , 𝜃𝑖𝑚𝑝 , 𝜃𝑔𝑒𝑛 ∪ 𝜃𝑖𝑚𝑝 , 𝜃𝑔𝑒𝑛 ∩ 𝜃𝑖𝑚𝑝 in both combination process and decision making. In the context of signature verification, we take as constraint the proposition that 𝜃𝑔𝑒𝑛 ∩ 𝜃𝑖𝑚𝑝 = ∅ in order to separate between genuine and impostor classes. Therefore, the hyperpowerset 𝐷Θ is simplified to the power set 2Θ as 2Θ = ∅, 𝜃𝑔𝑒𝑛 , 𝜃𝑖𝑚𝑝 , 𝜃𝑔𝑒𝑛 ∪ 𝜃𝑖𝑚𝑝 , which defines the Shafer’s model [Smarandache, 2004]. This section presents the experimental results with their discussion. To evaluate the performance of the proposed DSmT based combination, we use two individual off-line HSV systems using the CEDAR database at the first case study. Indeed, the task of the proposed combination module is to manage the conflicts generated between the two individual off-line HSV systems for each signature using the PCR5 combination rule. For that, we compute the verification errors of both individual off-line HSV systems and the combined individual off-line HSV systems using PCR5 rule. Figure 2.8 shows the FRR and FAR computed for different values of decision threshold using both individual off-line HSV systems of this first case study. Table 2.2 shows the verification errors rates computed for the corresponding optimal values of decision threshold of this case study. Here HSV system 1 is the individual off-line verification system feeded by Radon features that yields an error rate of 7.72% corresponding to the optimal value of threshold 𝑡 = 1.11 while HSV system 2 is the individual off-line verification system feeded by Ridgelet features, which provides the same

44


result with an optimal value of threshold 𝑡 = 0.991. Consequently, both individual off-line HSV systems give the same verification performance since the corresponding error rate of HTER = 7.72% is the same. The proposed DSmT based combination of individual off-line HSV systems yields a HTER of 5.45% corresponding to the optimal threshold value 𝑡 = 0.986 . Hence, the combined individual off-line HSV systems with PCR5 rule allow improving the verification performance by 2.27%. This is due to the efficient redistribution of the partial conflicting mass only to the elements involved in the partial conflict.

(a) Off-line HSV system 1.

(b) Off-line HSV system 2.

Figure 2.8: Performance evaluation of the individual off-line HSV systems.

HSV Systems System 1 (Radon) System 2 (Ridgelet) Combined Systems

Optimal Threshold 1.110 0.991 0.986

FAR

FRR

HTER

7.72 7.72 5.45

7.72 7.72 5.45

7.72 7.72 5.45

Table 2.2: Error rates (%) obtained for individual and combined HSV systems. In the second case study, two sources of information are combined through the PCR5 rule. Figure 2.9 shows three examples of conflict measured between off-line and on-line signatures for writers 3, 7, and 10 of the NISDCC dataset, respectively. The values 𝐾𝑐3 ∈ 0.00, 0.35 , 𝐾𝑐7 ∈ 0.00, 0.64 , and 𝐾𝑐10 ∈ 0.00, 1.00

represent the mass assigned to the empty set,

after combination. We can see that the two sources of information are very conflicting. Hence, the task of the proposed combination module is to manage the conflicts generated from both sources 𝐾𝑐𝑤 , 𝑤 = 1, 2, … , 12 for each signature using the PCR5 combination rule. For that, we compute the verification errors of both individual off-line and on-line HSV systems and the proposed DSmT based combination. Figure 2.10 shows the FRR and FAR computed for 45


different values of decision threshold using both individual off-line and on-line HSV systems of this second case study. For better comparison, Table 2.3 shows the HTER computed for the corresponding optimal values of decision threshold of this case study. The proposed DSmT based combination of both individual off-line and on-line HSV systems yields a HTER of 0% corresponding to the optimal threshold value 𝑡 = 0.597 . Consequently, the proposed combination of individual off-line and on-line HSV systems using PCR5 rule yields the best verification accuracy compared to the individual off-line and on-line HSV systems, which provide conflicting and complementary outputs.

Figure 2.9: Conflict between off-line and on-line signatures for the writers 3, 7, and 10, respectively.

(a) Off-line HSV system 1.

(b) On-line HSV system 2.

Figure 2.10: Performance evaluation of the individual off-line and on-line HSV systems.

HSV Systems System 1 (Off-line) System 2 (On-line) Combined Systems

Optimal Threshold 0.012 0.195 0.597

FAR

FRR

HTER

12.44 0.98 0.00

12.50 0.00

12.47 0.49

0.00

0.00

Table 2.3: Error rates (%) obtained for individual and combined HSV systems. 46


2.6 Conclusion This chapter proposed and presented a new system based on DSmT for combining different individual HSV systems which provide conflicting results. The individual HSV systems are combined through DSmT using the estimation technique based on the dissonant model of Appriou, sophisticated PCR5 rule and likelihood ratio test. Hence, two cases have been addressed in order to ensure a greater security: (1) combining two individual off-line HSV systems by associating Radon and Ridgelet features of the same off-line signature (2) and combining both individual off-line and on-line HSV systems by associating static image and dynamic information of the same signature characterized by off-line and on-line modalities. Experimental results show in both case studies that the proposed system using PCR5 rule allows improving the verification errors compared to the individual HSV systems. As remark, although the DSmT allows improving the verification accuracy in both studied cases, it is clearly that the achieved improvement depends also to the complementary outputs provided by the individual HSV systems. Indeed, according to the second case study, a suitable performance quality on the individual on-line HSV system can be obtained when the dynamic features of on-line signatures are carefully chosen. Combined to the grid features using DSmT allows providing more powerful system comparatively to the system of the first case study in term of success ratio. In continuation to the present work, the next objectives consist to explore other alternative DSmT based combinations of HSV systems in order to attempt improving performance quality of the writer-independent HSV whether the signature is genuine or forgery as well as in the false rejection and false acceptance concepts.

47

Chapter 3

A DSmT Based System for WriterIndependent Handwritten Signature Verification

Abstract: We propose in this chapter a new writer-independent off-line handwritten signature verification (HSV) system using only genuine signatures. This system is based on a combination of two off-line individual HSV systems through the plausible and paradoxical reasoning theory of Dezert-Smarandache (DSmT). Firstly, we propose to evaluate the performances of both offline HSV systems through using one-class SVM classifiers (OC-SVM) that operate independently of each other, which are associated to DCT and Curvelet transform based descriptors. To improve system performance, the outputs of both individual HSV systems are combined in DSmT framework, where a new decision making criterion is proposed. Experimental results conducted on the well known CEDAR database show the effective use of the proposed DSmT based combination for improving the verification accuracy comparatively to individual systems.

3.1 Introduction The handwritten signature is one of the oldest behavioral biometric modalities employed for authentication of an individual or a document. Despite technological advances in the modern digital era, signature remains one of the popular means for the authentication of official documents like bank checks, credit card transactions, certificates, contracts and bonds. Hence, its use is more relevant for the verification on a system. The main objective of a handwritten signature verification (HSV) system is verify the identity of an individual based on the analysis of signature employing the unique personal characteristics of his or her writing [Plamondon, 2000], [Impedovo, 2008]. Indeed, signatures are a special case of handwriting in which special characters and flourishes occur and therefore most of the time they can be unreadable. Also intrapersonal variations and interpersonal differences make it necessary to 48

Chapter 3: A DSmT Based System for Writer-Independent Handwritten Signature Verification

analyze them as complete images (or subsequent sampled trajectory points including the signature's shape and the dynamic information issued from the ballistic movements of the signer) and not as letters and words put together [Abbas, 2012a]. Furthermore, signatures are subject to three types of forgery: random, simple and skilled. Random forgeries are formed without any knowledge of the signer’s name and signature’s shape. Simple forgeries are produced knowing the name of the signer but without having an example of signer’s signature. Therefore, random and simple forgeries are easily identified. Skilled forgeries are produced by people looking at an original instance of the signature, attempting to imitate as closely as possible. Therefore, the skilled signature is very similar to the original one, making it much more difficult to verify the forgery. Hence, two kinds of problems faced in a such HSV system: on the one hand the rejecting of a genuine signature which is considered as a failure in verification and it's referred as type I error (false rejection), on the other hand the system should cope with a more challenging problem, i.e., avoiding the acceptance of forgeries as being authentic. The second error is referred as type II error (false acceptance). Depending on the mode of signature acquisition, such a HSV problem can be categorized into on-line and off-line [Plamondon, 2000], [Abbas, 2012b]. In general, on-line HSV systems achieve better performance since they deal with dynamic features like time, speed, pressure and order of strokes, which can be easily generated from a signature acquired through the online devices [Nalwa, 1997]. Off-line systems, on the other hand, rely only on static features generated from signature images [Abbas, 2012a], [Hanmandlua, 2005]. Although an efficient off-line HSV system is comparatively difficult to design, as it fails to extract many desirable characteristics such as the order of strokes, velocity, and other dynamic information, its wide application in the area of forensics and biometrics has made it an active area of research. Off-line HSV problem is investigated in two different approaches, namely writerdependent and writer-independent [Srihari, 2004], [Bertolini, 2010]. The first one is the commonly used for HSV, where a specific model is build for each writer. In this approach, some samples of a given writer are used to model the genuine class and some samples of other writers, chosen randomly, are used to model the forgery class. Hence, the system is trained either only with the genuine or with both genuine and forged signature samples (depending on the model: one-class or two-class) of a particular writer [Guerbai, 2012]. In testing phase, the input questioned signature of a writer is compared to one or more reference signatures and then, the system has to make a decision on the (dis) similarity between the signatures of that particular writer using its own model. The most important disadvantage of a writer-dependent 49


approach is the need of learning the model each time when a new writer should be included in the system [Oliveira, 2007], [Bertolini, 2010]. In practice, usually a limited number of signatures per writer is available to construct a reliable model, which then leads through large numbers of users to decline the performance of HSV systems. To overcome this problem, mainly related to the limited number of available signature per writer, Huang and Yan [Huang, 1997b] generate more data through transformations of the genuine signatures. An alternative to the writer-dependent approach is writer-independent, where the later goes for a generic and more economic HSV system which can be tested on any writer. In writerindependent case, the system is trained only once with genuine and forged signature samples of a number of writers where a general model is built. For testing, one or more reference signatures of any arbitrary writer can be used, comparing with which the system would conclude whether a questioned signature belongs to this particular writer or not. In this research, we take into account the DSmT based combination framework initially proposed in [Abbas, 2011], [Abbas, 2012c], [Abbas, 2012b]. The novelty of this work is to propose an effective combination scheme of on-class SVM (OC-SVM) classifiers in a general belief function framework by incorporating an intelligent learning technique for writerindependent HSV. Hence, the contribution of this research is twofold. First, we introduce a new intelligent learning technique which allows us to build an unique model, while reducing the pattern recognition problem to a two-class problem, by introducing the concept of (dis) similarity representation [Pekalska, 2002] using only genuine signatures. Therefore, makes it possible to build robust individual HSV systems even when few signatures per writer are available. In this vein, we propose firstly to evaluate the performances of two writerindependent off-line HSV systems through using OC-SVM classifiers that operate independently of each other, which are associated to DCT and Curvelet transform based descriptors. Second, for a given test signature during verification, both OC-SVM classifiers are considered using a static selection strategy, where a single ensemble of OC-SVM classifiers is selected before operations, and applied to all input samples, and then all the corresponding outputs of this ensemble provide the degrees of imprecision for the verification task. We then transform these ones in generalized basic belief assignments (gbba) using an inspired version of Appriou's model. To improve the performance of the proposed system, the gbba issued from both OC-SVM classifiers are combined through an effective combination scheme within DSmT framework, where a new decision making criterion has been

50


implemented, while managing significantly the conflict provided from the corresponding individual HSV systems. The chapter is organized as follows. In Section 3.2, we briefly recall the first works dealing with the concept of similarity/dissimilarity representation and then we review some of the recent research based on classifier combination for writer-independent off-line signature verification. Section 3.3 describes the proposed verification system and Section 3.4 presents the experimental protocol, validation approach of OC-SVM models, performance criteria of handwritten signatures used for evaluation and a discussion of the experimental verification results of the proposed system. The last section gives a summary of the proposed verification system and looks to the future research direction. 3.2 Related works Pekalska and Duin [Pekalska, 2002] showed that there's an appropriate representation based on similarity or dissimilarity relations between objects, which allows building good classifiers even when the training set is small. Such a classifier is constructed from a training set represented by the dissimilarities to a set of prototypes, called the representation set. If this set is small, it has the advantage that only a small set of dissimilarities has to be computed for its evaluation, while it may still profit from the accuracy offered by a large training set. Indeed, authors demonstrate in [Pekalska, 2002] that the tradeoff between the recognition accuracy and the computational effort is significantly improved by using a normal densitybased classifier built on dissimilarities instead of the Nearest Neighbor (NN) method [Cover, 1967], which is traditionally applied to dissimilarity representations. Still in the same vein, seminal work using the concept of similarity/dissimilarity representation in the field of author identification was presented by Cha and Srihari [Cha, 2002]. Santos et al. [Santos, 2004] use the idea of dissimilarity representation [Pekalska, 2002] for robust off-line HSV through a global method based on the questioned document expert's approach and a Multilayer Perceptron (MLP) classifier. Two main advantages have been emerged through investigating this method. The first is its potential to reduce the number of genuine signature samples required for both training and validation phases. The second is the model's ability to absorb new writers without generating new personal models. Nevertheless, few works have been recently focused on the classifier combination for dealing with the writer-independent off-line HSV. Oliveira et al. [Oliveira, 2007] take into account the framework initially proposed by Santos et al. [Santos, 2004] for improving the 51


performance of a writer-independent off-line HSV system. Two contributions have been proposed in this work for designing the system. Firstly, authors analyze the impacts of choosing different fusion strategies to combine the partial decisions provided by the SVM classifiers. Hence, they have found that the Max rule is more effective than the original Voting proposed in [Santos, 2004]. Then Receiver Operating Characteristic (ROC) curves produced by different classifiers are combined using maximum likelihood analysis, producing an ROC combined classifier. Indeed, in HSV the class distribution among examples is rarely constant or unbalanced, and then it’s strongly recommended the use of ROC curve instead of recognition rate as an evaluation metric. In other words, if the proportion of positive to negative instances changes in a set test, the ROC curves will not change [Fawcett, 2006]. Bertolini et al. [Bertolini, 2010] resume work in depth investigation of writer-independent off-line HSV problem, which has already been studied in [Oliveira, 2007], by reducing forgeries through ensemble of classifiers. Therefore, an important aspect in this work consisting in using an ensemble of graphometric features based SVM classifiers, which trained using just genuine samples and random forgeries, to improve the resistance of the HSV system against forgeries. The ensemble was built using a standard genetic algorithm and different fitness functions were used to drive the search. It has been demonstrated in [Bertolini, 2010] that if simple and simulated forgeries are available for some writer who did not participate in the training, these samples can be used in validation set to fine tune the system and select the best ensemble of classifiers. In [Berkay, 2011], two different learning approaches, namely global and user-dependent SVMs, are investigated for performing the verification. The global SVM classifiers, which are user-independent classifiers, are combined at score level with user-dependent SVM classifiers through weighted sum rule, for improving overall verification accuracy. Later, an hybrid generative-discriminative ensembles of classifiers (EoCs) approach is proposed in [Batista, 2012] for addressing the challenge of designing off-line HSV systems form a limited amount of genuine signature samples, where the classifier selection process is performed dynamically. During verification, a dynamic selection strategy, which is based on the K-nearest-oracles (KNORA) [Ko, 2008] algorithm and on Output Profiles [Cavalin, 2010], selects the most accurate subset of classifiers to form an EoC in order to classify a given input signature. However, the problem of designing a robust writer-independent off-line HSV system, through an effective DSmT based combination scheme of OC-SVM classifiers, using only genuine signatures is research challenge that still need to be addressed.

52


3.3 System description The structure of the combined individual systems for writer-independent HSV is depicted in Figure 3.1, which is composed of two individual off-line HSV systems and a DSmT based combination module of the corresponding individual systems outputs. Each individual HSV system is generally composed of three modules: pre-processing, feature generation for constructing descriptors and classification. Signature s

OFF-LINE ACQUISITION

x1

(Feature Generation) Descriptor-2

x2

(Classification) OC-SVM-1

WI HSV SYSTEM 2

(Feature Generation) Descriptor-1

Preprocessing WI HSV SYSTEM 1

Preprocessing

(Classification) OC-SVM-2

h1

h2 DSmT-Based Combination Module selected EoC Accepted or Rejected

Figure 3.1: Structure of the combined individual systems for writer-independent HSV. 3.3.1 Pre-processing Any image-processing application suffers from noise like touching line segments, isolated pixels and smeared images. Hence, pre-processing is one of the crucial stages for solving any document analysis problem. In our case, the pre-processing will be only performed on signature images for which we have applied the descriptor issued from the feature generation method, namely Curvelet transform (CT), except the signature images which are to be submissive to DCT based feature generation method. A normalization of size is performed on scanned signature images, which are available in the form of grey-level images, as required by CT-based descriptor. This normalization is 53


performed by adding zeros around these images to make them in a square matrix of dimensions 𝑅 × 𝑅 , such that 𝑅 = 2𝑙 and 𝑙 is an integer, without distorting the signature image as shown in Figure 3.2.

Figure 3.2: Normalization of a scanned signature image of size 606 × 378 to 1024 × 1024 . 3.3.2 Feature generation In what follows, we shall describe how the features are generated from a signature image using two suitable methods whose each one of them allows constructing a descriptor. As part of this work, we perform a comparative analysis of two descriptors separately and then we combine the individual off-line HSV systems, which exploit the best complementary features issued from the corresponding descriptors, to improve decision making. 3.3.2.1 Discrete cosine transform based descriptor Discrete cosine transform (DCT) is [Ahmed, 1974] a well-known signal analysis method used in compression due to its compact representation power. Moreover, DCT works better than other well-known techniques for dimensionality reduction, which makes it a very useful tool for signal representation both in terms of information packing ability and computational complexity due to its two important properties, i.e. decorrelation and energy compaction. In the following, we use DCT, and, with the transform, the static information of the original scanned signature image, which is available in the form of grey-level image, is reflected to the transformed patterns. The descriptor constructed from DCT based patterns is associated to a given individual writer-independent off-line HSV system to evaluate the effectiveness of this approach. The steps involved for generating features from a grey-level image of an off-line signature using the DCT are shown in Figure 3.5. Indeed, applying DCT to the input signature image allows transforming this one from the spatial domain to the frequency domain, while separating the signature image into parts (or spectral sub-bands) of differing importance (with

54


respect to the image's visual quality). This allows us on the one hand to efficiently sort the set of coefficients representing a given signature image in the frequency domain, on the other hand removing those which are not well distinguished visually. Moreover, removal particularly concerns the coefficients with high frequency while keeping the low frequencies, which represent the most significant coefficients. The general equation for 2D (𝑅 × 𝑅 data items) DCT is defined by the following equation: 𝐶𝑘𝑑𝑐𝑡 𝑢, 𝑣 = 𝛼 𝑢 𝛼 𝑢

𝑅−1 𝑦=0

𝑅−1 𝑥=0 𝐼

𝑥, 𝑦 cos

𝜋 2𝑥 +1 𝑢 2𝑅

cos

𝜋 2𝑦 +1 𝑣 2𝑅

(3.1)

Where 1 𝛼 𝑘 =

2 1

if 𝑘 = 0 if 𝑘 > 0

𝐶𝑘𝑑𝑐𝑡 is the 𝑘-th corresponding coefficient at both 𝑢 and 𝑣 frequencies in the DCT domain, 𝐼 is the square off-line signature image of dimensions 𝑅 × 𝑅 whereas 𝑥 and 𝑦 are the x-position and y-position of the pen in the spatial domain, respectively. Hence, we obtain a 2D-DCT matrix of size 𝑅 × 𝑅 which includes coefficients with higher amplitudes in the upper left corner of the matrix (low frequencies) and those whose amplitudes are smaller in the lower right corner (high frequencies). Thus, the most significant information of the original signature image will be concentrated on the upper left part of the DCT matrix (energy compaction property).  Retrieving DCT coefficients: Due to the energy compaction property of the DCT, the input data will be reduced in a few significant coefficients. Implementing the DCT on a signature image allows to generate DCT coefficients in bulk. To promote reading of the low frequencies, a particular scanning is employed by using the zig-zag algorithm [Belkasim, 2003].  Normalization of DCT coefficients: To avoid both saturation of the OC-SVM classifier inputs and crushing the DCT coefficients with relatively small amplitudes compared to the first coefficients, a normalization is performed through a nonlinear function (called Softmax) as follows: 𝐶𝑘𝑑𝑐𝑡 =

1 𝐶 𝑑𝑐𝑡 −𝜇 𝑝 1+exp − 𝑘 𝑠 𝜍𝑝

55

(3.2)


𝐶𝑘𝑑𝑐𝑡 is the 𝑘-th normalized coefficient using softmax function, 𝑠 is a constant fixed to 1, 𝜇𝑝 and 𝜍𝑝 are the mean and standard deviation of the 𝑝 selected coefficients, respectively. Moreover, this normalization allows us to limit the DCT coefficients in the range of 0, 1 by distributing them around the mean. Due to the large magnitude of the first DCT coefficient, the feature vector is constituted from the second coefficient. Therefore, the different optimal values of DCT coefficients representing the signature image are finally stored in a vector 𝑥1 of dimension 1 × 𝑝 . The length 𝑝 of the feature vector is tuned experimentally, which is fixed to 24 as shown in Figure 3.3. Grey-level signature image

Zigzag reading

Feature vector

𝐶𝑑𝑐𝑡 1 Applying the DCT

Normalizing the 24 selected coefficients

⋮ ⋮

𝐶𝑑𝑐𝑡 𝑝 Figure 3.3: Steps for generating the feature vector from the DCT. In what follows, we present in details a review and the steps involved for constructing Curvelet transform based descriptor. 3.3.2.2 Curvelet transform based descriptor The Ridgelet transform is known as the optimal approach for characterizing straight-line singularities. Unfortunately, global straight-line singularities are rarely observed in real applications. Hence, an attractive approach was introduced by Candès and Donoho in [Candès, 2000], which is named Curvelet transform, for analyzing local line or curve singularities. The first idea behind this approach is to consider a partition of the image, and then to apply the Ridgelet transform to the obtained sub-images whose local line or curve singularities are assumed to be well characterized. Indeed, this is carried out in accordance with Curvelet coefficients which allow an almost optimal nonadaptive sparse representation of objects with edges in different positions in order to adequately recognize them. Figure 3.4 shows examples for characterizing an object with edges through a curvelet.

(a) Curvelet.

(b) Null-coefficient.

(c) Small coefficient.

(d) High coefficient.

Figure 3.4: Examples for characterizing an object with edges through a curvelet. 56


The Curvelet transform based descriptor is issued from a multiscale directional transform according elementary features, which are characterized by scale and orientation parameters [Candès, 2006]. Each elementary feature is called wedge and is defined as the Fourier transform of the Curvelet as illustrated in Figure 3.5(a).

(a)

(b)

Figure 3.5: (a) Curvelet to wedge transformation using Fourier transform and (b) spectral partitionning of the Curvelet transform. To describe the Curvelet transform in the spatial domain, radial 𝑉 𝑡 and angular 𝑉 𝑟 windows are respectively considered, which satisfy the admissibility conditions as follows [Candès, 2006]: ∀𝑟 ∈

3 3

,

+∞ 2 𝑗 =−∞ 𝑊

1 1

+∞ 2 𝑗 =−∞ 𝑉

,

4 2

∀ 𝑡 ∈ −2,2 ,

2𝑗 𝑟 = 1

(3.3)

𝑡−𝑙 =1

(3.4)

where 𝑗 represents the scale parameter. Within the Fourier domain, the Curvelet is defined as the wedge 𝑈𝑗 , i.e. 𝑗 > 𝑗0 , as follows: 3𝑗

𝑈𝑗 𝑟, 𝜃 = 2− 4 𝑊 2−𝑗 𝑟 𝑉(

2 𝑗 /2 𝜃 2𝜋

)

(3.5)

where 𝑟 is the Cartesian coronae, 𝜃 is the projection angle, and 𝑗/2 defines the integer part of 𝑗/2. To pass in the discrete domain, it is necessary to change the shape of the previous windows in order to adapt them to a Cartesian array. Thus, dyadic coronae based on concentric circular rings are transformed into coronae based on Cartesian concentric squares as illustrated in Figure 3.5(b). It is expressed as [Candès, 2006]:

57


𝑊𝑗 𝜔 =

Φ𝑗2+1 𝜔 − Φ𝑗2 , 𝑗 ≥ 0

(3.6)

Φ𝑗 is defined as the product of two one-dimensional low pass windows by: Φ𝑗 𝜔1 , 𝜔2 = 𝜙 2−𝑗 𝜔1 𝜙(2−𝑗 𝜔2 )

(3.7)

In order to separate the scales and angles in the Cartesian domain, the function Φ𝑗 must be limited in the range of 0, 1 , and the shape of Cartesian window searched for must be expressed as follows: 𝑈𝑗 𝜔 = 𝑊𝑗 𝜔 𝑉𝑗 (𝜔)

(3.8)

𝑉𝑗 𝜔 = 𝑉(2 𝑗 /2 𝜔2 /𝜔1 )

(3.9)

such that:

Two main methods have been developed for computing the Curvelets coefficients [Candès, 2006]: either by the nonequispaced fast Fourier transform (NFFT) or the circular warp (wrapping) in the frequency domain as explained below. These methods depend mainly on the way in which the Curvelets are transposed to a given scale and angle. For numerical implementation of the Curvelet transform, the maximum decomposition level is equal to 𝑙𝑜𝑔 𝑅 − 3 for a normalized image of size 𝑅 × 𝑅 , such that 𝑅 = 2𝑙 and 𝑙 is an integer. In this work, Curvelet transform base descriptor representing an off-line signature image will be elaborated according the following steps: 1. Apply the Curvelet transform (i.e. wrapping method) on off-line signature images by fixing the decomposition level to 𝑗𝑜𝑝𝑡 , which defines the optimal level. 2. Select only wedges of the first and second decomposition level, which are issued from the set of details. 3. Select the half of wedges due to the symmetry property of the Curvelet transform. 4. Compute the energy of wedges, i.e. 𝐸𝑤𝑐𝑢𝑟 =

𝑟2 2 𝑟=𝑟1 𝑈𝑗 𝑜𝑝𝑡

𝑟, 𝜃 where 𝑤 ∈ 1, 2, … , 𝑁𝑤

is the number of wedges, for each translation 𝑟 ∈ 𝑟1 , 𝑟2

of the first and second

orientation level 𝜃. 5. Normalize the obtained Curvelet coefficients through the logarithmic function to enlarge the representation scale, in particular of the small values, and then to form, with the normalized coefficients 𝐸𝑤𝑐𝑢𝑟 , the feature vector 𝑥2 of dimension 1 × 𝑁𝑤 .

58


Figure 3.6 shows an example of a normalized off-line signature image of size 𝑅 × 𝑅 and the steps involved for constructing the Curvelet transform based descriptor.

Normalized signature image

Computing the wedges energy

Curvelet image

Selecting wedges from the set of details

Normalizing of 𝑵𝒘 energy coefficients

Feature vector 𝐸1𝑐𝑢𝑟 ⋮ ⋮ 𝐸𝑁𝑐𝑢𝑟 𝑤

Figure 3.6: Steps for generating the feature vector from the Curvelet transform. 3.3.3 Classification based on OC-SVM In this work, we take the path of OC-SVM classifier, enabling us to incorporate an intelligent learning technique through the concept of (dis) similarity representation, while benefiting from two advantages for the design of writer-independent HSV systems: firstly, it allows building an unique model, while reducing the pattern recognition problem to a oneclass problem, and secondly, it allows using only genuine signatures when a new writer should be included in the system. In the following, we briefly review the concept learning with OC-SVM classifier. 3.3.3.1 Review of OC-SVM classifier Schölkopf [Schölkopf, 2001] proposed OC-SVM classifier by modifying the standard support vector machines initially introduced by Vapnik [Vapnik, 1995]. The classification based on OC-SVM has been successfully used in many pattern recognition applications as the handwritten signature verification [Guerbai, 2012] and the multibiometric score fusion based on face and fingerprint [Bergamini, 2009], [Abbas, 2013a]. This classifier is an unsupervised learning algorithm developed by Schölkopf et al. [Schölkopf, 2001], which only requires the learning of the target class samples. In fact, the pattern classification through OC-SVM consists of defining a boundary around the target class, such that it accepts as many of the target samples as possible, while minimizing the chance of accepting outliers. For instance, in the context of biometric verification, OC-SVM allows well classifying the patterns from one class (either genuine or impostor match scores) while patterns from the other class are rejected.

59


The concept of the OC-SVM seeks to find an hyper sphere in which the most of learning data are included into a minimum volume. More specifically, the objective of the OC-SVM is to estimate a function 𝑓𝑂𝐶 𝑥 that encloses the most of learning data into a hyper sphere 𝑅𝑥 = 𝑥 ∈ ℝ𝑑 , 𝑓𝑂𝐶 𝑥 > 0 with a minimum volume where 𝑑 is the size of feature vector [Rabaoui, 2007]. Hence, the decision function 𝑓𝑂𝐶 𝑥 is given as [Schölkopf, 2001]: 𝑓𝑂𝐶 𝑥 =

𝑆𝑣 𝑘=1 𝛼𝑘

𝐾 𝑥, 𝑥𝑘 − 𝜌

(3.10)

𝑆𝑣 is the number of support vectors 𝑥𝑘 form the training dataset, 𝛼𝑘 are the Lagrange 1

multipliers, such that 0 ≤ 𝛼𝑘 ≤ 𝑣 𝑚 and

𝑚 𝑘

𝛼𝑘 = 1, 𝑚 is the cardinal of training dataset, 𝜌

defines the distance of the hyper sphere from the origin, 𝑣 is the percentage of data considered as outliers, and 𝐾 . , . defines the OC-SVM kernel that allows projecting data from the original space to the feature space [Tran, 2005]. 3.3.3.2 Writer-independent verification scheme As part of this work, the writer-independent verification scheme of each OC-SVM classifier is proposed by incorporating an intelligent learning technique (see Figure 3.7) according to the following steps: Learning phase

Verification phase

Learning data

Testing data

Vectors of (dis) similarity measures

Vectors of (dis) similarity measures

Learning algorithm

OC-SVM classifier

Generation of the model

Decision

Selection of the optimal threshold

Figure 3.7: Flowchart of writer-independent verification using an OC-SVM classifier. 1)Learning phase: In this step the classifier is only trained with samples belonging to the genuine class of signatures in order to generate the corresponding OC-SVM model. This 60


one will also serve for computing an optimal decision threshold, which is determined by using the criterion of equal error rate (EER) during an intermediate step, called validation phase. 2)Verification phase: This step consists to assess the robustness of the classifier using the generated model and the selected optimal threshold during the validation phase for a decision making. 3.3.3.3 Generating vectors of (dis) similarity measures The main idea behind the proposed verification scheme employed for designing the individual HSV systems, is based on the use of dissimilarity representation presented in [Pekalska, 2002], while using a set of prototype genuine signatures (called representation set ℛ) for generating a unique OC-SVM model. Hence, a distance metric 𝑑 . , . is used for generating

the

vectors

of

ℋ 𝑥, ℛ = 𝑑 𝑥, 𝑝1 , 𝑑 𝑥, 𝑝2 , … , 𝑑 𝑥, 𝑝𝑛

(dis)

similarity

measures

between the feature vector 𝑥 representing a given

signature and the elements 𝑝𝑖 ∈ ℛ. Thus, the obtained vectors through this operation will be considered as the inputs of OC- SVM classifiers. It should be noted that the key point of this work is to propose an intelligent learning technique, where training data for each OC-SVM classifier are established from vectors of similarity measures generated between feature vectors associated to genuine signatures, which are selected for learning. Let be 𝑁𝑏𝑤𝑟 the number of writers for the learning phase and 𝑁𝑏𝑠𝑖𝑔 be the number of genuine signatures per writer selected during this step. The number of vectors of similarity measures generated during learning is denoted 𝑁𝑏𝑣𝑠𝑚 and will be computed according the following formula: 𝑁𝑏𝑣𝑠𝑚 =

𝑁𝑏 𝑠𝑖𝑔 × 𝑁𝑏 𝑠𝑖𝑔 −1 2

× 𝑁𝑏𝑤𝑟

(3.11)

Moreover, the testing and validation data will be represented by the vectors of (dis) similarity measures which are generated between the feature vector representing the input signature and those associated to reference signatures. Thus, for each signature image belonging to the testing or validation dataset, the vectors of (dis) similarity measures will be then sent to the input of OC-SVM classifier with a number equals to those of reference signatures.

61


3.3.3.4 Decision rule in OC-SVM framework Generally, the decision making in OC-SVM classifier framework is performed through a function, denoted here 𝑓𝑂𝐶 , which takes positive values in some region of the representation space and negative values somewhere else. The value of this one for a given vector of (dis) similarity measures is defined by equation (3.10). In other words, if we note 𝜃𝑔𝑒𝑛 and 𝜃𝑖𝑚𝑝 as the classes associated respectively to genuine and impostor, then the decision rule is given as follows:

𝑥∈

𝜃𝑔𝑒𝑛 if 𝑓𝑂𝐶 𝑥 > 0 𝜃𝑖𝑚𝑝 otherwise

(3.12)

In this work, the decision on learning data will be performed according to (3.12). In contrast, the majority voting rule is applied to validation and testing data as follows:

𝑥∈

𝜃𝑔𝑒𝑛 if 𝑁𝑏𝑔𝑒𝑛 ≥ 𝑁𝑏𝑖𝑚𝑝 𝜃𝑖𝑚𝑝 otherwise

(3.13)

where 𝑁𝑏𝑔𝑒𝑛 and 𝑁𝑏𝑖𝑚𝑝 are the number of the responses, i.e. 𝑓𝑂𝐶𝑗 𝑥 generated in relation to the reference signatures associated to the sample 𝑥 such that 0 ≤ 𝑗 ≤ 𝑁𝑏𝑠𝑐𝑜𝑟𝑒𝑠 , provided by the 𝑖 − 𝑡𝑕 OC − SVM𝑖 classifier ( 𝑖 = 1, 2 ) under constraints 𝑓𝑂𝐶𝑗 𝑥 ≥ 𝑡𝑜𝑝𝑡 and 𝑓𝑂𝐶𝑗 𝑥 < 𝑡𝑜𝑝𝑡 , respectively. The index 𝑖 stands here for the information source corresponding to the used descriptor, 𝑁𝑏𝑠𝑐𝑜𝑟𝑒𝑠 is the number of vectors of (dis) similarity measures generated for each signature of testing or validation, 𝑡𝑜𝑝𝑡 is the optimal threshold associated with 𝑖 − th OC − SVM𝑖 classifier and determined during the validation phase.

3.3.4 Classification based on DSmT In order to overcome the eventual case of conflict between the two classes of genuine and impostor signatures, we propose to combine the degrees of imprecision provided by the individual writer-independent HSV systems using an effective combination scheme within general belief function framework. As shown in Figure 3.8, the proposed combination scheme consists of three steps: i) transformation of the OC-SVM outputs into belief assignments using estimation technique based on a calibration method and a modified version of Appriou’s model, ii) combine masses through a DSmT based combination rule and iii) implementing a new decision criterion for an optimal signature authentication.

62


OC-SVM1 classifier

Estimation of mass m2(.)

Estimation of mass m1(.)

Combination rule

OC-SVM2 classifier

Information source S2 (Descriptor 2)

Information source S1 (Descriptor 1)

Input Data

Decision making

Figure 3.8: An effective belief function theories based combination scheme for writerindependent verification signature. 3.3.4.1 Estimation of masses We propose in this chapter an inspired version of Appriou’s model, which is initially defined for two classes [Appriou, 1991], for estimating the mass function within DSmT framework. Thus, the estimation of masses is performed into two steps: i) mapping the uncalibrated outputs provided by each OC-SVM classifier to posterior probabilities, ii) estimation of masses of the two simple classes and their classes representing the ignorance and paradox, respectively. 1)Calibration of the OC-SVM outputs: Each OC-SVM classifier provides an uncalibrated output that allows representing the distance between test data and the separating hyperplane. However this one can be converted to posterior probability measure. Hence, we first exploit the logarithmic function in order to redistribute decision outputs on large range. The reassigned OC-SVM output using logarithmic function is given as follows [Rabaoui, 2007]: 𝑔𝑖 𝑥 = − 𝑙𝑜𝑔

𝑆𝑣𝑖 𝑗 =1 𝛼𝑗 𝑘

𝑥, 𝑥𝑗

+ 𝑙𝑜𝑔 𝜌𝑖

(3.14)

where 𝑆𝑣𝑖 and 𝜌𝑖 are the number of support vectors and the distance of the hyper sphere from the origin for each OC − SVM𝑖 which is trained with samples of the genuine class 𝜃𝑔𝑒𝑛 provided by the source of information 𝑆𝑖 , 𝑖 = 1, 2 (i.e. the 𝑖 − th descriptor), respectively. However, this logarithmic function will only concern the chosen responses by a selection rule in order to find a single response among the 𝑁𝑏𝑠𝑐𝑜𝑟𝑒𝑠 responses for 63


each tested signature. Hence, the selection rule is defined according to the following criterion:

𝑔𝑖∗ 𝑥 = max 𝑓𝑂𝐶𝑗 𝑥 , 0 ≤ 𝑗 ≤ 𝑞

(3.15)

𝑔𝑖∗ 𝑥 is the output of 𝑖 − th OC − SVM𝑖 classifier selected from 𝑁𝑏𝑠𝑐𝑜𝑟𝑒𝑠 responses and 𝑞 is the number of majority responses, representing the scores of similarity measures issued from the same classifier, with respect to an optimal decision threshold. Then, we use a sigmoid transformation for mapping the reassigned OC-SVM outputs, obtained by applying Equation (3.14), to probabilities in the range of 0, 1 as follows [Abbas, 2013b]: 𝑃𝑖 (𝜃𝑖 /𝑥) =

1

(3.16)

1+exp −𝑔𝑖 𝑥

where 𝜃𝑖 defines the class associated to descriptor 1 𝑖 = 1 and descriptor 2 𝑖 = 2 , i.e: Class 𝜃𝑖 =

𝜃𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑜𝑟 𝜃𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑜𝑟

1 2

if otherwise

𝑖=1

2)Assignment of the masses within DSmT framework: In this study, the frame of discernment, namely Θ, is composed of two distinct elements as: Θ = 𝜃1 , 𝜃2 . Thus, we consider the target class 𝜃1 belonging to Θ as the simple class associated to the first information source 𝑆1 (i.e. descriptor 1) and its complementary class 𝜃2 as the simple class associated to the second information source 𝑆2 (i.e. descriptor 2). Hence, the set of focal elements 𝐹 generated within DSmT framework for each source 𝑆𝑖 ≡ OC − SVM𝑖 , 𝑖 ∈ 1, 2 , is given as: 𝐹 = 𝜃1 , 𝜃2 , 𝜃1 ∪ 𝜃2 , 𝜃1 ∩ 𝜃2

In this work, we assign a mass to each element in 𝐹 using an inspired version of Appriou’s model defined as follows [Abbas, 2013b]: 𝑚𝑖 𝜃𝑖 = 𝑚𝑖 𝜃𝑖 =

1−𝛽 𝑖 𝑃𝑖 (𝜃 𝑖 /𝑥) 𝑃𝑖 (𝜃 𝑖 /𝑥) (1+𝜀) (1−𝛽 𝑖 ) 𝑃𝑖 (𝜃 𝑖 /𝑥) (1+𝜀) 𝜀

𝑚𝑖 𝜃𝑖 ∪ 𝜃𝑖 = (1+𝜀) 64

(3.17) (3.18) (3.19)

Chapter 3: A DSmT Based System for Writer-Independent Handwritten Signature Verification 𝛽

𝑖 𝑚𝑖 𝜃𝑖 ∩ 𝜃𝑖 = (1+𝜀)

(3.20)

where 𝜀 ≥ 0 is a tuning parameter, and 𝛽𝑖 is the sum of false accepted rates (FAR) made by the OC − 𝑆𝑉𝑀𝑖 , 𝑖 = 1, 2 , classifiers, which are trained with 2 sources of information, respectively. Furthermore, 𝛽𝑖 conflicting region, and 𝜀

1 + 𝜀 is used to quantify the belief for

1 + 𝜀 is used to quantify the belief that the pattern x belong

to the subset of ignorance 𝜃𝑖 ∪ 𝜃𝑖 , 𝑖 = 1, 2. Therefore, the value of  is fixed here to 0.001.

3.3.4.2 Combination of masses In order to manage the conflict generated from the two information sources 𝑆1 and 𝑆2 (i.e. OC − SVM1 and OC − SVM2 classifiers, respectively), the belief assignments 𝑚𝑖 . , 𝑖 = 1, 2 are combined as follows: 𝑚𝑐 = 𝑚1 ⊕ 𝑚2

(3.21)

𝐹

where 𝑚𝑐 is the combined mass calculated for any element in 𝐹 and ⨁ defines the combination operator, which is composed of both conjunctive and redistribution terms of the basic sum rule, DS rule, or PCR6 rule, when dealing with PT framework, DST framework, or DSmT framework, respectively.  In PT framework, 𝐹 = 𝜃1 , 𝜃2 and the combined mass 𝑚𝑐 is given as follows [Xu, 1992]:  m1 1   m2 1  if A  1  2 mc  A  msum  A    m1  2   m2  2  otherwise 2 

(3.22)

 In DST framework, 𝐹 = 𝜃1 , 𝜃2 , 𝜃1 ∪ 𝜃2 and the combined mass 𝑚𝑐 is given as follows [Shafer, 1976]:

0  mc  A  mDS  A   1 1  K c 

if A  ø



65

X , Y 2 X Y  A

m1  X  m2 Y  otherwise

(3.23)


Note that in the context of handwritten signature verification, we take as constraint the proposition 𝜃1 ∩ 𝜃2 = ∅ which allows separating between the two classes 𝜃𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑜𝑟

1

and

𝜃𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑜𝑟 2 . Therefore, the hyper-power set 𝐷Θ = ∅, 𝜃1 , 𝜃2 , 𝜃1 ∪ 𝜃2 , 𝜃1 ∩ 𝜃2 defined within DSmT framework is simplified to the set 𝐹 = 𝜃1 , 𝜃2 , 𝜃1 ∪ 𝜃2 , which defines a particular case of the Shafer’s model. Thus, the conflict 𝐾𝑐 ∈ 0, 1

measured between two sources is

defined in DST framework as: 𝐾𝑐 =

𝑋,𝑌∈𝐹 𝑚1 𝑋∩𝑌∈Φ

𝑋 × 𝑚2 𝑌

(3.24)

where Φ = 𝐷Θ \ 𝐹 is the set of all relatively and absolutely empty elements, i.e. Φ = ∅, 𝜃1 ∩ 𝜃2 .  In DSmT framework, 𝐹 = 𝜃1 , 𝜃2 , 𝜃1 ∪ 𝜃2 , 𝜃1 ∩ 𝜃2 and the combined mass 𝑚𝑐 is given as follows [Martin, 2006]: 𝑚𝑐 𝐴 = 𝑚𝑃𝐶𝑅6 𝐴 0 = 𝑚˄ 𝐴 +

if 2 𝑘=1 𝑚𝑘

𝐴

𝐴∈Φ 𝑚 𝜍 1 𝑌𝜍 1 𝑘 𝑘 𝑌𝜍 1 ∩𝐴ϵΦ 𝑘 𝑚 𝑘 𝐴 +𝑚 𝜍 1 𝑌𝜍 1 𝑘 𝑘 𝑌 𝜍 1 ϵ𝐹

2

otherwise

(3.25)

𝑘

Here, 𝜍𝑘 1 counts from 1 to 2 avoiding 𝑘, i.e.: 2 if k  1, 1 if k  2.

 k 1  

(3.26)

As 𝑌𝑘 is a focal element of the 𝑘 − th information source 𝑆𝑘 , 𝑘 = 1, 2 , 𝑚𝑘 𝐴 + 𝑚𝜍 𝑘

1

𝑌𝜍 𝑘

1

≠ 0; the combined belief function 𝑚˄ is the conjunctive consensus rule (also

called classical DSm rule) defined as follows [Dezert, 2002b]:

𝑚˄ 𝐴 =

0 𝑋,𝑌∈𝐹 𝑋∩𝑌=𝐴

if 𝐴=∅ 𝑚1 𝑋 × 𝑚2 𝑌

otherwise

(3.27)

3.3.4.3 Decision criterion To take a decision whether the signature is accepted or rejected, we propose here a new decision criterion which consists to determine an optimal decision threshold expressed in terms of mass according to the following steps:

66


i. Perform a combination between the two belief assignments 𝑚1 . and 𝑚2 . computed according to equation (3.17), (3.18) and (3.19), in DST framework or according to equation (3.17), (3.18), (3.19) and (3.20), in DSmT framework and associated to the posterior probabilities of the two decision thresholds determined for both information sources (i.e. descriptor 1 and descriptor 2) through using the EER criterion during the validation phase. ii. Compute the threshold 𝑡𝑚𝑎𝑠𝑠 1 according the following formula: 𝑡𝑚𝑎𝑠𝑠 1 = 𝑚𝑖𝑛 𝑚𝑐 𝜃1 , 𝑚𝑐 𝜃2 where 𝑚𝑐 𝜃1 and 𝑚𝑐 𝜃2 are the combined masses of 𝜃1 and 𝜃2 using DS or PCR6 rules, respectively. iii. Perform a second combination between the two belief assignments 𝑚1 . and 𝑚2 . computed according to equation (3.17), (3.18) and (3.19), in DST framework or according to equation (3.17), (3.18), (3.19) and (3.20), in DSmT framework and associated to the posterior probabilities of both learning and validation responses resulting from the corresponding OC-SVM classifiers. iv. Compute the threshold 𝑡𝑚𝑎𝑠𝑠 2 according the following formula: 𝑡𝑚𝑎𝑠𝑠 2 = 𝑚𝑖𝑛 𝑚𝑖𝑛 𝑚𝑙𝑒𝑎𝑟𝑛 𝜃1 , 𝑚𝑖𝑛 𝑚𝑙𝑒𝑎𝑟𝑛 𝜃2 where 𝑚𝑙𝑒𝑎𝑟𝑛 𝜃1 and 𝑚𝑙𝑒𝑎𝑟𝑛 𝜃2 are the combined masses of 𝜃1 and 𝜃2 using DS or PCR6 rules for a given learning sample, respectively. v. Determine the optimal decision threshold 𝑡𝑚𝑎𝑠𝑠 𝑜𝑝𝑡 expressed in terms of mass through computing the mean between 𝑡𝑚𝑎𝑠𝑠 1 and 𝑡𝑚𝑎𝑠𝑠 2 , i.e:

𝑡𝑚𝑎𝑠𝑠 𝑜𝑝𝑡 =

𝑡 𝑚𝑎𝑠𝑠 1 +𝑡 𝑚𝑎𝑠𝑠 2 2

Once the predetermined threshold, a decision rule is applied to the combined masses generated from belief assignments associated to posterior probabilities corresponding to test data. Each test sample is assigned to one of signature classes according to the following rule:

67


Decision =

Accepted if 𝑚𝑖𝑛 𝑚𝑡𝑒𝑠𝑡 𝜃1 , 𝑚𝑡𝑒𝑠𝑡 𝜃2 Rejected otherwise

≥ 𝑡𝑚𝑎𝑠𝑠 𝑜𝑝𝑡

(3.28)

where 𝑚𝑡𝑒𝑠𝑡 𝜃1 and 𝑚𝑡𝑒𝑠𝑡 𝜃2 are the combined masses of 𝜃1 and 𝜃2 using DS or PCR6 rules for a given test sample, respectively. 3.4 Experimental results 3.4.1 Experimental protocol In order to evaluate the effective use of the DSmT for writer-independent off-line handwritten signature verification, we have applied our tests on handwritten signatures samples of the Center of Excellence for Document Analysis and Recognition (CEDAR) database [Kalera, 2004]. This one consists of 55 signature users; each one provided 24 genuine and 24 forgery samples, respectively. In total, 1320 genuine and 1320 skilled forgery signatures are built from 55 users, respectively. Originally, the off-line signatures have been scanned at 300 dpi which are available in the form of grey-level and then are saved as Portable Network Graphics (PNG) images. Thus, we took the 2640 signature images spread over 55 writers (i.e. 48 images for each one), and then we assign them to two datasets, whose the first one will only contain 600 genuine signatures of the first 25 writers (i.e. 24 images for each one), that will be used for both learning and validation of the OC-SVM models and the second will contain the 1440 signatures of the remaining 30 writers (i.e. 48 images for each one) for the testing phase whose 5 genuine signatures serve as the references for each writer. The 24 genuine signature images per writer selected for the first dataset have been partitioned into three subsets whose the first one will contain 5 signatures to be used for the learning phase, the second one will include 5 other signatures that will be considered as signature references and used for generating test scores and the last one will contain the remaining 14 signatures to be served for both validation phase and computing the optimal thresholds. For each descriptor, a decision optimal threshold is established during the validation phase according the EER criterion which corresponds to operating point resulting from the intersection between both FAR and FRR curves. By reason of the adapted protocol where the signature images associated to the validation phase are genuine, the generation of the forged

68


signatures for each writer represents the genuine signatures of the other writers, known as fictitious signatures. In order to determine the optimal value for each parameter associated to the corresponding descriptor, the following steps are performed:  Assign a value to the parameter relating to concerned descriptor.  Compute the optimal threshold corresponding to this parameter.  Calculate the Average Error Rate (AER) from the optimal threshold.  Select the minimum AER rate deduced from these computations which reflects the optimal value of this parameter. 3.4.2 Validation of OC-SVM models In order to train and validate the OC-SVM models, the choice of the optimal parameters for each OC-SVM model is performed according the maximization criterion of the number of support vectors 𝑆𝑣 representing the learning data: higher the number of support vectors is, the better the information is representative for each class. Hence, we try to optimize for each OCSVM model the following hyper parameters 𝑣, γ :  The percentage of outliers 𝒗: this parameter manages the performance of the OCSVM classifier, it defines the percentage of training data considered as outliers that we have allowed during the learning phase for the corresponding target class.  The Radial Basis Function (RBF) kernel parameter 𝛄 : this parameter is tuned experimentally through using RBF kernel that allows projecting data from the original space to the feature space and its adjustment allows obtaining better performance for the corresponding OC-SVM classifier. 3.4.3 Performance criteria For evaluating performances of the proposed writer-independent HSV system, three popular errors are considered: False Rejected Rate (FRR); False Accepted Rate (FAR) and finally the Average Error Rate (AER) allows taking into account the mean of errors obtained for both genuine and forgery classes: AER % =

Number of rejected genuine signatutres + Number of accepted forgeries × 100 Total number of testing signatures 69


The following sections present details of the experiments and are followed by the discussion of obtained results. Furthermore, we choose to evaluate the performance of the OC-SVM classifier using only five signatures per writer during the learning phase. 3.4.4 OC-SVM models used for combined individual writer-independent HSV systems This section aims to tune experimentally the optimal parameters 𝑣, 𝛾 for each OC-SVM model using the proposed validation approach. Thus, the OC-SVM model is produced for each individual writer-independent off-line HSV system according the DCT and CT based descriptors, respectively. Table 3.1 shows the optimal parameters of the two OC-SVM models associated to DCT and CT based descriptors using testing data, respectively. We notice that not only there is an increased ranges of variation of 𝑣 and γ but also the number of support vectors that allows a better representation of genuine class of signatures.

Parameters of the OC-SVM Model

Descriptor DCT 9.5 8.026 238

𝒗 𝛄 𝑺𝒗

CT 0.2 65.1 220

Table 3.1: Optimal parameters of the OC-SVM models obtained according the proposed validation approach. 3.4.5 Determining of parameters in relation with both descriptors during the validation phase In what follows, we shall describe how the optimal number of DCT coefficients, optimal decomposition level of CT and the corresponding decision thresholds for each OC-SVM classifiers are determined during the validation phase, respectively. 3.4.5.1 Selecting the optimal number of DCT coefficients and the corresponding decision threshold In order to set the optimal number of the significant DCT coefficients, we now study the influence of the number of DCT coefficients on the different error rates computed from the validation samples. Thus, the obtained results are shown in Table 3.2. From the results shown in Table 3.2, we have chosen to set in the DCT based feature vector the number of significant coefficients to 24 in accordance with best global error rate AER (23.2857%) obtained for this value. Thus, this optimal number of coefficients will be retained for the next experiments. Figure 3.9 shows the FRR and FAR computed for different values of the decision threshold, 70


which allows determining the optimal threshold (≅ −0.06071) for the OC-SVM classifier associated to DCT based descriptor during the validation phase for an optimal number of DCT coefficients equal to 24. Number of DCT Coefficients

Optimal Threshold

FRR (%)

Error Rates FAR (%)

AER (%)

20

-0.054184

25.0000

25.0000

25.0000

22

-0.056147

23.7143

23.4286

23.5714

24

-0.060712

23.1429

23.4286

23.2857

26

-0.073691

24.0000

23.7143

23.8571

28

-0.074203

23.4286

24.0000

23.7143

30

-0.081034

24.5714

24.8571

24.7143

32

-0.084211

24.5714

25.1429

24.8571

34

-0.085002

24.8571

25.1429

25.0000

36

-0.085021

25.1429

25.1429

25.1429

38

-0.087733

25.7143

26.0000

25.8571

40

-0.089344

26.2857

26.2857

26.2857

42

-0.100700

26.2857

26.0000

26.1429

44

-0.100150

26.0000

26.0000

26.0000

46

-0.111900

26.0000

25.7143

25.8571

48

-0.119230

25.7143

25.7143

25.7143

50

-0.120510

26.0000

26.0000

26.0000

Table 3.2: Influence of the number of DCT coefficients on the different error rates during the validation phase. We notice that the optimal decision threshold of the OC-SVM classifier associated to DCT based descriptor during the validation phase corresponds to -0.0607 for which the AER is minimal with a value of 23.2857%. Hence, the same optimal value of threshold will be used for evaluating the performance of the OC-SVM classifier associated to DCT based descriptor

Error rate (%)

during the testing phase.

Threshold value

Figure 3.9: Error rates of the OC-SVM classifier associated to DCT based descriptor using different values of the decision threshold during validation phase. 71


3.4.5.2 Selecting the optimal decomposition level of CT and the corresponding decision threshold In order to reduce the error rate obtained previously through using the OC-SVM classifier associated to DCT based descriptor, we try to investigate the use of CT based descriptor. The determination of the optimal decomposition level 𝑗𝑜𝑝𝑡 has been established by varying the decomposition level between 4 and 7, where the value 7 defines here the maximal decomposition level due to the size of the normalization of signature images associated to the CT, which has been fixed to 1024 × 1024 using CEDAR database. Thus, the obtained results are shown in Table 3.3.

Decomposition Level 𝒋

Optimal Threshold

4

Error Rates

-0.41988

FRR (%) 8.0000

FAR (%) 7.4286

AER (%) 7.7143

5

-0.29781

12.8571

4.0000

8.4286

6

-0.31508

11.1429

4.8571

8.0000

7

-0.32461

19.7143

16.0000

17.8571

Table 3.3: Influence of the decomposition level 𝑗 on the different error rates during the validation phase. Figure 3.10 shows the FRR and FAR computed for different values of the decision threshold, which allows determining the optimal threshold for the OC-SVM classifier associated to CT based descriptor during the validation phase for an optimal decomposition level 𝑗𝑜𝑝 𝑡 equal to

Error rate (%)

4.

Threshold value

Figure 3.10: Error rates of the OC-SVM classifier associated to CT based descriptor using different values of the decision threshold during validation phase.

72


According to the above figure, we notice that the optimal decision threshold of the OCSVM classifier associated to CT based descriptor during the validation phase corresponds to 0.4199 for which the AER is minimal with a value of 7.7143%. Hence, the same optimal value of threshold will be used for evaluating the performance of the OC-SVM classifier associated to CT based descriptor during the testing phase. 3.4.6 Verification results and discussion The effectiveness of the proposed belief function theories based combination scheme is demonstrated experimentally by computing the verification performance of the two individual writer-independent off-line HSV systems, which will be tested on testing signatures of the CEDAR database. In these experiments, we compare the performance of the proposed DSm theory-based combination algorithm with learning-based individual OC-SVM classifiers, statistical match score combination algorithms, and DS theory-based combination algorithm. Table 3.4 shows the FRR, FAR and AER based verification error rates computed for the corresponding optimal values of decision threshold of both individual OC-SVM classifiers and the proposed combination frameworks with Max, Sum, Min, DS and PCR6 rules. Here OC-SVM1 classifier represents the individual writer-independent off-line HSV system using OC-SVM classifier associated to DCT based descriptor that yields an AER of 37.2868% corresponding to the optimal value of threshold 𝑡 = −0.060712; while OC-SVM2 classifier represents the individual writer-independent off-line HSV system using OC-SVM classifier associated to CT based descriptor that yields an AER of 4.2636% corresponding to the optimal value of threshold 𝑡 = −0.41988. The Max and Sum based combination algorithms decrease the AER of OC-SVM1 classifier to 32.3256% and 27.5969% for the corresponding optimal values of threshold 𝑡 = −0.06071 and 𝑡 = −0.48059 , respectively. While Min based combination algorithm provides a similar result, which is obtained when using the OCSVM2 classifier (i.e. an AER of 4.2636%) with the same corresponding optimal value of threshold 𝑡 = −0.41988. Indeed, the Max, Sum and Min based combination algorithms failed to improve the verification performance of the proposed combination scheme since it couldn’t handle managing correctly the conflict generated from the two individual writer-independent off-line HSV systems. Hence, the proposed statistical match score combination algorithms are not appropriate to solve our problem for writer-independent off-line HSV.

73


Algorithm

Optimal Threshold

Verification Error Rates (%)

OC-SVM1 classifier (DCT)

-0.060712

FRR 28.7719

FAR 44.0278

AER 37.2868

OC-SVM2 classifier (CT)

-0.419880

9.6491

0.0000

4.2636

Max combination rule

-0.060710

17.5439

44.0278

32.3256

Sum combination rule

-0.480590

6.8421

44.0278

27.5969

Min combination rule

-0.419880

9.6491

0.0000

4.2636

DS combination rule

0.334200

0.0000

6.3158

2.7907

PCR6 combination rule

0.267100

0.0000

6.1404

2.7132

Table 3.4: Experimental results of proposed algorithms. In the following, DST and DSmT are based on a different approach for modelling respectively of the notion of ignorance and paradox which seem to be an excellent choice for managing the conflicting outputs provided by both individual writer-independent off-line HSV systems, where statistical match score algorithms of combination fail to improve the performance attained through using OC-SVM learning algorithm associated to CT based descriptor. In this vein, we consider only the DS and PCR6 combination algorithms which are the more appropriate combination rules developed within DST and DSmT frameworks, respectively. For each combination rule, a decision making has been performed about whether the signature is genuine or forgery by using a decision threshold expressed in terms of mass according to (3.23), which will be applied on the combined masses. In order to appreciate the advantage of combining two sources of information through both DS and PCR6 combination rules, we present in Figure 3.11 the conflict measured during testing phase between the two OC-SVM classifiers associated to DCT and CT-based descriptors. By analyzing the different values of conflict, we first notice that the minimal value of conflict for all testing genuine and forged signatures is respectively the same and equals to 0.4999. Moreover, this representation is very attractive because of the constant value of the conflict (𝐾𝑐 = 0.4999 ) for all testing forged signatures due to the values of the posterior probabilities related to DCT based descriptor, which are negligible compared to those provided through using the CT based descriptor. Furthermore, the proposed combination module (see Figure 3.11) is even more interesting in terms of discriminating values of conflict of the forged and genuine signatures, which allows defining an optimal threshold for the decision making. We can see that the two sources of information are very conflicting since the value of conflict for any testing signature is greater than or equal to 0.4999. Hence, the task of the 74


proposed combination module is to manage the conflicts generated from both individual writer-independent off-line HSV systems for each testing signature. The proposed combination scheme using the DS combination algorithm yields an AER of 2.7907% corresponding to the optimal value of threshold 𝑡 = 0.3342 ; while PCR6 combination algorithm yields the best AER of 2.7132% corresponding to the optimal value of threshold 𝑡 = 0.2671 . Indeed, the use of DS rule in the combination module allows efficiently redistributing the beliefs through a simple normalization by 1 − 𝐾𝑐 in the combination process of masses and combining the normalized outputs of both individual writer-independent off-line HSV systems which are not highly conflicting. However, when outputs are highly conflicting, they do not provide reliable decision. Further, an improvement of 0.0775% in the verification performance is obtained through using PCR6 combination algorithm. This is due to the efficient redistribution of the partial conflicting mass only to the elements involved in the partial conflict.

Conflict measure

Kc

Signature index

Figure 3.11: Conflict between both OC-SVM classifiers using DCT and CT-based descriptors for testing signatures. 3.5 Conclusion This chapter proposed and presented an effective combination scheme of two writerindependent off-line HSV systems in a general belief function framework. The OC-SVM classifiers associated respectively to DCT and CT features can be incorporated as an intelligent learning technique using only genuine signatures. The combination framework is performed through belief function theories using the estimation technique based on an inspired version of Appriou’s model, DST and DSmT based combination algorithms. A new 75


decision criterion has been implemented in DST and DSmT frameworks for a decision making whether the signature is accepted or rejected. Experimental results show that the proposed combination scheme with PCR6 rule yields the best verification accuracy compared to the statistical match score combination algorithms and DS theory-based combination algorithm even when the individual writer-independent off-line HSV systems provide conflicting outputs. In continuation to the present work, the next objectives consist to adapt the use of the evidence supporting measure of similarity (ESMS) criteria to select complementary outputs of the OC-SVM classifiers associated to corresponding descriptors through using the same proposed combination scheme in order to attempt to improve the AER.

76

Chapter 4

The Effective Use of the DSmT for Multiclass Classification

Abstract: The extension of the Dezert-Smarandache theory (DSmT) for the multi-class framework has a feasible computational complexity for various applications when the number of classes is limited or reduced typically two classes. In contrast, when the number of classes is large, the DSmT generates a high computational complexity. This chapter proposes to investigate the effective use of the DSmT for multi-class classification in conjunction with the Support Vector Machines using the One-Against-All (OAA) implementation, which allows offering two advantages: firstly, it allows modeling the partial ignorance by including the complementary classes in the set of focal elements during the combination process and, secondly, it allows reducing drastically the number of focal elements using a supervised model by introducing exclusive constraints when classes are naturally and mutually exclusive. To illustrate the effective use of the DSmT for multiclass classification, two SVM-OAA implementations are combined according three steps: transformation of the SVM classifier outputs into posterior probabilities using a sigmoid technique of Platt, estimation of masses directly through the proposed model and combination of masses through the Proportional Conflict Redistribution (PCR6). To prove the effective use of the proposed framework, a case study is conducted on the handwritten digit recognition. Experimental results show that it is possible to reduce efficiently both the number of focal elements and the classification error rate.

4.1 Introduction The initial systems that have emerged in the optical character recognition (OCR) are the systems for reading postal addresses used for mail sorting, automatic reading of handwriting on the forms, etc. Despite the researches in this area, the handwriting recognition remains an open and important problem. In many applications, various constraints do not allow an efficient joint use of classifiers and feature generation methods leading to an inaccurate performance. The main 77

Chapter 4: The Effective Use of DSmT for Multi-class Classification

reasons come from two aspects. First, for a specific application problem, each of these classifiers could attain a different degree of success, but maybe none of them is totally perfect, or even not as good as expected for practical applications. The second aspect is that for a specific recognition problem, usually numerous types of features could be used to represent and recognize patterns [Xu, 1992], [Cheriet, 2007]. For instance, in character recognition, the basic task of such system is the recognition of isolated handwritten characters, the idea is to focus on only elementary units of isolated characters or numerals at a time. Hence, this method leads to several constraints due to the intrinsic nature of acquired data such as variability in the size of characters that can occur even among the characters of the same class, the difference in writing between individuals, the complexity of the separation between the character and background, the thickness of the writing, the inclination angle [Cheriet, 2007]. All these parameters are variables which makes this task complex and difficult. However, with the existence of the constraints corresponding to both aspects mentioned before, the concept of classifier combination is proposed as a new direction for enhancing the robustness of recognition systems [Kittler, 1998], and the most important theories developed in literature for managing uncertainties, namely, Probability Theory [Kolmogorov, 1960], [Papoulis, 2002] (and more recently Imprecise Probability Theory [Walley, 1991]), Possibility Theory [Dubois, 2001] (based on Fuzzy Sets Theory [Zadeh, 1978]), Neutrosophic Set Theory [Kandasamy, 2004] and belief function theories (such as Dempster–Shafer Theory (DST) [Shafer, 1976] and more recently Dezert–Smarandache Theory (DSmT) [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009]), have indicated that the combination of several complementary classifiers will improve the performance of individual classifiers. In this research, we investigate in depth the effective use of the DSmT for multi-class classification because of its ability to deal efficiently with imprecise, uncertain and conflicting sources of information. In DSmT, the discernment space  can be a set of possible nonexclusive elements and the definition of belief mass m. is extended to the lattice structures of hyperpowerset D  [Dilworth, 1961], [Grätzer, 1978], [Smarandache, 2009]. In general, m. is not a measure of probability, except in the case when its focal elements (that is, the elements which have a strictly positive mass of belief) are singletons; in such case, m. is called a basic probability assignment (bpa) [Shafer, 1976] which can be considered as a subjective probability measure. In DSmT framework, the main classifier combination problem consists in finding an 78


efficient way for combining several sources of evidence S1, S2 ,, S p (in our case multiple classifiers) characterized by their belief masses m1 . ,m2 . ,,m p . . These masses are defined on the same combination space, either 2 (power set), DM (reduced hyper-powerset), or D  depending on the underlying model (i.e. Shafer’s model, hybrid model or free DSm model [Smarandache, 2004]) associated with the nature of the frame  . The DSmT has proved its efficiency in many kinds of applications [Smarandache, 2004], [Smarandache, 2006], [Smarandache, 2009]. Indeed, the difficulty in classifier combination arises from the fact that the individual classifiers can be conflicting and one needs a solution for dealing with conflicting information in the combination process. Thus, the use of the DSmT for multi-class classification has a feasible computational complexity for various applications when the number of classes is limited or reduced typically two classes [Abbas, 2012b]. In contrast, when the number of classes is large, the DSmT generates a high computational complexity closely related to the number of elements to be processed, which follows the sequence of Dedekind’s numbers (i.e. Dedekind n such that n is the number of classes belonging to  ) [Dedekink, 1897]. Martin and Osswald propose PCR6 rule of combination, as the combination operator, to combine sources of evidence in DSmT framework. PCR6 is considered as an alternative rule to PCR5 for combining more than two sources altogether (i.e. p upper then 3). The PCR6 rule does not follow back on the track of conjunctive rule as PCR5 general formula does, but it gets better intuitive results. However, the use of the free DSm model, considering the set of all subsets of the original classes (but under the union and the intersection operators), is not easy and both rules becomes untractable because their complexity increases drastically with the number p of combined sources or with the size of the discernment space  (i.e. cardinal of  , denoted Car such that Car  n ), specially in the worst case (that is, when a strictly positive mass of belief is assigned to all elements of the combination space) [Dezert, 2004a]. To avoid this problem, one can do: (1) to reduce the number of combined sources and (2) to reduce the size of the combination space D  [Djiknavorian, 2006], [Martin, 2009], [Li, 2011], [Abbas, 2012d], [Abbas, 2013b]. In this chapter, we propose a solution only for reducing the number of focal elements in D  . We are not concerned about the first aspect in our application of handwritten digit recognition since in this application two complementary sources of information are only available, each one having own multi-class implementation of support vector machine (SVM). To reduce the computational complexity and 79


expect good performances of the use of the DSmT within multi-class classification framework, it seems natural to introduce integrity constraints on D  taking into account only the complementary elements (i.e. simple classes and their partial ignorance) according to a supervised model based combination scheme. In many pattern recognition applications, the classes belonging to the discernment space are naturally and then mutually exclusive such as in biometrics [Singh, 2008], [Vatsa, 2010] and handwritten recognition applications [Abbas, 2012d], [Abbas, 2012b], [Abbas, 2012c]. Hence, several classification methods have been proposed as template matching techniques [Deng, 1999], [Guo, 2001], minimum distance classifiers [Fang, 2001], [Sabourin, 1997], support vector machine (SVM) [Jusitno, 2005], hidden Markov Models (HMMs) [Jusitno, 2005], [Coetzer, 2004], neural networks [Kaewkongka, 1999], [Quek, 2002]. In various pattern recognition applications, the SVMs have proved their performance from the mid-1990s comparatively to other classifiers [Cheriet, 2007]. The SVM is based on an optimization approach in order to separate two classes by an hyperplane. In the context of multi-class classification, this optimization approach is possible [Weston, 1998a] but requiring a very costly duration. Hence, two preferable methods of multi-class implementation of SVMs have been proposed for combining several binary SVMs, which are One Against All (OAA) and One Against One (OAO), respectively [Weston, 1998b], [Hsu, 2001]. The former is the most commonly used implementation in the context of multi-class classification using binary SVMs, which constructs n SVMs to solve a n -class problem [Bottou, 1994]. Each SVM is designed to separate a simple

class  i from all the others, i.e., from the corresponding complementary class  i 



j 0 j n 1 j i

. In

contrast, the OAO implementation is designed to separate two simple classes  i and  j ( i  j ), which requires n  n  1 / 2 SVMs. Hence, various decision functions can be used such as the Decision Directed Acyclic Graph (DDAG) [Huang, 2002] since it has the advantage to eliminate all possible unclassifiable data. Generally, the combination of binary classifiers is performed through very simple approaches such as voting rule or a maximization of decision function coming from the classifiers. In this context, many combination operators can be used, especially in the DST framework [Martin, 2007]. Still in the same vein, some works have been tried out the combination of binary classifier originally from SVM in the DST framework [Aregui, 2007], [Quost, 2007a]. For instance, the 80


pairwise approach has been revisited by Quest et al. [Quost, 2007a], [Quost, 2007b] in the framework of the DST of belief functions for solving a multi-class problem. In [Hu, 2005], the combination method based on DST has been used by Hu et al. for combining multiple multi-class probability SVM classifiers in order to deal with distributed multi-source multi-class problem [Hu, 2005]. Martin and Quidu proposed an original approach based on DST [Martin, 2008] for combining binary SVM classifiers using OAO or OAA strategies, which provides a decision support helping experts for seabed characterization from sonar images. Burger et al. [Burger, 2006] proposed to apply a belief-based method for SVM fusion to hand shape recognition. Optimizing the fusion of the sub-classifications and dealing with undetermined cases due to uncertainty and doubt have been investigated by other works [Burger, 2008], through a simple method, which combines the fusion methods of belief theories with SVMs. Recently, one regression based approach [Laanaya, 2010] has been proposed to predict membership or belief functions, which are able to model correctly uncertainty and imprecision of data. In this work, we propose to investigate the effective use of the DSmT for multi-class classification in conjunction with the SVM-OAA implementation, which allows offering two advantages: firstly, it allows modeling the partial ignorance by including the complementary classes in the set of focal elements, and then in the combination process, contrary to the OAO implementation which takes into account only the singletons, and secondly, it allows reducing drastically the number of focal elements from Dedekind n to 2  n . The reduction is performed through a supervised model using exclusive constraints. Combining the outputs of SVMs within DSmT framework requires that the outputs of SVMs must be transformed into membership degree. Hence, several methods of estimating of mass functions are proposed in both DST and DSmT frameworks, these ones can be directly explicit through special functions or indirectly explicit through transfer models [Dmpster, 1967], [Denoeux, 1997], [Smets, 1994], [Dubois, 1994], [Appriou, 1999]. In our case, we propose a direct estimation method based on a sigmoid transformation of Platt [Platt, 1999]. This allows us to satisfy the OAA implementation constraint. The chapter is organized as follows. Section 2 describes the combination methodology for multi-class classification using the SVM-OAA implementation. Experiments conducted on the dataset of the isolated handwritten digits are presented in section 3. The last section gives a summary of the proposed combination framework and looks to the future research direction. 81


4.2 Methodology The proposed combination methodology shown in Figure 4.1 is composed of two individual systems using SVMs classifiers. Each one is trained using its own source of information providing two kinds of complementary features, which are combined through the PCR6 rule. In the following, we give a description of each module composed our system.

Acquisition (Input Data)

Source 1

Source 2

SVMs Classifier

SVMs Classifier

(First descriptor)

(Second descriptor)

lassifier

lassifier DSmT based Parallel Combination

Decision

Figure 4.1: Structure of the combination scheme using SVM and DSmT. 4.2.1 Classification based on SVM The classification based on SVMs has been used widely in many pattern recognition applications as the handwritten digit recognition [Cheriet, 2007]. The SVM is a learning method introduced by Vapnik et al. [Vapnik, 1995], which tries to find an optimal hyperplane for separating two classes. Its concept is based on the maximization of the distance of two points belonging each one to a class. Therefore, the misclassification error of data both in the training set and test set is minimized. Basically, SVMs have been defined for separating linearly two classes. When data are non linearly separable, a kernel function K is used. Thus, all mathematical functions, which satisfy Mercer’s conditions, are eligible to be a SVM-kernel [Vapnik, 1995]. Examples of such kernels are sigmoid kernel, polynomial kernel, and Radial Basis Function (RBF) kernel. Then, the decision function f : R p   1, 1, is expressed in terms of kernel expansion as: f x  

Sv



k

y k K x, x k   b

k 1

82

(4.1)


where  k are Lagrange multipliers, Sv is the number of support vectors xk which are training data, such that 0   k  C , C is a user-defined parameter that controls the tradeoff between the machine complexity and the number of nonseparable points [Huang, 2002], the bias b is a scalar computed by using any support vector. Finally, for a two-class problem, test data are classified according to: class  1 if f x   0 x class  1 otherwise

(4.2)

The extension of the SVM for multi-class classification is performed according the One Against-All (OAA) [Cortes, 1995]. Let a set of N training samples which are separable in n classes  0 ,1 ,, n1, such that

x , y  R i k

i k

p



 1; k  1,..,N ; i  1,..n . The principle consists to

separate a class from other classes. Consequently, n SVMs are required for solving n class problem. 4.2.2 Classification based on DSmT The proposed classification based on DSmT is presented in Figure 4.2, which is conducted into three steps: i) estimation of masses, ii) combination of masses through the PCR6 combination rule and iii) decision rule.



SVM 01   0 , 0 Information source S1







SVM n11   n1 , n1

Estimation of





masses m1 .

Set of focal elements F

Acquisition system



SVM 02   0 , 0  Information source S 2





SVM n21   n 1 , n 1



DSmT–based combination rule

Decision making

Estimation of



masses m2 .

Figure 4.2: DSmT-based parallel combination for multi-class classification.

83


4.2.2.1 Estimation of masses The difficulty of estimating masses is increased if one assigns weights to the composed classes [Lowrance, 1991]. Therefore, transfer models of the mass function have been proposed whose the aim is to distribute the initial masses on the simple and compound classes associated to each source. Thus, the estimation of masses is performed into two steps: i) assignment of membership degrees for each simple class through a sigmoid transformation proposed by Platt [Platt, 1999], ii) estimation of masses of simple classes and their complementary classes using a supervised model, respectively.  Calibration of the SVM outputs: Although, standard SVM is very discriminative classifier, its output values are not calibrated for appropriately combining two sources of information. Hence, an interesting alternative is proposed in [Platt, 1999] to transform the SVM outputs into posterior probabilities. Thus, given a training set of instance-label pairs xk , yk ,k  1,, N , where xk  R p and yk   1, 1 , the unthresholded output of an SVM is a distance measure between a test pattern and the decision boundary as given in Eq. (4.1). Furthermore, there is no clear relationship with the posterior class probability Py  1 x  that the pattern x belongs to the class y  1 . A possible estimation for this probability can be obtained [Platt, 1999] by modeling the distributions P f y  1 and P f y  1 of the SVM output f x  using Gaussian distribution of equal variance and then compute the probability of the class given the output by using Bayes’ rule. This yields a sigmoid allowing to estimate probabilities: 

Py  1 x  

1 1  exp  A  f x   B 

(4.3)

Parameters A and B are tuned by minimizing the negative log-likelihood of the training data: N



t

Qk   1  t k log1  Qk 

k log

(4.4)

k 1



where Qk  Pyk  1 x  and t k 

yk  1 denotes the probability target. 2

 Supervised Model: Denoting m1 . and m2 . the gbba provided by two distinct information sources S1 (First descriptor) and S 2 (Second descriptor), F is the set of focal elements for 84


each source, such that F   0 ,1 ,, n1 , 0 ,1 ,, n1, the classes  i are separable (One relatively to its complementary class  i ) using the SVM-OAA multi-class implementation corresponding to different singletons of the patterns assumed to be known. Therefore, each compound element Ai  F has a mass m1 equal to zero, on the other hand, the mass of the complementary element  i 



j 0 j n 1 j i

is different from zero, which represents the mass of the

partial ignorance. The same reasoning is applied to the classes issued from the second source S 2 and m2 . . Hence, both gbba m1 . and m2 . are given as follows:



mb  i  

Pb  i x  Zb

,  i  F

(4.5)

n 1 

 P  x b

 

mb  i 

j

j 0 j i

,  i  F

(4.6)

mb  Ai   0, Ai    D  \ F

(4.7)

Zb

where Z b   j 0 Pb  j x  represent the normalization factors introduced in the axiomatic n1 



approach in order to respect the mass definition, Pb are the posterior probabilities issued from the first source b  1 and the second source b  2 , respectively. They are given for a test pattern x as follows: 

Pb  i x  

1 . 1  exp  Aib  f ib x   Bib 

(4.8)

Aib and Bib are the parameters of the sigmoid function tuned by minimizing the negative log-

likelihood during training for each class of patterns  i , and f ib x  is the i -th output of binary SVM classifier SVM ib issued from the source S b , such that i  0,1,, n  1 and b  1, 2 . In summary, the masses of all elements A j  D  allocated by each information source Sb b  1,2 are obtained according the following steps:

85


1. Define a frame of discernment   1 , 2 ,, n . 2. Classify a pattern x through the SVM-OAA implementation. 3. Transform each SVM output to the posteriori probability using Eq. (4.8). 4. Compute the masses associated to each class and its complementary using Eq. (4.5) and Eq. (4.6), respectively. 4.2.2.2 Combination of masses In order to manage the conflict generated from the two information sources S1 and S 2 (i.e. both SVM classifications), the combined masses are computed as follows: mc  m1  m2

(4.9)

where  defines the combination operator, obtained from m1 . and m2 . by means of the PCR6 rule [Smarandache, 2006a], [Smarandache, 2009] as follows: 0  2 mc  Ai   mPCR 6  Ai      m A  mk2  Ai Lk   i k 1 



if Ai  , otherwise.

(4.10)

Where Lk 



m k 1 Y k 1

Y k 1  Ai  Y k 1D 

mk  Ai   m k 1 Y k 1

(4.11)

   M ,Ø is the set of all relatively and absolutely empty elements,  M is the set of all

elements of D  which have been forced to be empty in the hybrid model M defined by the exhaustive and exclusive constraints, Ø is the empty set, the denominator mk  Ai   m k 1 Y k 1 is different to zero, and where  k 1 counts from 1 to 2 avoiding k , i.e.: 2 if k  1, 1 if k  2.

 k 1  

(4.12)

Thus, the term m  Ai  represents a conjunctive consensus, also called DSm Classic (DSmC) combination rule [Smarandache, 2006a], [Smarandache, 2009], which is defined as:

86

Chapter 4: The Effective Use of DSmT for Multi-class Classification if Ai  Ø, 0  m  Ai    m1  X m2 Y  otherwise. X ,YD , X Y  A  i 



(4.13)

Hence, in the context of some application of pattern recognition area, such as handwritten digit recognition, we take as constraints the propositions (  i  j  Ø , i , j   ), such that i  j , which allow separating between each two classes belonging to  . Therefore, the hyper power set D

is reduced to the set F as F   0 ,1 ,, n1 , 0 ,1 ,, n1, which defines a particular case of the

Shafer’s model. Thus, the conflict K c  0, 1 measured between two sources is defined as: Kc 



m1 Ak , Al F Ak  Al 

 Ak  m2  Al 

(4.14)

where   D \ F is the set of all integrity constraints introduced through the supervised model M , m1 . and m2 . represent the corresponding generalized basic belief assignments provided by

two information sources S1 and S 2 , respectively. 4.2.2.3 Decision rule A membership decision of a pattern to one of the simple classes of  is performed using the statistical classification technique. First, the combined beliefs are converted into probability measure using a new probabilistic transformation, called Dezert-Smarandache probability (DSmP) that maps a belief measure to a subjective probability measure [Smarandache, 2009] defined as:

DSmP ( i )  mc ( i )  mc  i    

 

mc ( A j )

m (A )   C

A j 2 A j  i Ak 2 CM ( A j ) 2 Ak  A j CM ( Ak ) 1

c

k

M

(Aj )

(4.15)

where i  0,1,, 9,   0 is a tuning parameter, M is the Shafer’s model for  , and C M ( Ak ) denotes the DSm cardinal of Ak [Smarandache, 2004]. Therefore, the maximum likelihood (ML) test is used for decision making as follows:  x  i if DSmP ( i )  max  DSmP ( j ), 0  j  9  

87

(4.16)


where x is the pattern test characterized by both descriptors, which are used during the feature generation step, and  is fixed to 0.001 in the decision measure given by Eq. (4.15).

4.3 Experimental results 4.3.1 Database description and performance evaluation For evaluating the effective use of the DSmT for multi-class classification, we consider a case study conducted on the handwriting digit recognition application. For this, we select a wellknown US Postal Service (USPS) database that contains normalized grey-level handwritten digit images of 10 numeral classes, extracted from US postal envelopes. All images are segmented and normalized to a size of 16 16 pixels. There are 7291 training data and 2007 test data where some of them are corrupted and difficult to classify correctly Figure 4.3. The partition of the databse for each class according tranining and testing is reported in Table 4.1.

2

4

6

8

9

3

5

6

8

9

Figure 4.3: Some samples with their alleged classes from USPS database.

Classes Training Testing

0 1194 359

1 1005 264

2 731 198

3 658 166

4 652 200

5 556 160

6 664 170

7 645 147

8 542 166

9 644 177

Table 4.1: Partitioning of the USPS dataset. For evaluating performances of the handwritten digit classification, a popular error is considered, which is the Error Rate per Class (ERC) and Mean Error Rate (MER) for all classes. Both errors are expressed in %. 4.3.2 Pre-processing The acquired image of isolated digit should be processed to facilitate the feature generation. In our case, the pre-processing module includes a binarization step using the method of Otsu [Otsu, 88


1979], which eliminates the homogeneous background of the isolated digit and keeps the foreground information. 4.3.3 Feature Generation The objective of the feature generation step is to underline the relevant information that initially exists in the raw data. Thus, an appropriate choice of the descriptor improves significantly the accuracy of the recognition system. In this study, we use a collection of popular feature generation methods, which can be categorized into background features [Britto, 2004], [Cavalin, 2006], foreground features [Britto, 2004], [Cavalin, 2006], geometric features [Cheriet, 2007], and uniform grid features [Fayata, 1996], [Abbas, 2011]. 4.3.4 Validation of SVM Models The SVM model is produced for each class according the used descriptor. Hence, the training dataset is partitioned into two equal subsets of samples, which are used for training and validating each binary SVM, respectively. Thus, the validation phase allows finding the optimal hyperparameters for the ten SVM models. In our case, the RBF kernel is selected for the experiments. Furthermore, both the regularization and RBF kernel parameters C,  of each SVM are tuned experimentally during the training phase in such way that the misclassification error of data in the training subset is zero and the validation test gives a minimal error during validation phase for each SVM separating between a simple class and its complementary class. Table 4.2 shows an example of the optimal parameters, which are obtained during both training and validation phases by using the UG-SVMs classifier. The parameters n and m define the number of the lines (vertical regions) and columns (horizontal regions) of the grid, respectively, which have been optimized during the validation phase for each SVM model. Therefore, these all parameters are used afterwards during the testing phase. ERCs and ERCc are the Error Rates per Class for simple and complementary classes, respectively. As we can see, the choice of the optimal size of the uniform grid and hyperparameters of each SVM should be tuned carefully in order to produce a reduced error.

89


Parameters N M  C ERCs (%) ERCc (%)

0 7 5 3.5 5 2.0 0.6

1 2 3 1 3 1.0 1.1

2 8 3 3.5 4 4.6 0.4

SVM Classifier 3 4 5 5 4 7 6 12 5 4 3 3.5 5 4 4 5.7 15.6 10.0 0.3 0.1 0.3

6 7 8 4 2 2.7 0.1

7 8 6 3.5 4 5.5 0.1

8 8 6 5 3 11.8 0.3

9 7 10 4.5 5 4.0 0.4

Table 4.2: Optimal parameters of the UG-SVMs classifier. 4.3.5 Quantitative results and discussion The testing phase is performed using all samples from the test dataset. Hence, the performance of the handwritten digit recognition classification is evaluated on an appropriate choice of descriptors using the SVM classifiers and then we evaluate the combination of the SVMs classifiers within DSmT framework. 4.3.5.1 Comparative analysis of features The choice of the complementary features is an important step to ensure efficiently the combination. Indeed, the DSmT-based combination allows offering an accurate performance when the selected features are complementary. Hence, we propose in this section the performance of features in order to select the best ones for combining through the DSmT. For this, we evaluate each SVM-OAA implementation using Foreground Features (FF), Background Features (BF), Geometric Features (GF), Uniform Grid Features (UGF), and the descriptors deduced from a concatenation between at least two simple descriptors such as (BF,FF), (BF,FF,GF) and (UGF,BF,FF,GF). Indeed, the experiments have shown that the appropriate choice of both descriptors and concatenation order to represent each digit class in the feature generation step provides an interesting error reduction. In Table 4.3, FF and UGF-based descriptors using SVM classifiers are evaluated. When concatenating background and foreground (BF,FF)-features, we observe a significant reduction of the MER. Indeed, an error rate reduction of 6.71% is obtained when concatenating BF and FF, respectively. Furthermore, an error rate reduction of 1.5% is obtained when concatenating BF, FF and GF, respectively. This proves that BF, FF and GF are complementary and are more suitable for concatenation. In contrast, when concatenating UGF with BF, FF and GF, the MER is increased to 2.73% comparatively to UGF. This proves that the 90


concatenation does not always allow improving the performance of the classification. Thus, we expect that the UGF and (BF,FF,GF) descriptors are more suitable for combining through the DSmT. Descriptor (a) FF (b) (BF,FF) (c) (BF,FF,GF) (d) UGF (e) (UGF,BF,FF,GF)

MER (%) 18.87 12.16 10.66 6.98 9.71

Table 4.3: Mean error rates of the SVM classifiers using different methods of feature generation. 4.3.5.2 Performance evaluation of the proposed combination framework In these experiments, we evaluate a handwritten digit recognition classification based on a combination of SVM classifiers through DSmT. The proposed combination framework allows exploiting the redundant and complementary nature of the (BF,FF,GF) and UGF-based descriptors and manage the conflict provided from the outputs of SVM classifiers. Decision making will be only done on the simple classes belonging to the frame of discernment. Hence, we consider in both combination process and calculation of the decision measures the masses associated to all classes representing the partial ignorance i 

j 0 j  n 1 j i



and

 i   j such that i  j . Thus, in order to appreciate the advantage of combining two sources of

information through the DSmT-based algorithm, Figure 4.4 until Figure 4.13 show values of the distribution of the conflict measured for each test sample between both SVM-OAA implementations using (BF,FF,GF) and UGF-based descriptors for the 10 digit classes

 i ,i  0,1,,9 , respectively. Table 4.4 reports the minimal and maximal values of the conflict K ci ,i  0,1,,9 generated through the supervised model, which represent the mass assigned to the empty set, after combination process. As we can see, the conflict is maximal for the digit 4 while it is minimal for the digit 9.

91


Figure 4.4: Measured conflict between both SVMs classifiers using (BF,FF,GF) and UGF-based descriptors for the digits belonging to  0 .

Figure 4.5: Measured conflict between both SVMs classifiers using (BF,FF,GF) and UGF-based descriptors for the digits belonging to 1 .

Figure 4.6: Measured conflict between both SVMs classifiers using (BF,FF,GF) and UGF-based descriptors for the digits belonging to  2 . 92











Class 0 1 2 3 4 5 6 7 8 9

Minimal conflict (10-5) 2.149309 6.999035 2.747717 2.936855 0.494599 1.868961 2.537015 2.826402 1.485899 0.276778

Maximal conflict (10-2) 2.9933 2.9964 2.9992 2.9994 3.0000 2.9970 2.9887 2.9983 2.9910 2.9999

Table 4.4: Ranges of conflict variations measured between both SVM-OAA implementations using (BF,FF,GF) and UGF-based descriptors. For an objective evaluation, Table 4.5 shows ERC and MER produced from three SVM-OAA implementations using UGF, (BF,FF,GF), the descriptor resulting from a concatenation of both UGF and (BF,FF,GF) (i.e. combination at features level) and finally the PCR6 combination rule (i.e. combination at measure level) performed on (BF,FF,GF) and UGF based descriptors, respectively.

95


Descriptor ERC (%) (BF,FF,GF) 0 6.69 1 4.55 2 12.63 3 17.47 4 20.00 5 16.87 6 2.94 7 8.84 8 12.05 9 10.73 MER (%) 10.66

Concatenation

UGF 1.95 3.79 8.08 10.84 11.50 10.00 5.29 8.16 10.84 6.21 6.98

(UGF,BF,FF,GF) 9.75 3.79 3.54 18.67 19.50 10.62 4.71 8.84 10.24 10.17 9.71

Combination rule PCR6 1.95 3.03 6.06 10.84 9.00 7.50 3.53 4.76 6.63 5.65 5.43

Table 4.5: Error rates of the proposed framework with PCR6 combination rule using (BF,FF,GF) and UGF descriptors. Overall, the proposed framework using PCR6 combination rule is more suitable than individual SVM-OAA implementations since it provides a MER of 5.43% comparatively to the concatenation which provides a MER of 9.71%. However, when inspecting carefully each class, we can note that the PCR6 combination rule allows keeping or reducing in the most cases the ERC except for the samples belonging to classes  2 and  6 .This bad performance is due to the wrong characterization of both UG and (BF,FF,GF)-based descriptors. In other words, the PCR6 combination is not reliable when the complementary information provided from both descriptors is wrongly preserved. Thus, PCR6 combination rule allows managing correctly the conflict generated from SVMOAA implementations, even when they provide very small values of the conflict (see Table 4.4) specifically in the case of samples belonging to 8 . Thus, the DSmT is more appropriate to solve the problem for handwritten digit recognition. Indeed, the PCR6 combination rule allows an efficient redistribution of the partial conflicting mass only to the elements involved in the partial conflict. After redistribution, the combined mass is transformed into the DSm probability and the maximum likelihood (ML) test is used for decision making. Finally, the proposed algorithm in DSmT framework is the most stable across all experiments whereas recognition accuracies pertaining to both individual SVM classifiers vary significantly.

96


4.4 Conclusion In this chapter, we proposed an effective use of the DSmT for multi-class classification using conjointly the SVM-OAA implementation and a supervised model. Exclusive constraints are introduced through a direct estimation technique to compute the belief assignments and reduce the number of focal elements. Therefore, the proposed framework allows reducing drastically the computational complexity of the combination process for the multi-class classification. A case study conducted on the handwritten digit recognition shows that the proposed supervised model with PCR6 rule yields the best performance comparatively to SVM multi-classifications even when they provide uncalibrated outputs. In continuation to the present work, the next objectives consist to adapt the use of one-class classifiers instead of the OAA implementation of SVM in order to obtain a fixed number of focal elements within DSmT combination process. This will allow us to have a feasible computational complexity independently of the number of combined sources and the size of the discernment space.

97

Chapter 5

A DSmT Based Combination Scheme for Multi-Class Classification

Abstract: This chapter presents a new combination scheme for reducing the number of focal elements to manipulate in order to reduce the complexity of the combination process in the multiclass framework. The basic idea consists in using of p sources of information involved in the global scheme providing p kinds of complementary information to feed each set of p one class support vector machine classifiers independently of each other, which are designed for detecting the outliers of the same target class, then, the outputs issued from this set of classifiers are combined through the plausible and paradoxical reasoning theory for each target class. The main objective of this approach is to render calibrated outputs even when less complementary responses are encountered. An inspired version of Appriou’s model for estimating the generalized basic belief assignments is presented in this chapter. The proposed methodology allows decomposing a n-class problem into a series of n-combination, while providing n-calibrated outputs into the multi-class framework. The effectiveness of the proposed combination scheme with proportional conflict redistribution algorithm is validated on digit recognition application and is compared with existing statistical, learning, and evidence theory based combination algorithms.

5.1 Introduction Nowadays, a large number of classifiers and methods of generating features is developed in various application areas of pattern recognition [Jain, 2000], [Duda, 2001], [Cheriet, 2007]. Nevertheless, it failed to underline the incontestable superiority of a method over another in both steps of generating features and classification. Rather than trying to optimize a single classifier by choosing the best features for a given problem, researchers found more interesting to combine the recognition methods [Cheriet, 2007], [Ruta, 2000], [Rahman, 2003]. Indeed, the combination of classifiers allows exploiting the redundant and complementary nature of the responses issued from different classifiers.

98

Chapter 5: A DSmT Based Combination Scheme for Multi-class Classification

Researchers have proposed various approaches for combining classifiers increasingly numerous and varied, which led the development of several schemes in order to treat data in different ways [Cheriet, 2007], [Rahman, 2003]. Generally, three approaches for combining classifiers can be considered: parallel approach, sequential approach and hybrid approach [Cheriet, 2007], [Rahman, 2003]. Furthermore, these ones can be performed at a class level, at a rank level, or at a measure level [Jain, 2000], [Ruta, 2000], [Abbas, 2012d]. However, with the existence of the constraints corresponding to the joint use of classifiers and methods of generating features, an appropriate operating method using mathematical approaches is needed, which takes into account two notions: uncertainty and imprecision of the responses of classifiers. The uncertainty is an unrealistic measure induced by the outputs of classifier, which leads to interpret the response of the classifier as the result of a random phenomenon. In contrast, the imprecision is measure representing the uncertainty linked to incomplete knowledge. In general, the most theoretical advances which have been devoted to the theory of probabilities are able to represent the uncertain knowledge but are unable to model easily the information which is imprecise, incomplete, or not totally reliable. Moreover, they often lead to confuse both concepts of uncertainty and imprecision with the probability measure. Indeed, the modelling through these approaches allows the reasoning only on singletons, which represent the different hypotheses (classes), under the exhaustivity (closed world) assumption. Therefore, a new original theories dealing with uncertainty and imprecise information have been introduced, such as the fuzzy set theory [Zadeh, 1968], evidence theory [Dempster, 1967], [Shafer, 1976], possibility theory [Dubois, 1988] and, more recently, the theory of plausible and paradoxical reasoning [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009]. The evidence theory initiated by Dempster and Shafer termed as Dempster-Shafer theory (DST) [Dempster, 1967], [Shafer, 1976] is generally recognized as a convenient and flexible alternative to the bayesian theory of subjective probability [Shafer, 1990]. The DST is a powerful theoretical tool which has been applied in many kinds of applications [Smets, 1999] for the representation of incomplete knowledge, belief updating, and for combination of evidence [Provan, 1992], [Dubois, 1992] through the Dempster-Shafer’s combination rule. Indeed, it offers a simple and direct representation of ignorance and has a low computational complexity [Ruspini, 1992] for most practical applications.

99


Nevertheless, this theory presents some weaknesses and limitations mainly when the combined evidence sources become very conflicting. Furthermore, the Shafer’s model itself does not allow necessary holding in some fusion problems involving the existence of the paradoxical information. To overcome these limitations, a recent theory of plausible and paradoxical reasoning, known as Dezert-Smarandache theory (DSmT) in the literature, was elaborate by Jean Dezert and Florentin Smarandache for dealing with imprecise, uncertain and paradoxical sources of information. Thus, the main objective of the DSmT was to introduce combination rules that would allow to correctly combining evidences issued from different information sources, even in presence of conflicts between sources or in presence of constraints corresponding to an appropriate model (free or hybrid DSm models [Smarandache, 2004]). The DSmT has proved its efficiency in many current pattern recognition application areas such as remote sensing [Corgne, 2003], [Liu, 2012], [Maupin, 2004], [Elhassouny, 2011], [Zhun-ga, 2012], identification and tracking [Pannetier, 2008], [Pannetier, 2009], [Kechichian, 2009] , [Sun, 2010], [Dezert, 2010], [Pannetier, 2011], biometrics [Singh, 2008], [Vatsa, 2009a], [Vatsa, 2009b], [Vatsa, 2010], computer vision [Garcia, 2008], [Khodabandeh, 2010] , [Dezert, 2011], robotics [Huang, 2006], [Li, 2006a], [Li, 2006b], [Li, 2007], [Li, 2008], [Huang, 2009] and more recently handwritten recognition applications [Abbas, 2012b], [Abbas, 2012c], [Abbas, 2012d], [Abbas, 2013b] as well as many others [Smarandache, 2004], [Smarandache, 2006a], [Smarandache, 2009]. The DSmT has a feasible computational complexity for industrial uses which are considered as problems of small dimension [Abbas, 2012b], [Vatsa, 2010]. In contrast, the extension of this theory into the multi-class framework has the problem of their applicability in view of the high computational complexity. This is closely related with the number of elements to be processed in the framework of this theory, which follows the sequence of Dedekind’s numbers [Dedekink, 1897], [Comtet, 1974]: 1,2,5,19,167,7580,7828353,... An analytical expression of Dedekind’s numbers obtained by Tombak et al. can be found in [Tombak, 2001]. For instance, if the number of classes belonging to discernment space is 6, then the number of elements to be deal in DSmT framework is 7828353. Hence, it is not easy to consider the set of all subsets of the original classes (but under the union and the intersection operators) since it becomes untractable for more than 6 elements in the discernment space [Dezert, 2004a]. In this research, we propose an effective combination scheme of one-class classifiers in a general belief function framework by incorporating an intelligent learning technique for reducing

100


the number of focal elements. This allows us to reduce drastically the computational complexity of the combination process and to extend specially the applicability of DSmT into the multi-class classification framework. Indeed, the objective of this work is neither to choose the kind of the one-class classifier, but only to illustrate from a practical application the advantage of this new combination scheme for the real-time implementation purpose. The chapter is organized as follows. In Section 5.2, we briefly recall the related works dealing with the computational complexity of the combination algorithms formulated in DSmT framework. Section 5.3 presents an effective combination scheme of one-class classifiers proposed for solving the multi-class classification problem. We give in Section 5.4 a multi-class classification scheme based on belief function theories. The database of the isolated handwritten digits, methods used for generating features and algorithm used for OC-SVM models validation are described in Section 5.5. The experimental and statistical results are summarized in Section 5.6. 5.2 Related works Dezert and Smarandache [Dezert, 2004b] proposed a first work for ordering all elements generated using the free DSm model for matrix calculus such as made in DST framework [Kennes, 1992], [Smets, 2002]. However, this proposition has limitations since in practical applications it is more appropriate to only manipulate the focal elements [Den, 2001], [Djiknavorian, 2006], [Martin, 2006], [Abbas, 2012d]. Hence, few works have already been focused on the computational complexity of the combination algorithms formulated in DSmT framework. Djiknavorian and Grenier [Djiknavorian, 2006] showed that there’s a way to avoid the high level of complexity of DSm hybrid (DSmH) combination algorithm by designing a such code that can perform a complete DSmH combination in very short period of time. However, even if they have obtained an optimal process of evaluating DSmH algorithm, first some parts of their code are really not optimized and second it has been developed only for a dynamic fusion. Martin [Martin, 2009] further proposed a practical codification of the focal elements which gives only one integer number to each part of the Venn diagram representing the discernment space. Contrary to the Smarandache’s codification [Dezert, 2004a] used in [Dezert, 2004c] and the proposed codes in [Djiknavorian, 2006], author thinks that the constraints given by the application must be integrated directly in

101


the codification of the focal elements for getting a reduced discernment space. Therefore, this codification can drastically reduce the number of possible focal elements and so the complexity of the DST as well as the DSmT frameworks. A disadvantage of this codification is that the complexity increases drastically with the number of combined sources especially when dealing with a problem in the multi-class framework. To address this issue, Li et al. [Li, 2011] proposed a criterion called evidence supporting measure of similarity (ESMS), which consists in selecting, among all sources available, only a subset of sources of evidence in order to reduce the complexity of the combination process. However, this criterion has been justified for only a twoclass problem. Nowadays, the complexity of reducing both the number of combined sources and the size of the discernment space are research challenges that still need to be addressed. 5.3 Effective combination scheme of one-class classifiers Let   1 , 2 ,, n  the discernment space of the multi-class classification problem under consideration having n exhaustive elementary hypotheses  i , which are not necessarily mutually exclusive in DSmT. For the computation of the global combined mass mc . , the direct use of both DS and PCR6 combination rules yields the computation cost that increases exponentially with n when dealing with basic belief assignments (bba's) within the DST framework (i.e. Card 2   2 n ). Usually, it may be computationally prohibitive, especially when we have a huge number of elements belonging to D  when dealing with generalized basic belief assignments (gbba's) within the DSmT framework (i.e. Card D    d n , such that d n  is the Dedekind’s number of n ). However, for our multi-class problem, because of a special feature that the separation of data according to One Against All (OAA) approach [Guermeur, 1999] of such classifier, we can render the data highly unbalanced for each two-class problem. Hence, the need to use an oneclass classifier which is able to distinguish the samples of the target class  i from other outliers belonging to its complementary class  i , i  1,, n . From this principle, we propose a combination scheme which allows decomposing an n -class problem into a series of n combination, whose reasoning for each combination is performed from the subspace of discernment i   i ,i , i  1,, n , instead of the reference space  . As shown in Figure 5.1, the proposed combination scheme uses a complementary features captured by the different sources of information S k , k  1,, p , from the input probe data to feed the one-class classifiers

102


OC  Cik , k  1,, p , that operate independently of each other for each target class  i , i  1,, n , and

then the partial opinions (i.e. transformed measures) provided from these classifiers will be combined all together through an appropriate rule in the subset G i  , i  1,, n . Finally, all the n combined masses will be incorporated in a unique module for the task of decision making. Table 5.1 gives a comparison of Card 2  and Card D   with respect to Card F    Card G i   , which n

i 1

defines the cardinal of set of all focal elements F  G i  , i  1,, n obtained according to our computing method within DSmT framework, as follows: Card   n

2 3 4 5 6

 

Card 2  4 8 16 32 64

 

Card D  5 19 167 7580 7828353

Card F 

5 11 14 17 20

Table 5.1: Cardinality of combination space. In this way, we can reduce drastically, within DsmT framework, the number of focal elements from Card D    d n to Card F   3  n  2, n  3 . Input Data

Information Sources Source 1

OC-C11

Source 2

•••

Source p

Classifier 1

Classifier n

Class 1

Class n

OC-C12

•••

OC-C1p

•••

••• Combination 1

OC-Cn1

OC-Cn2

•••

••• ••• ••• •••• ••

Combination n

Decision

Figure 5.1: General concept of the proposed combination scheme.

103

OC-Cnp


5.4 Multi-class classification scheme based on belief function theories The proposed multi-class classification scheme is presented in Figure 5.2. However, for sake of clarity, this scheme is given here only for two sources of information. This one incorporates mainly four modules: i) one-class support vector machine (OC-SVM) classification, ii) transformation of the normalized OC-SVM outputs into belief assignments using estimation technique based on the dissonant model of Appriou, iii) combination of masses through an algorithm based on belief function theories and iv) decision making. 5.4.1 Classification based on OC-SVM From the late 1990s, support vector machines (SVMs) [Vapnik, 1995], [Burges, 1998], [Cristianini, 2000] have been appeared as a new direction in pattern recognition since they provide an optimal generalization performance via structural risk minimization (SRM), as opposed to the empirical risk minimization for neural networks [Bishop, 1995], [Haykin, 1999]. Initially, SVMs have been formulated to construct binary classifiers [Cortes, 1995] namely biclass SVMs (B-SVM). Hence, the subject of their extension to the multi-class classification problem stills a research topic. Multi-class (including two-class) classification is the standard approach used in machine learning, whereby a hypothesis is constructed that discriminates between a fixed set of classes.

OC-SVM11

Estimation of mass m11 .




OC-SVM12

OC-SVM21



OC-SVM22

• • • • OC-SVMn1





• • •



• • •

Estimation of mass mn1 .

• • • Estimation of mass mn 2 .





• • • • OC-SVMn2

•••••• Decision making

Figure 5.2: Belief function theories-based multi-classification scheme.

104

Information source S2

Information source S1

Input Data


Currently, we find two methods of multi-class implementation of SVMs, which are based on a combination of several binary SVMs [Weston, 1998], [Guermeur, 1999], [Hsu, 2001]: one against all (OAA) and one against one (OAO). Various decision functions can be used with the second implementation but the one that eliminates all possible unclassifiable data is the decision directed acyclic graph (DDAG) [Huang, 2002]. However, multi-class approaches make two assumptions: 1) Closed set: all possible cases fall onto one of the classes. 2) Good distribution: the training set is composed of cases that are statistically representative of each of the classes. Therefore, an intense research in machine learning field has been devoted to tackle problems where these assumptions are not valid, because for some classes there is either no data, insufficient data or ill-distributed data available, techniques for one-class classification have begun to receive some attention. Initially, various algorithms for one-class classification were based on neural networks, such as those of Moya et al. [Moya, 1993], [Moya, 1996], Bishop [Bishop, 1994], Tarassenko et al. [Tarassenko, 1995], Japowicz et al. [Japowicz, 1995], Hawkins et al. [Hawkins, 2002] and Kontorovich et al. [Kontorovich, 2011]. More recently, one-class versions of the SVM, namely support vector domain description (SVDD) and one-class support vector machine (OC-SVM) have been proposed, notably by Tax [Tax, 1999], [Tax, 2001] and Scholkopf et al. [Schölkopf, 2001], respectively. In the proposed combination scheme, we take the path of OC-SVMs, enabling us to incorporate an intelligent learning technique to efficiently avoid both the closed set and good distribution assumptions in the multi-class classification framework. In the following, we briefly review the concept learning with one-class SVM. 5.4.1.1 Review of OC-SVM Schölkopf [Schölkopf, 2001] proposed OC-SVM classifier by modifying the standard support vector machines initially introduced by Vapnik [Vapnik, 1995]. The pattern classification approach using OC-SVM has been successfully used for many applications as biometric verification [Bergamini, 2009], [Guerbai, 2012], [Abbas, 2013a] image retrieval [Seo, 2007], speaker diarization [Fergani, 2008] and document classification [Larry, 2001]. This classifier is an unsupervised learning algorithm developed by Schölkopf et al. [Schölkopf, 2001], which only

105


requires the learning of the target class samples. In fact, the pattern classification through OCSVM consists of defining a boundary around the target class, such that it accepts as many of the target samples as possible, while minimizing the chance of accepting outliers. For instance, in the context of biometric verification, OC-SVM allows well classifying the patterns from one class (either genuine or impostor) while patterns from the other class are rejected. The concept of the OC-SVM seeks to find an hyper sphere in which the most of learning data are included into a minimum volume. More specifically, the objective of the OC-SVM is to estimate a function f OC x  that encloses the most of learning data into an hyper sphere



 with a minimum volume where

Rx  x  R d , f OC x   0

d is the size of feature vector [Schölkopf,

2001], [Rab, 2007]. Hence, the decision function f OC x  is given as [Schölkopf, 2001], [Rabaoui, 2007]: f OC x  

 K x, x  . Sv

j

j

(5.1)

j

Sv is the number of support vectors x j form the training dataset and  j are the Lagrange

multipliers computed by optimizing the following expression:  1 min  2  

Subject to 0   j 

1 and vm

m



j l

j ,l

  K x j , xl ,  





(5.2)

m



j

1.

j

m is the cardinal of training dataset,  defines the distance of the hyper sphere from the origin, v is the percentage of data considered as outliers, and K .,. defines the OC-SVM kernel that

allows projecting data from the original space to the feature space [Tran, 2005]. Figure 5.3 illustrates an example of using OC-SVM for separating data from the outliers [Abbas, 2013a]. A pattern x is then accepted when fOC x   0 . Otherwise, it is rejected. Various kernel functions can be used as polynomial, Radial Basis Function or multilayer perceptron [Vapnik, 1995]. Usually, the RBF is the most used kernel, which allows determining the radius of the hyper sphere according the parameter  . It is defined by:

106






K x, x j  exp   x  x j  Support vectors

Training data

2

. 

(5.3)

Support vectors

Outliers

f OC  0

K x, xk 

f OC  0



Projection in.. feature space Original space

Origin

OC-SVM

w Feature space

f OC  0

Figure 5.3: Pattern classification based on OC-SVM. In the following, we show how the OC-SVM based concept learning can be extended to construct multi-class OC-SVM with multiple hyper spheres. 5.4.1.2 Extension of OC-SVM for constructing multi-class OC-SVM Basically, the OC-SVM classifier has been conceived to deal with an one-class classification problem [Schölkopf, 2001]. The extension of OC-SVM to multi-class scenario may provide uncalibrated outputs for some classifiers. However, few approaches have been proposed in the literature to address this problem. Yang et al. [Yang, 2007b] propose to estimate the radius and center of each hyper sphere involving in the decision making step, while Rabaoui et al. [Rabaoui, 2007] use a logarithmic function for computing the decision measure. Indeed, this function has the advantage of assigning new values to the output of different OC-SVM classifiers. On one hand it reduces the large values and on the other hand, it allows increasing small values. In this chapter, we use a sigmoid transformation which has been proposed in [Wahba, 1993] for mapping the reassigned output of different OC-SVM classifiers (using logarithmic function) to probabilities as follows: Pk i x  

1 Zk , 1  exp  g ik x 

(5.4)

n1

where Z k   1  exp  g ik x 1 represent normalization factors that are introduced in the i 0

probabilistic framework in order to respect the normality condition (i.e.

 P  k

i

107

i

x   1 ), Pk

are


the posterior probabilities issued from the source of information S k , k  1,, p . Thus, the term g ik x  represents the reassigned output of the i  th OC-SVM classifier, i  1,, n , which is

defined for a given pattern x as:  Svik g ik x    log  j K x, x j  j







  logik ,

(5.5)



Svik and  ik are the number of support vectors and the distance of the hyper sphere from the

origin for each OC  SVMik which is trained with samples of the i  th class i , i  1,, n , provided by the source of information S k , k  1,, p , respectively. In the multi-class classification framework, OC-SVM classifier is extended. Therefore, the posterior probability Pk i x  of any target class i , i  1,, n , of the frame   1 , 2 ,, n  can be directly obtained according to the Equation (5.4). Finally, the maximum likelihood (ML) test is used for decision making as follows:

 





x  i if Pk  i x   max Pk  j x ,1  j  n ,

(5.6)

where x is the pattern test characterized by the source of information S k , k  1,, p . 5.4.2 Estimation of masses In this chapter, the mass functions are estimated using three estimation techniques according the PT, DST and DSmT based combination frameworks, respectively. 5.4.2.1 Estimation technique using PT framework The difficulty of estimating masses here is generally avoided because of the non-assigning of weights to the composed classes [Lowrance, 1991]. Therefore, the mass functions are estimated through a direct estimation technique, which attributes the posterior probabilities only to the original classes i , i  1,, n , of the global discernment space for each source of information S k , k  1,, p , as given in Equation (5.4), i.e.: mk  i   Pk  i x .

(5.7)

5.4.2.2 Estimation technique using DST framework In DST framework, the mass functions bba of evidence mk . issued from the k  th source of information S k , k  1,, p , are estimated using a direct transfer model, this allows us to distribute

108


the

initial



posterior



probabilities

G i   Ø, i , i , i  i , i  1,, n ,

on

the

simple

and

compound

classes

over

as: mk Ø  0,

Pk  i x  , 1  i

(5.8b)

1  Pk  i x  , 1  i

(5.8c)

mk  i  

 

mk  i 



(5.8a)



mk  i   i 

i , 1  i

(5.8d)

where  i is the sum of false accepted rates (FAR) made by the OC  SVMik , k  1,, p , classifiers, which are trained with p sources of information, respectively. Here, i 1  i  is used to quantify the belief that the pattern x belong to the subset i   i ,i , i  1,, n .

5.4.2.3 Estimation technique using DSmT framework Similar to DST framework, the mass functions gbba of evidence mk . issued from the k  th source of information S k , k  1,, p , are estimated using a dissonant model of Appriou, which is defined for two classes [Appriou, 1991]. Therefore, the extended version of Appriou’s model in DSmT framework over G i   Ø,i ,i ,i i ,i i , i  1,, n , is given as: mk Ø  0,

1   i  Pk  i x  , 1   1  Pk  i x 

mk  i  

 

mk  i 

(5.9a)

1  i

1   1  Pk  i x 





mk  i   i 





 , 1 

mk  i   i 

109

i , 1 

,

(5.9b)

(5.9c)

(5.9d) (5.9e)


where   0 is a tuning parameter, and  i is the sum of false accepted rates (FAR) made by the OC  SVMik , k  1,, p , classifiers, which are trained with p sources of information, respectively.

Furthermore,  i 1    is used to quantify the belief for conflicting region, and  1    is used to quantify the belief that the pattern x belong to the subset i   i ,i , i  1,, n . Therefore, the value of  is fixed here to 0.001.

5.4.3 Combination of masses In order to manage the conflict generated from p sources of information (i.e. S k , k  1,, p ), the global combined mass mc . is obtained by computing the partial combined masses mci  . over the subspace of combination G i  , i  1,, n , and then incorporate them in the same vector as follows: 1  1 2  n  mc .  n  [mc ., mc .,, mc .]  With  i  i  mc  A  [m1    m p ] A, A  G , i  1,, n 

(5.10)

where 1 n represents normalization factor that is only introduced in both DST and DSmT frameworks in order to respect the normality condition of masses over the set of all focal elements F  G i  , i  1,, n (i.e.

 m A  1 ), and c

 represents the combination operator, which

AF

is composed of both conjunctive and redistribution terms of the basic sum rule, DS rule, or PCR6 rule, when dealing with PT framework, DST framework, or DSmT framework, respectively.  In PT framework, F    1 , 2 ,, n  and the global combined mass 𝑚𝑠𝑢𝑚 obtained from p

basic probability assignments m1 . ,,m p . by means of the sum rule [Xu, 1992] is

defined as: 1  mc  A  msum  A   p  0

p

 m   k

i

if A   i ,

k 1

110

otherwise.

(5.11)


i   In DST framework, G i   Ø,i ,i ,i i , i  1,, n , and the combined partial bba mDS

obtained from p basic belief assignments m1 . ,,m p . by means of the DS rule [Shafer, 1976] is defined as: if A  Ø, 0  p  mk Bk  i   A   mci   A  mDS  B1 , B2 ,, B pG i  k 1  B B B  A p  1 2 otherwise,  1  K ci 



K ci 



(5.12a)

p



 m B . k

B1 , B2 ,, B pG i  k 1 B1  B2  B p ø

(5.12b)

k

where K ci  0,1 defines the mass assigned to the empty set defined by the integrity constraint  i   i  ø , which is often interpreted as a partial conflict measured between the different sources.

 In DSmT framework, G i   Ø,i ,i ,i i ,i i , i  1,, n , and the combined partial gbba i  mPCR . ,,m p  . by means of the PCR6 6 obtained from p generalized belief assignments m1 

rule [Martin, 2006] is defined as: if A  , 0   m i   A   i   A   p mci   A  mPCR  6  mk  A2 Lk otherwise.  k 1 p 1 Y k l   A  l 1  Y k 1 ,,Y k  p 1 Gi  p 1 





(5.13a)

Where p 1

 m   Y    Lk 

k

j 1

mk  A  

j

k

j

p 1

 m   Y    j 1

k

j

k

,

(5.13b)

j

   M ,Ø is the set of all relatively and absolutely empty elements,  M is the element of

which has been forced to be empty in the hybrid model

111

M

G

i 

defined by the exclusive constraint


 i   i  ø , Ø is the empty set, the denominator mk  A 

p 1

 m   Y    is different to zero, and k

j 1

j

k

j

where  k counts from 1 to p avoiding k , i.e.: j  j 1

 k  j  

if j  k , if j  k .

(5.13c)

Here, mi   A corresponds to the classical DSm rule on the free Dsm model [Dezert, 2002b], which is defined as: 0    i m  A   B1 ,B2 ,,B pG i  \ B1B2 B p  A



if A  Ø, p

 m B  k

k

(5.13d)

otherwise.

k 1

5.4.4 Decision rule Combination of evidences using the proposed combination scheme yields the combined belief and a decision making is made using the statistical classification technique. First, the combined beliefs are converted into probability measure using a probabilistic transformation, called DSmP, that maps a belief measure to a subjective probability measure [Dezert, 2009] defined as: 0  Wj DSmP  Ai    mc A j  Tj A  F j 

 



if Ai  Ø,

(5.14a)

otherwise,

where Wj 



mc Ak  Ai  A j CM  Ak 1

Tj 



 Ak    CM Ai  A j ,

(5.14b)

 Ak    CM A j .

(5.14c)

mc Ak  A j C M  Ak 1

  0 is a tuning parameter and F corresponds to the set of all focal elements including M

eventually all the integrity constraints (if any, i.e. i  1,, n, i   i  Ø ) of the model M   F  G i  , i  1,, n, i   i  Ø  



M

M



(

for Shafer’s model and F  G i  , i  1,, n, i   i   i  Ø for all 



the paradoxical hypotheses); CM  Ak  denotes the DSm cardinal of the set Ak [Dezert, 2004b].

112


In the context of some particular multi-class classification problems, the simple classes  i are truly exclusive and Shafer’s model is adopted. Therefore, the DSmP i  probability of any element i , i  1,, n , of the frame   1 , 2 ,, n  can be directly obtained according the following equation: DSmP  i   mc  i   mc  i    





mc A j F A j i Ak F CM A j 2 Ak  A j CM  Ak 1

 

mc A j

 Ak    CM A j 

.

(5.15)

 

In this manner, the combined belief assignment is transformed into a probability measure so that the statistical classification approach is applied for computing the final decision. Finally, the DSmP-based maximum likelihood (ML) test is used for decision making as follows:





x  i if DSmP ( i )  max DSmP ( j ), 0  j  n ,

(5.16)

where x is the pattern test characterized by p sources of information S k , k  1,, p , and  is fixed to 0.001 in the decision measure given by (5.15). 5.5 Database and algorithms used for validation The proposed OC-SVM classifiers are trained using different methods of generating features on a database of the isolated handwritten digits. In this section, we briefly describe the database, the methods used for generating features and the algorithm used for validation of OC-SVM models. 5.5.1 Database description and performance criteria To validate the proposed combination scheme, the well-known US Postal Service (USPS) database is used for handwriting recognition task. This database contains normalized grey-level handwritten digit images of 10 numeral classes, extracted from US postal envelopes. All images are segmented and normalized to a size of 16 16 pixels. There are 7291 training data and 2007 test data where some of them are corrupted and difficult to classify correctly. For evaluating the performances of the combination scheme, a popular rate is considered, which are the Recognition Rate (RR) for each class and Mean Recognition Rate (MRR) for all classes. Both errors are expressed in %.

113


5.5.2 Methods used for generating features The objective of the features generation step is to underline the relevant information that initially exists in the raw data. Thus, an appropriate choice of the descriptor improves significantly the accuracy of the combination scheme. In this study, we use a collection of popular feature generation methods, which can be categorized into background features [Britto, 2004], [Cavalin, 2006], foreground features [Britto, 2004], [Cavalin, 2006], geometric features [Cheriet, 2007], and uniform grid features [Favata, 1996], [Abbas, 2011], [Abbas, 2012a].

5.5.3 Algorithm used for validation of OC-SVM models The OC-SVM model is produced for each class according the used descriptor. Hence, the training dataset is partitioned into ten subsets of samples as shown in Table 4.1: each one is used as a learning subset to learn the corresponding OC-SVM classifier that operates independently of other classifiers. Figure 5.4 depicts the flow chart for training and validation of the OC-SVM model, where N is the number of values of the RBF parameter  j sorted in increasing order, such that j  0,1,, N 1 . Initialization j 0

Choice of OC-SVM parameters

 , j 

Training Dataset Training

Validation Training Dataset

j  j 1

j  N 1 Yes

No

No

END

ER j   100 Yes Generation of j -th OC-SVM Model

Figure 5.4: Training and validation of the OC-SVM models.

114


The hyper parameters of each OC-SVM model are tuned during the validation phase using the corresponding training subset of samples. In this work, we have allowed until 10 % of error on the training dataset (i.e. percentage of training data considered as outliers   0.1 ) for each target class. Hence, the j  th OC-SVM model is generated under constraint ER j    100 , such that ER j is the corresponding error rate computed during the validation phase. Indeed, the selection

of the optimal value  opt of the RBF parameter during the validation phase is performed for each target class i , i  1,, n , on the set of all models fulfilling the last condition, using the maximum of the number of support vectors Sv criteria. Consequently, higher the number of support vectors is, the better the information is representative for each class. The Table 5.2 shows an example of the optimal parameters, which are obtained during the validation phase by learning the ten OC-SVM classifiers with uniform grid (UG) features. Therefore, these all parameters, which characterize the global recognition system, will be used afterwards in the test phase. As shown in this table, when we use the UG features as descriptor, the parameters n and m which define respectively the number of the lines (vertical regions) and columns (horizontal regions) of the grid have been also optimized for each OC-SVM model during the validation phase.

Parameters n m ν γopt Sv

OC-SVM 0 3 3 0.05 14.6 245

1 4 4 0.09 14.3 130

2 3 3 0.01 9.4 161

3 3 3 0.05 11.9 136

4 2 4 0.07 11.6 137

5 2 4 0.05 12.4 124

6 3 4 0.05 8.5 145

7 3 4 0.05 9.4 136

8 3 3 0.05 13.7 131

9 4 3 0.07 12.3 141

Table 5.2: Optimal parameters of the OC-SVM classifiers using UG Features. 5.6 Experimental results The effectiveness of the proposed combination scheme is demonstrated experimentally by evaluating the recognition performance on all isolated handwritten digits from the test dataset (see Table 4.1). We perform experiments to select a subset of global complementary sources of information (i.e. descriptors) using the proposed extension of OC-SVM into multi-class (MC) classification framework, namely MC-OC-SVM, and then the proposed combination scheme is evaluated in belief function theory framework.

115


5.6.1 Performance evaluation of the proposed descriptors In this experiment, we compute during the test phase both recognition rates RR and MRR of the MC-OC-SVM classifier using Uniform Grid Features (UGF), Geometric Features (GF), Foreground Features (FF), Background Features (BF), and the descriptors which result from a concatenation between at least two simple descriptors such as (BF,FF), (BF,GF), (FF,GF), and the (BF,FF,GF)-based descriptor. Indeed, the experiment has shown that the appropriate choice of both descriptors and concatenation in order to represent each digit class in the feature generation step provides an interesting recognition performance. In Table 5.3, the recognition rate for each class (i.e. RR) vary from one descriptor to another, and the mean recognition rates (i.e. MRR) of concatenated descriptors are relatively high compared to those of simple descriptors. When using (BF,GF)-based descriptor, which result respectively from a concatenation between BF and GF-based descriptors in the experiment (f), to feed the MC-OC-SVM classifier, we observe an improvement in the recognition performance from 89.50% until 91.90%. On the other hand, a reduction of 0.19% in the recognition performance, where MRR = 91.71%, is obtained in the experiment (h) using a new (BF,FF,GF)-based descriptor, which is constructed by a concatenation of (BF,GF) and FF-based descriptors in the same vector, respectively. Class

Simple Descriptor

0 1 2 3 4 5 6 7 8 9 MRR (%)

(a) UGF 67.40 79.08 69.07 60.97 78.28 45.00 73.37 43.83 46.34 48.58 61.19

(b) GF 86.90 96.19 67.52 60.36 69.69 69.37 84.02 83.56 87.19 84.18 78.90

(c) FF 76.04 94.26 87.11 71.34 86.86 70.00 89.94 91.09 87.80 83.05 83.75

Concatenated Descriptor (d) BF 81.05 85.55 92.78 81.09 94.44 91.87 92.30 91.09 93.29 91.52 89.50

(e) (BF,FF) 86.35 95.81 94.84 83.53 91.91 86.87 94.67 91.09 92.68 89.26 90.70

(f) (BF,GF) 89.97 95.43 93.81 79.26 92.92 95.00 93.49 92.46 95.12 91.52 91.90

(g) (FF,GF) 80.22 94.67 92.78 75.00 87.87 84.37 91.12 91.78 94.51 83.61 87.59

(h) (BF,FF,GF) 86.07 96.57 96.39 85.97 91.91 89.37 94.67 91.78 95.73 88.70 91.71

Table 5.3: Recognition rates of the MC-OC-SVM classifier using different methods of generating features. As we can see, it is difficult to improve the recognition performance by a concatenation of features since most of the time the combined descriptors does not take into account the complementary nature of features, which can be exist between both descriptors. Hence, we choose among all descriptors available (i.e. see Table 5.3) only those for which the corresponding MC-OC-SVM classifiers could attain an improvement in the recognition performance. Indeed, BF, FF and GF-based descriptors yield respectively in the experiments (d), (c) and (b) a MRR of

116


89.50%, 83.75% and 78.90%. When using (FF,GF)-based descriptor in the experiment (g), we obtain a significant improvement in the recognition performance of the MC-OC-SVM classifier from 83.75% until 87.59%. Further, an important gain of 4.31% in the recognition performance, where MRR = 91.71%, is obtained in the experiment (h) when the BF-based descriptor is concatenated to the (FF,GF)-based descriptor to get a new (BF,FF,GF)-based descriptor. Hence, in the following section we use the three descriptors BF, (FF,GF) and (BF,FF,GF) as global sources of information to feed the OC-SVM classifier of each target class. This allows us to evaluate the recognition performance of the proposed combination scheme and to better exploit the complementary nature, which is obtained from these descriptors. In this way, it is possible to improve the recognition performance when the concatenation of descriptors can fail to provide the correct solution for some specific handwritten digit recognition problems. 5.6.2 Performance evaluation of the proposed combination scheme In this experiment, we evaluate the recognition performance of the proposed combination scheme, using sum, DS and PCR6 rules, for handwritten digit recognition. In fact, this combination scheme allows to exploit the complementary nature issued from the three sources of information S1  BF, S2  FF,GF, S3  BF,FF,GF , and manage the conflict provided from the outputs of OC-SVM classifiers of each target class. The proposed combination scheme, which uses ten combinations (i.e. a combination per target class), consists to measure ten values of conflict K ci , i  0,1,, 9 , for each isolated handwritten digit from the test dataset. These ten measures of conflict between the OC-SVM classifiers, which are trained by the both sources of information S1 and S 2 , for the handwritten digits representing each class are shown in the same figure (see Figure 5.5 until Figure 5.14). For each class of digits i , i  0,, 9 , the minimal conflict corresponds to the conflict measured between the OC-SVM classifiers trained with samples of the target class  i , unlike to nine other measures of conflict which have high values. For instance, the value of the minimal conflict of handwritten digits of the class  0 corresponds to the conflict measured by the first combination. In the context of recognition of isolated handwritten digits, the conflicting regions of each handwritten digit test are modeled by the ten paradoxical classes i i , i  0,1,, 9 . Consequently, we introduce an integrity constraint i i , i  0,1,, 9 , for each corresponding combination. In this manner, the ten

117


values of the conflict are managed through the proposed combination scheme for selecting the optimal response. This step consists as a matter of fact to put at a disadvantage the classes where the measured conflict between the OC-SVM classifiers is high and to bring out the class for which the conflict measure is small. In Table 5.4, each line presents the FAR error computed for each class of handwritten digits using the three sources of information S1, S 2 and S 3 . Indeed, the proposed OC-SVM models are optimized and trained using the training dataset to compute, during the validation phase, the error parameter i , i  0,,9 shown in the last column of this table, as the sum of FAR errors which is subsequently used for each combination. As we can see here, the 8  th OC-SVM classifier provides the higher value of error of FAR, where 8  6.69% , for each source of information.

Class 0 1 2 3 4 5 6 7 8 9

βi (%)

FAR (%) Source 1 0.07 0.10 0.67 0.24 0.02 0.60 0.02 0.03 2.48 0.08

Source 2 0.00 0.06 1.14 0.24 0.00 0.64 0.33 0.00 2.88 0.14

Source 3 0.00 0.06 0.18 0.11 0.00 0.21 0.14 0.00 1.33 0.06

0.07 0.22 1.99 0.59 0.02 1.45 0.49 0.03 6.69 0.28

Table 5.4: FAR errors provided by the tree sources of information for each class.

Figure 5.5: Measures of conflict between the OC-SVM classifiers for the handwritten digits belonging to  0 .

118


Figure 5.6: Measures of conflict between the OC-SVM classifiers for the handwritten digits belonging to 1 .



119





120




Figure 5.14: Measures of conflict between the OC-SVM classifiers for the handwritten digits belonging to  9 . In the following, we study the influence of the number of sources of information on the recognition performance using the three combination rules sum, DS and PCR6. In Tables 5.5 and

121


5.6, we give respectively the recognition results using the pairs of sources S1, S2  and S1, S3  within combination scheme. Class 0 1 2 3 4 5 6 7 8 9 MRR (%)

Sources S1 81.05 85.55 92.78 81.09 94.44 91.81 92.30 91.09 93.29 91.52 89.50

S2 80.22 94.67 92.78 75.00 87.87 84.37 91.12 91.78 94.51 83.61 87.59

Combination Rules Sum 84.95 94.67 96.39 86.58 93.93 93.12 93.49 93.83 98.78 89.26 92.5

DS 84.67 95.05 96.39 88.41 92.92 93.12 93.49 93.15 98.17 89.26 92.46

PCR6 92.47 95.81 94.84 91.46 94.44 94.37 95.85 93.83 96.95 93.78 94.38

Table 5.5: Comparison of the recognition performance of Sum, DS and PCR6 combination rules using the pair of sources (S1, S2). Class 0 1 2 3 4 5 6 7 8 9 MRR (%)

Sources S1 81.05 85.55 92.78 81.09 94.44 91.81 92.30 91.09 93.29 91.52 89.50

S3 86.07 96.57 96.39 85.97 91.91 89.37 94.67 91.78 95.73 88.70 91.71

Combination Rules Sum 87.18 94.67 94.32 90.85 94.94 91.87 95.26 91.78 98.17 89.26 92.83

DS 87.18 95.05 94.84 90.24 94.94 91.87 95.26 91.78 98.17 89.83 92.91

PCR6 93.03 95.43 94.84 91.46 95.95 95.00 95.26 91.78 96.34 94.35 94.33

Table 5.6: Comparison the recognition performance of Sum, DS and PCR6 combination rules using the pair of sources (S1, S3). As shown in both Tables 5.5 and 5.6, the RR for each class varies from one combination rule to another. The MRR obtained with both Sum and DS combination rules is approximately the same, where MRR takes the values 0.04% and 0.08% when the pairs of sources S1, S2  and

S1, S3  are respectively combined, while it was improved to 94.38% and 94.33% when the pairs of sources S1, S2  and S1, S3  are respectively combined trough PCR6 combination rule. Further, a significant performance gain of recognition when using PCR6 rule compared to both Sum and DS rules were obtained for all classes of handwritten digits except those of  2 and  8 .

122


For better comparison, recognition results corresponding to the combination of the three sources S1 , S 2 and S3 by Sum, DS and PCR6 rules are respectively given in Table 5.7. Class 0 1 2 3 4 5 6 7 8 9 MRR (%)

Sources

Combination Rules

S1 81.05 85.55 92.78 81.09 94.44 91.81 92.30 91.09 93.29

S2 80.22 94.67 92.78 75.00 87.87 84.37 91.12 91.78 94.51

S3 86.07 96.57 96.39 85.97 91.91 89.37 94.67 91.78 95.73

Sum 86.35 95.06 95.36 89.02 93.94 91.87 94.08 93.15 98.17

DS 86.35 95.05 95.87 89.63 93.43 91.87 93.49 93.84 98.17

PCR6 94.15 96.20 94.84 92.07 95.95 95.00 97.63 93.83 95.12

91.52 89.50

83.61 87.59

88.70 91.71

89.83 92.68

89.83 92.76

95.48 95.03

Table 5.7: Comparison of the recognition performance of Sum, DS and PCR6 combination rules using the three sources (S1, S2, S3). The proposed combination scheme using the DS rule yields a MRR of 92.76% corresponding to an improvement of 1.05%. While Sum rule decreases the MRR to 92.68%. This is due to the direct estimation technique of masses which assigns the confidences only to the simple classes in PT framework. Hence, the Sum rule couldn’t handle managing correctly the conflict generated from the three sources for a given target class. Furthermore, experimental results show that the DS rule is not able to handle most of the conflicting cases between two or three sources. Hence, the DST is not appropriate to solve our problem of handwritten digit recognition into the multiclass classification framework. Indeed, the use of DS rule in the combination scheme allows redistributing the beliefs through a simple normalization by 1  Kci , i  0,1,, 9 in the combination process of masses. However, when responses of OC-SVM classifiers are less complementary (i.e. see MRR in both Tables 5.5 and 5.6 which are obtained by Sum and DS rules, respectively), they do not provide reliable decision. In Table 5.7, an improvement of RR when using PCR6 rule compared to both Sum and DS rules were obtained for all classes of handwritten digits except those of 1, 2 and  8 . This is because there are some digits belonging to theses classes which are wrongly characterized by the combined sources S1 , S 2 and S3 . In other words, the PCR6 combination based rule is not reliable when the complementary information provided from the sources of information is wrongly preserved. In order to get a higher MRR, the combined sources

123


of information should provide complementary information. The proposed combination scheme with PCR6 rule yields the best MRR of 95.03% when combining the three sources S1 , S 2 and S3 all together. Indeed, the PCR6 rule allows an efficient redistribution of the partial conflicting mass only to the elements involved in the partial conflict (i.e. the target class  i and their complementary class i , i  0,1,, 9 ). After redistribution, the combined mass is transformed into the DSm probability and the DSmP-based ML test is used for decision making. Finally, the proposed combination scheme using PCR6 rule in DSmT framework is the most stable across all experiments whereas recognition rates pertaining to DS combination rule vary significantly. 5.7. Conclusion In this chapter, an effective combination scheme of one-class classifiers in a general belief function framework has been proposed. The OC-SVM classifiers can be incorporated as an intelligent learning technique for reducing the number of focal elements. This scheme consists in using of a subset of global complementary sources of information to feed the OC-SVM classifiers corresponding to each target class, which allows decomposing an n-class problem into a series of n-combination, while providing n-calibrated outputs into the multi-class framework. Therefore, this allows us to reduce drastically the computational complexity of the combination process and to extend specially the applicability of DSmT into the multi-class classification framework. Experimental results show that the proposed combination scheme with PCR6 rule yields the best performance on the handwritten digit recognition application compared to the sum rule and DS rule even when the individual MC-OC-SVM multi-classifications provide uncalibrated outputs. In continuation to the present work, the next objectives consist to adapt the use of the evidence supporting measure of similarity (ESMS) criteria to select complementary sources of information for each target class using the same proposed combination scheme in order to attempt to improve the RR and MRR.

124

Conclusion The Support Vector Machine (SVM) classifiers are considered to be the most efficient in different areas of pattern recognition, particularly the handwritten recognition, for their performances judged significantly higher than those of other traditional classifiers. The SVM is based on the principle of structural risk minimization (SRM), which addresses two central problems of the statistical learning theory: controlling the efficiency of the classifier and the phenomenon of overfitting. However, the main limitation of using SVMs is related to the choice of suitable descriptor. Indeed, for the same application, the SVM may respond differently depending on the used descriptor. Hence, various methods have been used for combining multiple sources of information in order to improve the accuracy of the handwritten recognition. The objective of the proposed works has focused on the development and implementation of various schemes to combine SVM classifiers using the DempsterShafer theory (DST) of evidence and Dezert-Smarandache theory (DSmT) of plausible and paradoxical reasoning. When using the DSmT in conjunction with the SVMs (DSmT-SVM), two main problems are occurred. The first problem is related to the choice of the estimation model. The second problem concerns the difficulty of extending the DSmT-SVM for multiclass classification. Indeed, an important number of focal elements is produced leading to the impossible use of the DSmT. Both DST and DSmT allow dealing with conflicts between the responses of classifiers attempting to select the best responses. Two main applications are considered for evaluating the effective use of both theories which are the handwritten signature verification and the handwritten digit recognition. In this context, different methods of generating complementary features have been evaluated to train the individual SVM classifiers operating independently of each other. Moreover, in order to deal with the case of conflict between the responses generated from SVM classifiers, and reducing the computational complexity of the DSmT in the multi-class classification framework, we have proposed four solutions for improving the handwritten recognition: The first solution is based on the implementation of a combination scheme using bi-class SVM classifiers through the DSmT. This scheme is applied to the writer dependent handwritten signature verification. Hence, two cases have been addressed in order to ensure a greater security: (1) combining two individual off-line HSV systems by associating Radon and Ridgelet features of the same off-line signature (2) and combining both individual off-line and on-line HSV systems by associating static image and dynamic information of the same 125

Conclusion

signature characterized by off-line and on-line modalities. Experimental results show in both case studies that the proposed combination scheme using the sophisticated PCR5 rule allows improving the verification errors compared to the individual HSV systems. As remark, although the DSmT allows improving the verification accuracy in both studied cases, it is clearly that the achieved improvement depends also to the complementary outputs provided by the individual HSV systems. Indeed, according to the second case study, a suitable performance quality on the individual on-line HSV system can be obtained when the dynamic features of on-line signatures are carefully chosen. Combined to the grid features using DSmT allows providing more powerful system comparatively to the system of the first case study in term of success ratio. The second solution is based on the implementation of an effective combination scheme using two one-class classifiers, which are associated respectively to DCT and CT features, through a general belief function framework. This scheme is applied to the writerindependent off-line handwritten signature verification. The main contribution behind the proposed verification scheme employed for designing the individual off-line HSV systems is based on the use of dissimilarity representation concept using only genuine signatures. A new decision criterion has been implemented in DST and DSmT frameworks for a decision making whether the signature is accepted or rejected. Experimental results show that the proposed combination scheme with PCR6 rule yields the best verification accuracy compared to the statistical match score combination algorithms and DS theory-based combination algorithm even when the individual writer-independent off-line HSV systems provide conflicting outputs. The third solution is based on the implementation of a supervised combination model using the DSmT for multi-class classification. This model allows us to effectively use the DSmT in conjunction with the multi-class SVM classifier implementation based on bi-class SVM for handwritten digit recognition. Exclusive constraints are introduced through a direct estimation technique to compute the belief assignments and reduce the number of focal elements. Therefore, the proposed framework allows reducing the computational complexity of the combination process for the multi-class classification. Experimental results show that the proposed supervised model with PCR6 rule yields the best performance comparatively to SVM multi-classifications even when they provide uncalibrated outputs. The fourth solution consists to incorporate one-class SVM classifiers into the DSmT based combination framework in order to reduce the huge number of focal elements occurring when 126

Conclusion

using the bi-class SVM classifiers. Hence, we have proposed to use a learning technique based on one-class SVM classifier for each target class. Therefore, a combination of these classifiers is performed by the DSmT for each class independently to other classes and then allows extending the applicability of DSmT in the context of the multi-class classification. Experimental results show that the proposed combination scheme with PCR6 rule yields the best performance on the handwritten digit recognition application compared to the sum rule and DS rule even when the extension of OC-SVM into multi-class classification framework provide uncalibrated outputs. The proposed combination schemes are experimented on the well known standard databases of off-line and on-line signature images (NISDCC), off-line signature images (CEDAR) and isolated digits images (USPS). The obtained results show that the proposed combination schemes using DSm theory-based combination algorithm yields the best performance compared to the statistical learning algorithms and DS theory-based combination algorithm even when SVM based individual classifications provide conflicting results. The future prospects of this research consist to evaluate the proposed combination schemes on various applications of handwritten recognition, such as: text-dependent writer identification,

text-independent

writer

identification,

gender

verification,

gender

identification, handwritten word recognition, word spotting for historical documents, etc.

127

Bibliography [Abbas, 2009]

N. Abbas, Développement de modèles de fusion et de classification contextuelle d‘images satellitaires par la théorie de l‘évidence et la théorie du raisonnement plausible et paradoxal, Mémoire de magister en électronique, spécialité Traitement du signal et d'images, Université des Sciences et de la Technologie Houari Boumediène, Bab Ezzouar, Alger, Algérie, 69 pages, 2009. http://www.gallup.unm.edu/~smarandache/DSmT-ThesisAbbas.pdf

[Abbas, 2011]

N. Abbas and Y. Chibani, ―Combination of Off-Line and On-Line Signature Verification Systems Based on SVM and DST,‖ in Proc. 11th International Conference on Intelligent Systems Design and Applications (ISDA'11), Córdoba, Spain, pp. 855 – 860, November 22 - 24, 2011.

[Abbas, 2012a]

N. Abbas and Y. Chibani, ―An Off-Line Signature Verification System Based on Uniform Grid Features and SVM,‖ International Congress on Telecommunication and Application (ICTA'12), Béjaia, Algeria, April 11 - 12, 2012.

[Abbas, 2012b]

N. Abbas and Y. Chibani, ―SVM-DSmT combination for the simultaneous verification of off-line and on-line handwritten signatures,‖ International Journal of Computational Intelligence and Applications (IJCIA), vol. 11, no. 3, 2012.

[Abbas, 2012c]

N. Abbas and Y. Chibani, ―SVM-DSmT Combination for Off-Line Signature Verification,‖ IEEE International Conference on Computer, Information and Telecommunication Systems (CITS), Amman, Jordan, pp. 1-5, May 14-16, 2012.

[Abbas, 2012d]

N. Abbas, Y. Chibani, and H. Nemmour, ―Handwritten Digit Recognition Based On a DSmT-SVM Parallel Combination,‖ in Proc. 13th International Conference on Frontiers in Handwriting Recognition (ICFHR), Bari, Italy, pp. 241-246, September 18-20, 2012.

[Abbas, 2013a]

N. Abbas, M. Bengherabi, and E. Boutellaa, ―Experimental Investigation of OC-SVM for Multibiometric Score Fusion,‖ in Proc. 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA), Algiers, Algeria, pp. 250-255, May 12-15, 2013.

[Abbas, 2013b]

N. Abbas, Y. Chibani, Z. Belhadi, and M. Hedir, ―A DSmT Based Combination Scheme for Multi-Class Classification,‖ in Proc. 16th International Conference on Information FUSION (ICIF), Istanbul, Turkey, pp. 1950-1957, July 9-12, 2013.

[Ahmed, 1974]

N. Ahmed, T. Natarajan, and K.R. Rao, ―On image processing and a discrete cosine transform,‖ IEEE Transactions on Computers C, vol. 23, no. 1, pp. 90–93, 1974.

[Anisimovich, 1997]

K. Anisimovich, V. Rybkin, A. Shamis, and V. Tereshchenko, ―Using combination of structural, feature and raster classifiers for recognition of handprinted characters,‖ in Proc. 4th International Conference Document Analysis and Recognition (ICDAR), UIm, Germany, vol. 2, pp. 881–885, August 18-20, 1997.

[Appriou, 1991]

Appriou, ―Probabilités et incertitude en fusion de données multisenseurs,‖ Revue Scientifique et Technique de la Défense, vol. 11, pp. 27-40, 1991.

[Appriou, 1999]

Appriou, Multisensor signal processing in the framework of the theory of evidence, NATO/RTO, Application of Mathematical Signal Processing Techniques to Mission Systems, 1999.

[Aregui, 2007]

Aregui and T. Denoeux, ―Fusion of one-class classifier in the belief function framework,‖ in Proc. 10th International Conference on Information Fusion (ICIF), Québec, Canada, pp. 1-8, July 9-12, 2007.

[Arif, 2004]

M. Arif, T. Brouard, and N. Vincent, ―A fusion methodology for recognition of offline signatures,‖ in Proc. 4th International Workshop Pattern Recognition and Information System (PRIS), Porto, Portudal, pp. 35–44, April 2004.

[Arif, 2006]

M. Arif, T. Brouard, and N. Vincent, ―A fusion methodology based on Dempster-Shafer evidence theory for two biometric applications,‖ in Proc. 18th International Conference on Pattern Recognition (ICPR), vol. 4, pp. 590-593, 2006.

[Arsforensica, 2009]

http://www.sigcomp09.arsforensica.org, April 2009.

[Augustin, 2001]

E. Augustin, Reconnaissance de mots manuscrits par systèmes hybrides Réseaux de Neurones et Modèles de Markov Cachés, Thèse de Doctorat, Paris V, 188 pages, 2001.

128

Bibliography [Bajaj, 1997]

R. Bajaj and S. Chaudhury, ‗‗Signature verification using multiple neural classifiers,‘‘ Pattern Recognition, vol. 30, no. 1, pp. 1–7, 1997.

[Batista, 2012]

L. Batista, E. Granger, and R. Sabourin, ‗‗Dynamic selection of generative–discriminative ensembles for off-line signature verification,‘‘ Pattern Recognition, vol. 45, no.4, pp.13261340, 2012.

[Battati, 1994]

R. Battati and A. M. Colla, ―Democracy in neural nets: voting schemes for classification,‖ Neural Networks, vol. 7, no. 4, pp. 691-707, 1994.

[Belkasim, 2003]

S. Belkasim and G. Derado, ‗‗Zigzag line discrete cosine transform for blocking artifact removal,‘‘ IEEE 46th International Midwest Symposium on Circuits & Systems, vol. 2, pp. 540-543, 2003.

[Bergamini, 2009]

C. M. Bergamini, L. S. Oliveira, A. L. Koerich, and R. Sabourin, ―Combining different biometric traits with one-class classification,‖ Signal Processing, vol. 89, no. 11, pp. 21172127, 2009.

[Berg-Kirkpatrick, 2013]

T. Berg-Kirkpatrick, G. Durrett, and D. Klein, ―Unsupervised Transcription of Historical Documents,‖ in Proc. 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, vol 1, pp. 207-217, August 2013.

[Bertolini, 2010]

D. Bortoloni, L. S. Oliveira, E. Justino, and R. Sabourin, ―Reducing forgeries in writerindependent off-line signature verification trough ensemble of classifiers,‖ Pattern Recognition, vol. 43, no. 1, pp. 387-396, 2010.

[Bishop, 1995]

C.M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, 1995.

[Bishop, 1994]

C. Bishop, ―Novelty detection and neural network validation,‖ in Proc. Vision, Image and Signal Processing. Special Issue on Applications of Neural Networks, vol. 141, no. 4, pp. 217–222, 1994.

[Bloch, 2003]

Bloch. Fusion d‘informations en traitement du signal et des images. IC2. Hermès Science, Paris, France, 2003.

[Bottou, 1994]

L. Bottou, C. Cortes, J. Drucker, I. Guyon, L. LeCunn, U. Muller, E. Sackinger, P. Simard, and V. Vapnik, ‗‗Comparaison of Classifier methods: a case study in handwriting digit recognition,‘‘ in Proc. International Conference on Pattern Recognition (ICPR), vol. 2, pp. 77-87, 1994.

[Bouakache, 2009]

Bouakache, A. Belhadj-Aissa, and G. Mercier, ―Satellite image fusion using DezertSmarandache theory,‖ Chap. 22, pp. 549-564, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: Amer. Res. Press, 2009.

[Bovino, 2003]

L. Bovino, S. Impedovo, G. Pirlo, and L. Sarcinella, ‗‗Multi-expert verification of handwritten signatures,‘‘ in Proc. 7th International Conference Document Analysis and Recognition (ICDAR), Edinburgh, U.K., pp. 932–936, August 3-6, 2003.

[Britto, 2004]

A. Britto, R. Sabourin, F. Bortolozzi, and C. Suen, ―Foreground and Background Information in an HMM-based Method for Recognition of Isolated Characters and Numeral Strings,‖ in Proc. Int. Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 371-376, October 2004.

[Brunelli, 2009]

R. Brunelli. Template Matching Techniques in Computer Vision: Theory and Practice. Wiley, 348 p, 2009.

[Burger, 2011]

T. Burger, Y. Kessentini, and T. Paquet, ―Dempster-Shafer based rejection strategy for handwritten word recognition,‖ in Proc. 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 528-532, September 18-21, 2011.

[Burger, 2006]

T. Burger and O. Aran, ―Modeling hesitation and conflict: a belief-based approach for multi-class problems,‖ in Proc. 5th International Conference on Machine Learning and Applications (ICMLA), pp. 95-100, 2006.

[Burger, 2008]

T. Burger, A. Urankar, O. Aran, L. Akarun, and A. Caplier, ―A Dempster-Shafer theory based combination of classifiers for hand gesture recognition,‖ Computer Vision and Computer Graphics: Theory and Applications, vol. 21, pp. 137-150, 2008.

[Burges, 1998]

C. J. C. Burges, ―A tutorial on support vector machines for pattern recognition,‖ Knowledge Discovery and Data Mining, vol. 2, no. 2, pp. 1-43, 1998.

[Candès, 1998]

E. J. Candès, Ridgelets: Theory and Applications, Ph.D. thesis, Department of Statistics, Stanford University, 1998.

129

Bibliography [Candès, 2000]

E. J. Candès and D. Donoho, Curvelets—A surprisingly effective nonadaptive representation for objects with edges, in Curves and Surface Fitting: Saint-Malo 1999, A. Cohen, C. Rabut and L. Schumaker, Eds. Nashville: Vanderbilt Univ. Press, pp. 105–120, 2000.

[Candès, 2006]

E. Candès, L. Demanet, D. Donoho, and L. Ying, ‗‗Fast discrete curvelet transforms, Multiscale Model,‘‘ Simul, vol. 5, no. 3, pp. 861–899, 2006.

[Cavalin, 2010]

P. Cavalin, R. Sabourin, and C. Suen, ‗‗Dynamic selection of ensembles of classifiers using contextual information,‘‘ in Proc. 9th International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science, Eds. Berlin, Germany: Springer-Verlag, vol. 5997, pp. 145–154, 2010.

[Cavalin, 2006]

P. R. Cavalin, A. Britto, F. Bortolozzi, R. Sabourin, and L. Oliveira, ―An implicit segmentation-based method for recognition of handwritten strings of characters,‖ in Proc. ACM symposium on Applied computing, pp. 836-840, 2006.

[Cardot, 1994]

H. Cardot, M. Revenu, B. Victorri, and M. J. Revillet, ‗‗A static signature verification system based on a cooperating neural networks architecture,‘‘ International Journal of Pattern Recognition and Artificial Intelligence, vol. 8, no. 3, pp. 679–692, 1994.

[Carney, 1999]

J. Carney and P. Cunningham, Tuning diversity in bagged neural network ensembles, Technical report, University of Dublin, Department of Computer Science, 1999.

[Cha, 2002]

S. H. Cha and S. N. Srihari, ‗‗On measuring the distance between histograms,‘‘ Pattern Recognition, vol. 35, no. 6, pp. 1355-1370, 2002.

[Chen, 1995]

M. Y. Chen, A. Kundu, and J. Zhou, ―Variable duration hidden Markov model and morphological segmentation for handwritten word,‖ IEEE Trans. Image Processing, vol. 4, no.12, pp. 1675–1687, 1995.

[Cheriet, 2007]

M. Cheriet, N. Kharma, C. L. Liu, and C. Y. Suen. Character Recognition Systems: A Guide for Students and Practitioner. John Wiley & Sons, 2007.

[Clavier, 2000]

E. Clavier, É. Trupin, M. Laurent, S. Diana, and J. Labiche, ―Classifiers Combination for Forms Sorting,‖ in Proc. 15th International Conference on Pattern Recognition (ICPR), vol. 1, no. 1, pp. 932-935, September 3-7, 2000.

[Clemen, 1989]

R. Clemen, ―Combining forecasts: A review and annotated bibliography,‖ Journal of Forecasting, vol. 5, no. 4, pp. 559-583, 1989.

[Coetzer, 2004]

J. Coetzer, B. M. Herbst, and J. A. de Preez, ―Offline Signature Verification Using the Discrete Radon Transform and a Hidden Markov Model,‖ EURASIP Journal on Applied Signal Processing, vol. 4, pp. 559-571, 2004.

[Cohen, 1994]

E. Cohen, J. J. Hull, and S. N. Srihari, ―Control structure for interpreting handwritten addresses,‖ IEEE Trans. Pattern Analysis and Machine Recognition, vol. 16, no. 10, pp. 1049–1055, 1994.

[Comtet, 1974]

L. Comtet, ―Sperner Systems,‖ sec. 7.2 in Advanced Combinatorics: The Art of Finite and Infinite Expansion, D. Reidel Publ. Co., pp. 271-273, 1974.

[Cooke, 1988]

R. Cooke, ―Uncertainty in Risk Assessment: A Probabilist‘s Manifesto,‖ Reliability in Engineering and System Safety, vol. 23, pp. 277-283, 1988.

[Cooke, 1991]

R. Cooke. Experts in Uncertainty. Oxford University Press, 1991.

[Corgne, 2003]

S. Corgne, L. Hubert-Moy, J. Dezert, and G. Mercier, ―Land cover change prediction with a new theory of plausible and paradoxical reasoning,‖ in Proc. 6th International Conference on Information Fusion (ICIF), Cairns, Australia, pp. 1141-1148, July 8-11, 2003.

[Cordella, 1999a]

L. P. Cordella, P. Foggia, C. Sansone, F. Tortorella, and M. Vento, ‗‗Reliability parameters to improve combination strategies in multi-expert systems,‘‘ Pattern Analysis and Application, vol. 3, no. 2, pp. 205–214, 1999.

[Cordella, 1999b]

L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, ‗‗Document validation by signature: A serial multi-expert approach,‘‘ in Proc. 5th International Conference on Document Analysis and Recognition (ICDAR), pp. 601–604, September 20-22, 1999.

[Cordella, 2000]

L. P. Cordella, P. Foggia, C. Sansone, F. Tortorella, and M. Vento, ‗‗A cascaded multiple expert system for verification,‘‘ in Proc. 1 st International Workshop, Multiple Classifier Systems, Lecture Notes in Computer Science, J. Kittler and F. Roli, Eds. Berlin, Germany: Springer-Verlag, vol. 1857, pp. 30–339, 2000.

130

Bibliography [Cortes, 1995]

D. Cortes and V. Vapnik, ―Support vector networks,‖ Machine Learning, vol. 20, pp. 273297, 1995.

[Cover, 1967]

T. M. Cover and P. E. Hart, ―Nearest neighbor pattern classification,‖ IEEE Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, 1967.

[Cristianini, 2000]

N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.

[Dedekink, 1897]

R. Dedekink, ―Über Zerlegungen von ZahlendurchihregrösstengemeinsammenTeiler,‖ in GesammelteWerke, Bd. 1. pp. 103-148, 1897.

[Dempster, 1967]

P. Dmpster, ―Upper and lower probabilities induced by a multivalued maping,‖ Annals of Mathematical Statistics, vol. 38, no. 2, pp. 325-339, 1967.

[Deng, 1999]

P. S. Deng, H. Y. M. Liao, C. W. Ho, and H. R. Tyan, ―Wavelet based off-line handwritten signature verification,‖ Computer Vision and Image Understanding, vol. 76, no. 3, pp. 173– 190, 1999.

[Denoeux, 1995]

T. Denoeux, ―A k-nearest neighbour classification rule based on Dempster-Shafer Theory,‖ IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no. 5, pp. 805-813, may 1995.

[Denoeux, 2001]

T. Denoeux, ―Inner and outer approximation of belief structures using a hierarchical clustering approach,‖ International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, vol. 9, no. 4, pp. 437-460, 2001.

[Denoeux, 1997]

T. Denoeux, ―Analysis of evidence theoretic decision rules for pattern classification,‖ Pattern Recognition, vol. 30, no. 7, pp. 1095-1107, 1997.

[Dilworth, 1961]

R. P. Dilworth, Lattice theory. American Mathematical Society, Providence, Rhode Island, 1961.

[Dimauro, 2004]

G. Dimauro, S. Impedovo, M. G. Lucchese, R. Modugno, and G. Pirlo, ―Recent advancements in automatic signature verification,‖ in Proc. Int. Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 179–184, 2004.

[Dimauro, 1998]

G. Dimauro, S. Impedovo, G. Pirlo, and A. Salzo, ―An advanced segmentation technique for cursive word recognition,‖ in Proc. 6th International Workshop on Frontiers in Handwriting Recognition (IWFHR), Taejon, Korea, pp. 99–111, 1998.

[Dimauro, 1997]

G. Dimauro, S. Impedovo, G. Pirlo, and A. Salzo, ‗‗A multi-expert signature verification system for bankcheck processing,‘‘ International Journal of Pattern Recognition and Artificial Intelligence, vol. 11, no. 5, pp. 827–844, 1997, Automatic Bankcheck Processing, Series in Machine Perception and Artificial Intelligence, S. Impedovo, P. S. P. Wang and H. Bunke, Eds. Singapore: World Scientific, vol. 28, pp. 365–382.

[Dezert, 2002a]

J. Dezert, ‗‗An introduction to the theory of plausible and paradoxical reasoning,‘‘ in Proc. NM&A 02. Conf, Borovetz, Bulgaria, pp. 20-24, August 2002.

[Dezert, 2002b]

J. Dezert, ―Fondations for a new theory of plausible and paradoxical reasoning,‖ Information & Security Journal, vol. 9, pp. 13-57, 2002.

[Dezert, 2010]

J. Dezert and B. Pannetier, ―A PCR-BIMM filter for maneuvering target tracking,‖ in Proc. 13th International Conference on Information Fusion (ICIF), Edinburgh, Scotland, July 2629, 2010.

[Dezert, 2011]

J. Dezert, Zhun-ga Liu, and G. Mercier, ―Edge Detection in Color Images Based on DSmT,‖ in Proc. 14th International Conference on Information Fusion (ICIF), Chicago, Illinois, USA, pp. 1-8, July 5-8, 2011.

[Dezert, 2004a]

J. Dezert and F. Smarandache, ―The generation of the hyper-power sets,‖ Chap. 2, pp. 3748, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: American Research Press, 2004.

[Dezert, 2004b]

J. Dezert and F. Smarandache, ―Partial ordering on hyper-power sets,‖ Chap. 3, pp. 49-60, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: American Research Press, 2004.

[Dezert, 2004c]

J. Dezert and F. Smarandache, ―Combination of beliefs on hybrid DSm models,‖ Chap. 4, pp. 61-103, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: Amer. Res. Press, 2004.

[Dezert, 2004d]

J. Dezert and F. Smarandache, ―Presentation of DSmT,‖ Chap. 1, pp. 3-36, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: Amer. Res. Press, 2004.

131

Bibliography

[Dezert, 2004e]

J. Dezert, F. Smarandache, and M. Daniel, ―The Generalized Pignistic Transformation,‖ in Proc. 7th International Conference on Information Fusion (ICIF), Stockholm, Sweden, pp. 384-391, July 2004.

[Dezert, 2009]

J. Dezert and F. Smarandache, ―Transformation of belief masses into subjective probabilities,‖ Chap. 3, pp. 85-136, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: Amer. Res. Press, 2009.

[Djiknavorian, 2006]

P. Djiknavorian and D. Grenier, ‗‗Reducing DSmT hybrid rule complexity through optimization of the calculation algorithm,‘‘ Chap. 15, pp. 345-440, in: Advances and Application of DSmT for Information Fusion. Rehoboth, NM: Amer. Res. Press, 2006.

[Do, 2005]

M. N. Do and M. Vetterli, ‗‗The Contourlet Transform: An Efficient Directional Multi resolution Image Representation,‘‘ Image Processing, vol. 14, pp. 2091-2106, December 2005.

[Dubois, 1988]

D. Dubois and H. Prade, ―Representation and combination of uncertainty with belief functions and possibility measures,‖ Computational Intelligence, vol. 4, pp. 244-264, 1988.

[Dubois, 1986]

D. Dubois and H. Prade, ―On the unicity of Dempster rule of combination,‖ International Journal of Intelligent Systems, vol. 1, no. 2, pp. 133–142, 1986.

[Dubois, 1992]

D. Dubois and H. Prade, ‗‗Evidence, Knowledge and Belief Functions,‘‘ International Journal of Approximate Reasoning, vol. 6, pp. 295-319, 1992.

[Dubois, 1994]

D. Dubois and H. Prade, ‗‗Fuzzy Sets-A Convenient Fiction for Modeling Vagueness and Possibility,‘‘ IEEE Transactions on Fuzzy Systems, vol. 2, no. 1, 1994.

[Dubois, 2001]

D. Dubois and H. Prade, ‗‗Possibility, theory, probability theory and multiple-valued logics: a clarification,‘‘ Ann. Math. Artif. Intell, vol. 32, Issue. 1-4, pp. 35–66, 2001.

[Duda, 2001]

R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Second edition, Wiley Interscience, New York, 2001.

[Duda, 2000]

D. Ruta and B. Gabrys, ‗‗An overview of classifier fusion methods,‘‘ Computing and Information Systems, vol. 7, no. 1, pp. 1-10, 2000.

[Duin, 2000]

R. P. W. Duin and D. M. J. Tax, ―Experiments with classifier combining rules,‖ in Proc. 1 st Int. Workshop on Multiple classifier systems, Lecture Notes in Computer Science, Eds. Berlin, Germany: Springer-Verlag, vol 1857, pp. 16-29, 2000.

[Dzub, 1998]

G. Dzuba, A. Filatov, D. Gershuny, and I. Kil, ―Handwritten word recognition - the approach proved by practice,‖ in Proc. 6th International Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 99–111, 1998.

[El-Yacoubi, 1996]

El-Yacoubi, Modélisation Markovienne de l‘écriture manuscrite - Application à la reconnaissance des adresses postales, Thèse de Doctorat, Université de Rennes I, 307 pages, 1996.

[Elhassouny, 2011]

Elhassouny, S. Idbraim, A. Bekkari, D. Mammass, and D. Ducrot, ―Change Detection by Fusion/Contextual Classification based on a Hybrid DSmT Model and ICM with Constraints,‖ International Journal of Computer Applications, vol. 35, no. 8, pp. 28-40, 2011.

[Fierrez-Aguilar, 2005]

J. Fierrez-Aguilar, L. Nanni, J. Lopez-Penalba, J. Ortega-Garcia, and D. Maltoni, ―An online signature verification system based on fusion of local and global information,‖ in Audio- and Video-Based Biometric Person Authentication, Lecture Notes in Computer Science, Eds. New York: Springer-Verlag, vol. 3546, pp. 523–532, 2005.

[Franke, 2003]

K. Franke, L. R. B. Schomaker, C. Veenhuis, C. Taubenheim, I. Guyon, L. G. Vuurpijl, M. van Erp, and G. Zwarts, ―WANDA: A generic framework applied in forensic handwriting analysis and writer identification, Design and Application of Hybrid Intelligent Systems,‖ in Proc. 3rd International Conference on Hybrid Intelligent Systems, A. Abraham, M. Koeppen, and K. Franke, Eds. Amsterdam: IOS Press, pp. 927-938, 2003.

[Fang, 2001]

B. Fang, Y. Y. Wang, C. H. Leung, K. W. Tse, Y. Y. Tang, P. C. K. Kwok, and Y. K. Wong, ―Offline signature verification by the analysis of cursive strokes,‖ International Journal of Pattern Recognition and Artificial Intelligence, vol. 15, no. 4, pp. 659–673, 2001.

[Fang, 2003]

B. Fang, C. H. Leung, Y. Y. Tang, K. W. Tse, P. C. K. Kwok, and Y. K. Wong, ―Off-line signature verification by the tracking of feature and stroke positions,‖ Pattern Recognition, vol. 36, pp. 91–101, 2003.

132

Bibliography [Fawcett, 2006]

T. Fawcett, ―An introduction to ROC analysis,‖ Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, 2006.

[French, 1985]

S. French, ―Group Consensus Probability Distributions: A Critical Survey,‖ in J. Bernardo et al., editor, Bayesian Statistics, Elsevier, pp. 183-201, 1985.

[Filatov, 1998]

Filatov, N. Nikitin, A. Volgunin, and P. Zelinsky, ―The Address Script TM recognition system for handwritten envelopes,‖ in International Association for Pattern Recognition Workshop on Document Analysis Systems (DAS‘98), Nagano, Japan, November 4-6, pp. 157–171, 1998.

[Fergani, 2008]

B. Fergani, D. Manuel, and A. Houacine, ―Speaker diarization using one class support vector machines,‖ Speech Communication, vol. 50, pp. 355-365, 2008.

[Favata, 1996]

J. Favata and G. Srikantan, ―A Multiple Feature/Resolution Approach To Handprinted Digit and Character Recognition,‖ International journal of imaging systems and technology, vol. 7, no. 4, pp. 304–311, 1996.

[Gader, 1997]

P. D. Gader, M. Mohamed and J. H. Chiang, ―Handwritten word recognition with character and inter-character neural networks,‖ IEEE Trans. on Systems, Man and Cybernetics - Part B, vol. 27, no.1, pp.158–164, 1997.

[Garcia, 2008]

E. Garcia and L. Altamirano, ―Multiple Cameras Fusion Based on DSmT for Tracking Objects on Ground Plane,‖ in Proc. 11th International Conference on Information Fusion (ICIF), Cologne, Germany, June 30-July 3, 2008.

[Garvey, 1986]

T. D. Garvey, ―Evidential Reasoning for Land-Use Classification,‖ in Analytical Methods in remote Sensing for Geographic Information Systems, International Association of Pattern Recognition, Technical Committee 7 Workshop, Paris, October 1986.

[Gilloux, 1993]

M. Gilloux, ―Research into the new generation of character and mailing address recognition systems at the french post office research center,‖ Pattern Recognition Letters, vol. 14, no. 4, pp.267–276, 1993.

[Gilloux, 1995a]

M. Gilloux, M. Leroux, and J. M. Bertille, ―Strategies for cursive script recognition using hidden Markov models,‖ Machine Vision and Applications, vol. 8, pp. 197–205, 1995.

[Gilloux, 1995b]

M. Gilloux, B. Lemari, and M. Leroux, ―Hybrid radial basis functions Network/Midden Markov Model handwritten word recognition system,‖ International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 394-397, 1995.

[Gilloux, 1992]

M. Gilloux and M. Leroux, ―Recognition of cursive script amounts on postal cheques,‖ in Proc. 5th USPS Advance Technology Conference, pp. 545–556, 1992.

[Grätzer, 1978]

G. Grätzer, General Lattice Theory. Academic Press, New York, 1978.

[Guerbai, 2012]

Y. Guerbai, Y. Chibani, and N. Abbas, ―One-Class versus Bi-Class SVM Classifier for Off-line Signature Verification,‖ in Proc. 3rd Intl. Conf. on Multimedia Computing and Systems, Tangier, Morocco, May 10-12, pp. 206-210, 2012.

[Guermeur, 1999]

Y. Guermeur, A. Elisseeff, and H. Paugam-Moisy, ―Estimating the sample complexity of a multi-class discriminant model,‖ in Proc. Industrial Conference on Artificial Neural Networks, pp. 310-315, 1999.

[Guillevic, 1998]

D. Guillevic and C. Y. Suen, ―HMM-KNN word recognition engine for bank cheque processing,‖ in Proc. 13th International Conference on Pattern Recognition (ICPR), Brisbane, Australia, pp. 1526–1529, August 16-20, 1998.

[Guo, 2001]

J. K. Guo, D. Doermann, and A. Rosenfeld, ―Forgery detection by local correspondence,‖ International Journal of Pattern Recognition and Artificial Intelligence, vol. 15, no. 4, pp. 579–641, 2001.

[Hamadene, 2012]

Hamadene, Y. Chibani, and H. Nemmour, ―Off-line Handwritten Signature Verification Using Contourlet Transform and Co-occurrence Matrix,‖ in Proc. 13th International Conference on Frontiers in Handwriting Recognition (ICFHR), Bari, Italy, pp. 343-347, September 18-20, 2012.

[Han, 1997]

K. Han and I. K. Sethi, ―An off-line cursive handwritten word recognition system and its application to legal amount interpretation,‖ International Journal of Pattern Recognition and Artificial Intelligence, vol. 11, no. 5, pp. 757–770, 1997.

[Hanmandlua, 2005]

M. Hanmandlua, M. Yusofb, and V.K. Madasuc, ―Off-line signature verification and forgery detection using fuzzy modeling,‖ Pattern Recognition, vol. 38, no. 3, pp. 341–356, 2005.

133

Bibliography [Hansen, 1990]

L. K. Hansen and P. Salamon, ―Neural network ensembles,‖ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993-1001, October 1990.

[Haykin, 1999]

S. Haykin, Neural Networks: A Comprehensive Foundation. Prentice-Hall, Inc., 1999.

[Hawkins, 2002]

S. Hawkins, H. He, G. Williams, and R. Baxter, ―Outlier detection using replicator neural networks,‖ in Proc. 5th Int. Conf. and Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science, Eds. Berlin, Germany: Springer-Verlag, vol. 2454, pp. 170-180, 2002.

[Heutte, 1994]

L. Heutte, Reconnaissance de caractères manuscrits: application à la lecture automatique des chèques et des enveloppes postales, Thèse de Doctorat, Université de Rouen, France, 1994.

[Ho, 1994]

T. K. Ho, J. J. Hull, and S. N. Srihari, ―Decision combination in multiple classifier systems,‖ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 16, no 1, pp. 66-75, January 1994.

[Hongo, 2005]

Y. Hongo, D. Muramatsu, and T. Matsumoto, ‗‗AdaBoost-based on-line signature verifier,‘‘ Biometric Technology for Human Identification II, A.K. Jain and N.K. Ratha, Eds. Proc. SPIE, vol. 5779, pp. 373–380, 2005.

[Hsu, 2001]

C. Hsu and C. Lin, A comparison of methods for multi-class support vector machines, Technical report, National Taiwan University, Department of Computer Science and Information Engineering, 2001.

[Hu, 2005]

Z. Hu, Y. Li, Y. Cai, X. Xu, ‗‗Method of combining multi-class SVMs using DempsterShafer theory and its application,‘‘ in Proc. American Control Conference, vol. 3, pp. 19461950, June 8-10, 2005.

[Huang, 1995]

Y. S. Huang and C. Y. Suen, ―A method of combining multiple experts for the recognition of unconstrained handwritten numerals,‖ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 1, pp. 90-94, 1995.

[Huang, 2002]

H. P. Huang and Y. H. Liu, ―Fuzzy support vector machines for pattern recognition and data mining,‖ International Journal of Fuzzy Systems, vol. 4, no. 3, pp. 826-835, 2002.

[Huang, 1996]

K. Huang and H. Yan, ―Identifying and verifying handwritten signature images utilizing neural networks,‖ in Proc. International Conference on Neural Information Processing (ICONIP), pp. 1400–1404, 1996.

[Huang, 1997a]

K. Huang, J. Wu, and H. Yan, ‗‗Offline writer verification utilizing multiple neural networks,‘‘ Optical Engineering, vol. 36, no. 11, pp. 3127–3133, 1997.

[Huang, 1997b]

K. Huang and H. Yan, ‗‗Off-line signature verification based on geometric feature extraction and neural network classification,‘‘ Pattern Recognition, vol. 30, no. 1, pp. 9-17, 1997.

[Huang, 2006]

X. Huang, X. Li, M. Wang, and J. Dezert, ―A fusion machine based on DSmT and PCR5 for robot's map reconstruction,‖ International Journal of Information Acquisition (IJIA), vol. 3, no. 3, pp. 201-211, 2006.

[Huang, 2009]

X. Huang, P. Li, and M. Wang, ―Evidence Reasoning Machine based on DSmT for mobile robot mapping in unknown dynamic environment,‖ in Proc. IEEE International Conference on Robotics and Biomimetics (ROBIO), Guilin, China, pp. 753-758, December 18-22, 2009.

[Hull, 1983]

J. Hull, S. Srihari, and R. Choudhuri, ―An integrated algorithm for text recognition: comparison with a cascaded algorithm,‖ IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 5, no. 4, pp.384-395, 1983.

[Hull, 1988]

J. Hull, A computational theory of visual word recognition, Doctoral Dissertation, Technical Report 88-07, Department of Computer Science, SUNY at Buffalo, February 1988.

[Impedovo, 1997]

S. Impedovo, P. S. P. Wang, and H. Bunke, ―Automatic bankcheck processing,‖ Series in Machine Perception and Artificial Intelligence, World Scientific publishing company, vol. 28, 1997.

[Impedovo, 2008]

D. Impedovo and G. Pirlo, ―Automatic Signature Verification: The State of the Art,‖ IEEE Transactions on Systems, Man, and Cybernetics-C,vol. 38, no. 5, pp. 609–335, 2008.

[InkML, 2006]

Ink Markup Language (InkML), W3C http://www.w3.org/TR/InkML/#orientation.

134

Working

Draft

23

October

2006,

Bibliography [Jain, 2007]

K. Jain, P. Flynn, and A. Ross. Handbook of Biometrics. Springer-Verlag, New York, 2007.

[Jain, 2004]

K. Jain, A. Ross, and S. Prabhakar, ‗‗An introduction to biometric recognition,‘‘ IEEE Transaction on Circuits and Systems for Video Technology, Special Issue on Image- and Video- Based Biometrics, vol. 14, no. 1, pp. 4–20, 2004.

[Jain, 2000]

K. Jain, R. P. W. Duin, and J. Mao, ―Statistical pattern recognition: A review,‖ IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 22, no. 1, pp. 4-37, 2000.

[Japkowicz, 1995]

N. Japkowicz, C. Myers, and M. Gluck, ―A novelty detection approach to classification,‖ in Proc. 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, vol. 1, pp. 518-523, 1995.

[Ji, 1997]

C. Ji and S. Ma, ―Combination of week classifiers,‖ IEEE Trans. Neural Networks, vol. 8, no. 1, pp. 32-42, 1997.

[Justino, 2001]

E. J. R. Justino, F. Bortolozzi, and R. Sabourin, ―Off-line Signature Verification Using HMM for Random, Simple and Skilled Forgeries‖, in Proc. 6th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 105-110, 2001.

[Justino, 2005]

E. J. R. Justino, F. Bortolozzi, and R. Sabourin, ‗‗A comparison of SVM and HMM classifiers in the off-line signature verification,‘‘ Pattern Recognition Letters, vol. 26, pp. 1377-1385, 2005.

[Jusitno, 2005]

E. J. R. Justino, F. Bortolozzi, and R. Sabourin, ‗‗A comparison of SVM and HMM classifiers in the off-line signature verification,‘‘ Pattern Recognition Letters, vol. 26, pp. 1377-1385, 2005.

[Kaewkongka, 1999]

T. Kaewkongka, K. Chamnongthai, and B. Thipakorn, ―Offline signature recognition using parameterized hough transform,‖ in Proc. 5th International Symposium on Signal Processing and Its Applications, Brisbane, Australia, pp. 451–454, 1999.

[Kalera, 2004]

M. Kalera, B. Zhang, and S. Srihari, ‗‗Offline Signature Verification and Identification Using Distance Statistics,‘‘ International Journal of Pattern Recognition and Artificial Intelligence, vol. 18, no. 7, pp. 1339–1360, 2004.

[Kandasamy, 2004]

W. B. S. Kandasamy and F. Smarandache. Basic Neutrosophic Algebraic Structures and Their Application to Fuzzy and Neutrosophic Models. Hexis, Church Rock, 2004.

[Kang, 1997]

H. J. Kang and J. H. Kim, ―A probabilistic framework for combining multiple classifiers at abstract level,‖ Proc. 4th International Conference on Document Analysis and Recognition (ICDAR), Germany, vol 2, pp. 870-874, August 1997.

[Kechichian, 2009]

P. Kechichian and B. Champagne, ―An improved partial Haar dual adaptive filter for rapid identification of a sparse echo channel,‖ Signal Processing, vol. 89, no. 5, pp. 710-723, 2009.

[Kennes, 1992]

R. Kennes, ―Computational Aspect of the Möbius Transformation of Graphs,‖ IEEE Transaction on Systems, Man, and Cybernetics – Part A: Systems and Humans, vol. 22, no. 2, pp. 201-223, 1992.

[Khodabandeh, 2010]

M. Khodabandeh and A. Mohammad-Shahri, ―Data Fusion of Cameras‘ Images for Localization of an Object: DSmT-based Approach,‖ in Proc. 1st International Symposium on Computing in Science and Engineering (ISCSE), Kusadasi, Turkey, June 3-5, 2010.

[Kim, 1997]

G. Kim and V. Govindaraju, ―Lexicon driven approach to handwritten word recognition for real-time applications,‖ IEEE Trans. on Pattern Analysis and Machine Recognition, vol. 19, no. 4, pp. 366–379, 1997.

[Kim, 1998]

G. Kim and V. Govindaraju, ―Handwritten phrase recognition as applied to street name images,‖ Pattern Recognition, vol. 31, no. 1, pp. 41–51, 1998.

[Kittler, 1998]

J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, ―On combining classifiers,‖ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, March 1998.

[Knerr, 1996]

S. Knerr, O. Baret, D. Price, J. C. Simon, V. Anissimov, and N. Gorski, ―The A2iA recognition system for handwritten checks,‖ in Proc. 2nd Workshop on Document Analysis Systems, Philadelphia, pp. 431–494, 1996.

[Knerr, 1998]

S. Knerr and E. Augustin, ―A neural network-hidden Markov model hybrid for cursive word recognition,‖ in Proc. 13th International Conference on Pattern Recognition (ICPR), Brisbane, Australia, pp. 1518–1520, August 16-20, 1998.

135

Bibliography [Ko, 2008]

Ko, R. Sabourin, and A. Britto, ‗‗From dynamic classifier selection to dynamic ensemble selection,‘‘ Pattern Recognition, vol. 41, no. 5, pp. 1718–1731, 2008.

[Kolmogorov, 1960]

N. Kolmogorov. Foundations of the Theory of Probability. Chelsea Publishing Company, New York, 1960.

[Kontorovich, 2011]

Kontorovich, D. Hendler, and E. Menahem, ―Metric Anomaly Detection via Asymmetric Risk Minimization,‖ in Proc. 1st Int. Workshop, Similarity-Based Pattern Recognition, Lecture Notes in Computer Science, M. Pelillo and E. R. Hancock, Eds. Venice, Italy: Springer-Verlag, vol. 7005, pp. 17-30, 2011.

[Kumar, 2010]

S. Kumar, K. B. Raja, R. K. Chhotaray, and S. Pattanaik, ‗‗Off-line Signature Verification Based on Fusion of Grid and Global Features Using Neural Networks,‘‘ International Journal of Engineering Science and Technology, vol. 2, no. 12, pp. 7035-7044, 2010.

[Kumar, 2012]

R. Kumar, J. D. Sharma, and B. Chanda, ‗‗Writer-independent off-line signature verification using surroundedness feature,‘‘ Pattern Recognition Letters, vol. 33, no. 3, pp. 301-308, 2012.

[Kundu, 1998]

Kundu, H. He, and M. Y. Chen, ―Alternative to variable duration HMM in handwriting recognition,‖ IEEE Trans. on Pattern Analysis and Machine Recognition, vol. 20, no. 11, pp. 1275–1281, 1998.

[Kurzweil, 1990]

R. Kurzweil. The age of intelligent machines. Cambridge, MA: MIT Press, 1990.

[Laanaya, 2010]

H. Laanaya, A. Martin, D. Aboutajdine, and A. Khenchaf, ‗‗Support vector regression of membership functions and belief functions-Application for pattern recognition,‘‘ Information Fusion, vol. 11, no. 4, pp. 338–350, 2010.

[Lam, 1988]

L. Lam and C. Suen, ―Structural classification and relaxation matching of totally unconstrained handwritten zip-code numbers,‖ Pattern Recognition, vol. 21, no. 1, pp. 1931, 1988.

[Laplace, 1847]

P. Laplace, ―Deuxième supplément à la théorie analytique des probabilités,‖ Œuvre Complètes de Laplace, vol. 7, pp. 531-580, 1847.

[Larkins, 2009]

R. L. Larkins, Off-line Signature Verification, Master Thesis, University of Waikato, 2009.

[Larry, 2001]

M. Larry and Y. Malik, ―One-Class SVMs for Document Classification,‖ Journal of Machine Learning Research, vol. 2, pp. 139-154, 2001.

[Leclerc, 1994]

F. Leclerc and R. Plamondon, ―Automatic signature verification: The state of the art 19891993,‖ International Journal of Pattern Recognition and Artificial Intelligence, vol. 8, no. 3, pp. 643-660, 1994.

[Lecolinet, 1990]

E. Lecolinet, Segmentation d‘images de mots manuscrits: Application à la lecture de chaîne de caractères majuscules alphanumériques et à la lecture de l‘écriture manuscrites, Thèse de Doctorat, Université Pierre et Marie Curie (Paris VI), 283 pages, Mars 1990.

[Lee, 1987]

T. Lee, J. A. Rechards, and P. H. Swain, ―Probabilistic and Evidential Approaches for Multisource Data Analysis,‖ IEEE Transactions on Geoscience and Remote Sensing, vol. GE-25, no. 3, pp. 283-293, 1987.

[Leroux, 1997]

M. Leroux, E. Lethelier, M. Gilloux, and B. Lemarié, ―Automatic reading of handwritten amounts on French checks,‖ International Journal of Pattern Recognition and Artificial Intelligence, vol. 11, no. 4, pp.619–638, 1997.

[Li, 2006a]

X. Li, X. Huang, and M. Wang, ―Robot Map Building from Sonar Sensors and DSmT,‖ International Journal of Information & Security, vol. 20, pp. 104-121, 2006.

[Li, 2006b]

X. Li, X. Huang, and M. Wang, ―Sonar Grid Map Building of Mobile Robots Based on DSmT,‖ Information Technology Journal, vol. 5, no. 2, pp. 267-272, 2006.

[Li, 2007]

X. Li, X. Huang, J. Dezert, L. Duan, and M. Wang, ―A successful application of DSmT in sonar grid map building and comparison with DST-based approach,‖ International Journal of Innovative Computing, Information and Control (ICIC), vol. 3, no. 3, pp. 539-549, 2007.

[Li, 2008]

P. Li, X. Huang, M. Wang, and X. Zeng, ―Multiple Mobile Robots Map Building Based on DSmT,‖ IEEE International Conference on Robotics, Automation and Mechatronics (RAM), Chengdu, China, pp. 509-514, September 21-24, 2008.

[Li, 2011]

X. Li, J. Dezert, F. Smarandache, and X. Huang, ―Evidence Supporting Measure of Similarity for Reducing the Complexity in Information Fusion,‖ Information sciences, vol. 181, no. 10, pp. 1818–1835, 2011.

136

Bibliography [Liu, 2012]

Z. Liu, J. Dezert, G. Mercier, and Q. Pan, ‗‗Dynamical Evidential Reasoning For Changes Detections In Remote Sensing Images,‘‘ IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 5, pp. 1955-1967, 2012.

[Liwicki, 2011]

M. Liwicki, Y. Akira, S. Uchida, M. Iwamura, S. Omachi, and K. Kise, ‗‗Reliable Online Stroke Recovery from Offline Data with the Data-Embedding Pen,‘‘ in Proc. 11th International Conference Document Analysis and Recognition, pp. 1384-1388, 2011.

[Lowrance, 1991]

J. D. Lowrance, T. M. Strat, L. P. Wesley, T. D. Garvey, E. H. Ruspini, and D. E. Wilkins, The Theory, Implementation and Practice of Evidential Reasoning, SRI project 5701 final report, SRI, Palo Alto, 1991.

[Mallat, 1989]

S. G. Mallat, ‗‗A theory for multiresolution signal decomposition: The wavelet representation,‘‘ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, 1989.

[Mandler, 1988]

E. Mandler and J. Schuermann, ―Combining the classification results of independent classifiers based on the Dempster-Shafer theory of evidences,‖ International Journal of Pattern Recognition and Artificial Intelligence, North-Holland, pp. 381-393, 1988.

[Mandal, 2005]

S. Mandal, S. P. Chowdhury, A. K. Das, and B. Chanda, ―A hierarchical method for automated identification and segmentation of forms,‖ in Proc. 12th International Conference on Document Analysis and Recognition (ICDAR), Seoul, Korea, August 31Saptember 1, pp. 705-709, 2005.

[Martin, 2006]

Martin and C. Osswald, ―A new generalization of the proportional conflict redistribution rule stable in terms of decision,‖ Chap. 2, pp. 69-88, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: Amer. Res. Press, 2006.

[Martin, 2009]

Martin, ―Implementing general belief function framework with a practical codification for low complexity,‖ Chap. 7, pp. 217-273, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: Amer. Res. Press, 2009.

[Martin, 2007]

Martin and C. Osswald, ‗‗Toward a combination rule to deal with partial conflict and specificity in belief functions theory,‘‘ inProc. 10th International Conference on Information Fusion (ICIF), pp. 1-8, July 9-12, 2007.

[Martin, 2008]

Martin and I. Quidu, ‗‗Decision support with belief functions theory for seabed characterization,‘‘ inProc. 11th International Conference on Information Fusion (ICIF), pp. 1-8, June 30-July 3, 2008.

[Maupin, 2004]

P. Maupin and A. L. Jousselme, ―Vagueness, a multifacet concept - a case study on Ambrosia artemisiifolia predictive cartography,‖ in Proc. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Alaska, USA, vol. 1, September 20-24, 2004.

[Milewski, 2006]

R. Milewski and V. Govindaraju, ―Extraction of Handwritten Text from Carbon Copy Medical From Images,‖ Document Analysis Systems (DAS‘06), vol. 3872, pp. 106-116, 2006.

[Moobed, 1996]

B. Moobed, Combinaison de classifieurs, une nouvelle approche, Thèse de Doctorat, Informatique, Université Paris sud, UFR Scientifique d'Orsay, 1996.

[Mottl, 2008]

V. Mottl, M. Lange, V. Sulimova, and A. Yermakov, ‗‗Signature verification based on fusion of on-line and off-line kernels,‘‘ in Proc.19th International Conference on Pattern Recognition (ICPR), Florida, USA, pp. 1-4, December 08-11, 2008.

[Moya, 1993]

M. Moya, M. Koch, and L. Hostetler, ―One-class classifier networks for target recognition applications,‖ in Proc. World Congress on Neural Networks, Portland, OR, International Neural Network Society, INNS, pp. 797–801, 1993.

[Moya, 1996]

M. R. Moya and D. R. Hush, ―Network Constraints and Multi-Objective Optimization for One-Class Classification,‖ Neural Networks, vol. 9, no. 3, 1996.

[Muramatsu, 2009]

D. Muramatsu, K. Yasuda, and T. Matsumoto, ‗‗Biometric Person Authentication Method Using Camera-Based Online Signature,‘‘ in Proc. 10th International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, pp. 46-50, July 2009.

[Nakanishi, 2006]

Nakanishi, H. Hara, H. Sakamoto, Y. Itoh, and Y. Fukui, ‗‗Parameter Fusion in DWT Domain: On-Line Signature Verification,‘‘ International Symposium in Intelligent Signal Processing and Communication Systems (ISPACS‘06), Yonago Convention Center, Tottori, Japan, pp. 395-398, December 12-15, 2006.

[Nalwa, 1997]

V. S. Nalwa, ‗‗Automatic on-line signature verification,‘‘ in Proc. IEEE, vol. 85, issue. 2, pp. 215–239, 1997.

137

Bibliography [Nemmour, 2006]

H. Nemmour and Y. Chibani, ―Multiple support vector machine for land cover change detection: An application for mapping urban extensions,‖ ISPRS Journal of Photogrammetry & Remote Sensing, vol. 61, pp. 125-133, 2006.

[Nilsson, 1965]

N. Nilsson. Learning Machines: Foundations of Trainable Pattern-Classifying Systems. New York: McGraw-Hill, 1965. (Reprinted as: N. Nilsson. The Mathematical Foundations of Learning Machines. San Francisco: Morgan Kaufmann, 1990.)

[Oliveira, 2007]

L. S. Oliveira, E. Justino, and R. Sabourin, ‗‗Off-line signature verification using writerindependent approach,‘‘ in Proc. International Joint Conference on Neural Networks (IJCNN), Orlando, Florida, USA, pp. 2539-2544, August 12-17, 2007.

[Otsu, 1979]

N. Otsu, ‗‗A threshold selection method from gray-level histogram,‘‘ IEEE Trans. Syst. Man Cybernet, vol. 9, no. 1, pp. 62–66, 1979.

[Pannetier, 2008]

B. Pannetier, J. Dezert, and E. Pollard, ―Improvement of Multiple Ground Targets Tracking with GMTI Sensor and Fusion of Identification Attributes,‖ Aerospace Conference, Big Sky, MT, USA, pp. 1-13, March 1-8, 2008.

[Pannetier, 2009]

B. Pannetier and J. Dezert, ―GMTI and IMINT data fusion for multiple target tracking and classification,‖ in Proc. 12th International Conference on Information Fusion (ICIF), Seattle, USA, pp. 203-210, July 6-9, 2009.

[Pannetier, 2011]

B. Pannetier and J. Dezert, ―Extended and multiple target tracking: Evaluation of an hybridization solution,‖ in Proc. 14th International Conference on Information Fusion (ICIF), Chicago, USA, pp. 1-8, July 5-8, 2011.

[Papoulis, 2002]

Papoulis. Probability, Random Variables and Stochastic Processes. Revised 4th Eds., McGraw Hill, 2002.

[Paquet, 1993]

T. Paquet and Y. Lecourtier, ―Automatic reading of the literal amount of bank checks,‖ Machine Vision and Applications, vol. 6, pp. 151–162, 1993.

[Pekalska, 2002]

E. Pekalska and R. P. W. Duin, ‗‗Dissimilarity representations allow for building good classifiers,‘‘ Pattern Recognition Letters, vol. 23, pp. 943-956, 2002.

[Plamondon, 1989]

R. Plamondon and G. Lorette, ―Automatic signature verification and writer identification: The state of the art,‖ Pattern Recognition, vol. 22, no. 2, pp. 107-131, 1989.

[Plamondon, 2000]

R. Plamondon and S. N. Srihari, ‗‗On-line and off-line handwriting recognition: A comprehensive survey,‘‘ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63–84, 2000.

[Plamondon, 1992]

R. Plamondon, P. Yergeau, and J. J. Brault, ‗‗A multi-level signature verification system,‘‘ in From Pixels to Features III—Frontiers in Handwriting Recognition, S. Impedovo and J. C. Simon, Eds. Amsterdam, The Netherlands: Elsevier, pp. 363–370, 1992.

[Platt, 1999]

J. C. Platt, ‗‗Probabilities for SV Machines,‘‘ in Proc. Advances in Large Margin Classifiers, MIT Press, pp. 61-74, 1999.

[Poisson, 2005]

E. Poisson, Architecture et Apprentissage d‘un Système Hybride Neuro-Markovien pour la Reconnaissance de l‘Ecriture Manuscrite En-Ligne, Thèse de Doctorat, Université de Nantes, 213 pages, 2005.

[Provan, 1992]

G. M. Provan, ‗‗The validity of Dempster-Shafer Belief Functions,‘‘ International Journal of Approximate Reasoning, vol. 6, pp. 389-399, 1992.

[Qi, 1995]

Y. Qi and B. R. Hunt, ‗‗A multiresolution approach to computer verification of handwritten signatures,‘‘ IEEE Transactions on Image Processing, vol. 4, no. 6, pp. 870–874, 1995.

[Quek, 2002]

C. Quek and R. W. Zhou, ―Antiforgery: a novel pseudo-outer product based fuzzy neural network driven signature verification system,‖ Pattern Recognition Letters, vol. 23, no. 14, pp. 1795–1816, 2002.

[Quost, 2007a]

B. Quost, T. Denoeux, and M. Masson, ‗‗Pairwise classifier combination using belief functions,‘‘ Pattern Recognition Letters, vol. 28, no. 5, pp. 644–653, 2007.

[Quost, 2007b]

B. Quost, T. Denoeux, and M. Masson, ‗‗Combinaison crédibiliste de classifieurs binaires,‘‘ Traitement du Signal, vol. 24, no. 2, pp. 83-101, 2007.

[Quost, 2005]

B. Quost, T. Denoeux, and M. Masson, ‗‗Pairwise classifier combination in the transferable belief model,‘‘ inProc. 8th International Conference on Information Fusion (ICIF), Philadelphia, PA, USA, vol. 1, July 25-28, 2005.

[Quost, 2006]

B. Quost, T. Denoeux, and M. Masson, ‗‗One-against-all combination in the framework of

138

Bibliography belief functions,‘‘in Proc. IPMU, Paris, France, vol. 1, pp. 356-363, 2006.

[Rabaoui, 2007]

Rabaoui, D. Manuel, Z. Lachiri, and N. Ellouze, ‗‗Improved One-Class SVM Classifier for Sounds Classification,‘‘ in Proc. IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS), London, United Kingdom, pp. 117-122, September 5-7, 2007.

[Rabiner, 1989]

L. R. Rabiner, ―A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,‖ in Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.

[Rahman, 2000]

F. R. Rahman and M. C. Fairhurst, ―Multiple expert classification: a new methodology for parallel decision fusion,‖ International Journal on Document Analysis and Recognition (IJDAR), vol. 3, pp. 40-55, 2000.

[Rahman, 1999]

Rahman and M. Fairhurst, ―A study of some multi-expert recognition strategies for industrial applications: Issues of processing speed and implementability,‖ in Vision Interface, Trois-Rivières, Canada, May 19-21, 1999.

[Rahman, 2003]

Rahman and M. C. Fairhurst, ―Multiple classifier decision combination strategies for character recognition: A review,‖ International Journal on Document Analysis and Recognition (IJDAR), vol. 5, no. 4, pp. 166-194, 2003.

[Ramdane, 2003]

S. Ramdane, B. Taconet, and A. Zahour, ―Classification of forms with handwritten fields by planar hidden Markov models,‖ Pattern Recognition, vol. 36, no. 4, pp. 1045-1060, 2003.

[Ramesh, 1999]

V. E. Ramesh and M. N. Murty, ‗‗Offline signature verification using genetically optimized weighted features,‘‘ Pattern Recognition, vol. 32, no. 2, pp. 217–233, 1999.

[Rasoulian, 1990]

H. Rasoulian, W. E. Thompson, L. F. Kazda, and R. Parra-Loera, ―Application of the Mathematical Theory of Evidence to the Image Cueing and Image Segmentation Problem,‖ SPIE Signal and Image Processing Systems Performance Evaluation, vol. 1310, p. 199-206, 1990.

[Romero, 2007]

V. Romero, A. H. Toselli, L. Rodriguez, and E. Vidal, ―Computer Assisted Transcription for Ancient Text Images,‖ in Proc. Image Alalysis and Recognition, Lecture Notes in Computer Science, Eds. Berlin, Germany: Springer-Verlag, Montreal, Canada, vol. 4633, pp. 1182–1193, August 2007.

[Ross, 2006]

Ross, K. Nandakumar, and A. K. Jain. Handbook of Multibiometrics. Springer-Verlag, New York, 2006.

[Ross, 2003]

Ross and A. K. Jain, ‗‗Information fusion in biometrics,‘‘ Pattern Recognition Letters, vol. 24, no. 13, pp. 2115–2125, 2003.

[Ruspini, 1992]

E. H. Ruspini, D. J. Lowrance, and T. M. Start, ―Understanding Evidential Reasoning,‖ International Journal of Approximate Reasoning, vol. 6, pp. 401-424, 1992.

[Ruta, 2000]

D. Ruta and B. Gabrys, ―An overview of classifier fusion methods,‖ Computing and Information Systems, vol. 7, no. 1, pp. 1-10, 2000.

[Sabourin, 1994]

M. Sabourin and G. Genest, ―Coopération de classificateurs pour la vérification automatique des signatures,‖ in Proc. 3ème Colloque National sur l'Ecrit et le document, pp. 89-98, Rouen, 1994.

[Sabourin, 1997]

R. Sabourin, G. Genest, and F. Prêteux, ―Off-line signature verification by local granulometric size distributions,‖ IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, no. 9, pp. 976–988, 1997.

[Sansone, 2000]

C. Sansone and M. Vento, ‗‗Signature verification: Increasing performance by a multistage system,‘‘ Pattern Analysis and Application, vol. 3, pp. 169-181, 2000.

[Santos, 2004]

C. Santos, E. J. R. Justino, F. Bortolozzi, and R. Sabourin, ‗‗An offline signature verification method based on document questioned expert's approach and a neural network classifier,‘‘ in Proc. 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 498-502, 2004.

[Schalkoff, 1991]

R. Schalkoff. Pattern Recognition: Statistical, Structural and Neural Approaches. Wiley, 384 p, 1991.

[Schölkopf, 2001]

B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson, ―Estimating the support of a high dimensional distribution,‖ Neural Competition, vol. 13, no. 7, pp. 14431472, 2001.

[Seo, 2007]

K. K. Seo, ―An application of one-class support vector machine in content-based image retrieval,‖ Expert System with Applications, vol. 33, pp. 491-498, 2007.

139

Bibliography

[Shafer, 1976]

G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, Princeton (NJ), 312 p, 1976.

[Shafer, 1990]

G. Shafer, ‗‗Perspectives on the Theory and Practice of Belief Functions,‘‘ International Journal of Approximate Reasoning, vol. 4, pp. 323-362, 1990.

[Srihari, 1993]

S. N. Srihari, ―Recognition of handwritten and machine-printed text for postal address interpretation,‖ Pattern Recognition Letters, vol. 14, no. 4, pp. 291–302, 1993.

[Srihari, 1982]

S. Srihari, ―Reliability analysis of majority vote systems,‖ Information Sciences, vol. 26, pp. 243-256, 1982.

[Singh, 2008]

R. Singh, M. Vatsa, and A. Noore, ―Integrated Multilevel Image Fusion and Match Score Fusion of Visible and Infrared Face Images for Robust Face Recognition,‖ Pattern Recognition - Special Issue on Multimodal Biometrics, vol. 41, no. 3, pp. 880-893, 2008.

[Smagt, 1996]

P. Smagt, ―Back-Propagation,‖ Chap. 4, pp. 33-46, in An Introduction to Neural Networks. Eighth edition, November, 1996.

[Smarandache, 2004]

F. Smarandache and J. Dezert. Advances and Application of DSmT for Information Fusion. Rehoboth, NM: American Research Press, vol. 1, 418 p, 2004.

[Smarandache, 2006a]

F. Smarandache and J. Dezert. Advances and Application of DSmT for Information Fusion. Rehoboth, NM: American Research Press, vol. 2, 442 p, 2006.

[Smarandache, 2006b]

F. Smarandache and J. Dezert, ―Proportional Conflict Redistribution Rules for Information Fusion,‖ Chap. 1, pp. 3-68, in Advances and Application of DSmT for Information Fusion. Rehoboth, NM: Amer. Res. Press, 2006.

[Smarandache, 2009]

F. Smarandache and J. Dezert. Advances and Applications of DSmT for Information Fusion. Rehoboth, NM: American Research Press, vol. 3, 734 p, 2009.

[Smets, 1990]

Ph. Smets, ―The Combination of Evidence in the Transferable Belief Model,‖ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 5, pp. 447–458, 1990.

[Smets, 2000]

Ph. Smets, ―Data Fusion in the Transferable Belief Model,‖ in Proc. 3rd International Conference on Information Fusion (ICIF), Paris, France, vol. 1, pp. PS21-PS33, July 10-13, 2000.

[Smets, 1999]

Ph. Smets, ‗‗Practical uses of belief functions,‘‘ in K. B. Laskey and H. Prade, editors, Fifteenth Conference on Uncertainty in Artifficial Intelligence, Stockholm, Sweden, vol. 99, pp. 612-621, July 1999.

[Smets, 2002]

Ph. Smets, ―The application of matrix calculus for belief functions,‖ International Journal of Approximate Reasoning, vol. 31, pp. 1-30, 2002.

[Smets, 1994]

Ph. Smets and R. Kennes, ‗‗The transferable belief model,‘‘ Artificial Intelligence,vol. 66,pp. 191-234, 1994.

[Srihari, 2004]

S. Srihari, A. Xu, and M. Kalera, ‗‗Learning strategies and classification methods for offline signature verification,‘‘ in Proc. 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 161–166, 2004.

[Starner, 1994]

T. Starner, J. Makhoul, R. Schwartz, and G. Chou, ―On-line cursive handwriting recognition using speech recognition methods,‖ in Proc. International Conference on Acoustics, Speech and Signal Processing, vol. 5, pp. 125–128, 1994.

[Sudano, 2002]

J. Sudano, ―The system probability information content (PIC) relationship to contributing components, combining independent multi-source beliefs, hybrid and pedigree pignistic probabilities,‖ in Proc. International Conference on Information Fusion (ICIF), Annapolis, Maryland, U.S.A., vol. 2, pp. 1277–1283, July 2002.

[Sun, 2010]

Y. Sun and L. Bentabet, ―A particle filtering and DSmT Based Approach for Conflict Resolving in case of Target Tracking with multiple cues,‖ Journal of Mathematical Imaging and Vision, vol. 36, no. 2, pp. 159-167, 2010.

[Tappert, 1990]

C. C. Tappert, C. Y. Suen, and T. Wakara, ―The state of the art in on-line handwriting recognition,‖ IEEE Trans. on Pattern Analysis and Machine Recognition, vol. 12, no. 8, pp. 787–808, 1990.

[Tarassenko, 1995]

L. Tarassenko, P. Hayton, and M. Brady, ―Novelty detection for the identification of masses in mammograms,‖ in Proc. 4th International IEEE Conference on Artificial Neural Networks, vol. 409, pp. 442–447, 1995.

140

Bibliography [Tax, 1999]

D. M. J. Tax and R. P. W. Duin, ―Support Vector Domain Description,‖ Pattern Recogn. Lett, vol. 20, Issues 11-13, pp. 1191–1199, 1999.

[Tax, 2001]

D. Tax and R. Duin, ―Uniform object generation for optimizing one-class classifiers,‖ J. Machine Learning Research, vol. 2, pp. 155-173, 2001.

[Tay, 2003]

Y. H. Tay, P. M. Lallican, M. Khalid, C. Viard-Gaudin, and S. Knerr, ―Offline handwritten word recognition using a hybrid Neural network and hidden Markov model,‖ IEEE International Symposium on Computational Intelligence in Robotics and Automation, vol. 3, pp. 1190-1195, 2003.

[Tombak, 2001]

M. Tombak, A. Isotamm, and T. Tamme, ‗‗logical method for counting Dedekind numbers,‘‘ Lecture Notes in Computer Science, Springer-Verlag, vol. 2138, pp. 424-427, 2001.

[Toselli, 2010]

H. Toselli, V. Romero, M. Pastor, and E. Vidal, ―Multimodal interactive transcription of text images,‖ Pattern Recognition, vol. 43, no. 5, pp. 1824–1825, 2010.

[Tran, 2005]

Q. A. Tran, X. Li, and H. Duan, ―Efficient performance estimate for one-class support vector machine,‖ Pattern Recognition Letters, vol. 26, pp. 1174–1182, 2005.

[Tumer, 1996]

K. Tumer and J. D. Ghosh, ―Estimating the bayes error rate through classifier combining,‖ in Proc. 13th International Conference on Pattern Recognition (ICPR), Austin, Texas, USA, pp. 695-699, 1996.

[Van, 2009]

C. E. van den Heuvel, K. Y. Franke, and L. G. Vuurpijl (et al), ―The ICDAR 2009 signature verification competition,‖ in Proc. 10th International Conference Document Analysis and Recognition (ICDAR), Barcelona, Spain, pp. 1403 – 1407, July 26-29, 2009.

[Vapnik, 1995]

V. Vapnik. The nature of statistical learning theory. Springer Verlag, New York, USA, 1995.

[Vapnik, 1998]

V. Vapnik. Statistical Learning Theory. Wiley-Interscience Publication, 740 pages, 1998.

[Vatsa, 2009a]

M. Vatsa, R. Singh, A. Noore, and M. Houck, ‗‗Quality-Augmented Fusion of Level-2 and Level-3 Fingerprint Information using DSm Theory,‘‘ International Journal of Approximate Reasoning, vol. 50, no. 1, 2009.

[Vatsa, 2009b]

M. Vatsa, R. Singh, and A. Noore, ‗‗Unification of Evidence Theoretic Fusion Algorithms: A Case Study in Level-2 and Level-3 Fingerprint Features,‘‘ IEEE Transaction on Systems, Man, and Cybernetics - A, vol. 29, no. 1, 2009.

[Vatsa, 2010]

M. Vatsa, R. Singh, A. Ross, and A. Noore,‗‗Dynamic Selection in Biometric Fusion Algorithms,‘‘ IEEE Transaction on Information Forensics and Security, vol. 5, no. 3, pp. 470-479, 2010.

[Voorbraak, 1991]

F. Voorbraak, ―Justification of Dempster‘s rule of combination,‖ Artificial Intelligence, vol. 48, no. 2, pp. 171-197, 1991.

[Wahba, 1993]

G. Wahba, Y. Wang, C. Gu, R. Klein, and B. Klein, ―Structured Machine Learning for 'Soft Classification' with Smoothing Spline ANOVA and Stacked Tuning, Testing and Evaluation,‖ in Proc. Advances in Neural Information Processing, J. D. Cowan, G. Tesauro, and J. Alspector, Eds. San Francisco, CA: Morgan Kaufmann Publishers, vol. 6, pp. 415-422, 1993.

[Walley, 1991]

P. Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, 1991.

[Wan, 2003]

L. Wan, Z. Lin, and R. C. Zhao, ‗‗Signature verification using integrated classifiers,‘‘ in Proc. 4th Chinese Conference on Biometric Recognition, Beijing, China, pp. 7–8, 2003.

[Wang, 2011]

Q. F. Wang, F. Yin, and C. L. Liu, ―Improving Handwritten Chinese Text Recognition by Confidence Transformation,‖ in Proc. 11th International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, pp. 518-522, September 18-21, 2011.

[Weston, 1998]

J. Weston and C. Watkins, Support Vector Machines for Multi-Class Pattern Recognition Machines, Tech. Rep, CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX, UK, 1998.

[Xu, 1992]

L. Xu, A. Krzyzak, and C. Y. Suen, ―Methods of combining multiple classifiers and their applications to handwriting recognition,‖ IEEE Transaction on Systems, Man, And Cybernetics, vol. 22, no. 3, pp. 418-435, 1992.

[Yang, 2007a]

M. Yang, Z. Yin, Z. Zhong, S. Wang, P. Chen, and Y. Xu, ―A Contourlet-based Method for Handwritten Signature Verification,‖ in Proc. IEEE International Conference on

141

Bibliography Automation and Logistics, Jinan, China, pp. 1561-1566, August 18-21, 2007.

[Yang, 2007b]

X. Y. Yang, J. Liu, M. Q. Zhang, and K. Niu, ―A New Multi-class SVM Algorithm Based on One-Class SVM,‖ in Proc. 7th Int. Conference on Computational Science, Part III: ICCS, Y. Shi et al., Eds. Berlin, Heidelberg: Springer-Verlag, vol. 4489, pp. 677-684, 2007.

[Zadeh, 1968]

L. A. Zadeh, ‗‗Fuzzy algorithm,‘‘ Information and Control, vol. 12, pp. 94-102, 1968.

[Zadeh, 1978]

L. A. Zadeh, ‗‗Fuzzy sets as the basis for a theory of possibility,‘‘ Fuzzy Sets Syst. vol. 1, pp. 3–28, 1978.

[Zadeh, 1979]

L. A. Zadeh, The Validity of Dempster‘s rule of Combination of Evidence, Memo M 79/24: University of California, Berkeley, 1979.

[Zhang, 2000]

G. P. Zhang, ―Neural Networks for Classification: A Survey,‖ IEEE Tranactions on Systems, Man, and Cybernetics - Part C, vol. 30, no. 4, pp. 641-662, 2000.

[Zhang, 2002]

K. Zhang, E. Nyssen, and H. Sahli, ‗‗A multi-stage online signature verification system,‘‘ Pattern Analysis and Application, vol. 5, pp. 288-295, 2002.

[Zhun-ga, 2012]

Zhun-ga Liu, J. Dezert, G. Mercier, and Q. Pan, ―Dynamical Evidential Reasoning For Changes Detections In Remote Sensing Images,‖ IEEE Trans. on Geoscience and Remote Sensing, vol. 50, no. 5, pp. 1955-1967, 2012.

[Zois, 1999]

E. N. Zois and V. Anastassopoulos, ―Fusion of correlated decisions for writer verification,‖ Pattern Recognition, vol. 32, pp. 1821-1823, 1999.

[Zouari, 2004]

H. K. Zouari, Contribution à l'évaluation des méthodes de combinaison parallèle de classifieurs par simulation, Thèse de doctorat en Sciences appliquées, spécialité Informatique, Université de Rouen, U.F.R. des sciences et techniques, France, 270 pages, 2004.

142

SVM Classifier Combination for Handwritten

SVM Classifier Combination for Handwritten

Suggest Documents

Combination of minimum enclosing balls classifier with SVM in coal ...

Kernel Combination for SVM Speaker Verification - CRIM

Architecture for Classifier Combination Using Entropy ... - CEDAR

Modeling Consensus: Classifier Combination for Word Sense ...

ENTROPY BASED CLASSIFIER COMBINATION FOR SENTENCE

Performance Comparison of SVM and ANN for Handwritten ... - arXiv

Performance Comparison of SVM and ANN for Handwritten ... - arXiv

Performance Comparison of SVM and ANN for Handwritten ... - arXiv

Optimized Multi Class SVM Classifier for Named Entity ... - IJET

SVM Classifier for Impulse Fault Identification in Transformers using

Classifier Fusion for SVM-Based Multimedia Semantic ... - CiteSeerX

Learning Unsupervised SVM Classifier for Answer ... - ACL Anthology

A multi-class SVM classifier ensemble for automatic hand washing ...

An SVM Classifier for Fatigue-Detection using Skin ...

Named Entity Recognition through Classifier Combination

The combination approach of SVM and ECOC for powerful ...

Combination of Feature Extraction Methods for SVM ... - Robesafe

[PDF] SVM-DSmT Combination for Off-Line Signature Verification

Clustering-and-Selection Model for Classifier Combination Ludmila I ...

Trust-Based Classifier Combination for Network ... - Semantic Scholar

Trust-Based Classifier Combination for Network ... - Semantic Scholar

multiple classifier combination for target identification from high ...

SVM classifier based feature selection using GA ... - Semantic Scholar

training a svm-based classifier in distributed sensor ... - FORTH-ICS