ARTICLE IN PRESS
Neurocomputing 69 (2006) 869–873 www.elsevier.com/locate/neucom
Letters
Experimental comparison of one-class classifiers for online signature verification Loris Nanni DEIS, IEIIT—CNR, Universita` di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy Received 13 May 2005; received in revised form 9 June 2005; accepted 12 June 2005 Available online 3 October 2005 Communicated by R.W. Newcomb
Abstract Online signature verification systems based on one-class classifiers are presented. Global information is extracted with a feature-based representation and recognized by using an ensemble of one-class classifiers. Experimental results obtained on the SUBCORPUS-100 MCYT signature database (100 signers, 5000 signatures) show that the machine experts, here proposed, outperform the state-of-the-art works both for random and skilled forgeries. r 2005 Elsevier B.V. All rights reserved. Keywords: Online signature; Parzen window classifiers; One-class classifiers
1. Introduction In online signature verification the time functions of the dynamic signing process (e.g., position trajectories or pressure versus time) are available for recognition. The main approaches proposed in the literature in order to extract relevant information from online signature data [1,2,4,5,8,9] are: (i) feature-based approaches, in which a holistic vector representation consisting of global features is derived from the acquired signature trajectories and (ii) function-based approaches, in which time sequences describing local properties of the signature are used for recognition (e.g., position trajectory, velocity, acceleration, force or pressure). In [7], an online signature verification system based on fusion of local and global information is presented. In this paper, we propose a new approach that extends the previous work with respect to the global feature-based machine expert. The results obtained by the subsystem exploiting global information in [7] were Corresponding author at: DEIS, IEIIT—CNR, Alma Mater Studiorum, Universita` di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy. Tel.:+39 349 3511673; fax: +39 0547 338890. E-mail address:
[email protected].
0925-2312/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2005.06.007
characterized by high variability of the performance changing the system parameter (number of features retained); in [13], the authors show that a random subspace-based ensemble of Parzen window classifiers [12] makes the performance more stable with respect to the system parameters. The problem in one-class classification is to make a description of a target set of objects. The difference with conventional classification is that in one-class classification only examples of one class are available. The objects from this class will be called the target objects. All other objects are by definition the outlier objects. In the signature online verification problem, an object is a signature, while individuals are the classes. Using one-class classifier we can build a classifier for each individual without any knowledge of the other individuals. Results using all the 5000 signatures from 100 subjects of the SUBCORPUS-100 MCYT bimodal biometric database [6] are presented, yielding remarkable performance improvement both with random and skilled forgeries. The machine expert based on global information is described in Section 2. Experimental procedure and results are given in Section 3. Finally, conclusions are drawn in Section 4.
ARTICLE IN PRESS 870
L. Nanni / Neurocomputing 69 (2006) 869–873
2. Machine expert based on global information The system proposed here is based on a previous work [7,13], where the complete set of 100 global features is detailed (a d-dimensional feature vector, d ¼ 100). The random subspace method is the combining technique proposed by Ho [3]. This method modifies the training data set (generating K new training sets), builds classifiers on these modified training sets, and then combines them into a final decision rule. The new training sets contain only a subset of all the features. The percentage of features retained in each training set is fixed at 60%. In our experiments, we set K ¼ 100, and we combine the classifiers using the ‘‘max rule’’ [10]. Considering the similarity matching score estimated by each of the K classifiers between an input signature s and its claimed identity c, the max rule selects as final score the maximum score of the pool of K classifiers. In this paper we make an experimental comparison of random subspace ensembles of one-class classifiers. In the following, a description of these one-class classifiers is given (for a detailed description of these methods see [14]).
distance) of this object (that means the distance between object NN(x) and NN(NN(x)). 2.4. Principal component analysis description (PCAD) This method describes the target data by a linear subspace. This subspace is defined by the eigenvectors of the data covariance matrix of the training set. Only k eigenvectors are used. Assume they are stored in a d k matrix W (where d is the dimensionality of the original feature space). To check if a new object fits the target subspace, the reconstruction error is computed. The reconstruction error is the difference between the original object and the projection of that object onto the subspace (in the original data). This projection is computed by xproj ¼ WðWT WÞ1 Wx.
(2)
The reconstruction error is f ðxÞ ¼ jjx xproj jj2 .
(3)
2.5. Support vector data description (SVD) 2.1. Gaussian model description (GAUSSD) The probability distribution for a d-dimensional object x is given by 1 1 T 1 f ðxÞ ¼ ðx mÞ exp S ðx mÞ , 2 ð2pÞd=2 detðSÞ1=2 (1) where det(.) denotes the determinant of the matrix, l is the mean and R is the covariance matrix of the training set. The classifier is very simple and it imposes a strict unimodal and convex density model on the data. 2.2. Mixture of Gaussians description (MOGD) The Gaussian distribution assumes a very strong model of the data. It should be unimodal and convex. To obtain a more flexible density method, the normal distribution can be extended to a mixture of Gaussians (MoG). An MoG is a linear combination of normal distributions. It has a smaller bias than the single Gaussian distribution, but it requires far more data for training (and thus shows more variance when a limited amount of data is available). In this paper, the number of Gaussians is fixed to 3 (MOGD_3) or 2 (MOGD_2), the means and covariances of the individual Gaussian components can efficiently be estimated by an expectation minimization routine. 2.3. Nearest-neighbour method description (NND) Here, a new object is evaluated by computing the distance to its nearest neighbour NN(x). This distance is normalized by the nearest-neighbour distance (Euclidean
To describe the domain of a data set, we enclose the data in a hypersphere with minimum volume. By minimizing the volume of the captured feature space, we hope to minimize the chance of accepting outlier objects. Analogous to the ‘‘standard’’ support vector machines, we can replace the inner products (x y) by kernel functions K(x,y) which gives a much more flexible method. Especially, the Gaussian kernel appears to provide a good data transformation. 2.6. Linear programming description (LPD) This data descriptor is specifically constructed to describe target objects which are represented in terms of distances to other objects. In some cases it might be much easier to define distances between objects than informative features. The classifier has basically the following form: X f ðxÞ ¼ wi dðx; xi Þ, (4) i
where d(x,y) is the Euclidean distance between x and y. The weights w are optimized such that just a few weights stay non-zero, and the boundary is as tight as possible around the data. 2.7. Parzen window classifier (PWC) The Parzen window density estimate can be used to approximate the probability density p(x) of a vector of continuous random variables X. It involves the superposition of a normalized window function centered on a set of samples. Given a set of n d-dimensional samples D ¼ {x1,x2,y,xn} the pdf estimate by the Parzen window
ARTICLE IN PRESS L. Nanni / Neurocomputing 69 (2006) 869–873
is given by ^ pðxÞ ¼ 1n
n X
jðx xi ; hÞ;
(5)
i¼1
where j( , ) is the window function and h is the window width parameter (h ¼ 0.125 in our tests). We use Gaussian kernels with covariance matrix S ¼ identity as window function (det(.) denotes the determinant of the matrix): T 1 1 y S y jðy; hÞ ¼ exp . (6) d=2 d 1=2 2h2 ð2pÞ h detðSÞ _
We get the estimate of the conditional pdf p ðs=cÞ of each class c using the Parzen window method as 1 X _ jðs si ; hÞ, (7) p ðs=cÞ ¼ #Dc s 2D i
c
where #Dc denotes the cardinality of the set Dc. Given the feature vector s of an input signature and a claimed identity c, the verification procedure of the PWC is performed by evaluating Eq. (7). 3. Experiments The 100 signers of the SUBCORPUS-100 MCYT database are used for the experiments (100 signers with 25 genuine signatures and 25 skilled forgeries per signer— forgers are provided the signature images of the clients to be forged and, after training with them several times, they are asked to imitate the shape with natural dynamics, i.e., without breaks or slowdowns). Signature corpus is divided Table 1 Comparison of the one-class classifiers
PWC SVD NND MOGD_3 MOGD_2 GAUSSD LPD PCAD
Skilled_5
Skilled_20
Random_5
Random_20
9.7 8.9 12.2 8.9 8.1 7.7 9.4 7.9
5.2 5.4 6.3 7.3 7 4.4 5.6 4.2
3.4 3.8 6.9 5.4 5.2 5.1 3.6 3.8
1.4 1.6 2.1 4.3 3.3 1.5 2.5 1.4
871
into the training and the test sets. In case of considering skilled forgeries, the training set comprises the first 5 genuine signatures or the first 20 genuine signatures and the test set consists of the remaining samples (i.e., 100 20 client, respectively, and 100 25 impostor similarity test scores). In case of considering random forgeries (i.e., impostors are claiming others’ identities using their own signatures), client similarity scores are as above and we use one signature of every other user as impostor data so the number of impostor similarity scores is 100 99. We compare our methods with the approach based on global information (GI) proposed in [13], which to our knowledge is the state-of-the-art method for signature verification. Please note that GI outperforms [7,13] the local system based on hidden Markov model that was ranked in first and second place for random and skilled forgeries, respectively, in the Signature Verification Competition 2004 [11]. For performance evaluation we adopt the equal error rate (EER) [4], that is the error rate when the frequency of fraudulent accesses (false acceptance rate (FAR)) and the frequency of rejections of people who should be correctly verified (false rejection rate, (FRR)) assume the same value; it can be adopted as a unique measure for characterizing the security level of a biometric system. In Tables 1–6 we report the EER obtained by our systems; in these tables we show that our systems permit to obtain improvement in comparison to the state-of-the-art work [7] of signature matchers. In these tables, Skilled_5 denotes skilled forgeries where the training set comprises the first 5 genuine signatures; Skilled_20 denotes skilled forgeries where the training set comprises the first 20 genuine signatures; Random_5 denotes random forgeries where the training set comprises the first 5 genuine signatures; Random_20 denotes random forgeries where the training set comprises the first 20 genuine signatures. The experimental results show that:
(1) The performance of the approach based on GI proposed in [13] (PWC) is not always the best method; (2) The fusion between classifiers (i.e. PCAD+PWC, NND+PWC, LPD+PCAD) permits to reduce the EER;
Table 2 Fusion using the ‘‘sum rule’’ between the classifiers, Skilled_5
SVD NND MOGD_3 MOGD_2 GAUSSD LPD PCAD PWC
SVD
NND
MOGD_3
MOGD_2
GAUSSD
LPD
PCAD
PWC
8.9 — — — — — — —
9 12.2 — — — — — —
8.4 10.3 8.9 — — — — —
8.4 10.3 8.5 8.1 — — — —
8.4 10.3 8.1 8 7.7 — — —
6.9 7.3 9.4 9.4 9.4 9.4 — —
8.3 8.4 7.3 7.2 7.2 6.5 7.9 —
6.6 7.4 8.6 8.6 8.6 9 6.5 9.7
ARTICLE IN PRESS L. Nanni / Neurocomputing 69 (2006) 869–873
872
Table 3 Fusion using the ‘‘sum rule’’ between the classifiers, Skilled_20
SVD NND MOGD_3 MOGD_2 GAUSSD LPD PCAD PWC
SVD
NND
MOGD_3
MOGD_2
GAUSSD
LPD
PCAD
PWC
5.4 — — — — — — —
4.4 6.3 — — — — — —
5 5.5 7.3 — — — — —
5 5.4 6 7 — — — —
5 5.4 4.8 4.6 4.4 — — —
3.5 3.6 5.6 5.6 5.6 5.6 — —
4.2 3.8 4.2 4.2 4.2 3 4.2 —
3 3.1 5.2 5.2 5.2 4.8 2.8 5.2
Table 4 Fusion using the ‘‘sum rule’’ between the classifiers, Random_5
SVD NND MOGD_3 MOGD_2 GAUSSD LPD PCAD PWC
SVD
NND
MOGD_3
MOGD_2
GAUSSD
LPD
PCAD
PWC
3.8 — — — — — — —
3.7 6.9 — — — — — —
3.7 5.8 5.4 — — — — —
3.7 5.8 5.2 5.2 — — — —
3.7 5.8 5 5 5.1 — — —
2.2 2.5 3.6 3.6 3.6 3.6 — —
3.7 4.2 3.5 3.6 3.6 2.1 3.8 —
2.2 2.8 3.3 3.3 3.3 3.2 2.6 3.4
Table 5 Fusion using the ‘‘sum rule’’ between the classifiers, Random_20
SVD NND MOGD_3 MOGD_2 GAUSSD LPD PCAD PWC
SVD
NND
MOGD_3
MOGD_2
GAUSSD
LPD
PCAD
PWC
1.6 — — — — — — —
1.3 2.1 — — — — — —
1.5 1.9 4.3 — — — — —
1.5 1.8 3 3.3 — — — —
1.5 1.8 1.7 1.6 1.5 — — —
1.6 0.7 2.5 2.5 2.5 2.5 — —
1.3 0.9 1.3 1.3 1.3 1.3 1.4 —
1.1 0.7 1.3 1.3 1.3 1.2 1.1 1.4
combine a discriminant function-based classifier (LPD) and a subspace-based classifier (PCAD).
Table 6 Comparison among the best method tested in this paper
Skilled_5 Skilled_20 Random_5 Random_20
PWC
PCAD+PWC
NND+PWC
LPD+PCAD
9.7 5.2 3.4 1.4
6.5 2.8 2.6 1.1
7.4 3.1 2.8 0.7
6.5 3 2.1 1.3
(3) The best method is the fusion between LPD and PCAD. It is well known in literature [15] that classifier ensembles that enforce diversity fare better than ones that do not. It is believed [15] that classifiers based on different methodologies offer complementary information about the patterns to be classified. We propose a fusion of classifiers that
4. Conclusions Online signature recognition systems based on global analysis of input signatures and one-class classifiers have been described. Our fusion between PCAD and PWC drastically outperforms the state-of-the-art works (PWC). Moreover, we show that LPD+PCAD is the best method when the training set comprises only the first 5 genuine signatures. Acknowledgments The authors would like to thank J. Fierrez-Aguilar and J. Ortega-Garcia for sharing SUBCORPUS-100 MCYT data set.
ARTICLE IN PRESS L. Nanni / Neurocomputing 69 (2006) 869–873
References [1] J.C. Bezdek, Pattern Recogniton with Fuzzy Objective Function Algorithms, Plenum, New York, 1981. [2] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, second ed, Wiley, New York, 2000. [3] T.K. Ho, The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. Mach. Intell. 20 (8) (1998) 832–844. [4] A.K. Jain, F. Griess, S. Connell, On-line signature verification, Pattern Recog. 35 (12) (2002) 2963–2972. [5] J. Ortega-Garcia, J. Fierrez-Aguilar, J. Martin-Rello, J. GonzalezRodriguez, Complete signal modelling and score normalization for function-based dynamic signature verification, Audio- and VideoBased Biometric Person Authentication 2003, Guilford, UK, 9–11 June 2003, pp. 658–667. [6] J. Ortega-Garcia, J. Fierrez-Aguilar, D. Simon, et al., MCYT baseline corpus: bimodal biometric database, IEE Proc. Vision Image Sens. Process. 150 (6) (2003) 395–401. [7] J. Fierrez-Aguilar, L. Nanni, J. Lopez-Penalba, J. Ortega-Garcia, D. Maltoni, An on-line signature verification system based on fusion of local and global information, Audio- and Video-based Biometric Person Authentication 2005, New York, USA, 20–22 July 2005, to appear. [8] R. Plamondon, G. Lorette, Automatic signature verification and writer identification—the state of the art, Pattern Recog. 22 (2) (1989) 107–131. [9] D. Sakamoto, et al., On-line signature verification incorporating pen position, pen pressure and pen inclination trajectories, IEEE
[10] [11]
[12] [13]
[14]
[15]
873
International Conference on Acoustics, Speech, and Signal Processing 2001, Salt Lake City, USA, 9, May 2005, pp. 993–996. J. Kittler, M. Hatef, R. Duin, J. Matas, On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell. 20 (3) (1998) 226–239. D.Y. Yeung, et al., SVC2004: first international signature verification competition, International Conference on Biometric Authentication 2004, Fort Lauderdale, USA, 15–17 July 2004, pp. 16–22. E. Parzen, On the estimation of a probability density function and mode, Ann. Math. Statist. 33 (March-3) (1962) 1064–1076. L. Nanni, A. Lumini, Ensemble of Parzen window classifiers for online signature verification, Neurocomputing 68 (October-7) (2005) 217–224. D.M.J. Tax, One-class classification; concept-learning in the absence of counter-examples, Delft University of Technology, June 2001, ISBN:90-75691-05-x. L.I. Kuncheva, Diversity in multiple classifier systems, Inform. Fusion 6 (1) (2005) 3–4.
Loris Nanni is a Ph.D. Candidate in Computer Engineering at the University of Bologna, Italy. He received his Master Degree cum laude in 2002 from the University of Bologna. In 2002, he started his Ph.D. in Computer Engineering at DEIS, University of Bologna. His research interests include pattern recognition, and biometric systems (fingerprint classification and recognition, signature verification, face recognition).