Expert Systems with Applications 36 (2009) 5291–5296
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Ensemble of on-line signature matchers based on OverComplete feature generation Alessandra Lumini, Loris Nanni * DEIS, IEIIT – CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy
a r t i c l e
i n f o
Keywords: On-line signature Ensemble of classifiers One-class classifiers OverComplete feature generation
a b s t r a c t A novel method for building an ensemble of on-line signature verification systems based on one-class classifiers is presented. The ensemble is built concatenating the classifiers obtained by the Random Subspace on the ‘‘original features” and a set of classifiers each trained selecting a different set of ‘‘artificial features” for each different subset of users that belong to the validation set. The ‘‘artificial features” are extracted using an OverComplete global feature combination, starting from a set of global features a set of artificial features is created by applying mathematical operators to a randomly extracted set of the original ones, then a small subset is selected for verification by running sequential forward floating selection (SFFS). Finally a set of One-class classifiers are used to classify, between genuine and impostor, each match between two signatures. As dataset the MCYT signature database is used, our results show that the proposed ensemble outperforms the ensembles based only on the original features. Using only 5 genuine signatures for each user our best system obtains an equal error rate of 4.5 in the skilled forgeries and 1.4 in the Random Forgeries, when 20 genuine signatures are used to train the classifiers an equal error rate of 2.2 in the skilled forgeries and 0.5 in the Random Forgeries are obtained. Ó 2008 Elsevier Ltd. All rights reserved.
1. Introduction The difference between on-line signature verification and offline signature verification is that the features that describe the on-line signature are extracted from the dynamic signing process of the signature (Kholmatov & Yanikoglu, 2005). See (Fig. 1) for the information that could be acquired from an acquisition device for a given on-line signature. Several methods are yet proposed in the literature (Jain, Griess, & Connell, 2002; Ortega-Garcia, Fierrez-Aguilar, Martin-Rello, & Gonzalez-Rodriguez, 2003; Plamondon & Lorette, 1989; Sakamoto et al., 2001), it is possible to distinguish three approaches for extracting information from on-line signatures: (a) Feature-based approaches, a set of global features are used to train a classifier (Fierrez-Aguilar, Nanni, Lopez-Penalba, Ortega-Garcia, & Maltoni, 2005). In (Nanni, 2006; Nanni & Lumini, 2005) it is shown that a Random Subspace of oneclass classifier permits to obtain results better than those obtained by a stand-alone one-class classifier.
* Corresponding author. E-mail address:
[email protected] (L. Nanni). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.06.069
(b) Local function-based, the time functions of the different signatures are directly matched by using elastic distance measures such as dynamic time warping (DTW) (Martin et al. 1997; Piyush Shanker & Rajagopalan, 2007). For example the authors of Martin et al. (1997) propose a modified DTW algorithm, based on the stability of the components of the signature, that outperforms the standard DTW. (c) Regional function-based, in these methods each signature is described by a sequence of vectors describing regional properties, these vectors are used to train a given classifier (e.g. the Hidden Markov Models (HMM) Fierrez-Aguilar, OrtegaGarciaet al. (2005); Fierrez-Aguilar, Krawczyk, et al. (2005)). In (Nanni & Lumini, 2008) the Discrete Cosine Transform (DCT) is used to reduce the approximation coefficients vector obtained by a given Wavelet transform to a feature vector of a given dimension. The Linear Programming Descriptor classifier is trained using the DCT coefficients. Finally, we want to stress that one of the major research trend in on-line signature verification is to combine different systems for building an ensemble of classifiers (e.g. Van et al., xxxx; Fierrez-Aguilar et al., 2005; Nanni & Lumini, 2005; Fierrez-Aguilar, Krawczyk, et al. (2005); Nanni, 2006).
5292
A. Lumini, L. Nanni / Expert Systems with Applications 36 (2009) 5291–5296
Fig. 1. Information acquired for a given signature (from Fierrez-Aguilar et al. (2005)). The signature is parameterized then by 7 discrete-time functions (a) and first order time derivatives of all of them (b). Ns, p, h, v, q and a stand respectively for signature time duration in time samples, pressure, path tangent angle, path velocity magnitude, log curvature radius and total acceleration magnitude.
In this work, a novel method for building an ensemble of oneclass classifiers (Duda, Hart, & Stork, 2000; Parzen, 1962) is presented. The ensemble is built concatenating the classifiers obtained by the Random Subspace on the ‘‘original features” and a set of classifiers each trained using the ‘‘artificial features” selected for a random set of validation users. The idea to create a set of ‘‘artificial” features starting from a set of ‘‘original” features was proposed for the first time in Xiao, Zhang, and Zhang (2004). The authors propose a three step feature selection and combination for finding an appropriate representation of the patterns. In the first step they remove the original irrelevant features, then a polynomial composition set is build, finally the most discriminative features are selected by a feature selection algorithm. In this work the artificial features are created using the method that we have proposed in Nanni and Lumini (in press b): from a first set of features, named ‘‘original”, a second set, named ‘‘artificial” features, is generated by applying mathematical operators to a randomly extracted set of the original ones, then a small subset is selected for classification by running Sequential Forward Floating Selection (SFFS) (Pudil et al., 1994). The results reported in Section 3 obtained using the 100 subjects of the SUBCORPUS-100 MCYT Bimodal Biometric Database (Ortega-Garcia, Fierrez-Aguilar, Simon, et al., 2003) confirm the goodness of the proposed ensemble of classifiers. The proposed system is described in Section 2. Experimental results are given in Section 3. Finally, the conclusions are drawn in Section 4. 2. System proposed In this work we use the same set of 100 global features (the ‘‘original” features) detailed in Fierrez-Aguilar et al. (2005), Nanni
and Lumini (2005). Several one-class classifiers1,2 are tested (For a detailed description of these methods see (Tax, 2007): Gaussian model description (GAUSSD). Mixture of Gaussians description (MOGD), in this paper, the number of Gaussians is fixed to 3. Nearest neighbour method description (NND). Principal component analysis description (PCAD). Support vector data description (SVDD). Linear programming description (LPD). Parzen window classifier (PWC). We have used the same one-class classifiers with the same parameters proposed in Nanni (2006), in this way it is possible an easy comparison between the proposed ensemble and the standard Random Subspace on the ‘‘original” features. The Random Subspace method (Ho, 1998) builds the ensemble perturbing the features set, each classifier is trained using a modified training dataset that contains a subset (60% in this paper) of all the features. In our experiments, we create 100 different training set and we combine the classifiers using the ‘‘max rule” (Kittler, Hatef, Duin, & Matas, 1998) (as in Nanni (2006)). Considering the similarity matching score estimated by each of the K classifiers between an input signature s and its claimed identity c, the max rule selects as final score the maximum score of the pool of K classifiers. To generate the ‘‘artificial” features we use the method that we have proposed in Nanni and Lumini (in press b), a large number (N = 1000) of ‘‘artificial” features is carried out by applying a ran-
1 All the classifiers of this work are implemented as in the free DD_Tool 0.95 or in the PRTools 3.1.7 Matlab Toolboxes. 2 The difference with conventional classification is that in one-class classification only examples of one class (i.e. one user) are available.
A. Lumini, L. Nanni / Expert Systems with Applications 36 (2009) 5291–5296
Random Subspace
One-class classifiers
Artificial Generation
SFFS
Fusion on-line signature training set
Validation set extraction
Fig. 2. Block-diagram of the proposed biometric authentication system.
dom number r (between 1 and 5) of mathematical operators to one or more ‘‘original” features. The input of the first operator is always one or two ‘‘original” features, while the i th operator takes as input the output of (i-1)th operator and, only for binary operators, a randomly extracted ‘‘original” feature. The mathematical operators3 are: (a) sum; (b) subtraction; (c) product; (d) division; (e) square; (f) square root; (g) sin; (h) cosine; (i) arc sin; (j) arc cosine; (k) tangent; (l) hyperbolic tangent; (m) reciprocal; (n) logarithm; (o) absolute value; (p) negative value; (q) nothing. To extract different sets of artificial features we randomly extract from the dataset a validation set, then different subsets of the validation set are extracted. In each subset we run the sequential forward floating selection (SFFS)4 for selecting a subset of features. The objective function of the SFFS is, as in Nanni and Lumini (in press b), the minimization of the Equal Error Rate (see Section 3) obtained in the Skilled forgeries + 3(Equal Error Rate obtained in the Random Forgeries) using PWC, as one-class classifier, trained with 5 genuine signatures per user. Finally, we train a different classifiers (combined by Sum Rule) for each set of ‘‘artificial” features selected from the random subsets of the validation set. In this work we extract 25 different subsets of the validation set, hence we train 25 different one-class classifiers. Finally, we combine, by Sum Rule, the ensemble of classifiers trained in the original space with the ensemble trained using the ‘‘artificial” features. We call this ensemble OR + AR, notice that before the fusion the scores of the two ensembles are normalized to mean 0 and standard deviation 1 (as suggested in Jain, Nandakumar, and Ross (2005)). The ratio of the method is that the subset of features are usersdependent, in this way it is possible to build an ensemble selecting different set of features from different set of validation users. Fig. 2 shows the block-diagram of the proposed biometric authentication system. 3. Experiments The 100 signers of the SUBCORPUS-100 MCYT database (OrtegaGarcia et al., 2003) are used for the experiments (100 signers with 25 genuine signatures and 25 skilled forgeries). We have used the same performance indicator, the Equal Error Rate (EER)5 (Jain et al., 2002), and the same protocol used in several other papers (e.g. Fierrez-Aguilar et al., 2005; Nanni, 2006):
3
The binary operators are: sum; subtraction; product; division. Implemented as in the PRTools 3.1.7 Matlab Toolbox. 5 EER is the error rate when the frequency of fraudulent accesses (False Acceptance Rate, FAR), and the frequency of rejections of people who should be correctly verified (False Rejection Rate, FRR) assume the same value. 4
5293
Skilled Forgeries, the training set comprises the first 5 genuine signatures or the first 20 genuine signatures and the test set consists of the remaining samples. Random Forgeries, client similarity scores are as above and we use one signature of every other user as impostor data. In this work 50 users are extracted to build the validation set and several subsets of 25 validation users are used to select the different subsets of ‘‘artificial” features. We run the experiments five times and we report the average results. Notice that the validation users are not used in the training stage and in the testing stage. We compare our methods with the approaches based on Global Information proposed in Nanni (2006), Nanni and Lumini (2005), which to our knowledge are the state-of-the-arts method for online signature verification based on global features. We want to stress that the approaches proposed in Nanni (2006) outperform also the local system based on Hidden Markov Model that was ranked in first and second place, for Random and Skilled forgeries, respectively, in Signature Verification Competition 2004 (Yeung, 2004). In Figs. 3–6 we report the equal error rate (EER) obtained changing the number NE of ‘‘artificial” features selected in each subset of the validation set. In each figure there are two images, the left image contains the EER obtained by the ensemble where only the ‘‘artificial” features are considered, the right image contains the EER obtained by the ensemble named OR + AR (see Section 2). Our experiments show that: The obtained EER is very similar varying NE. The ensemble obtained combining both ‘‘artificial” features and ‘‘original” features obtain the best performance (mainly in the Random Forgeries Protocol). In (Table 1) we compare the performance of each stand-alone classifier in both skilled forgeries and Random Forgeries. In each cell of this Table there are two values: the left values is the EER obtained in Nanni (2006), where the one-class classifiers are trained using a Random Subspace of original features; the right value is the EER obtained by the new ensemble (with NE = 20) based on both ‘‘artificial” and ‘‘original” features. It is clear that the novel method outperforms (Nanni, 2006). In Tables 2–5 the EER obtained by the fusion, by Sum Rule, between each couple of one-class classifiers (each trained using the new ensemble (with NE = 20) based on both ‘‘artificial” and ‘‘original” features) is reported. In (Table 6) we report the performance of PWC and of the two best fusions. The experimental results show that: The OverComplete feature generation permits to create a good ensemble for different one-class classifier also if in the objective function only the PWC classifier is used. The fusion between NND and PWC outperforms the other tested fusions. The OverComplete method that we have proposed in Nanni and Lumini (2005) obtained, using PWC as classifier, an EER of 6.8 in the Skilled Forgeries and 1.9 in the Random Forgeries (training set comprises the first 5 genuine signatures) but these results are obtained running SFFS on the whole dataset, in this paper we show that it is possible to use the OverComplete feature generation to build an ensemble of classifiers using only a relatively small validation set. Finally, in order to confirm the benefit of the our method the DET curve has been also considered. The DET curve (Martin et al.,
5294
A. Lumini, L. Nanni / Expert Systems with Applications 36 (2009) 5291–5296
EER 12
EER 12
10
10
8
8
6
6
4
SVD MOG LPD PWC
2 0
4
NND GAUSS PCAD
SVD+ MOG+ LPD+ PWC+
2 0
NE=
8
12
16
20
NE= 8
NND+ GAUSS+ PCAD+
12
16
20
16
20
Fig. 3. SKILLED Forgeries, the training set comprises the first 5 genuine signatures.
EER
EER
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
NE= 8
12
16
NE= 8
20
12
Fig. 4. RANDOM Forgeries, the training set comprises the first 5 genuine signatures.
EER
EER
12
SVD MOG LPD PWC
10
12
NND GAUSS PCAD
SVD+ MOG+ LPD+ PWC+
10
8
8
6
6
4
4
2
2
0
NND+ GAUSS+ PCAD+
0
NE= 8
12
16
20
NE= 8
12
16
20
Fig. 5. SKILLED Forgeries, the training set comprises the first 20 genuine signatures.
1997) is a two-dimensional measure of classification performance that plots the probability of false acceptation against the rate of false rejection. In (Fig. 5) the DET curves obtained in a single run of our experiments by: PWC trained using the novel ensemble (red line); PWC trained using the Random Subspace of ‘‘original” features (black line); the proposed fusion between PCAD and PWC (green line) (see Figs. 7 and 8).
4. Conclusions A method for building an ensemble of matchers for On-line signature recognition based on global analysis of input signatures and one-class classifiers has been described. The ensemble is built concatenating the classifiers obtained by the Random Subspace on the ‘‘original features” and a set of classifiers each trained using a dif-
5295
A. Lumini, L. Nanni / Expert Systems with Applications 36 (2009) 5291–5296
EER 9
EER 9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
NE= 8
12
16
NE= 8
20
12
16
20
Fig. 6. Random Forgeries, the training set comprises the first 20 genuine signatures.
Table 1 Comparison of the one-class classifiers (SKILLED_X means that the training set comprises the first X genuine signatures) SKILLED_5 PWC SVD NND MOGD GAUSSD LPD PCAD
9.7 8.9 12.2 8.9 7.7 9.4 7.9
6.4 7.1 9.4 5.7 5.4 6.9 6.6
SKILLED_20 5.2 5.4 6.3 7.3 4.4 5.6 4.2
RANDOM_5
3.3 4 4.4 5.2 4 4.8 3.2
3.4 3.8 6.9 5.4 5.1 3.6 3.8
2.2 3.9 6.1 2.9 2.8 2.4 3.4
RANDOM_20 1.4 1.6 2.1 4.3 1.5 2.5 1.4
1.2 1.2 1.6 2.3 1.9 1.5 0.9
Table 2 Fusion between the classifiers, Skilled Forgeries (training set comprises the first 5 genuine signatures)
SVD NND MOGD GAUSSD LPD PCAD
Table 5 Fusion between the classifiers, Random Forgeries (training set comprises the first 20 genuine signatures)
SVD
NND
MOGD
GAUSSD
LPD
PCAD
PWC
7.1 – – – – –
7.4 9.4 – – – –
5.2 5.7 5.7 – – –
5.2 6 5.5 5.4 – –
5.3 5.6 5.3 4.9 6.9 –
6.7 8 4.5 4.4 4.4 6.6
4.9 4.9 5.6 5.4 6.3 4.5
SVD NND MOGD GAUSSD LPD PCAD
SVD
NND
MOGD
GAUSSD
LPD
PCAD
PWC
1.2 – – – – –
1.1 1.6 – – – –
1.1 0.9 2.3 – – –
0.9 1.1 2 1.9 – –
1.3 0.8 1.9 2 1.5 –
0.9 1.1 1.1 1.1 1.5 0.9
1.1 0.5 1.5 1.5 1.5 1.5
Table 6 Comparison among the best methods tested in this paper
SKILLED_5 SKILLED_20 RANDOM_5 RANDOM_20
PWC
PCAD+PWC
NND+PWC
6.4 3.3 2.2 1.2
4.9 2.2 1.7 1.5
4.5 2.9 1.4 0.5
Table 3 Fusion between the classifiers, Skilled Forgeries (training set comprises the first 20 genuine signatures)
SVD NND MOGD GAUSSD LPD PCAD
SVD
NND
MOGD
GAUSSD
LPD
PCAD
PWC
4 – – – – –
3.3 4.4 – – – –
3.2 3 5.2 – – –
3.2 3.2 4.6 4 – –
3.3 3 3.9 3.1 4.8 –
3.6 3 2.8 2.8 2.5 3.2
2.9 2.9 3.1 2.9 3.2 2.2
Table 4 Fusion between the classifiers, Random Forgeries (training set comprises the first 5 genuine signatures)
SVD NND MOGD GAUSSD LPD PCAD
SVD
NND
MOGD
GAUSSD
LPD
PCAD
PWC
3.9 – – – – –
4.1 6.1 – – – –
2.5 2.6 2.9 – – –
2.5 2.7 2.8 2.8 – –
2.4 2.1 2.3 2.2 2.4 –
3.6 4.8 2.4 2.4 2.3 3.4
1.8 1.4 2 2 2.3 1.7
ferent set of ‘‘artificial features”. The ‘‘artificial features” are extracted by applying mathematical operators to a randomly extracted set of the original ones.
Fig. 7. DET-curve, Skilled Forgeries, the training set comprises the first 5 genuine signatures.
Each set of ‘‘artificial features” is selected by SFFS where its objective function is the minimization of ‘‘EER obtained in the
5296
A. Lumini, L. Nanni / Expert Systems with Applications 36 (2009) 5291–5296
Fig. 8. DET-curve, Random Forgeries, the training set comprises the first 5 genuine.
Skilled forgeries + 3 EER obtained in the Random forgeries”. The feature selection is performed in different random set of validation users, for each set of selected features a different classifier is trained. Acknowledgments The authors would like to thank J. Fierrez-Aguilar and J. OrtegaGarcia for sharing SUBCORPUS-100 MCYT data set. Moreover, the authors want to thank J. Fierrez-Aguilar for the helpful discussions on on-line signature when he was a visiting researcher at the University of Bologna-Italy. References Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification (3rd ed.). New York: Wiley. Fierrez-Aguilar, J., Krawczyk, S., Ortega-Garcia, J., & Jain, A.K., (2005). Fusion of local and regional approaches for on-line signature verification, IWBRS 2005 (pp. 188– 196). Fierrez-Aguilar, J., Nanni, L., Lopez-Penalba, J., Ortega-Garcia, & J., Maltoni, D., (2005). An on-line signature verification system based on fusion of local and
global information. Audio- and video-based biometric person authentication 2005, July 20–22 (pp. 523–532). Fierrez-Aguilar, J., Ortega-Garcia, J., & Gonzalez-Rodriguez, J. (2005). Target dependent score normalization techniques and their application to signature verification. IEEE Transactions on Systems, Man and Cybernetics, 35, 418–425. Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844. August. Jain, A. K., Griess, F., & Connell, S. (2002). On-line signature verification. Pattern Recognition, 35(12), 2963–2972. December. Jain, A., Nandakumar, K., & Ross, A. (2005). Score normalization in multimodal biometric systems. Pattern Recognition, 38(12), 2270–2285. Kholmatov, A., & Yanikoglu, B. (2005). Identity authentication using improved online signature verification method. Pattern Recognition Letters, 26(15), 2400–2408. November. Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. March. Martin, A. et al. (1997). The DET curve in assessment of decision task performance. In Proceedings of EuroSpeech (pp. 1895-1898). Nanni, L. (2006). Experimental comparison of one-class classifiers for on-line signature verification. Neurocomputing, 69(7–9), 869–873. March. Nanni, L., & Lumini, A. (2005). Ensemble of Parzen Window Classifiers for on-line signature verification. Neurocomputing, 68, 217–224. October. Nanni, L. & Lumini, A. (in press b). Over-complete feature generation and feature selection for biometry. Expert Systems with Applications, doi:10.1016/ j.eswa.2007.08.097. Nanni, L., & Lumini, A. (2008). A novel local on-line signature verification system. Pattern Recognition Letters, 29(5), 559–568. April. Ortega-Garcia, J., Fierrez-Aguilar, J., Martin-Rello, J., & Gonzalez-Rodriguez, J., (2003). Complete signal modelling and score normalization for function-based dynamic signature verification. Audio- and Video-based Biometric Person Authentication 2003, June 9–11 (pp. 658–667). Ortega-Garcia, J., Fierrez-Aguilar, J., Simon, D., et al. (2003). MCYT baseline corpus: A bimodal biometric database. IEE Proceedings of Vision Image Sensor and Processor, 150(6), 395–401. December. Parzen, E. (1962). On the estimation of a probability density function and mode. Annals of Mathematical Statistics, 33, 1064–1076. March-3. Piyush Shanker, A., & Rajagopalan, A. N. (2007). Off-line signature verification using DTW. Pattern Recognition Letters, 28(12), 1407–1414. September. Plamondon, R., & Lorette, G. (1989). Automatic signature verification and writer identification – the state of the art. Pattern Recognition, 22(2), 107–131. February. Pudil, P. et al. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15, 1119–1125. Sakamoto, D., et al., (2001) On-line signature verification incorporating pen position, pen pressure and pen inclination trajectories. In IEEE international conference on acoustics, speech, and signal processing 2001, May 9 (pp. 993–996). Tax, D.M.J., (2007). One-class classification; Concept-learning in the absence of counter-examples Delft University of Technology. June 2001, ISBN: 90-7569105-x. B.L. Van, S. Garcia-Salicetti, & B. Dorizzi. On using the viterbi path along with HMM likelihood information for online signature verification. IEEE Transactions on Systems, Man and Cybernetics, part B, 10.1109/TSMCB.2007.895323. Xiao, R., Zhang, L., & Zhang, H.-J. (2004). Feature selection on combinations for efficient learning from images. In Proceedings of the Asian conference on computer vision, January (Vol. 2, pp. 1074–1079). Yeung, D.Y. et al., (2004) SVC2004: First international signature verification competition. In International conference on biometric authentication 2004, July 15–17 (pp. 16–22).