Y_Rekik_87_ICMCS11 - F - IEEE Xplore

A Comparison of Feature Extraction Approaches for Offline Signature Verification Y. Rekik, N. Houmani, M.A. El Yacoubi, S. Garcia-Salicetti, and B. Dorizzi. Intermedia, Dept. EPH Institut Telecom; Telecom SudParis Evry, France Abstract— this paper proposes two systems for offline signature verification based on a global and on a local approach respectively. The features used consist of different kinds of geometrical, statistical and structural features. For comparison purposes, we used two baseline systems (global and local), both based on a larger number of features encoding the orientations of the strokes using mathematical morphology. Experiments are performed on two offline signature databases, namely DS2-50 and GPDS-104. The obtained results show that we may obtain similar performances even when using a much smaller but more discriminant set of features and that stability of the performance across different databases can be a real challenge.

H

I.

INTRODUCTION

ANDWRITTEN signature verification is one of the most important modalities in biometrics. This can be explained by the use of signatures as an official mean to verify the identity of the authors of social and legal documents such as checks, credit cards, contracts, certificates… A handwritten signature depends on the physical and psychological state of the signer as well as on the acquisition device and conditions. Thus, the signature acquired from a person is susceptible to changes leading to high intra-class variability. This variability makes signature verification a difficult discrimination problem. Depending on the acquisition process, automatic signature verification systems can be classified into two categories: 1) on-line signature verification [1, 2] where signature is captured during the writing process, and which makes available dynamic information like writing speed and pressure as well as static information, and 2) offline signature verification [2, 3, 4, 5,6,7,8,9,10] where the static image of a signature is captured once the writing process is over, so only the signature geometry is available. Robust offline systems are, therefore, more difficult to design. A recent review of offline and online signature verification approaches is proposed in [2]. Although offline signature verification systems are less accurate than online verifications systems, they are still important owing to the reasons mentioned above. This paper deals with offline signature verification, and our aim i is to study different approaches for a better discrimination between genuine signatures and skilled forgeries. Offline verification systems can be classified into Global [2,3;4,5,6,7] and Local [2;3;7,8] systems. The first are based

on global feature extraction, which describe the signature as a whole. The latter are based on local feature extraction, which represent the signature as a sequence of feature vectors or observations by an appropriate segmentation or scanning (windowing) along a specific direction. Global systems are generally fast but have lower performance than local systems since the order information (order of features) is not taken into account [2]. Moreover, the choice of features is very important for the system to correctly discriminate between authentic and forged signatures. It is worth noting that a good level of performance for a verification system does not depend only on the number of features but also on the discriminating power of these features and on the signature image quality. The aim of this work is, on one hand, to study the influence of the number and nature of features on performance, and, on the other hand, to assess the stability of performance across different signature databases. To this end, we propose two systems for offline signature verification: the first one is based on a global approach while the second one is based on a local approach. The features used by the global approach consist of a set of geometrical, statistical and structural features while the local approach employs directional and curvature features after an explicit segmentation of the hand-drawn signature into strokes each with a roughly uniform direction. For comparison purposes, we design two baseline systems (global and local) inspired by [3], which will serve as a benchmark. Both baseline systems are based on a larger number of features encoding the orientations of the strokes using mathematical morphology. Experiments are performed on two offline signature databases, namely the offline BioSecure DS2 database containing data of 50 persons, and the GPDS database containing data of 104 persons. The results obtained show that we may obtain similar performances even when using a much smaller but more discriminant set of features and that stability of the performance across different databases can be a real challenge. This paper is organized as follows: in Section 2, we describe the two databases that we use (DS2-50 and GPDS104). Section 3 describes the pre-processing phase. In Section 4, we present the two baseline systems based on morphological features, and the two proposed systems based on geometrical, statistical and structural features. The experimental framework including results and analysis is

978-1-61284-732-0/11/$26.00 ©2010 IEEE

detailed in Section 5. Conclusions and perspectives are finally drawn in Section 6. II.DATABASES DESCRIPTION We use two different offline databases: the offline BioSecure Data Set 2 (DS2) [11] and the GPDS database [12]. The offline BioSecure Data Set 2 (DS2) is used for the first time. It contains data of 104 people, and was acquired at TELECOM SudParis, in the framework of the BioSecure Network of Excellence. Signatures were scanned with a resolution of 600 dpi. DS2 database contains two sessions separated by two weeks. Each session consists of 15 authentic signatures and 10 skilled forgeries per person. For our work, we considered only the first 50 people and only the first session. Although on this database skilled forgeries are not as good imitations as those associated with the GPDS database, recognition is still a difficult task for the following reasons: first, there is a strong intra-class variability; second, as people were asked to sign in a zone delimited by a rectangle, some signatures touch and exceed the latter, and thus it is difficult to distinguish between the hand-drawn signature and the rectangle. Some examples of problematic signatures are shown in Figure 1.

For BioSecure DS2 database, input images are first binarized using an adaptive threshold. Then, we seek to remove the signature’s enclosing rectangle by means of the horizontal and vertical profile histograms. Owing to the discontinuities generated by this procedure, an additional module is added to fill gaps. Finally, a Median filter is applied for noise reduction. Figure 3 shows an example of these steps.

Fig.3: (a) Input signature, (b) Binarization, (c) Elimination of the enclosing rectangle, and (d) Filling gaps and noise reduction.

For GPDS-104 dataset, only binarization process is carried out on the signature’s image. IV. OFFLINE SIGNATURE VERIFICATION SYSTEMS A. Baseline systems: morphological feature extraction Two baseline systems with different approaches for feature extraction are used: one is based on a global approach; the other one on a local approach. These two systems are inspired by the work developed in [3], and both exploit mathematical morphology operators (erosion and dilation). In the following sections, we describe the two systems separately, both based on the extraction of slant directions of the signature.

Fig.1: Example of signatures of BioSecure DS2 database

The GPDS database is acquired by the “Grupo de Procesado Digital de Senales” (GPDS) and contains the signatures of 300 people [12]. Each person provides 24 authentic signatures and 30 skilled forgeries. In this work, we considered only data of 104 people. All signature images of the GPDS database are cropped binary images. By looking to genuine signatures and their associated skilled forgeries, we observed that on GPDS database, forgeries are very similar to authentic signatures for several people. III. PRE-PROCESSING The pre-processing is the first step for any verification system. The aim of this process is to improve the image quality by removing undesirable information.

A.1. Description of the global baseline system The pre-processed signature image is first eroded 16 times independently through 16 linear structuring elements having the same size (20 pixels), with different orientations regularly distributed between 0° and 180°. The structuring element size was optimized empirically. Therefore 16 eroded images are obtained. Then, a feature vector with 16 components is extracted, where each component represents the number of pixels in the eroded signature. The second angular space (180°-360°) was not considered as the information on the positive or negative stroke orientation is not available for the offline data. A second set of features is considered using the signature envelopes obtained by means of dilation. The directions of the signature envelopes are extracted as follows: 6 structuring elements, having the same size (10 pixels) and with orientations regularly distributed between 0° and 180° are used. Then, for each element, the pre-processed signature

is successively dilated 5 times. Thus, a feature vector of 30 components is extracted (6 elements* 5 dilations), where each component represents the number of pixels in the signature’s envelope. Considering the two operations altogether, namely erosion and dilation, we obtain a feature vector of 46 components, which are normalized between 0 and 1. For performance assessment, the Euclidean distance is used to measure the similarity score between the reference and test signatures. A.2. Description of the local baseline system For this system, we consider only the signature enclosing area instead of considering the entire image as in the global approach. To this end, left and right, height and wide white spaces are discarded by means of the horizontal and vertical profile histogram, as shown in Figure 4.

Fig.4 : Removal of white spaces surrounding the signature.

Then, the signature image is split into N vertical blocks, each having a width of 160 pixels, with an overlapping of 25% in the horizontal direction. These parameters are obtained empirically. Each block is further divided vertically into 3 equal parts, as shown in Figure 5.

Fig.5: Segmentation of the signature image into blocks vertically and horizontally.

After that, the local directions of the signature in each block are extracted using mathematical morphology operations following the same methodology described in Section A.1. Each block is then represented by a feature vector of 138 components (46 components* 3 horizontal parts). Hence, the signature is parameterized with a sequence of N feature vectors. Note that, as signature’s images are not normalized, the number of vertical blocks differs from one signature to another, and thus such signatures are described by variable length sequences. Therefore, for performance assessment, a Dynamic Time Warping (DTW) [13] classifier is used as it is suitable for measuring the similarity between two sequences with different length.

A.3. Fusion of the global and local baseline approaches In order to take advantage of both approaches, a simple fusion strategy based on the average of the global score and the local one is used. B. The proposed systems In this section, we propose two different systems: one based on a global approach and the other one on a local approach. Such systems differ from the two baseline systems described in Section IV.A. Indeed, the two baseline systems are based on morphological features, which are highly used for any image processing, and thus are not related more specifically to the hand-drawn signature, while the two proposed systems, which we introduce in this section, exploit the spatial representation of the hand-drawn signature in a different manner. B.1. Description of the global proposed system This system is based on features extracted globally on the whole hand-drawn signature. The features that we used are geometrical, statistical, and structural and consist of: - Width and height of the signature, - Top and left spaces with respect to the signature location, - x and y coordinates of the gravity center of the signature, - The angle between the horizontal line and the line connecting the two gravity centers corresponding to the left and right part of the signature, - The highest value of the vertical projection histogram, - The highest value of the horizontal projection histogram, - Number of peaks in the vertical projection and in the horizontal projection, - Number of loops in the signature, - Number of pixels in the skeleton of the signature, - Pressure of writing which is represented by the average thickness of the strokes. For performance assessment, the Euclidean distance is used. Note that for this system, a feature selection study was done in order to keep only the more pertinent features. To this end, we used the Sequential Forward Selection method [14] to incrementally select features that conjointly improve the performance. Thus only features which improve the performance are kept; the remaining ones are eliminated. B.2. Description of the local proposed system This system is based on a local description of the signature by exploiting the “outer” and “inner” contours. The first one means the outer boundaries of the hand-drawn signature, while the second one is represented by the boundaries of holes existing in the hand-drawn signature. Figure 6 illustrates these two concepts.

Fig. 6: Example of “outer” and “inner” contours in a given signature.

After extracting both “outer” and “inner” contours, we first ordered them starting with “inner” contours and then “outer” contours. We set the starting point of each contour as the pixel associated with its extreme left, and we follow this point in clockwise direction. Then, all “inner” (respectively “outer”) contours are ordered according to the position of their starting points. Each contour is then segmented into strokes that are defined by two relevant points, which correspond to changes of direction along the x-axis or y-axis or to high changes of slant directions. Figure 7 shows an example of a signature where both “outer” and “inner” contours are segmented.

Fig.8: Computation of the slope of different strokes.

Concerning the second feature, the curvature is computed as the angle between 2 consecutive strokes, as shown in Figure 9. In the same manner as for the first feature extraction, we establish the histogram of the curvature angles corresponding to 8 intervals regularly distributed between 0° and 180°.

Fig.9: Angle between two consecutive strokes.

At the end, for the local proposed system, a vector of 12 components is obtained, and the Euclidean distance classifier is used for performance assessment. V. EXPERIMENTAL RESULTS Fig. 7: Segmentation of outer and inner contours.

After the segmentation step, 2 “spatial” features are extracted: the first one is based on the slope of the strokes; the second one is based on the curvature existing between two consecutive stokes, thus encoding structural relation between such strokes. In order to make more accurate the encoding of slopes, slopes extracted are given weights depending of the corresponding stroke lengths (the slope of a large stroke is much more meaningful than that of a small stroke), we have to take into account the size of each stroke. Indeed, the slope of a long stroke could not be considered as that of a small one. To this end, if the number of pixels of a given stroke S is higher than 40 pixels, such a stroke is subdivided into 2 segments S1 and S2 according to the gravity center of the segment S. Then, if the number of pixels of S1 (respectively S2) is higher than 40 pixels, S1 (respectively S2) is subdivided into two segments. This process is repeated in a recursive way. Then, in order to measure the slope of the obtained strokes, we compute the angle between the line connecting the start and the end of each stroke and the horizontal line (xaxis), as shown in Figure 8. Finally, we establish the histogram of the angle values by considering 4 bins regularly distributed between -90° and 90°.

A. Protocol description For each person, 5 authentic signatures are used as reference signatures. For test purposes, the 10 remaining authentic signatures, 10 skilled forgeries and 10 random forgeries are used for the DS2-50 dataset. For the GPDS-104 dataset, we tested on the remaining 10 authentic signatures and 10 skilled forgeries. To evaluate the performance of the 4 systems described in Section IV, the Equal Error Rate (EER) functioning point is reported. B. Performance assessment on BioSecure DS2-50 dataset B.1. Results of the two baseline systems Table 1 gives the results obtained with the two baseline systems on DS2-50 dataset considering both skilled and random forgeries. The first one is based on the global approach and the second one on the local approach. Also, the EER obtained for the fusion of these two systems is reported. Table 1: Equal Error Rates on DS2-50 with the two baseline systems. EER (%) Baseline systems Global approach Local approach Fusion

Skilled forgeries 19% 16.6% 14.2%

Random forgeries 6.6% 6% 4.8%

We first notice that the baseline system based on the local

approach outperforms that based on the global approach for both skilled and random forgeries. Also, by combining these two systems, we reach the best performance on both types of forgeries. As DS2 dataset is used for the first time, there is no previous work in the literature to compare with. In order to evaluate the performance obtained on this dataset, we report in Table 2 the results obtained on the MCYT-75 database with the two systems developed in [3], which are similar to our baseline systems. We observe that the results obtained with our baseline systems are closed to those developed in [3] although our database is degraded. Table 2: Equal Error Rates of the two systems of [3] on MCYT-75 EER (%) Baseline systems Skilled forgeries Random forgeries Global approach Local approach Fusion

21.84% 14.51% 11%

8.64% 4.35% 2.69%

B.2. Results of the two proposed systems We report in Table 3 the EER obtained with the two proposed systems, the one based on the global approach and the one based on the local approach. Note that for the global approach, the reported results are obtained with the best combination of features that gives the best performance. Using Sequential Forward Selection methodology [14], best performance is obtained when statistical features (based on vertical and horizontal projections) are removed. Table 3: Equal Error Rates on DS2-50 with the two proposed systems. EER (%) Proposed systems Global approach Local approach

Skilled forgeries

Random forgeries

16.6% 24.9%

9% 12.1%

For the global approach, when we compare the results of our proposed system (see Table 3) with those of the baseline system (see Table 1), we observe that, on average, a similar performance is obtained with a much lower number of features for our proposed system: the feature vector for the baseline system has a dimension of 46 while that for the proposed system has a dimension of 10. This is not surprising as the performance depends not only on the number of features but also on the discriminating power of these features. Regarding the local approach, although the results of the proposed system look worse at first glance, the performance obtained is actually promising if we keep in mind the following observations: - It is worth noting that our local approach uses feature vectors of dimension 12 while the morphological based features approach uses features vectors of dimension 3x46 =

138. Thus, the potential of improvement regarding our local system is huge. - Only features based on direction are used. Considering other non-direction-based features is necessary to reduce the error rates; the potential for improvement is significant. C. Performance assessment on GPDS-104 dataset C.1. Results of the two baseline systems Table 4 gives the results obtained with the two baseline systems on GPDS-104 dataset considering only skilled forgeries. The first one is based on the global approach and the second one on the local approach. Table 4: Equal Error Rates on GPDS-104 with the two baseline systems. EER (%) Baseline systems Skilled Global approach Local approach

20% 23.75%

We first observe that the results of the baseline systems obtained on GPDS-104 data set differ from those obtained on DS2-50 dataset. Indeed, for the GPDS dataset the global approach outperforms the local approach, while for the DS2 dataset the opposite occurs. For evaluation purposes, we compare performance assessment of our local baseline system with the DTW-based system developed in [10], which is based on a local approach and performed on GPDS database containing only 39 persons. In [10], considering skilled forgeries, an EER of 22% is reached, while with our local baseline system, an EER of 23.75% is obtained on GPDS containing 104 persons. C.1.Results of the two proposed systems Table 5 shows the EER obtained with the global and local proposed systems on GPDS-104 dataset. For the global approach, a feature selection study was performed on GPDS-104 in order to keep only the most pertinent features. We have found that the best combination of features is that containing: - Width and height of the signature, - x and y coordinates of the gravity center of the signature, -The angle between the horizontal line and the line connecting the two gravity centers corresponding to the left and right part of the signature, - The highest value of the vertical projection histogram, - The highest value of the horizontal projection histogram, - Number of pixels in the skeleton of the signature, - Pressure of writing which is represented by the average thickness of the strokes. The best combination of features that we obtained differs from that found for DS2-50: indeed, for DS2-50, we have taken into account the location of the signature in the

enclosing rectangle. However, for GPDS this information does not exist and as a consequence, the sequential Forward Selection method [14] leads a different combination of features. Table 5: Equal Error Rates on GPDS-104 with the two proposed systems

Proposed systems Global approach Local approach

EER (%) Skilled 25.19% 32.83%

In Table 5, we first observe that the global baseline system gives better results than our proposed global system on the GPDS dataset, while on the DS2 dataset, a better performance is obtained with our global proposed system. As a consequence, we can conclude that the features describing the location of the signature in the enclosing rectangle are very important for performance improvement. Moreover, when we compare the results of the local approach in Table 5 with those reported in Table 4, we notice that the proposed local system gives also worse performance on GPDS dataset, as observed on DS2. VI. CONCLUSION AND PERSPECTIVES In this paper, we have proposed two off-line signature verification systems. The first one is based on a local approach and the second one on a global approach. For both, geometrical, statistical and structural features are extracted. In order to evaluate the performance of such systems, we design two baseline systems inspired by the literature [3], both based on mathematical morphology features extracted globally for the first system and locally for the second one. The experimental results are promising and show that: - For global approaches, a satisfactory performance can be obtained with a much lower feature vector dimension. - For the local approaches, two types of segmentation are used: A coarse segmentation that divides the signature image into blocks. Segmentation of the hand drawn signature into strokes at specific points. The obtained results with these two types of segmentation show that the first segmentation, where a feature vector of 138 components is extracted, improves system performance, due to the consideration of local information that encodes local shape and variation. However, for the second segmentation, where a feature vector of 12 components associated with slope and curvature is extracted, performance is degraded. This is not surprising as the size of the two feature vectors is highly different and thus we should extract additional features including structural relations between the current and the previous stroke. It is worth noting that performance overall is better on the DS2 database than on the GPDS database. This may be explained by the fact that some parameters were optimized

on the DS2 set and above all by the fact that the GPDS set consists of forgeries that much more skillful w.r.t. the DS2 dataset. Our future work will consist of seeking more robust features, designing suitable techniques for combining the local and global approaches and studying the factors of instability across different signature datasets. REFERENCES [1]

[2] [3]

[4] [5] [6]

[7] [8] [9]

[10]

[11] [12] [13] [14]

S. Garcia-Salicetti, N. Houmani, B. Ly-Van, B. Dorizzi, F. AlonsoFernandez, J. Fierrez, J. Ortega-Garcia, C. Vielhauer and T. Scheidat, "Online Handwritten Signature Verification", D. Petrovska-Delacretaz and G. Chollet and B. Dorizzi (Eds.), Guide to Biometric Reference Systems and Performance Evaluation, Springer-Verlag, London, 2008. D. Impedovo and G. Pirlo, Automatic signature verification the state of the art, IEEE transactions on systems, man,and cybernetics partC : applications and reviews, vol. 38, NO.5, September 2008. J. Fierrez-Aguilar, N.Alonso-hermira, G.Moreno-Marquez and J.Ortega-Garcia, An off-line signature verification system based on fusion of local and global information, In Proc. BIOAW, LNCS-3087, pp. 295-306, 2004. H.Baltzakis and N.Papamarkos, A new signature verification technique based on a two-stage neural network classifier, Engineering Application of Artificial Intelligence, N±14, PP.95-103, 2001 V.Nguyen and M. Blumenstein and G. Leedham, “Global Features for the Off-Line Signature Verification Problem”. 2009 10th International Conference on Document Analysis and Recognition A.C. Ramachandra, J.S. Rao, K.B. Raja, K.R. Venugopla, L.M. Patnaik. “Robust Offline Signature Verification Based On Global Features », 2009 IEEE International Advance Computing Conference (IACC 2009), Patiala, India, 6-7 March 2009 K. Huang and H. Yan, “Off-line signature verification Based On Geometric Feature Extraction and Neural Network classification”, Pattern Recognition, Elsevier Science, Vol. 30, No. 1, pp. 9-17,1997. A. Gilperez and F. Alonso-Fernandez and S. Pecharroman and J. Fierrez and Javier Ortega-Garcia, Off-line Signature Verification Using Contour Features S. Chen, and S. Srihari, “Use of Exterior Contour and Shape Features in Off-line Signature Verification”, 8th International Conference on Document Analysis and Recognition (ICDAR ’05), 2005, pp. 12801284 [GPDS] Jayadevan R, Satish R. Kolhe¸ Pradeep M. Patil, Dynamic Time Warping Based Static Hand Printed Signature Verification, , Journal of Pattern Recognition, Research 1 (2009) 52-65 Received Dec 7, 2008. Revised Jan 24, 2009, April 2009. [DS2] http://biosecure.it-sudparis.eu/ F. Vargas, M. Ferrer, C. Travieso and J. Alonso, “Off-line Handwritten Signature GPDS-960 Corpus”, Ninth ICDAR 2007, Vol. 2, pp. 764-768, Sept. 2007. [DTW ]L. Rabiner, B.H. Juang, "Fundamentals of Speech Recognition", Prentice Hall Signal Processing Series, 1993. [selection] Anil K. Jain, Robert P.W. Duin, Jianchang Mao, "Statistical Pattern Recognition: A Review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.