Online Signature Verification Based on DCT and ... - Semantic Scholar

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CYBERNETICS

1

Online Signature Verification Based on DCT and Sparse Representation Yishu Liu, Zhihua Yang, and Lihua Yang

Abstract—In this paper, a novel online signature verification technique based on discrete cosine transform (DCT) and sparse representation is proposed. We find a new property of DCT, which can be used to obtain a compact representation of an online signature using a fixed number of coefficients, leading to simple matching procedures and providing an effective alternative to deal with time series of different lengths. The property is also used to extract energy features. Furthermore, a new attempt to apply sparse representation to online signature verification is made, and a novel task-specific method for building overcomplete dictionaries is proposed, then sparsity features are extracted. Finally, energy features and sparsity features are concatenated to form a feature vector. Experiments are conducted on the Sabancı University’s Signature Database (SUSIG)-Visual and SVC2004 databases, and the results show that our proposed method authenticates persons very reliably with a verification performance which is better than those of state-of-the-art methods on the same databases. Index Terms—Discrete cosine transform (DCT), energy features, on-line signature verification, overcomplete dictionary, sparse representation, sparsity features.

I. I NTRODUCTION UTOMATIC identity authentication based on behavioral biometrics, which are measurements and data derived from an action performed by a person, has become more and more important in many social and commercial interactions. A signature is one of the most commonly used and the most widely accepted behavioral biometrics [1]. Compared with off-line signatures [2], online signatures are more robust and ensure a higher level of security as they store dynamic features like pressure, altitude, and azimuth in addition to time series of position trajectories, and it is difficult for

A

Manuscript received January 8, 2014; revised August 13, 2014 and October 23, 2014; accepted November 17, 2014. This work was supported in part by the National Science Foundation of China under Grant 11371017, Grant 11431015, Grant 91130009, and Grant 41101184, in part by the Research Fund for the Doctoral Program of Higher Education of China under Grant 20130171110016, and in part by the Project of Department of Education of Guangdong Province under Grant 2012KJCX0055. This paper was recommended by Associate Editor B. Sick. Y. Liu is with the School of Geography, South China Normal University, Guangzhou 510631, China, and also with the Guangdong Province Key Laboratory of Computational Science, School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou 510275, China. Z. Yang is with the Information Science School, Guangdong University of Finance and Economics, Guangzhou 510320, China. L. Yang is with the Guangdong Province Key Laboratory of Computational Science, School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou 510275, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2014.2375959

even skilled forgers to imitate online signatures along with their dynamic properties [3]. In this paper, we only focus on online signature verification, which can basically be seen as a two-class classification problem—a given online signature is decided whether to belong to the claimed identity or not. A. State-of-the-Art and Motivation Online signature verification can broadly be divided into two groups. 1) Parametric Approaches: A set of global features derived from the signal, e.g., average speed, pressure, the number of pen ups and pen downs [4], the number of strokes [5], etc., represent a signature pattern and consist of the feature vector of a signature. The feature vectors of the test and reference signatures, which have a common length and hence can be dealt with in a more straightforward way, are compared to decide if the test signature is authentic. Parametric approaches can be found in many works. Lejtman and George [6] used the Daubechies-6 wavelet transform (WT) to extract global features of an online signature, which are then input to a back propagation neural network. Besides, Kamel and Sayeed [7] used new input devices, i.e., data gloves, to capture the dynamic information during the signing process, then singular value decomposition is used to find r singular vectors of the glove data matrix, called principal subspace. The similarity between two signatures can be obtained by calculating the angle between their principal subspaces. Moreover, Ibrahim et al. [8] proposed a method based on 35 global features in an integration with Fisher linear discriminant analysis. 2) Functional Approaches: The time sequences describing local properties of an online signature are analyzed. Since the signatures of the same user usually have different lengths, we should use techniques appropriate for evaluating the similarity/dissimilarity between two signals of unequal lengths, among which dynamic time warping (DTW) [9] and hidden Markov model [10] are the most commonly used. Some approaches can be seen as a simplification of the DTW-based systems, such as [11], in which time/space normalization is proposed and all signatures are resampled to have a common length. Besides, Faundez-Zanuy and Pascual-Gaspar [12] introduced a multisection vector quantization approach and all signatures are represented with vectors of the same length. Furthermore, Jain et al. [5] use a set of local features describing both spatial and temporal information. In the

c 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2168-2267 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

verification process, the test signature is compared to all reference signatures to obtain the dissimilarity values. The best results are achieved by considering the minimum of all dissimilarity values and the user-dependent thresholds. Besides, recently a local feature approach combined with elastic distance metrics has been proposed by Barkoula et al. [13]. In general, function-based systems show better performance than parameter-based systems, but require more timeconsuming and more space-consuming matching procedures. Nowadays, it has become a promising trend to perform fusion to improve verification accuracy, and the combination of parametric approaches and functional approaches is often conducted [14]–[17]. Comprehensive surveys of the state of the art in online signature verification, including a recent one, can be found in [18]–[21]. It should be stressed that in contrast to many other biometric authentication techniques, the signatures of a person may vary considerably. Although some researchers [22] state that it is important to take local variability into account, Fierrez-Aguilar et al. [14] show that the parametric approaches are equally competitive when compared to functional approaches, which, we think, is due to the fact that parametric approaches are usually more stable against the variations in local regions that in signatures are common. In addition, it is important to reduce the enrollment data size in commercial applications, especially for large scale systems and in resource-limited systems like smart-cards or personal digital assistant terminals. So it is undoubtedly desirable to find an approach, in which signatures of a person can be compactly represented with the same number of points to decrease space requirement and to avoid the trouble of matching signals of different lengths, and the verification performance can be comparable with functional approaches. This paper aims to find such a method, which is based on discrete cosine transform (DCT) and sparse representation. B. Contributions of This Paper Mathematical tools such as Fourier transform [23], [24], discrete WT [8], [25], [26], and DCT [16] can be used to extract global features of an online signature. This paper is more closely related to [16], in which the discrete 1-D WT is performed on time series, and DCT is used to reduce the approximation coefficients obtained by WT to a feature vector of a given dimension, then linear programming descriptor classifier is trained using the DCT coefficients. Finally, the fusion among the proposed approach and the state-of-the-art of the regional, the local and the global approaches is studied. However, Nanni and Lumini [16] do not justify the use of the first few DCT coefficients to form a feature vector—time series from two genuine signatures of the same user usually have different lengths, so do their DCT coefficient vectors, then why are the preceding small fractions of the two vectors similar? In this paper, we give some theoretical explanations on this aspect. First a new property of DCT that the first several “standardized” DCT coefficients of a time series are almost invariant to temporal dilation is given with estimated error

IEEE TRANSACTIONS ON CYBERNETICS

bounds. Based on this property, a novel and simple approach for signature representation using only the first few standardized DCT coefficients of time sequences is proposed. In this way, the signature signals from the same person have a common length, and the total signature data size is reduced to about 20% (but these data preserve the most energy of the original signature signals). This representation approach is simple and easy to realize, and it can be considered to be a semi-parametric approach for the following reasons. 1) The DCT coefficient sequence used to represent the signature signal is much shorter than the original signal which is directly analyzed in functional approaches. 2) The DCT coefficient sequence can be inversely transformed to yield an approximate representation of the original signal, which generally can not be done in parametric approaches. We find that energy features which are extracted from the first several standardized DCT coefficients are very discriminative and are not sensitive to the significant variations of sample signatures of the same person to some extent. Then, in order to obtain significant reduction in equal error rate (EER), we use sparse representation to extract other features. Sparse representation aims to search for the most compact representation of a signal in terms of linear combination of atoms in an overcomplete dictionary [27], it has been extensively studied and applied in the field of face recognition [28], image super-resolution [29], denoising [30], [31], audio inpainting [32], etc. In almost all of these applications, it has been proven that sparse representation leads to state-of-the-art results. To the best of our knowledge, no work has been reported in the literature which makes an attempt toward application of sparse representation for signature verification. A fundamental problem in sparse representation is the choice of the dictionary, in this paper, we propose a novel task-specific method to build dictionaries. In other words, this paper differs from previous works using DCT in: 1) justifying the use of a fixed number of DCT coefficients to obtain a feature vector or represent a signature; 2) the feature extraction methods (energy features and sparsity features are extracted in this paper); and 3) the signature representation methods (the fixed-length representation is proposed and used for subsequent sparse coding). Moreover, to some extent the proposals in [11] and [12] are similar to ours since they also use a fixed-length representation. However, our approach is differen from them in that we truncate DCT coefficients to get a common size. Overall, the contributions of this paper are listed below. 1) A new property of DCT is found and used for signature verification, and it strikes a balance between functional approaches and parametric approaches and provides an alternative to deal with time series of different lengths. 2) An initial attempt to apply sparse representation to signature verification is made. 3) A novel method for building dictionaries, which is efficient, low-cost, and easy to apply, is provided. 4) Extensive experiments on signature verification are conducted on two databases, and EERs smaller than stateof-the-art results on the same databases are achieved.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: ONLINE SIGNATURE VERIFICATION BASED ON DCT AND SPARSE REPRESENTATION

3

where AL is a square matrix of order L, whose (q, p) entry is

II. DATA R EDUCTION AND E NERGY F EATURES BASED ON DCT In this section, some notations and symbols are described first, then a new property of DCT is given and used to reduce signal sizes. Furthermore, the method for extracting energy features is presented.

aqp = αq cos

(q − 1)(2p − 1)π , 2L

with αq =

A. Notations and Symbols

⎧ √1 , ⎪ ⎪ ⎨ L ⎪ ⎪ ⎩ 2, L

The signals (time sequences) used in this paper are numbered as follows [10] (the last two signals are unavailable on the SUSIG database, see Section IV-A). 1) X: X-coordinate relative to the centroid of signature trajectory. 2) Y: Y-coordinate relative to the centroid of signature trajectory. 3) Pressure: Pressure applied by the pen. 4) X: X-coordinate difference between two consecutive points. 5) Y: Y-coordinate difference between two consecutive points. 6) Azimuth: Azimuth angle of the pen with respect to the tablet. 7) Altitude: Altitude angle of the pen with respect to the tablet. X and Y are derived from X and Y, respectively. Suppose that X = (X1 , X2 , . . . , XL )T and Y = (Y1 , Y2 , . . . , YL )T (T means “transpose” throughout this paper, and L is the number of points sampled along the signature’s trajectory), then X = (X1 , X2 , . . . , XL−1 )T and Y = (Y1 , Y2 , . . . , YL−1 )T are defined as follows: Xq = Xq+1 − Xq ,

q = 1, 2, . . . , L − 1

(1)

Yq = Yq+1 − Yq ,

q = 1, 2, . . . , L − 1.

(2)

A signature can be seen as a set of time sequences Sig = f [r] |r = 1, 2, . . . , 7 = {X, Y, pressure, X, Y, azimuth, altitude}

(3)

where the superscript “[r]” (r = 1, 2, . . . , 7) is used to denote the signal number, e.g., “[1]” means X, “[4]” means X, and so on. Some other notations are described as follows. The subscript “i” means the ith user; the subscript “j” means the jth reference signature (so fij[r] means the rth time sequence of the jth reference signature of the ith user); when neither a user nor a signal no. is involved, we drop the subscript “ij” and the superscript “[r],” and use f instead; f˜ means f ’s DCT; f˜˜ means f ’s standardized DCT coefficients; fˇ means the truncated f˜˜ , i.e., the first several entries of f˜˜ . B. DCT Throughout this paper “DCT” means DCT-II, which is the most commonly used among the four established types of DCT [33]. For a signal f ∈ RL , its DCT is defined as follows: f˜ = AL f

(4)

q, p = 1, 2, . . . , L

(5)

q=1 2 ≤ q ≤ L.

DCT is an orthogonal transform, and f can be reconstructed from f˜ by performing inverse DCT ˜ (6) f = AT L f. √ In the following sections, f˜ / L is frequently used. Let f˜ f˜˜ = √ L

(7)

we call the qth entry of f˜˜ , denoted as f˜˜q (q = 1, 2, . . . , L), the qth standardized DCT coefficient of f . C. New Property of DCT Definition 1 (Temporal Dilation): Suppose f ∈ RsL and g ∈ RtL (s, t, L ∈ N), denote fq and gq as the qth entries of f and g, respectively. If for any k ∈ {1, 2, . . . , L}, m ∈ {0, 1, . . . , s − 1} and n ∈ {0, 1, . . . , t − 1}, we have fks−m = gkt−n

(8)

then we call f a temporal dilation of g with the dilation factor of s/t. Equation (8) means that every s entry of f corresponds to every t entry of g—f1 , . . . , fs corresponds to g1 , . . . , gt ; fs+1 , . . . , f2s corresponds to gt+1 , . . . , g2t ; and so forth. Theorem 1: Suppose that f ∈ RsL is a temporal dilation of g ∈ RtL with the dilation factor of s/t (s, t, L ∈ N), then the qth standardized DCT coefficients of f and g satisfy f˜˜q q 2 , q ∈ N and q L. (9) = 1+O L g˜˜ q It can be proved using Taylor expansion of the cosine function [34]. Theorem 1 shows that f˜˜q ≈ g˜˜ q when q is much smaller than L; in other words, the first few standardized DCT coefficients are almost temporal-dilation-invariant. The first few standardized DCT coefficients of several time sequences are shown in Fig. 1. Fig. 1(g) and (i) are the temporal dilations of Fig. 1(e) with the dilation factors of 0.75 and 1.60, respectively, and their first several standardized DCT coefficients are very similar, which can be seen from Fig. 1(f), (h), and (j). The lengths of the signals in Fig. 1(e), (g), and (i) are 341, 0.75 × 341 = 255, and 1.60 × 341 = 545, respectively; apparently they do not have a common factor L. However, their first several standardized DCT coefficients basically remain unchanged under the transformation of temporal dilation. Therefore it can be inferred that we have f˜˜q ≈ g˜˜ q as long as q is small and f is a temporal dilation of g (with an integer dilation factor or a fractional dilation factor).



(a)

(d)

(b)

(c)

(e)

(f)

(g)

(h)

(i)

(j)

Fig. 1. First few standardized DCT coefficients of some signals. (a) Genuine signature 1 from SUSIG-Visual. (b) X sequence of (a). (c) First 30 standardized DCT coefficients of (b). (d) Genuine signature 2 from SUSIG-visual. (e) X sequence of (d). (f) First 30 standardized DCT coefficients of (e). (g) Temporal dilation of (e) with the dilation factor of 0.75. (h) First 30 standardized DCT coefficients of (g). (i) Temporal dilation of (e) with the dilation factor of 1.60. (j) First 30 standardized DCT coefficients of (i).

The duration between two signatures of the same person can change (thus the lengths of the signatures are different), which is very common in practice, whereas the dynamic characteristics tend to remain similar [11]. For example, suppose f = ( f1 , f2 , . . . , fL )T and g = (g1 , g2 , . . . , g2L )T are the X time sequences from two signatures of a certain person (translation, size and rotation normalizations have been conducted), then f2i−1 , f2i , and gi are close to one another, with 1 ≤ i ≤ L. Thus we may think that (8) basically holds good, i.e., f may be regarded as a temporal dilation of g, therefore f˜˜q ≈ g˜˜ q for a small q. For other time sequences and other dilation factors which may be integers or fractions, the same conclusion can be reached. One example of the empirical demonstration can be found in Fig. 1(c) and (f), which show the first 30 standardized DCT coefficients of the X sequences from two genuine samples of the same person.

Nanni and Lumini [16] used the first few DCT coefficients to form a feature vector (the feature values are normalized using user-specific normalization techniques), but does not adequately explain why this is feasible. Here, Theorem 1 gives the theoretical support for this.

D. Data Reduction by DCT It is known that DCT has a strong energy compaction property, i.e., the energy of the original signal may be concentrated in only a few low frequency coefficients. Therefore, a signal can be approximately represented and reconstructed without significant loss of quality and intelligibility from a small fraction of its DCT coefficients, and high frequency coefficients carrying little energy can be discarded to compress data.


Moreover, it is hoped that the time sequences to be matched preserve the same number of low frequency DCT coefficients (Mi[r] is used to denote “the same number,” see below). In this way, they have the same length, which will simplify the subsequent matching procedure. E. Energy Features For a signature Sig whose (claimed) identity is the user i, preserve the first Mi[r] standardized DCT coefficients of f [r] to [r] form a vector fˇ [r] ∈ RMi (how to evaluate Mi[r] is discussed in Section III-E). Sig’s energy features are defined as

(10) E(Sig, i) = fˇ [1] , . . . , fˇ [7] ∈ R7. 2

2

When azimuth and altitude are unavailable, i.e., r ∈ {1, 2, 3, 4, 5}, accordingly Sig’s energy features should be

(11) E(Sig, i) = fˇ [1] , . . . , fˇ [5] ∈ R5. 2

2

For simplicity, hereinafter we only consider the case where r ∈ {1, 2, . . . , 7}. When common thresholds, i.e., user-independent thresholds, are used, E(Sig, i) should be normalized in order to take into account the variability within the user’s signatures. We use five reference signatures for each user, and normalization factors which are separately calculated for each user i, are the averages of energy features of five reference signatures 1 E Sigij , i ∈ R7 5 5

C1 (i) =

(12)

j=1

where Sigij ( j = 1, 2, . . . , 5) is the jth reference signature of the ith user. Then Sig’s normalized energy features are defined as ¯ E(Sig, i) = E(Sig, i)./C1 (i) ∈ R7

5

approach. More details can be found in [36] and the references therein. Barthelemy et al. [37] presented some algorithms for learning a multivariate dictionary, and applies the proposed methods to 2-D handwritten characters to provide rotation invariant sparse decomposition. Like other learning-based approaches, the dictionaries in this reference are much finer-tuned than the analytic dictionaries, and thus can lead to a significantly better performance in applications. However, they are unstructured, and hence more costly to apply. In Section III-B, we propose a novel dictionary structure that does not belong to either of the approaches mentioned above, and the resulting dictionaries are adaptive and efficient. Compared to [37], our method is simple and flexible, besides, and it has low time and space complexity. In the next section, we first introduce the problem of sparse representation, then present the methods for building overcomplete dictionaries and extracting sparsity features. A. Sparse Representation Given a signal f ∈ RM and an overcomplete dictionary D ∈ RM×N composed of N elements referred to as atoms, with M < N and usually M N, the problem of sparse representation is to find xˆ ∈ RN , such that xˆ = arg min x0 x∈RN

After replacing the symbol E in (12) and (13) with Eu , we ¯ get features E¯ u (Sig, i) that is a counterpart of E(Sig, i). III. S PARSITY F EATURES BASED ON S PARSE R EPRESENTATION The aim of sparse representation is often to reveal certain structures of a signal and to represent these structures in a compact way [30], [35], which is suitable to code the intrinsic features. In this section, we extract sparsity features via seeking the sparse solution of fˇ [r] . A fundamental consideration in employing sparse representation is the choice of the dictionary. How to create an overcomplete dictionary can be categorized into two basic approaches: the analytic approach and the learning-based

f = Dx

(15)

where x0 is the 0 norm of the vector x and is equivalent to the number of nonzero entries in x. Finding the solution to (15) is non-deterministic polynomial hard and an approximate solution is obtained by replacing the 0 norm with the 1 norm, as follows: xˆ = arg min x1 x∈RN

(13)

where “./” means dot division (α./β is the vector with entries αq /βq , α and β must have the same dimension). For comparison purposes, we also consider the features generated from unstandardized DCT coefficients: replacing fˇ [r] √ [r] ˇ in (10) with Lf (L is the length of f [r] , r = 1, 2, . . . , 7), we get unstandardized energy features √ Eu (Sig, i) = LE(Sig, i). (14)

subject to

subject to f = Dx.

(16)

In [27], it is proved that if certain conditions on the sparsity are satisfied, i.e., the solution is sparse enough, the solution of (15) is equivalent to that of (16), which can be solved in polynomial time by standard linear programming methods [38]. A generalized version of (16), which allows for certain degree of noise, is to solve the following stable 1 -minimization problem: xˆ = arg min Dx − f 2 + λx1 x∈RN

(17)

where λ > 0 is a scalar regularization parameter that balances the tradeoff between reconstruction error and sparsity. In this paper, the convex optimization problem (17) is efficiently solved via l1_ls method [39]. B. Building Dictionaries For the user i’s rth kind of signals, the dictionary structure is as follows: ⎡ ⎤⎡ ⎤⎡ ⎛⎡ ⎤ ⎤⎞ 1 4 ⎢ .. ⎥ ⎢AT ⎥ ⎣ ⎜⎢ .. ⎥ ⎦⎟ D[r] i = ⎝⎣ . ⎦ · · · ⎣ . ⎦ ⎣ M [r]⎦ Random vectors ⎠ 1

r = 1, 2, . . . , 7.

4

i

Mi[r] ×Ni[r]

(18)



Four signatures Sigij ( j = 1, 2, . . . , 5; j = Ji ) selected from the five reference signatures (the details are described in Section III-D) are used to generate the first four columns which are composed of the first Mi[r] DCT coefficients of fij[r] . The columns of AT [r] are Mi[r] -dimensional DCT basis vectors Mi

(Ni[r] − Mi[r] − 4)

(see Section II-B). And the last columns are random vectors (the redundancy factor is 4 in our experiments, i.e., Ni[r] = 4Mi[r] ). The dictionary structure in [28] inspires our creation of the first four atoms. The genuine samples are much more likely to be represented as a linear combination of the first four atoms than the samples from a forgery. This representation is naturally sparse. While a forgery’s samples’ sparsest representations tend to involve many dictionary elements, not limited to the first 4. Seeking the sparsest representation therefore automatically discriminates between a genuine signature and a forgery. As to why five reference signatures are used (and hence there are four atoms arising from the reference set), because most systems use five reference signatures as suggested by many experimental protocols, we also use five ones for the sake of fair comparison with previous work. The DCT basis vectors are used to guarantee that an Mi[r] -dimensional signal can be exactly represented over this dictionary. And that the residual vector can be considered as a random vector is the reason why random vectors are used as atoms. With the first four atoms remaining unchanged, we try other modes of generating the rest (for example, wavelet redundant atoms, a wavelet base + random vectors, a wavelet base + a DCT base + random vectors, and so on), and find that the dictionary structure in (18) leads to the best performance. It is easy to generate dictionaries in (18), and only the first four atoms of each dictionary should be stored. C. Sparsity Features Given a signature Sig which is claimed to belong to the user i, solve the following problem:

[r] ˇ [r] x − f xˆ [r] = arg min D[r]

+ λx[r] 1 i [r]

x[r] ∈RNi

r = 1, 2, . . . , 7.

2

(19)

fˇ [r] of a genuine signature will approximately lie in the linear span of the first four atoms, therefore the first four entries in the estimate xˆ [r] will be more dominant than the other entries. Some examples of sparse solution are shown in Fig. 2, from which it can be seen that the distribution of the estimated sparse coefficients xˆ [r] contains important information about the authenticity of the claimed identity: a genuine signature should have sparse representations whose nonzero entries concentrate mostly on the first four atoms, whereas a forgery has sparse coefficients spread widely over all atoms; xˆ [r] will be more sparse for a genuine signature than for a forgery.

To quantify these observations, we define Sig’s sparsity features as follows: ⎛ 4 [1] 4 [7] ⎞ x xq ˆ q q=1 q=1 ˆ S(Sig, i) = ⎝ [1] , . . . , [7] ⎠ ∈ R7 (20)

xˆ

xˆ 1

1

where xˆ q[r] is the qth entry of xˆ [r] . We define normalized energy features in Section II-E, analogously, normalized sparsity features are given here. For each Sigij ( j = 1, . . . , 5), we use the other four reference signatures to generate the overcomplete dictionaries, and then the sparse solutions xˆ ij[r] associated with Sigij are computed. The averages of sparsity features of five reference signatures act as normalization factors 1 S Sigij , i ∈ R7 . C2 (i) = 5 5

(21)

j=1

Then Sig’s normalized sparsity features are defined as ¯ S(Sig, i) = S(Sig, i)./C2 (i) ∈ R7 .

(22)

Finally (normalized) energy features and (normalized) sparsity features are concatenated to form the feature vector which will be input to the classifiers F(Sig, i) = (E(Sig, i), S(Sig, i)) ∈ R14

(23)

¯ ¯ ¯ F(Sig, i) = E(Sig, i , S(Sig, i)) ∈ R14 .

(24)

or

Substituting the symbols Eu and E¯ u for E and E¯ in (23) and (24), respectively, we can get Fu (Sig, i) and F¯ u (Sig, i). F(Sig, i) and Fu (Sig, i) are used for user-dependent thresh¯ olds, while F(Sig, i) and F¯ u (Sig, i) are used for common thresholds. In order to investigate the efficacy of our proposed dictionaries and compare the effectiveness of different features, here we define another set of features, which are the residuals when only the first four atoms are used to reconstruct signals. Let

[r] x[r] (1:4) − fˇ [r] , r = 1, 2, . . . , 7 (25) Res[r] i = Di (:, 1:4)ˆ 2

[r] where D[r] i (:, 1:4) is the first four columns of Di , and [r] [r] xˆ (1:4) is the first four entries of xˆ . Sig’s residual features are defined as follows:

[7] ∈ R7 . R(Sig, i) = Res[1] (26) i , . . . , Resi

Normalization factors are given in a similar way to (21) 1 R(Sigij , i) ∈ R7 . C3 (i) = 5 5

(27)

j=1

Then Sig’s normalized residual features are defined as ¯ R(Sig, i) = R(Sig, i)./C3 (i) ∈ R7 .

(28)


7

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m)

(n)

(o)

Fig. 2. Sparse solutions. (a) Genuine signature from SVC2004. (b) Skilled forgery from SVC2004. (c) Random forgery from SVC2004. (d) Sparse solution of X of (a). (e) Sparse solution of X of (b). (f) Sparse solution of X of (c). (g) Sparse solution of Y of (a). (h) Sparse solution of Y of (b). (i) Sparse solution of Y of (c). (j) Sparse solution of X of (a). (k) Sparse solution of X of (b). (l) Sparse solution of X of (c). (m) Sparse solution of Y of (a). (n) Sparse solution of Y of (b). (o) Sparse solution of Y of (c).

D. Selecting Four Signatures From the Reference Set Let Ji = arg

min

j∈{1,2,...,5}

¯S(Sij , i) . 2

(29)

With the Ji th reference signature excluded, the remaining four are used to generate the first four atoms of D[r] i in (18).

It should be noted that when the sparse solutions xˆ ij[r] associated with Sigij ( j = 1, . . . , 5) are computed in order to obtain the normalization factors C2 (i) in (21) and C3 (i) in (27), the dictionaries used are all not D[r] i —when a reference signature is put aside as query, the remaining four are used to create dictionaries.



(a)

(b)

(c)

(d)

Fig. 3. Candidate values of θ [r] versus average EER. (a) For X and Y (θ [1] = θ [2] = 0.075). (b) For pressure (θ [3] = 0.15). (c) For X and Y (θ [4] = θ [5] = 0.20). (d) For azimuth and altitude (θ [6] = θ [7] = 0.075).

E. Evaluating Mi[r]

IV. N UMERICAL E XPERIMENTS

Each user i provides five signatures Sigij ( j = 1, . . . , 5) as reference signatures. We denote by Mi[r] the number of the preserved DCT coefficients, and let Mi[r] = θ [r] Li ,

r = 1, 2, . . . , 7

(30)

where θ [r] is a parameter across all users, and Li is the average length of Sigij ( j = 1, . . . , 5). To determine the optimal value of θ [r] , some candidate values {θ [r] = k/40 | k = 1, . . . , 16} are tentatively given. For each candidate value, ten verification experiments, in which only the rth entries of E¯ and S¯ are evaluated and used to form a 2-D feature vector, over DEV_SUSIG and DEV_SVC2004 (they are subgroups used to build the models, see Section IV-B for more details) are conducted by changing the reference/training/test sets. Then the average EER of these ten experiments is computed (each candidate value has two average EERs, one evaluated over DEV_SUSIG and the other over DEV_SVC2004). Finally θ [r] takes the value corresponding to the minimum EER(s). Since X and Y are similar in nature, θ [1] and θ [2] are analyzed together and we let θ [1] = θ [2] . Likewise, θ [4] = θ [5] and θ [6] = θ [7] . In these cases, feature vectors are 4-D instead. Based on our empirical studies, the results of which are shown in Fig. 3, the values which θ [r] take on are as follows: θ [1] = θ [2] = 0.075, θ [3] = 0.15, θ [4] = θ [5] = 0.20, and θ [6] = θ [7] = 0.075. About 17% of the original storage requirement is needed for a signature from SVC2004 ( 7r=1 θ [r] /5 ≈ 17%), and about 23% for one from SUSIG-Visual ( 5r=1 θ [r] /3 ≈ 23%).

In this section, the details of the databases used are described, then experimental results are presented, discussed and analyzed. A. Databases Two databases are used: 1) SUSIG [40] and 2) SVC2004 [41], [42]. SUSIG is a public database consisting of real-life signatures of the users and including “highly skilled” forgeries that were signed by the people attempting to break the systems. SUSIG consists of two parts: 1) the visual subcorpus (i.e., SUSIG-Visual) obtained using a tablet with a built-in LCD display providing visual feedback and 2) the blind subcorpus (i.e., SUSIG-Blind) collected using a tablet without visual feedback. SUSIG-Visual used in this paper contains a total of 1880 genuine signatures by 94 persons and 940 skilled (half are highly skilled) forgeries collected in two sessions from 94 persons. The data in SUSIG consists of X, Y, pressure, and time stamp, collected at 100 Hz. X, Y, pressure, and derivative time sequences X and Y are used, whereas azimuth and altitude are unavailable, therefore r = 1, . . . , 5 for SUSIG-Visual. For convenience, sometimes SUSIG-Visual is called SUSIG for short in this paper. SVC2004 database contains signature data collected from a graphic tablet (WACOM Intuos) with 20 genuine signatures and 20 skilled forgeries per person. We use task 2 of the publicly available subset of 40 users, the data consist of X, Y, pressure, azimuth, altitude, time stamp, and button status. Time stamp or button status is not used in the experiments, so r = 1, . . . , 7 for SVC2004 database. The data in this database


9

TABLE I D ETAILS OF DATABASES

TABLE II V ERIFICATION P ERFORMANCE (EER IN %)

are also collected in two sessions: 1) ten genuine signatures per user are from session 1 and 2) another ten from session 2. The details of these databases are summarized in Table I. B. Verification Results Each database is split into three disjoint sets. 1) Reference Set: The reference set is used to generate and evaluate atoms of overcomplete dictionaries D[r] i each user’s parameters, such as Li , normalization factors C1 (i), C2 (i), and C3 (i). For each user, five randomly selected genuine samples from session 1 are used as reference signatures. 2) Training Set: The training set is used to train the classifiers. The remaining five session-one genuine signatures and 20 random forgeries (if imposters claim others’ identities using their own signatures, these signatures are called random forgeries) per user serve as training examples; more specifically, when we use F(Sig, i) in (23) as feature vectors, the classifiers are built for every user taking five genuine signatures as positive and 20 random ¯ forgeries as negative examples; when we use F(Sig, i) in (24) as feature vectors, the classifiers are built for all users taking ( of users) × 5 genuine signatures as positive and ( of users) × 20 random forgeries as negative examples. 3) Test Set: The test set is used to demonstrate and measure the effectiveness of our proposed method. All session-two genuine signatures and all skilled forgeries are used for testing. When random forgeries are verified, the genuine test set remains the same as above, and the imposter test set comprises 20 random forgeries per user taken from other users’ reference signatures. For the sake of convenience, we refer to “reference set + training set” of SUSIG-Visual database as DEV_SUSIG, and “reference set + training set” of SVC2004 database as DEV_SVC2004. These two subgroups are used to evaluate θ [r] in Section III-E.

In this paper, support vector machine (SVM) is used as the classier; more specifically, we used the C-SVM implementation of the library for support vector machine library [43]. And we adopt EER, i.e., the error rate at which false acceptance rate (FAR) and false rejection rate (FRR) are equal, as a measure for characterizing verification performance. In order to obtain reliable results for independent test data, we conduct and evaluate ten repetitions of each experiment with disjoint reference, training and test data sets. In every trial, we use a different set of randomly selected reference signatures (but they all come from session 1). The maximum EER, the minimum EER, the average EER and the standard deviation of EERs for all ten trials are presented in Table II for the experiments with common thresholds and user-dependent thresholds on both databases. Based on these data, our observations are listed below. ¯ is close to that of Eu (E¯ u ), 1) The performance of E (E) ¯ is close to that of Fu and the performance of F (F) (F¯ u ), which is somewhat unexpected and not in accord with Theorem 1. However, we think that this is a consequence of the fact that the entries of Eu and E¯ u incorporate the total duration which has the most discriminative power among all the global features studied in [15]. Generally the total duration of a forgery is much longer than that of a genuine signature, which can be corroborated on the SUSIG-Visual and SVC2004 databases. According to these facts, our explanation of the above issue via an example is as follows. Let g and f be a genuine signature and a forgery, respectively, denote by Lg and Lf their lengths, respectively, we may as well suppose Lg < Lf , so in all probability f [1] 2 > g[1] 2 (note that translation, size and rotation normalizations have been carried out), therefore in all probability [1] Lf fˇ [1] 2 ˜f 2 f [1] 2 ≈ = [1] > 1 [1] [1] ˜g 2 g 2 Lg ˇg 2

(31)



(a)

(b)

Fig. 4. Energy features versus unstandardized energy features (on SUSIG-Visual). (a) Average standardized energy features of X sequences. (b) Average unstandardized energy features of X sequences.

(b)

(a) Fig. 5.

ROC curves. (a) On SUSIG-Visual. (b) On SVC2004.

whereas Lf fˇ [1] 2 Lg fˇ [1] 2 = · . [1] [1] ˇg 2 Lg ˇg 2 Lf

(32)

< Lf , fˇ [1] 2 /ˇg[1] 2 is closer to 1 than Since[1] Lg ˇ L f / Lg ˇg[1] 2 , that is, the difference between f [1] 2 Lf fˇ 2 and Lg ˇg[1] 2 is greater than that between fˇ [1] 2 and ˇg[1] 2 . In other words, as compared with standardized energy features, unstandardized energy features lead to better separability between genuine signatures and forgeries (at the same time they reduce the difference between two genuine signatures). These discussions are empirically validated in Fig. 4. Therefore, as Table II presents, Eu (E¯ u ) is almost as effective as ¯ and Fu (F¯ u ) is almost as effective as F (F). ¯ E (E), 2) Among the six feature vectors, F and Fu (F¯ and F¯ u ) are the most effective, leading to best maximum, minimum, average EER and σEER . We have tried concatenating E ¯ S and R (S¯ and R) ¯ ¯ Eu and R (E¯ u and R), and R (E¯ and R), to create three other feature vectors, and have found that ¯ and Fu (F¯ u ) in terms they are less effective than F (F) of verification performance, so the experimental results about them are omitted here due to limited space. In addition to the EER results, Fig. 5 shows the receiver operating characteristic (ROC) curves depicting how FAR and FRR change with acceptance thresholds for the two databases. We only present the cases where the feature vector F¯ and

common thresholds are used. Each ROC curve in Fig. 5 corresponds to one trial which is the median of the ten trials in terms of EER. C. Comparative Study When systems tested under different (and sometimes unknown) conditions are compared, it is difficult to reach objective conclusions. So we give the results of previous work tested on SUSIG-Visual and/or SVC2004 task 2 in Table III, including those recently reported. Besides, for the sake of a fair comparison, we repeat the experiments of [11] and [16] using SUSIG-Visual and SVC2004 task 2 (both [11] and [16] conduct their experiments on Ministerio de Ciencia y Tecnología Database instead); our configurations are the same as the best configurations in [11] and [16], respectively, i.e., “Daubechies order four wavelet, D = 10, FE2” in [16] and “MeanNor, p = 0.3, multiple templates and normalized signature size equal to 50” in [11]. These two systems are both stand-alone, their results are presented in Table III. The entry “−” in this table denotes that the corresponding result is not reported in the respective work, and the results about our system are ¯ obtained from F (F). It can be seen that on the whole our system has smaller EERs when compared to the other stand-alone systems, among which are those claimed to achieve the state-of-the-art EER on SUSIG-Visual and/or SVC2004. Only in the case of “user-dependent threshold + random forgeries,” are our EERs (0.27% and 0.10%) a bit higher than their counterparts in [13]


11

TABLE III EER OF VARIOUS S IGNATURE V ERIFICATION A PPROACHES

(0.00% and 0.00%); but our system is better at verifying skilled forgeries than [13], especially on SVC2004. It is also worthwhile to note that our proposed model is comparable to those models using fusion of classifiers—our system on SUSIG-Visual (for “common threshold + skilled forgeries”) shows a result very close to those of [24] and [44], and the EERs are 2.98%, 3.03%, and 2.46%, respectively. Since in many pattern recognition problems fusion of classifiers usually leads to a much higher performance compared with the original systems, we think that our proposals’ results

would also improve if combined with other systems, but this is far from the scope of this paper. Furthermore, it should be pointed out that the performance of a signature verification system is related to the number of samples used to build the model. Generally, the higher the number, the better one would expect the results to be. In our system, this number is 10, as necessitated by our algorithm. And in the two “repeated” systems, ten reference/training genuine signatures are used, too; besides, the same classifier and database division as in our system are used. In this way, the



Fig. 6. Classification performances on our extended synthetic control dataset.

variations in experimental setups are made as small as possible and thus the comparison is fair in a sense. Table III shows that our proposed model outperforms the two repeated systems. However, all the other systems in Table III use five reference/training samples instead. Additionally, these systems are different in reference/training/test sets, classification methods and threshold settings, etc. Therefore a straightforward comparison between our proposed framework and them in terms of verification performance is not very objective. In fact, the researchers in the field of online signature verification are always faced with the difficulty making a comparative study. Strictly speaking, in order to make a comprehensive comparison and reach statistically valid conclusions, we should conduct statistical tests [45], [46]. This will be our future work. V. C ONCLUSION There are mainly two approaches to online signature verification: parametric approaches and functional approaches. Generally speaking, functional approaches lead to a better performance than parametric approaches, but require more time and more space. This paper proposes a novel method, which is equally competitive when compared to functional approaches and leads to much simpler and easier matching procedures. Our proposal is based on DCT and sparse representation. We find a new property of DCT that a time series’ first several standardized DCT coefficients are almost invariant to temporal dilation. Based on this property, the signatures from the same user are transformed to have a common length and the energy features are extracted. In addition, we make an initial attempt to use sparse representation to authenticate on-line signatures, and propose a novel task-specific method for generating overcomplete dictionaries. Experiments are conducted on the SUSIG-Visual and SVC2004 databases, and the results reveal that our proposed scheme outperforms several other existing verification methods, including the state-of-the-art methods for signature verification. However, the proposed method for building overcomplete dictionaries lacks sufficient theoretical support, and how to construct dictionaries based on optimization of a certain target function (dictionaries constructed in this way may be more effective) will also form our future research focus. In addition, compared to related work, our number of training samples is still quite high for a practical system. This need to be solved in the future.

Moreover, in Definition 1, if (8) is changed to |fks−m − gkt−n | < (the term “temporal dilation” should be changed to “s-t- approximation” correspondingly), and correspondingly if the error bound in (9) can be estimated as a function of , the definition and the theorem will be more appropriate and more practical for online signatures. However, we find it difficult to give such a bound. We will exploit this issue in the near future. And how to apply the new property of DCT to other patten recognition problems is also our future work. We have made a preliminary study on general time series classification using this property, and some experiments have been conducted on the synthetic control dataset, which is one of the datasets on University of California, Riverside time series classification/clustering homepage [50]. This webpage has been created as a public service to the data mining/machine learning community, to encourage reproducible research for time series classification and clustering. The synthetic control dataset includes a training set of size 300 and a test set of size 300, there are six classes, and the length of each time series is 60. Three temporal dilations of each time series with the dilation factor of 2, 4, and 8, respectively, are created and incorporated into the training/test sets. That is, either the new training set or the new test set has 1200 time series, whose lengths may be 60, 120, 240, or 480. And then the first few standardized DCT coefficients are used to form feature vectors and fed to the 1-nearest neighbor (NN) Euclidean distance program available on the webpage. The results are shown in Fig. 6, in which the abscissa axis denotes the dimension of feature vectors, i.e., the number of the retained standardized DCT coefficients, and the ordinate axis denotes the error rate in percentage. It can be seen that when the dimension of feature vectors is between 12 and 26, the error rates are low (all smaller than 1.7%; especially, when 22 standardized DCT coefficients are retained, the corresponding error rate is only 0.33%). For comparison purposes, here we give the performances (available on the webpage [50]) of three other algorithms which use the original synthetic control dataset as inputs: the error rates of unconstrained warping, constrained warping and 1-NN Euclidean distance are 0.7%, 1.7%, and 12.0%, respectively. These results validate Theorem 1 and show that standardized DCT coefficients are discriminative and can provide a compact representation of a signal. In the near future, a more systematic study on these problem will be conducted and statistically significant claims will be made. ACKNOWLEDGMENT The authors would like to thank Prof. C. Y. Suen at the Center for Pattern Recognition and Machine Intelligence, Concordia University for his valuable help in revising this paper. They would also like to thank the anonymous reviewers for providing constructive comments. R EFERENCES [1] R. Bolle and S. Pankanti, Biometrics, Personal Identification in Networked Society: Personal Identification in Networked Society. Norwell, MA, USA: Kluwer, 1998.


[2] R. Sabourin, G. Genest, and F. J. Prteux, “Off-line signature verification by local granulometric size distributions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 9, pp. 976–988, Sep. 1997. [3] D. S. Guru and H. N. Prakash, “Online signature verification and recognition: An approach based on symbolic representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 6, pp. 1059–1073, Jun. 2009. [4] L. L. Lee, T. Berger, and E. Aviczer, “Reliable on-line human signature verification systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 6, pp. 643–647, Jun. 1996. [5] A. K. Jain, F. D. Griess, and S. D. Connell, “On-line signature verification,” Pattern Recognit., vol. 35, no. 12, pp. 2963–2972, 2002. [6] D. Z. Lejtman and S. E. George, “On-line handwritten signature verification using wavelets and back-propagation neural networks,” in Proc. Int. Conf. Doc. Anal. Recognit., Seattle, WA, USA, 2001, pp. 992–996. [7] N. S. Kamel and S. Sayeed, “SVD-based signature verification technique using data glove,” Int. J. Pattern Recognit., vol. 22, no. 3, pp. 431–443, 2008. [8] M. T. Ibrahim, M. Kyan, and G. Ling, “On-line signature verification using global features,” in Proc. Can. Conf. Electr. Comput. Eng., St. John’s, NL, Canada, 2009, pp. 682–685. [9] A. Kholmatov and B. Yanikoglu, “Identity authentication using improved online signature verification method,” Pattern Recognit. Lett., vol. 26, no. 15, pp. 2400–2408, Nov. 2005. [10] J. Fierrez, J. Ortega-Garcia, D. Ramos, and J. Gonzalez-Rodriguez, “HMM-based on-line signature verification: Feature extraction and signature modeling,” Pattern Recognit. Lett., vol. 28, no. 16, pp. 2325–2334, Dec. 2007. [11] C. Vivaracho-Pascual, M. Faundez-Zanuy, and J. M. Pascual, “An efficient low cost approach for on-line signature recognition based on length normalization and fractional distances,” Pattern Recognit., vol. 42, no. 1, pp. 183–193, 2009. [12] M. Faundez-Zanuy and J. M. Pascual-Gaspar, “Efficient on-line signature recognition based on multi-section vector quantization,” Pattern Anal. Appl., vol. 14, no. 1, pp. 37–45, 2011. [13] K. Barkoula, G. Economou, and S. Fotopoulos, “Online signature verification based on signatures turning angle representation using longest common subsequence matching,” Int. J. Doc. Anal. Recognit., vol. 16, no. 3, pp. 261–272, 2013. [14] J. Fierrez-Aguilar, S. Krawczyk, J. Ortega-Garcia, and A. K. Jain, “Fusion of local and regional approaches for on-line signature verification,” in Proc. Int. Conf. Adv. Biometric Recognit. Syst. (IWBRS), Beijing, China, 2005, pp. 188–196. [15] J. Fierrez-Aguilar, L. Nanni, J. Lopez-Penalba, J. Ortega-Garcia, and D. Maltoni, “An on-line signature verification system based on fusion of local and global information,” in Proc. 5th Int. Conf. Audio- Video-Based Biometric Person Authentication IAPR (LNCS 3546). Berlin, Germany: Springer, Jul. 2005, pp. 523–532. [16] L. Nanni and A. Lumini, “A novel local on-line signature verification system,” Pattern Recognit. Lett., vol. 29, no. 5, pp. 559–568, Apr. 2008. [17] M. J. Alhaddad, “Multiple classifiers to verify the online signature,” World Comput. Sci. Inf. Technol. J., vol. 2, no. 2, pp. 46–50, 2012. [18] D. Impedovo and G. Pirlo, “Automatic signature verification: The state of the art,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 38, no. 5, pp. 609–635, Sep. 2008. [19] S. Garcia-Salicetti et al., “Online handwritten signature verification,” in Guide to Biometric Reference Systems and Performance Evaluation, 2nd ed., D. Petrovska-Delacretaz, G. Chollet, and B. Dorizzi, Eds. London, U.K.: Springer, 2008, pp. 125–164. [20] Z. Zhang, K. Wang, and Y. Wang, “A survey of on-line signature verification,” in Proc. 6th Chin. Conf. Biometric Recognit. (CCBR), Beijing, China, 2011, pp. 141–149. [21] I. El-Henawy, M. Rashad, O. Nomir, and K. Ahmed, “Online signature verification: State of the art,” Int. J. Comput. Technol., vol. 4, no. 2, pp. 664–678, 2013. [22] C. Gruber, T. Gruber, S. Krinninger, and B. Sick, “Online signature verification with support vector machines based on LCSS kernel functions,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 40, no. 4, pp. 1088–1100, Aug. 2010. [23] Z.-H. Quan, D.-S. Huang, X.-L. Xia, M. R. Lyu, and T.-M. Lok, “Spectrum analysis based on windows with variable widths for online signature verification,” in Proc. Int. Conf. Pattern Recognit., vol. 2. Hong Kong, 2006, pp. 1122–1125. [24] B. Yanikoglu and A. Kholmatov, “Online signature verification using Fourier descriptors,” EURASIP J. Adv. Signal Process., vol. 2009, pp. 12:1–12:1, Jan. 2009.

13

[25] A. V. da Silva and D. S. de Freitas, “Wavelet-based compared to function-based on-line signature verification,” in Proc. XV Brazil. Symp. Comput. Graph. Image Process., Fortaleza, Brazil, 2002, pp. 218–225. [26] I. Nakanishi, N. Nishiguchi, Y. Itoh, and Y. Fukui, “On-line signature verification based on discrete wavelet domain adaptive signal processing,” in Proc. Int. Conf. Biometrics Authentication, Hong Kong, 2004, pp. 584–591. [27] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Rev., vol. 51, no. 1, pp. 34–81, Feb. 2009. [28] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009. [29] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, Nov. 2010. [30] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736–3745, Dec. 2006. [31] N. Yu, T. Qiu, and F. Ren, “Denoising for multiple image copies through joint sparse representation,” J. Math. Imag. Vis., vol. 45, no. 1, pp. 46–54, Jan. 2013. [32] A. Adler et al., “Audio inpainting,” IEEE Audio, Speech, Language Process., vol. 20, no. 3, pp. 922–932, Mar. 2012. [33] W. Chen, M. J. Er, and S. Wu, “Illumination compensation and normalization for robust face recognition using discrete cosine transform in logarithm domain,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 36, no. 2, pp. 458–466, Apr. 2006. [34] T. M. Apostol, Mathematical Analysis. New York, NY, USA: Addison-Wesley, 1974. [35] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006. [36] R. Rubinstein, M. Zibulevsky, and M. Elad, “Double sparsity: Learning sparse dictionaries for sparse signal approximation,” IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1553–1564, Mar. 2010. [37] Q. Barthelemy, A. Larue, A. Mayoue, D. Mercier, and J. I. Mars, “Shift & 2D rotation invariant sparse coding for multivariate signals,” IEEE Trans. Signal Process., vol. 60, no. 4, pp. 1597–1611, Apr. 2012. [38] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159, Jan. 2001. [39] S. J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, “An interiorpoint method for large-scale 1 -regularized least squares,” IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, pp. 606–617, Dec. 2007. [40] A. Kholmatov and B. Yanikoglu, “SUSIG: An on-line signature database, associated protocols and benchmark results,” Pattern Anal. Appl., vol. 12, no. 3, pp. 227–236, Sep. 2009. [41] D. Yeung et al. (2004). SVC2004: First International Signature Verification Competition. [Online]. Available: http://www.cse.ust.hk/ svc2004/results.html [42] D. Yeung et al., “SVC2004: First international signature verification competition,” in Proc. Int. Conf. Biometric Authentication, Hong Kong, 2004, pp. 16–22. [43] C.-C. Chang and C.-J. Lin. (2011). LIBSVM: A Library for Support Vector Machines. [Online]. Available: http://www.csie.ntu.edu.tw/ ∼cjlin/libsvm [44] K. Wang, Y. Wang, and Z. Zhang, “On-line signature verification using graph representation,” in Proc. 6th Int. Conf. Image Graph., Hefei, China, 2011, pp. 943–948. [45] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” J. Mach. Learn. Res., vol. 7, pp. 1–30, Jan. 2006. [46] T. G. Dietterich, “Approximate statistical tests for comparing supervised classification learning algorithms,” Neural Comput., vol. 10, no. 7, pp. 1895–1923, 1998. [47] M. I. Khalil, M. Moustafa, and H. M. Abbas, “Enhanced DTW based online signature verification,” in Proc. 16th IEEE Int. Conf. Image Process., Cairo, Egypt, 2009, pp. 2685–2688. [48] K. Wang, Y. Wang, and Z. Zhang, “On-line signature verification using wavelet packet,” in Proc. Int. Joint Conf. Biometrics, Washington, DC, USA, 2011, pp. 1–6. [49] I. Kostorz and R. Doroz, “On-line signature recognition based on reduced set of points,” in Computer Recognition Systems 4. Berlin, Germany: Springer, 2011, pp. 3–11. [50] E. Keogh et al. (2011). The UCR Time Series Classification/Clustering Homepage. [Online]. Available: http://www.cs.ucr.edu/∼eamonn/time_ series_data/



Yishu Liu received the B.S. degree in computational mathematics from Nanjing University, Nanjing, China, and the M.S. and Ph.D. degrees in computational mathematics from Sun Yat-sen University, Guangzhou, China, in 1996, 2002, and 2013, respectively. From 1996 to 1999, she was a Research Assistant with the Institute of Mathematics, Shantou University, Shantou, China. Since 2004, she has been a Lecturer with the School of Geography, South China Normal University, Guangzhou. Her current research interests include image processing, pattern recognition, and machine learning. Zhihua Yang was born in Hunan province, China, in 1964. He received the M.S. degree in automation from Hunan University, Changsha, China, and the Ph.D. degree in computer science from Sun Yat-sen University, Guangzhou, China, in 1995 and 2005, respectively. Since 2006, he has been a Professor with the Information Science School, Guangdong Finance and Economics University, Guangzhou. His current research interests include signal analysis, pattern recognition, and image processing. He has authored over 40 articles.

Lihua Yang received the B.S. and M.S. degrees in mathematics from Hunan Normal University, Changsha, China, and Beijing Normal University, Beijing, China, and the Ph.D. degree in computational mathematics from Sun Yat-sen University, Guangzhou, China, in 1984, 1987, and 1995, respectively. He was a Post-Doctoral Fellow with the Institute of Mathematics, Academia Sinica, Taipei, Taiwan, from 1996 to 1998. He was a Visiting Scholar with the Department of Computer Science, Hong Kong Baptist University, Hong Kong, from 1997 to 1999, the Center of Pattern Recognition and Machine Intelligence, Concordia University, Montreal, QC, Canada, in 2001, the Department of Mathematics, Syracuse University, Syracuse, NY, USA, in 2006, Laboratoire Jacques-Louis Lions, Universite Pierre et Marie Curie, Paris, France, in 2008, the Department of Mathematics, Michigan State University, East Lansing, MI, USA, in 2010, and the Department of Mathematics, City University of Hong Kong, Hong Kong, from 2009 to 2010. He was the Supervisor of 16 Ph.D. candidates. He joined the School of Mathematics and Computational Science, Sun Yat-sen University, as an Associate Professor in 1998, where he is currently a Professor. He is an Associate Dean of the Guangdong Province Key Laboratory of Computational Science, Guangzhou. His current research interests include time-frequency analysis, signal processing, and pattern recognition. He has published over 90 papers, one book and three translations. Prof. Yang has been actively involved in several international conferences as an Organizing Committee and/or Technical Program Committee Member.