IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 7, JULY 2018
1109
Kernel Deep Regression Network for Touch-Stroke Dynamics Authentication Inho Chang , Cheng-Yaw Low , Seokmin Choi, and Andrew Beng-Jin Teoh , Senior Member, IEEE
Abstract—Touch-stroke dynamics is an emerging behavioral biometrics justified feasible for mobile identity management. A touch-stroke dynamics authentication system is composed of a hand-engineered feature extractor and a classifier separately. In this letter, we propose a stacking-based deep learning network that performs feature extraction and classification, collectively dubbed Kernel Deep Regression Network (KDRN). The KDRN is built on multiple kernel ridge regressions (KRR) hierarchically, where each is trained analytically and independently. In principal, KDRN does not mean to learn directly from the raw touch-stroke data like other deep learning models, but it relearns from the pre-extracted features to yield a richer and a relatively more discriminative feature set. Subsequent to that, the authentication is carried out by KRR. Overall, KDRN achieves an equal error rate of 0.013% for intrasession authentication, 0.023% for intersession authentication, and 0.121% for interweek authentication on the Touchlaytics dataset. Index Terms—Authentication, biometrics, stacking-based deep neural network, touch-stroke dynamics.
I. INTRODUCTION HE number of smartphone users have overtaken those using desktop since 2011, worldwide [1]. The users trust and store their private information, which has imposed stringent concern on identity management in smartphones. The mobile identity authentication can be done by means of passcodes, patterns, or biometrics, and it may take place either only once or after the entry point [2]. The latter authentication mode is called continuous authentication (CA) [3], [4]. CA monitors and authenticates the user’s identity in the background silently. Hence, even after entry-point authentication, it can constantly secure smartphone from being used by the imposter. CA is often realized by using biometrics. Despite fingerprint, iris and face are the major physiological biometrics adopted in the smartphones for entry point authentication;
T
Manuscript received March 16, 2018; revised May 22, 2018; accepted June 1, 2018. Date of publication June 11, 2018; date of current version June 21, 2018. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Roberto Caldelli. This work was supported by the National Research Foundation of Korea funded by the Korea government (Ministry of Science, ICT and Future Planning) under Grant 2016R1A2B4011656. (Corresponding author: Andrew Beng-Jin Teoh.) I. Chang, S. Choi, and A. B.-J. Teoh are with the School of Electrical and Electronic Engineering, College of Engineering,Yonsei University, Seoul 120749, South Korea (e-mail:,
[email protected];
[email protected];
[email protected]). C.-Y. Low is with the School of Electrical and Electronic Engineering, College of Engineering, Yonsei University, Seoul 120749, South Korea, and also with Multimedia University, 75450 Melaka, Malaysia (e-mail:,chengyawlow@ yonsei.ac.kr). Color versions of one or more of the figures in this letter are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LSP.2018.2846050
behavioral touch-stroke dynamics [5]–[8] stands out as a promising candidate for CA as data can be readily acquired by built-in sensors, such as touch sensitive screen, accelerator, and gyroscope, and authentication can be done silently and seamlessly. Although, touch-stroke is considered less discriminative than the physiological biometrics, the CA does not have to be as accurate as the authentication occurs at high frequent intervals and the user can be locked out after some successive rejects [9]. In literature, touch-stroke features are taken from the rawtouch data, such as area covered by finger, pressure, or/and the statistical derivatives, such as displacement, velocity, acceleration, etc. that derived from the x, y-positions and the duration on the touch screen [6]. They are typical hand-engineered features and expressed in the feature vector form. Then, a separate classifier is used to distinguish genuine from imposter. Some well-known classifier instances for touch-stroke authentication are support vector machine (SVM), k-nearest neighbor, hidden Markov model, single hidden layer perceptron, and random forest [5], [7], [9]–[11]. Deep learning (hierarchical representation learning) is prominent for its capability to learn the effective feature representation from the unstructured raw data, such as image, speech signal, etc., in a layerwise manner [12]. However, how to elicit useful features on top of the existing hand-engineered features in a hierarchical manner, in which we named feature relearning, remains less explored while the hand-engineered features are commonly available in various domains. In this letter, we demonstrate a stacking-based feature relearning model capable of performing feature extraction and classification, collectively dubbed kernel deep regression network (KDRN). KDRN is formed by stacking multiple kernel ridge regression (KRR) in the feedforward manner. Every KRR in KDRN is trained independently with no iteration, and thus, the gradient-vanishing problem plaguing the deep-learning models is nonexistent, regardless of how deep KRDN is. To our knowledge, there is only one work that applies deep learning for touch-stroke authentication [13], where the authors pretrain a two hidden-layer perceptron network from the handengineered feature vectors. However, its performance is far from satisfactory. In fact, multiple hidden layer perceptron (MLP) can be viewed as a feature relearning instance when the handengineered features are adopted as the network input. In this letter, we demonstrate that the touch-stroke dynamics authentication system can be beneficiated from KRDN, despite the features are inherently handcrafted. Our implementation code is available at https://github.com/damonchang23/KDRN. II. PRELIMINARIES Ridge regression (RR) [14] is a statistical problem that targets to learn a linear function that relates the dependencies between
1070-9908 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
1110
IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 7, JULY 2018
predictor vector variables xi ∈ Rd and response variables yi ∈ R, where i = 1, . . ., N . The classical solution is the ordinary least-square (OLS) method [16] that minimizes the squared loss function (y − wT xi ). Due to scarce training samples, the variance of i i the estimate w by OLS could be large, and hence, the estimation is unreliable. A plausible solution is to penalize the norm of w, w2 as in RR. The squared loss function can thus be changed to T i (yi − w xi ) + λw2 , where λ is a regularization coefficient, typically 0 ≤ λ ≤ 1. For classification, the response variables yi ∈ R is one-hot encoded as the C-class label vector, i.e., ti ∈ {0, 1}C . Let X = [x1 , . . . , xN ]T ∈ RN ×d , T = [t1 , . . . , tN ]T ∈ RN ×C , and W = [w1 , . . . , wC ] ∈ Rd×C , the squared loss function of RR for classification is minimized as follows: min [tr (T − XW) (T − XW)] + λ ||W||2F w
(1)
where · F is the Frobenius norm. The W is analytically estimated as follows: −1 (2) W = XT X + λI XT T. For an unseen data xtest ∈ Rd , the trained RR returns an output vector ttest ∈ RC as follows: ttest = WT xtest .
(3)
Subsequent to that, xtest is labeled as class j as in (4) j ∗ = arg max (ttest )j .
(4)
j ∈{1, ..., C }
Nevertheless, RR remains as a linear model, and therefore, it is not fit for nonlinear classification problem. It is easy to generalize RR to KRR via kernel trick [17]. We can choose a typical nonlinear kernel function k(xi , xj ), such as polynomial function or Gaussian radial basis function (RBF). In this letter, we opt for RBF kernel function as shown in (5). Let K ∈ RN ×N be a Gram matrix with elements Ki,j = k(xi , xj ), then (2) is revised as follows: 2 k (xi , xj ) = exp −γ xi − xT (5) j W = XT (K + λI)−1 T.
(6)
For an unseen data xtest ∈ Rd , ttest ∈ RC is computed as follows: k (xtest , x1 ) T : ttest = (K + λI)−1 T. (7) k (xtest , xN ) Similar to that of RR, KRR infers the class label for xtest as in (4). III. METHODOLOGY A. Overview In this letter, we focus on the two-class touch-stroke dynamic authentication problem, i.e., genuine and imposter. For such a system, a single raw touch-stroke information is first captured by Android API and a touch-stroke feature vector is derived [6]. Unfortunately, a single touch-stroke barely captures expressive user behavior, and hence, more feature vectors are to be derived when the user swipes continuously. Once more than one feature
vector are accessible, the feature-level fusion mechanism [18] is triggered to merge the multiples into one before dispatching to KDRN [see Section IV-A]. Subsequent to that, KDRN relearns from the fused features to classify an unseen input vector. B. Kernel Deep Regression Network KDRN can be perceived as the stacking-based representation learning (S-RL), wherein the classical deep-learning models are resembled in terms of deep and feedforward architecture [19]. However, unlike the end-to-end trained deep-learning models, S-RL assembles the atomic learnable modules in the form of deep stacked-up networks with the prediction output of the current layer feeding the input of the next. A typical S-RL instance is deep convex network [14] that is built by multiple single hidden layer network. Simply speaking, the key notion of S-RL is that it deciphers a large-scale problem via modularization. In this letter, KRR is the atomic module in KDRN, and every module is trained analytically (iteration free) and independently. The KDRN composed of consecutive L KRR modules is portrayed in Fig. 1. On the assumption that indexes the KRR module, or equivalently the layer number of KDRN, p() ∈ RC is realized as a new feature set relearned, based on h() , provided that h() is an aggregation of x and other relearned feature sets for all preceding layers, i.e., h() = [x, p(1) , . . . , p(−1) ] ∈ RD , where D = d + C( − 1), and = 1, . . . , L. To begin with () (−1) the layerwise training, we derive H() = [h1 , . . . , hN ]T ∈ () (−1) RN ×D , and P() = [p1 , . . . , pN ]T ∈ RN ×C . Subsequent to that, (6) is reformulated as follows: −1 W() = H()T K() + λ() I T ∈ RD ×C (8) where K() ∈ RN xN denotes the Gram matrix to be computed ()
()
()
()
() 2
as Ki,j = k(hi , hj ) = exp(−γ () hi − hj ). To trigger feature-learning, the ReLU activation is applied to P(P) as a sparsification operator as follows: P() = max 0, H() W() ∈ RN ×C . (9) This nonnegativity is essential to perturb the intra- and interclass variations when KDRN grows. For an unseen data () xtest ∈ Rd , the pretrained KDRN returns ptest as follows: −1 () () T ptest = max(0, (H() htest ) K() + λ() I T) ⎡ ⎤T () () k htest , H1 ⎢ ⎥ () T ⎥ : (H() htest ) = ⎢ ⎦ ⎣ () () k htest , HN ()
(1)
(−1)
(10)
where htest = [xtest , ptest , . . . , ptest ]T ∈ RD . When = L, (L) we assign ttest = ptest ∈ RC , and the class label for ttest is inferred with respect to (4). We recapitulate the KDRN learning pseudocode in Table I. In summary, there are several interesting facts of KDRN. 1) In contrast to the conventional deep neural networks that required empirical fine-tuning on the number of hidden nodes for each layer, (denoted by D in KDRN), D
CHANG et al.: KERNEL DEEP REGRESSION NETWORK FOR TOUCH-STROKE DYNAMICS AUTHENTICATION
1111
Fig. 1. Generic KDRN pipeline, where the first layer is dispatched with the raw hand-engineered features h (1 ) = x. A new feature set p ( ) is relearned from the stacked features h ( ) , as KDRN deepens.
TABLE I SUMMARY FOR KDRN LEARNING PROGRESSION
2) 3)
4)
5)
grows proportionally and deterministically with respect to , such as D = d + C( − 1) for = 1, 2, . . . , L. Since every KDRN module learns exclusively and independently, the network depth L, i.e., the number of KDRN modules, can be adjusted flexibly during training. KDRN relearns discriminative features from the input vector via each module and the classification is carried out by the built-in classifier in (4). However, the accumulated learned features h() at last layer, i.e., = L , can be singled-out for other classifiers for authentication. KDRN has two sets of layerwise parameters to be finetuned, i.e., γ () and λ() , which denote spread factor of RBF kernel function and the KRR regularization coefficients, respectively. In place of carefully tuning γ () and λ() for each individual layer , we fix these parameters across all layers. This diminishes the parameter sets to only γ and λ. The computational complexity for each KDRN module, i.e., KRR, is estimated to be O(N 3 ). As a whole, KDRN expends O(LN 3 ). IV. EXPERIMENTS
A. Setup and Protocol 1) Data Description: We adopt the publicly accessible Touchalytics database [6] in our experiments. It consists of
41 users with a summation of 21 158 vertical and horizontal touch strokes collected using four different Android phones. The stroke data from Android API are composed of nine raw attributes originally, which derive the functional 30 handengineered features (d = 30). Our experiments are evaluated under three scenarios, specifically, intrasession, intersession, and interweeks with respect to the annotated document ID, where 1, 2, and 3 for article; 4 and 5 for image game; and 6 and 7 for interweek samples acquired at least a week after the initial session. We normalize these hand-engineered features by means of min–max scaling (by features), followed by z-score normalization (by samples). Note that, we adopt both vertical and horizontal strokes without differentiating them explicitly. 2) Feature-Level Fusion Method: In accordance with the feature sliding window mechanism in [11], we fuse k continuous feature vectors into one single vector by averaging k feature vectors sequentially by elements. 3) Evaluation Metric: To evaluate the KDRN performance, the equal error rate (EER), which is the most common performance metric for biometric systems, is adopted. The EER is the value when false accept rate equals to false reject rate. Since authentication mode is considered in this letter, for q subjects, q KDRNs are developed independently, where each KDRN only returns either genuine or imposter scores (C = 2). For instance, if a KDRN is designated for subject 1 and its associated training samples are of genuine (class 1), the remaining q − 1 subjects’ training samples are of imposter (class 2). Therefore, this is an imbalanced class problem where the imposter samples highly outnumber the genuine samples. The number of genuine and imposter training samples rely on k fused samples with respect to varying scenarios that will be discussed in Section IV-B. The Touchalytics dataset is randomly split into 80:10:10 of training, validation, and testing subsets, and the average EER for the five-fold cross validation is reported. We perform grid search to fine-tune the KDRN parameters in the range of [0.01, 1] with a step size of 0.01 for γ and λ. B. Experiment Results for Three Scenarios 1) Intrasession Scenario: Intrasession scenario is to observe whether a user uses the phone right after login. In other words, the training and authentication are carried out within the same session. The training and testing data, which are composed of genuine and imposter classes, are divided equally without overlapping. Note that the number of training samples for each class is reduced in accordance to increment of k.
1112
IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 7, JULY 2018
Fig. 4. Interweek training, validation, and testing EERs for 5-layer KDRN with k = 12.
TABLE II COMPARISONS IN TERMS OF TRAINING TIME (IN SECOND) AND AVERAGE TESTING EER (%) WITH STANDARD DEVIATION AT k = 12. THE TRAINING TIME IS ESTIMATED BASED ON A SINGLE SUBJECT BASIS FOR INTRASESSION SCENARIO.
Fig. 2. (a) Intrasession training, validation, and test EERs for 5-layer KDRN with k = 12. (b) The change of test EERs according to k at L = 3.
C. Comparisons and Discussion
Fig. 3. Intersession training, validation, and testing EERs for 5-layer KDRN with k = 12.
Figure 2 (a) summarizes the training, validation, and testing EER (%) changes with respect to L and k = 12. It is observed that the training EER decreases when KDRN is increasingly stacked with KRR modules. This suggests that the KDRN improves its performance while navigating into the networks. However, due to overfitting issue, the validation EER achieves the finest in the third layer. Based on the validation EER result, the best testing EER is reported to be 0.013% at L = 3. At the same time, Fig. 2 (b) illustrates the effect of k on the validation set. It is observed that EER decreases monotonically with respect to k and the lowest EER is attained at k = 12. Despite the lower EER is expected for k > 12, it is not realistic as this will drastically delay the authentication process. 2) Intersession Scenario: This scenario authenticates users across multiple sessions within a time interval of 10 min approximately in the same day. Both training and validation EER trajectories for KDRN are shown in Fig. 3. It is observed that both training and validation EERs exhibit the similar tendency like the one in the intrasession scenario. However, testing EER achieves the lowest of 0.016% at layer 2 with k = 12. 3) Interweek Scenario: Interweek is a more challenging scenario, as the samples are acquired in a span of one week. In this scenario, there are only 14 subjects available for evaluation. As expected, the same trend is observed in Fig. 4. The best testing EER is appraised to be 0.121% at layer 2 with k = 12.
For a fair comparison, we compare the KDRN performance to SVM and KRR equipped with the RBF kernel. Aside from that, we also compare KDRN to that of single hidden-layer perceptron network (1-MLP) and three hidden-layer perceptron network (3-MLP). Note that, KRR is not equivalent to the first layer of KDRN as their respective best parameters are different. On the contrary, 1-MLP serves as a classifier whereas 3-MLP is a competing network to KDRN for feature relearning. Both 1-MLP and 3-MLP employ the ReLU activation function, and each hidden layer is exercised with 50 nodes. The learning rate is set to 0.0001 and the mini batch size is configured to 10. On top of that, 3-MLP is enabled with batch normalization. To alleviate the imbalanced class problem, we perform random undersampling on the imposter samples for MLP and SVM, but not doing so for KDRN and KRR. Table II discloses that KDRN outperforms all the comparing methods in all scenarios. We discern that SVM is incapable of classifying the interweek samples in particular. Interestingly, 1MLP performs the worst among all others. This is tally to the empirical observations in [11]. However, SVM consumes the least training time. V. CONCLUSION In this letter, we present a stacking-based feature relearning model for touch-stroke authentication, namely KDRN. The KDRN is built by multiple KRRs modularly and hierarchically, each of which is trained independently with a close-form solution. The empirical results on the Touchalytics dataset show that KDRN achieves lower EER (%) than those without feature relearning mechanism and also the classical deep-learning model. In future, KDRN will be extended to resolve one-class problem, where only the genuine data is accessible.
CHANG et al.: KERNEL DEEP REGRESSION NETWORK FOR TOUCH-STROKE DYNAMICS AUTHENTICATION
REFERENCES [1] “Smart phones overtake client PCs in 2011 | Canalys.” 2012. [Online]. Available: https://www.canalys.com/newsroom/smart-phonesovertake-client-pcs-2011. Accessed on: Feb. 11, 2018. [2] M. Harbach, E. von Zezschwitz, A. Fichtner, A. D. Luca, and M. Smith, “It’s a hard lock life: A field study of smartphone (un) locking behavior and risk perception,” in Proc. Symp. Usable Privacy Security, 2014, pp. 213– 230. [3] S. Mondal and P. Bours, “A computational approach to the continuous authentication biometric system,” Inf. Sci., vol. 304, pp. 28–53, May 2015. [4] X. Wang, T. Yu, O. Mengshoel, and P. Tague, “Towards continuous and passive authentication across mobile devices: An empirical study,” in Proc. 10th ACM Conf. Security. Privacy Wireless. Mobile Netw., New York, NY, USA, 2017, pp. 35–45. [5] T. Feng et al., “Continuous mobile authentication using touchscreen gestures,” in Proc. IEEE Conf. Technol. Homeland Security, 2012, pp. 451–456. [6] M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song, “Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 1, pp. 136–148, 2013. [7] C. Shen, Y. Zhang, X. Guan, and R. Maxion “Performance analysis of touch-interaction behavior for active smartphone authentication,” IEEE Trans. Inf. Forensics Security,vol. 11 no. 3, pp. 498–513, Mar. 2016. [8] Z. Sitov´a et al., “HMOG: New behavioral biometric features for continuous authentication of smart-phone users,” IEEE Trans. Inf. Forensics Security, vol. 11, no. 5, pp. 877–892, May 2016. [9] R. Kumar, P. P. Kundu, and V. V. Phoha, “Continuous authentication using one-class classifiers and their fusion,” in Proc. IEEE 4th Int. Conf. Identity Security. Behav. Anal., 2018, pp. 1–8.
1113
[10] A. Roy, T. Halevi, and N. Memon, “An HMM-based behavior modeling approach for continuous mobile authentication,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2014, pp. 3789–3793. [11] A. Serwadda, V. V. Phoha, and Z. Wang, “Which verifiers work?: A benchmark evaluation of touch-based authentication algorithms,” in Proc. IEEE 6th Int. Conf. Biometrics Theory Appl. Syst., 2013, pp. 1–8. [12] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015. [13] Y. S. Lee et al., “Touch based active user authentication using deep belief networks and random forests,” in Proc 6th Int. Conf. Inf. Commun. Manage., 2016, pp. 304–308. [14] L. Deng, D. Yu, and J. Platt, “Scalable stacking and learning for building deep architectures,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2012, pp. 2133–2136. [15] C. Saunders, A. Gammerman, and V. Vovk, “Ridge regression learning algorithm in dual variables,” in Proc. 15th Int. Conf. Mach. Learn., San Francisco, CA, USA, 1998, pp. 515–521. [16] Stigler, Stephen M. “Gauss and the invention of least squares,” Ann. Statist. vol. 9, no. 3, pp. 465–474, 1981. [17] S. An, W. Liu, and S. Venkatesh, “Face recognition using kernel ridge regression,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–7. [18] A., Ross and A. Jain, “Information fusion in biometrics,” Pattern Recognit. Lett., vol. 24, no. 13, pp. 2115–2212, 2003. [19] C. Y. Low and A. B. J. Teoh, “Stacking-based deep neural network: Deep analytic network on convolutional spectral histogram features,” in Proc. IEEE Int. Conf. Image Process., 2017, pp. 1592–1596.