IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
1
Mobile User Authentication Using Statistical Touch Dynamics Images Xi Zhao, Member, IEEE, Tao Feng, Student Member, IEEE, Weidong Shi, Senior Member, IEEE, and Ioannis A. Kakadiaris, Senior Member, IEEE
1 2 3 4 5 6 7 8 9 10 11 12
Abstract— Behavioral biometrics have recently begun to gain attention for mobile user authentication. The feasibility of touch gestures as a novel modality for behavioral biometrics has been investigated. In this paper, we propose applying a statistical touch dynamics image (aka statistical feature model) trained from graphic touch gesture features to retain discriminative power for user authentication while significantly reducing computational time during online authentication. Systematic evaluation and comparisons with state-of-the-art methods have been performed on touch gesture data sets. Implemented as an Android App, the usability and effectiveness of the proposed method have also been evaluated. Index Terms— Touch gesture, user authentication, mobile security, statistical touch dynamics images, behavioral biometrics.
15
I. I NTRODUCTION
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
T
IE E Pr E oo f
14
13
HE increasing ubiquity of smartphones raises the issue of security of paramount importance. The data stored and accessed from such devices are rarely protected except by a point-based authentication at the login stage. This is deficient since it cannot detect intruders after a user has passed the point entry and is vulnerable to simple attack (e.g., smudge attacks [1]) after loss or theft. As opposed to password or image-pattern-based authentication, continuous authentication has received increased attention in recent years. It intelligently monitors and analyzes the user-device interaction to ensure correct identity, which either complements point-based authentication or even substitutes it when continuous authentication satisfies particular accuracy requirements. While using a smartphone, user identity can be sensed via multiple integrated sensors (e.g., the touch screen, motion sensors, microphone, or camera). Users’ faces [2] and voices [3] can be used for authentication, but it is not convenient or Manuscript received February 12, 2014; revised May 21, 2014 and August 7, 2014; accepted August 8, 2014. Date of publication August 22, 2014. This work was supported in part by the U.S. Department of Homeland Security (DHS) under Award N66001-13-C-3002 and in part by the National Science Foundation (NSF) under Award CNS 1205708. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of DHS or NSF. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jaihie Kim. X. Zhao is with the Department of Computer Science, University of Houston, Houston, TX 77004 USA, and also with the School of Electronic and Information Engineering, Xian Jiaotong University, Xi’an 710049, China (e-mail:
[email protected]). T. Feng, W. Shi, and I. A. Kakadiaris are with the Department of Computer Science, University of Houston, Houston, TX 77004 USA (e-mail:
[email protected];
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2014.2350916
Fig. 1. Plot (a) depicts over 300 touch traces of zoom-in gesture from 30 users. Each subfigure contains around 10 traces from a single user. Plots (b) and (c) are the enlarged figures depicting the first and second users’ touch gestures.
optimal to apply these approaches while the user is using most of the apps (e.g., reading news and email, surfing the internet). Users must face the camera or keep talking to be continuously authenticated, which greatly reduces the device usability. Authentication using motion data (i.e., Accelerometer/Gyroscope data) has been studied. However, due to signal noise and difficulty in decoupling identity from activity, this modality currently can only be applied on motion-specific activities (e.g., phone pick-up motion during answering the call [4]). Conversely, the touch screen is quite suitable for authentication, not only because it is the most frequently used sensor but also the touch gesture contains identity-rich user behavioral traits. The feasibility of using touch gestures as a biometric modality has been investigated [5]–[8]. It can be observed in Fig. 1 that most of the touch traces from one user tend to follow a similar pattern while the patterns vary among different users. Intrinsically, touch traces are affected by two biometric features (i.e., user hand geometry and finger muscle behavior).
1556-6013 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
2
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
99 100 101 102 103 104 105 106 107 108
In our previous work, we evaluated the Graphic Touch Gesture Feature (GTGF) for touch based user authentication [8]. The GTGF can represent touch dynamics in an explicit manner where the discriminative power from touch trajectories and tactile pressure is combined. However, implemented as an Android App, this method suffers from computational overhead and incurs a noticeable run-time delay when the gallery size increases because a probe trace must be compared with each trace in the gallery. To improve the effectiveness of GTGF and reduce the computational expense at the same time, we propose to apply a Statistical Feature Model (SFM) as Touch Dynamics Images (STDI), which builds a touch gesture “appearance” model from the GTGF. STDI learns the intra-person variations of the GTGF in the image domain by statistical analysis and is capable of synthesizing new instances according to a new probe trace. Instead of directly computing the distance between the probe trace and the gallery traces, the distance is computed between the probe trace and its corresponding synthesized instance given a user-specific STDI. The synthesized instance from the STDI mirrors the probe trace. The instance tends to be more dissimilar to the trace if the trace originates from an identity other than the genuine user. We evaluated the proposed method on three sets of touch gestures (flick up/down, flick right/left, zoom in/out) simply because they are among the most used gestures during userdevice interaction [9]. Flick up/down is generally used when reading text or scrolling menus. Flick right/left is generally used when browsing photos, translating screens, and unlocking phones, while zoom in/out is generally used whenever users switch between details and overviews of the contents. Our main contributions are: (i) applying the feature representation principle in SFM to touch gestures by STDI, which retains the effectiveness of previous GTGF and is more efficient in terms of computation; (ii) developing the algorithms to apply the STDI on the mobile user verification and recognition problems; (iii) systematically evaluating the proposed algorithm by varying different parameters, and providing a comparison between the proposed algorithm with the state-of-the-art methods; and (iv) developing an Android app and testing performance of the verification algorithm in a realworld scenario. The rest of paper is organized as follows. First, we discuss the related work in Section II. Methods are presented in Section III. GTGF extraction is presented in Section III-A. The STDI is described in Section III-B. The user authentication algorithms are presented in Section III-C. Section IV presents the experimental results. In Section V, we conclude our study.
a pen or stylus [14], [15]. Yamazaki et al. [16] extracted personal features from stroke shape, writing pressure and pen inclination to identify an individual. Cao et al. [17] discussed a quantitative human performance model of making single-stroke pen gestures. Ferrer et al. [18] used local patterns (e.g., the Local Binary Pattern and the Local Directional Pattern), to segment and extract identity features from signatures. Impedovo et al. [19] presented the stateof-the-art in automatic signature verification. However, verifying users with pen strokes cannot be pervasively applied in mobile user authentication since only a few types of mobiles provide a pen or stylus. Furthermore, most users opt to interact with their phone directly using their fingers for convenience. Studies exploring the touch gesture as a biometric modality for mobile devices are quite recent [20]. Feng et al. [5] extracted finger motion speed and acceleration of touch gesture as features for identification. Luca et al. [21] directly computed the distance between traces using the dynamic time warping algorithm. Sae-Bae et al. [6] designed 22 touch gestures for authentication, involving different combinations of five fingers. They computed the dynamic time warping distance and Frechet distance between multi-touch traces. However, users must perform their pre-defined touch gestures for authentication, which thus cannot be transparent and nonobtrusive to users. Frank et al. [7] studied the correlation between 30 analytic features from touch traces and classified these features using K-nearest-neighbor and Support Vector Machine approaches. However, the analytic features did not capture the touch trace dynamics. The proposed method is quite different from existing works. The pressure dynamics are adopted in our method and are represented as the profiles of the STDI instances. This is different from extracting a single mid-stroke pressure value [7] or simply being ignored as in other studies [6], [21]. Rather than directly using GTGF for authentication [8], the STDI applies the similar feature representing principle of the statistical feature model in [24] to learn the intra-person variations from many traces by a user so that scores are computed between pairs of gallery models and probe traces, instead of using pairs of gallery traces and probing traces, which significantly reduces the computational time.
IE E Pr E oo f
52
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
II. R ELATED W ORKS Mobile User Authentication can be categorized into two groups: the physiological biometrics, which rely on static physical attributes (e.g., fingerprints [10], facial features [11]) and behavioral biometrics, which adopt identity-invariant features of human behavior to identify or verify people during their daily activities [12]. Prior research on behavioral biometrics has shown that biometric traits can be extracted from physiological characteristics of arm gestures [4], [13], strokes and signatures using
III. M ETHODS The authentication scenarios considered in this paper include open set user recognition and user verification. In the open set user recognition scenario, mobile devices are shared among a few users (e.g., family members). The sensitive group data should be shared only among the authorized users, and users’ identities should first be verified to ensure they belong to the authorized group. Meanwhile, the user should be further recognized to customize the user experience on the device (e.g., limiting the functions for a child, granting different access levels). In a more general scenario, phones are used only by their owner. Other individuals are considered intruders and should be excluded from using the phone. Thus, the method verifies whether the device is being used by its owner.
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
3
TABLE I G ESTURE D ESCRIPTIONS
to the tactile pressure of a sample, while intensity values in a block are related to the moving speed in X, Y directions. The values in each block are assigned by translating the information of a normalized sample, following Eq. 2, Eq. 3 and Eq. 4. p
Ux − xˆn , Ux xˆn − xˆn−1 , U y − yˆn 128 ∗ , Uy yˆn − yˆn−1 , pˆ n H ∗ , 2 Lp
In = 128 ∗ xˆn = Inb =
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
185 186 187 188 189 190 191 192 193 194 195
196 197
198 199 200 201 202 203 204 205 206 207 208 209 210
To perform user authentication, we capture the touch traces from the touch screen output and segment the traces with support from the Android API. We use a series of x-y coordinates of finger touch points, pressure values and time stamps in generating GTGF feature representation. Each series varies in time duration, trace length, dynamics of finger movements and tactile pressure, which is dependent on the user’s hand geometry and muscle behavior. Touch gesture types (e.g., slide up, slide down) introduce another type of variation other than identity variations. To exclude this variation, the captured traces are pre-filtered into one of these six predefined gestures based on screen regions where the traces start and end. The six gestures (Table I) ensure the generality and usability of the proposed method. In cases of multiple fingertip touch gestures, the Euclidean distances between two fingers at the start and end of traces are also adopted. An increasing distance between two finger tips indicates that a spread gesture is performed, while a decreasing distance indicates that a pinch gesture is performed.
yˆn = p Hn
=
A. Graphic Touch Gesture Feature
GTGF represents touch traces as images. To normalize and register the traces with different numbers of points and time intervals, we use cubic interpolation to resample the traces in terms of their x-y coordinates, time, and pressure series. The cubic interpolation is chosen because it is the simplest method that offers true continuity between the samples. After normalization, a single touchtip trace Sˆ 0 (i.e., UD, DU, LR, RL) is obtained consisting of c samples. And a multiple touchtip trace Sˆ 0 , Sˆ 1 (i.e., ZI, ZO) consists of 2c samples, c samples in Sˆ 0 and c samples in Sˆ 1 respectively. sˆn0 = (xˆn0 , yˆn0 , tˆn0 , pˆ n0 ), n ∈ 1, 2, ..., 50, sˆn1 = (xˆn1 , yˆn1 , tˆn1 , pˆ n1 ), n ∈ 1, 2, ..., 50.
(1)
We then translate a trace Sˆ into a GTGF T . A zero-valued image template T with resolution set to H × W is used. Its size is empirically set to 100 × 150 as a tradeoff between the feature “discriminability” and computational efficiency. GTGF is extracted by filling the image template with different intensity values according to an input normalized trace. The normalized traces are projected onto two perpendicular X and Y axes (horizontal and vertical directions according to the touch screen). In general, the upper part of a GTGF image describes the feature along the X direction and the lower part describes the feature along the Y direction. A block in a GTGF image corresponds to a sample sˆn in a trace. Each block spans three columns. The Y expansion in a block is related
213 214 215
216
217
(3)
218
219
(4)
where is the ceiling operator; n is the index of block. The T has a fixed number of levels ( H2 ) for the Y expansion (profile) of GTGF. The parameter L p in Eq. 4 is used to translate the continuous pressure measure into one of H /2 levels. p The higher this level Hn is, the higher the finger pressure applied to the screen. For the same purpose, parameters Im , Ux and U y are adopted in Eqs. 2 and 3. The value 128 evenly divides the range of pixel levels ([0, 256]). Levels within the range [1, 128] describe the negative values of xˆn or yˆn , representing that the finger moves to the left or bottom directions. Levels within the range [129, 256] describe the positive values of xˆn or yˆn , representing that the finger moves to the right or top directions. The closer the pixel levels are to 0 or 256, the faster the fingertip moves along positive or negative directions correspondingly. In this way, the direction, the pressure, and the dynamics of the traces are captured into GTGF, as depicted in Fig. 2. The images within the same row are quite similar to each other, but major differences can be observed between the two rows. The differences are mainly caused by the user identities. For traces Sˆ 0 and Sˆ1 in a multiple fingertip gesture (i.e., Pinch, Spread), we extract two GTGF T 0 and T 1 , respectively. The construction of GTGF has multiple advantages. First, original traces have different spatial topology and temporal duration and thus are difficult to compare directly. GTGF solves the difficulty by registering these traces into a canonical 2D template T . Second, the dynamics are considered to be an important factor in other pattern recognition problems (e.g., facial expression recognition [22] and speech recognition [23]). However, they have not been commonly considered in the touch gesture in the mobile authentication literature. The GTGF is able to represent the gesture dynamics in terms of movement and pressure intuitively and explicitly. Third, due to the inhomogeneity between movement and pressure data, it is difficult to combine their discriminative power at the feature level. However, GTGF takes both features into consideration and combines their discriminative power.
IE E Pr E oo f
166 167
(2)
211 212
B. Statistical Touch Dynamics Images STDI learns the intra-class variation modes and synthesizes the new instances using the learned base feature and affine combinations of these learned modes. In the testing phase, the STDI expresses the probe trace in the new basis as
220
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257
258 259 260 261 262
4
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
Fig. 2. Examples of the GTGF extraction. The first row includes GTGF extracted from five touch traces of a single subject while the second row includes GTGF extracted from five traces from another subject. The movement and pressure dynamics can be explicitly represented in GTGF. We can observe major variations on the image pattern between two rows but minor variations within a row.
265 266 267 268 269 270 271 272 273 274 275 276 277 278
synthesized instances. The distance between the synthesized probe instance and the original probe instance is computed for verification. This distance tends to be greater when the probe is from different users than those from the same user as the one on which the STDI trained. Since the STDI is learned with the specific knowledge of the user, its representation power decreases for others. Thus, less similarity exists between the synthesized and original instances extracted from others’ traces. 1) STDI Training: The training of an STDI is presented in Alg. 1. It takes as input a set of touch traces per gesture type from the user and outputs the trained STDI. This training process runs repeatedly for all types of touch gestures (i.e., LR, RL, DU, UD, ZI, ZO) to finish the training process. Thus, six STDIs are trained for the six gestures, respectively, for each user.
Algorithm 1 Training Algorithm (Per Gesture Type) Input: The training set of touch gesture (type t) traces Stu from the user u. Output: The user-specific Statistical Touch Dynamics Images Dut . For each trace Si ∈ Stu : 1) Extract the GTGF Ti (Sec. III-A), 2) Reshape the 2D matrix Ti into a vector ti , 2: Learn the base vector v¯ and basis feature vectors vi by applying Statistical Analysis on {ti } (Eq. 5). 1:
vectors as in statistic feature models [24]: v = v¯0 +
280 281 282 283 284 285 286 287 288
bi vi = v¯0 + V b,
289
(5)
Since the gesture traces have been registered and scaled in feature extraction, the GTGF can be directly used for training without further normalization or alignment. For every T ∈ R H ×W in the training set, it is first reshaped into t ∈ R H W . The 2D shape information lost due to reshaping can be easily restored by a reverse reshaping process before score computation. As in Eq. 5, a vectorized feature v can be expressed as a base v¯ plus a linear combination of n basis feature vectors vi . The approach to compute the STDI is to apply Principal Component Analysis (PCA) to the training
290
i=1
where the base v¯0 is the mean of training vectors. The vectors vi are n eigenvectors corresponding to the n largest eigenvalues. The number n is determined based on the sum of the n largest eigenvalues which is greater than a portion (κ) of the sum of all the eigenvalues. The coefficients bi are the feature parameters different from eigenvalues from PCA. They control the appearance of learned STDI, as depicted in Fig. 3. By varying their values, the appearance of STDI can be changed. We can assume that the vectors vi are orthonormal after PCA and the parameters bi are assumed to follow the Gaussian distributions N (0, σi2 ). When bi deviates too much from its mean, singular STDI instances may be generated. We further express the term ni=1 bi vi as V b. 2) STDI Testing: The testing algorithm is presented in Alg. 2. It takes a probe trace S p and a trained model D as input, where the model D and the trace S p share the same gesture type. The output is a score d representing the distance between the trace S p and the user u represented by the trained model Dut .
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309
Algorithm 2 Testing Algorithm Input: A probe trace S tp and the trained model Dut . Output: The score d. 1: 2:
279
n
IE E Pr E oo f
263 264
3: 4: 5:
Extract the GTGF T pt from the probe trace S tp , Reshape T pt into v tp , Synthesize the instance v˜ tp using the parameters bˆ e (Eqs. 6 - 8), Reshape the vector v˜ tp back to T˜ pt Compute the score d between T pt and its responding instance T˜ pt (Eq. 9).
The trained STDI has the capability to synthesize different instances by varying the parameters b. Given an input v tp for testing, its responding instances v˜ tp can be synthesized from Eq. 5 using the estimated parameters blx . The estimation is
310 311 312 313
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
5
Fig. 3. A demonstration of the STDI in Eq. 5 and its instance synthesis. (a) The appearance of the STDI with the coefficient b1 set to 3σ1 (top) and -3σ1 (bottom) with other coefficients fixed to 0; (b) the appearance of the STDI with the coefficient b2 set to 3σ2 (top) and −3σ2 (bottom) with other coefficients fixed to 0; (c) the appearance of the STDI with the coefficient b3 set to 3σ3 (top) and −3σ3 (bottom) with other coefficients fixed to 0; (d) the probe feature (top) and the synthesized instance (bottom), where the test instance and the STDI are from the same user; (e) the probe feature (top) and the synthesized instance (bottom), where they originate from different users. It can be observed that the synthesized instances tend to be more similar to their original features when the probe instances originate from the same user as the STDIs.
performed using Eq. 6: be = V T (v tp − v¯0 ).
315
316 317 318 319 320
321
322
323 324 325 326
327
328 329 330
331
332 333 334 335 336 337 338 339 340 341 342 343
(6)
be
Instead of directly applying to Eq. 5, we limit the range of these parameters by applying the function f to increase the separability between classes. The function f limits the distribution of the synthesized instances close to the training class in the feature space. bˆ e = f (be ), for bi ∈ be ,
f (bi ) =
while keeping the method computationally efficient, subjectspecific STDIs are learned from the GTGF feature space. The expense of the score computation is then proportional only to the number of subjects rather than the number of features in the gallery. 1) Single User: The user verification algorithm is designed for a single user. In this case, the six touch gestures from the genuine owner are collected. A set of user-specific STDIs are trained for these gestures respectively. Note that the touch traces are automatically classified using the pre-filtering described at the beginning of Sec. III. During the online stage, the type of the probe trace is retrieved first and the corresponding STDI is selected. Then, a distance is computed between the probe trace and the selected STDI, and is further compared with the threshold for verification. The user verification algorithm takes a set of six trained user-specific STDIs {D}, the probe trace S p and the threshold ξ as inputs, and outputs a boolean decision B. First it extracts the GTGF T p from S p as in Sec. III-A, and selects the STDI D t from D that shares the same gesture type as the probe S p . Then it computes the score d given D t and T p , as in Alg. 2. Finally, it computes and outputs the B. If d < ξ , the user is genuine; otherwise, the user is an impostor. 2) Multiple Users: The open set user recognition algorithm is designed for the case of multiple users, where user identity management (UIM) is needed. Mobiles with UIM can reject the unauthorized users and change the personalized settings according to the authorized user identity. Touch traces from all authorized users are collected and a set of six STDIs are trained for each user. During the online stage, the type of the probe trace is retrieved first and compared with all STDIs in the gallery from the same type. Distances are computed given the probe trace and the STDIs. Note that the distance computation only needs to be repeated for all subjects, instead of all traces in the gallery. The minimum distance is then selected and compared with the threshold for authentication. Only if it is lower than a threshold is its corresponding identity returned as the result.
IE E Pr E oo f
314
bi ρσi sgn(bi )
if abs(bi ) ≤ ρσi , (7) otherwise,
where σi is the expectation of the aforementioned Gaussian distributions of bi , ρ is a constant, and sgn is the signum function extracting the sign of a real number. Then the responding instance v˜ tp can be synthesized as follows: v˜ tp = v¯0 + V bˆ e .
(8)
The L 1 distance is computed between the input instance T pt and its corresponding one T˜ pt given the STDI Dut as the score metrics. L 1 (T˜ pt , T pt ) = |T pt − T˜ pt |.
(9)
C. User Authentication Algorithms
Directly computing the score metrics between a pair of feature instances from the gallery and probe sets is commonly adopted by an authentication system [8], [25], [26]. However, the phone suffers from computation overhead when the gallery size is large to cover different touch habits from multiple users. The probe trace must compare with each trace in the gallery stored in the phone, which leads to an obvious delay during user interaction when many traces per subject are stored in the gallery. However, reducing the number of traces per subject in the gallery inevitably decreases the generality of the user’s identity feature. To capture the intra-person variations
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381
6
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
Algorithm 3 The User Verification Algorithm Input: Six user-specific STDIs D, the probe trace S p , the threshold ξ . Output: A boolean decision B.
TABLE II S TATISTICAL I NFORMATION A BOUT THE D ATASETS
Extract the GTGF T p from S p as in Sec. III-A, Select the STDI D t from D that shares the same gesture type as the probe S p , 3: Compute the score d given D t and T p , as in Alg. 2. 4: Compute and output the B: 1) if d < ξ , B = tr ue; otherwise, B = f alse 1:
AQ:1
2:
Algorithm 4 The User Identification Algorithm Input: Sets of STDI Du for the user group, the probe trace S p , the threshold ξ . Output: The user identity I. Extract the GTGF T p from S p as in Sec. III-A, For each subject u: 1) Select the STDI Dut from Du which shares the same gesture type as the probe S p , 2) Compute the score du given Dut and T p , as in Alg. 2, 3: find the smallest dm among all du s, 4: if dm ξm , then I = Null, else I is the identity corresponding to the dm . 1:
382 383 384 385 386 387 388 389 390 391
392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409
IE E Pr E oo f
2:
The user recognition algorithm takes sets of STDIs {Du } for the user group, the probe trace S p and the threshold set {ξ } as inputs, and outputs the user identity I. First it extracts the GTGF T p from S p as in Sec. III-A. Then for each subject u in the gallery, it selects the STDI Dut from {Du } that shares the same gesture type as the probe S p . Then it computes the score du given Dut and T p , as in Alg. 2. After finishing the loop, it finds the smallest dm among all du s. If dm ξ , the user is not in the user group in the gallery; otherwise, it assigns the user identity as the one corresponding to dm . IV. E XPERIMENTAL R ESULTS AND D ISCUSSION A. Data Acquisition
20 traces for each gesture (Tab. I). All 78 subjects participated in all the following sessions from the second to the sixth. The explanation and practice procedures were repeated for the newly recruited subjects in the second session. Sessions occurred with at least three days between two consecutive sessions. Subjects also provided at least 20 traces per gesture during each session. Subjects could vary the manner they held the phone (e.g., left-hand holding vs right-hand holding, whole palm holding vs half palm holding), the hand pose (palm down vs palm up) or the fingers used to perform the single touch gesture (thumb vs index finger) or the finger orientation between sessions. We named the dataset collected in the first session with a smaller number of subjects UH-TOUCH v1 and the dataset collected in the following sessions with all subjects UH-TOUCH v2. To test the performance and usability of STDI in a real world scenario, we implemented another app named ST D I –U A on the Android system to verify users. The app runs as a background service, collecting and verifying the touch traces in the context of Android Launcher. The UHTOUCH v3 dataset was collected by this app for evaluation. Twenty subjects whose data were acquired in the UHTOUCH v2 dataset were recruited. Each subject used the device (a Samsung Galaxy S3 phone with Android version 4.1.1) for three days as their daily phone and their touch traces in the context of Android Launcher were recorded. There were 1,327 touch traces collected on average from each user. Table II depicts the statistical information for these datasets. The Recognition Rate (RR) and Equal Error Rate (EER) were used as the evaluation criteria for the recognition and verification experiments. RR is defined as the number of times that users have been correctly identified over the size of the probe set. EER is the common value when the false acceptance rate (FAR) and the false rejection rate (FRR) are equal.
To systematically evaluate performance of GTGF and STDI, we developed a data collection app which captures touch gestures using a standard API of the Android system. When fingers contact the touchscreen, it records the trace by recording raw touch samples from the API. We recruited 78 subjects, of which 64 were right-handed, and 75 had touchscreen device experience. The data acquisition contains six sessions collected over several months. The first session served as a pilot session where only 30 subjects were recruited. We explained the purpose of the study and the use of our data acquisition program. Then, the subjects practiced for 5 to 10 min. The subjects were instructed to perform the six types of most used touch gestures repetitively during the practice time so that they could get used to the data collection app and their gestures could converge to their daily touch habit. After a user could use the program in a natural way, the subject provided
B. Experimental Results
1) Evaluation on STDI Configuration: Figure 4 depicts the results of grid search on the 3D GTGF parameter space, including the relative movement upper bound Ux , U y in the x, y directions and the upper bound for the tactile pressure L p during extracting GTGFs. They varied in the range from 0.3 to 0.7, from 20 to 60 and from 30 to 70 on the L p , Ux , U y axes, respectively. These ranges were selected to cover most of the meaningful values in the
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445
446 447 448 449 450 451 452 453 454
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
7
455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490
IE E Pr E oo f
Fig. 4. Results of grid search on the parameter space (L p , U x , U y ) on the evaluation dataset UH-TOUCH v1. Each subfigure is a slice on the Ux axis in the 3D parameter space, and the x, y axes in all subfigures represent L p and U y axes in the parameter space. The colors represent the average EERs for the six gestures under a set of parameters. Note that the same color in different subfigures may represent different EER values according to the accompanying colormaps.
parameter space. For the experiment setup, we randomly selected half of the traces per subject per gesture from the UH-TOUCH v1 as the gallery (target) set and used the other half per subject per gesture from UH-TOUCH v1 as the probe (query) set. The experiments were conducted six times for all gestures and EER reported were computed by averaging the EERs from all the gestures. The L 1 distance was computed directly between GTGFs as the score metrics. The variations on EER are from 11.28% to 12.14% and the best EER has been achieved for the values 0.35, 20, 30 in the parameter space. This demonstrated that the proposed GTGF feature is not sensitive to the parameter variations in the aforementioned tested range. The parameters at which the best EER was achieved were used in the experiments described below. We evaluated the performance of the proposed method under different values of the parameter κ in STDI. The κ determines the number of the eigenvectors to preserve in STDIs. When the κ is larger, more eigenvectors are preserved so that new instances can be synthesized in a higher dimensional vector space. This allows more variations on the instance synthesis, but meanwhile increases the computation complexity. The experiments were conducted in UH-TOUCH v2 and the setup was the same as the previous experiments. Figure 5 depicts the results for the experiments in terms of EER in Fig. 5(a) and RR in Fig. 5(b). In verification, the average EERs of the six gestures are 9.6%, 9.7% and 12.3% for κ = 0.95, κ = 0.90 and κ = 0.80. In recognition, the average RRs of the six gestures were 82.6%, 82.3% and 78.1% for κ = 0.95, κ = 0.90 and κ = 0.80. The performances are similar for most of the gestures when κ = 0.95 and κ = 0.90, but a clear decrease in performance can be observed when κ = 0.80. Thus, we opt to use 0.90 as the value of κ, as a tradeoff between performance and computational complexity. We also evaluated the performance of the proposed method under different values of the parameter ρ in STDI. The ρ sets
Fig. 5. A comparison of the different parameter values κ in terms of User Verification and Recognition on UH-TOUCH v2; (a) EER under three κ values of the six touch gestures; (b) RR under three κ values of the six touch gestures. The parameter κ determines the number of the bases in STDIs.
the boundary for coefficients b, which limits the variability of synthesizing the new instances. The greater this boundary, the broader the synthesized features can spread in the vector space. The smaller this boundary, the tighter the synthesized features of the class are limited in the space. The experiments are conducted in UH-TOUCH v2 and the setup is the same as the previous experiments. Figure 6 depicts the results for the experiments in terms of EER in Fig. 6(a) and RR in Fig. 6(b). In verification, the average EERs of the six gestures were 10.9%, 9.7%, 10.8% and 13.4% for ρ = 0.5, ρ = 1.0, ρ = 3.0 and ρ = ∞. In recognition, the average RRs of the six gestures were 80.7%, 82.3%, 80.9% and 78.2% for ρ = 0.5, ρ = 1.0, ρ = 3.0 and ρ = ∞. The best performance for most of the gestures has been achieved in both the recognition and verification experiments when ρ = 1.0. Thus, we opted to use 1.0 as the value of ρ. A possible explanation is that when the synthesized features spread widely in the vector space, the inter-class distance may be reduced and thus false positive cases increase. When the distribution of the synthesized features is limited to a small region, the intra-
491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
IE E Pr E oo f
8
Fig. 6. A comparison of the different parameter values ρ in terms of User Verification and Recognition on UH-TOUCH v2; (a) EER under four ρ values of the six touch gestures; (b) RR under four ρ values of the six touch gestures. The parameter ρ limits the variability during synthesizing the trace instances.
Fig. 7. Comparisons of different authentication schemes in terms of User Verification and Recognition on UH-TOUCH v2. (a) EER of the three different authentication schemes. (b) RR of the three different authentication schemes.
class variations are limited and thus the true negative cases increase, where a balance is achieved at the value ρ = 1.0. 2) Demonstration on STDI Advantages: To demonstrate the advantage of converting touch traces into GTGF and further constructing STDI, we conducted a comparison between three methods. The first method is based on the Multivariate Gaussian Distribution Model (MGD) learned directly from the resampled traces where the number of samples on all traces are normalized to a constant (i.e., 50) as Eq. 1 in Sec. III-A. An MGD model is built for each sample, which contains variables of movement and pressure dynamics x, y, and p. Thus, a set of 50 MGD models were trained for complete traces from a gesture type of a user. These models output a set of probabilities to represent the similarities between the testing trace and the trained model. These probabilities are summed to a single score for authentication. The second GTGF-based method is to extract the GTGF feature and compute the L 1 distance between traces from the training set and the testing set [8]. The third method
is the proposed STDI-based method. For the experiment setup, we randomly selected half of the traces in each session in the UH-TOUCH v2 and combined to the gallery or training set. The remaining portion from UH-TOUCH v2 was used as the probe or testing set. Figure 7 depicts the comparisons among these three methods in terms of EER in Fig. 7(a) and RR in Fig. 7(b). In the verification experiments, the average EERs for MGD, GTGF and STDI on the six gestures were 36.8%, 12.8% and 9.7%, respectively. The improvement of the GTGF over the MGD is obvious (12.8% vs 36.8%), which indicates the necessity of adopting the GTGF feature. The decrease of EER from 12.8% to 9.7% demonstrates that the traces are better represented and more distinguishable by STDI. In the recognition experiments, the average RRs for MGD, GTGF and STDI on the six gestures were 55.2%, 79.0% and 82.3%, respectively. The increase of RR demonstrates the same evidence as in the verification experiments. By adopting GTGF and STDI, touch traces can be better represented and classified in user identification and verification problems.
530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
TABLE III A C OMPARISON OF EER AND RR FOR D IFFERENT F USION S CHEMES AND M ETHODS
550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593
3) Evaluation of Score Fusion and Baseline Comparisons: To evaluate the performance of the proposed method using different combinations of touch gestures (combining single touchtip gestures, combining multiple touchtip gestures and combining all six gestures) and compare the method with the methods available in the literature, we tested the method on UH-TOUCH v2. The same gallery and probe set were used as in the previous experiment. After score computation, we obtained one score matrix for each gesture. Then, we fused the score matrix of multiple gestures using the sum rule, which is proved by Kitller et al. [27] to be superior in comparison to other rules (i.e., product, min, max, median rule). Table III depicts a comparison of the EER and RR for different fusion schemes and methods: (i) GTGF-S: fusion of the score matrices of four single touchtip gestures computed from the aforementioned GTGF method [8]; (ii) GTGF-M: fusion of the score matrices of the two multiple touchtip gestures computed from the GTGF method [8]; (iii) STDI-S: fusion of the score matrices of four single touchtip gestures computed from the proposed method; (iv) STDI-M: fusion of the score matrices of the two multiple touchtip gestures computed from the proposed method; (v) GTGF-A: fusion of the score matrices of the six gestures computed from the aforementioned GTGF method [8]; (vi) STDI-A: fusion of the score matrices of the six gestures computed from the proposed method; (vii) TA-A: fusion of the score matrices of the six gestures computed from the method in [7]; (viii) DM-A: fusion of the score matrices of the six gestures computed from the method in [6]. As in most score-level fusion schemes, combining more scores from different channels increases the overall performance. This trend can be observed from the decrease in EER and increase in RR from the STDI-M (including only two multi-touch gestures) to the STDI-S (including four singletouch gestures) and further to the STDI-A schemes (including all six gestures). Furthermore, the performance of the GTGF-based method demonstrated a clear improvement over the other methods in the scheme of combining all six gestures. There are two reasons for this improvement. First, compared with the method in [6], we adopted tactile pressure in our method, which contains extra clues on the subject’s muscle behavior. However, their method did not include this information. Second, compared to the work in [7] which extracted 30 static analytic feature values, the proposed method includes movement dynamics and pressure dynamics of the touch
TABLE IV AVERAGE C OMPUTATIONAL T IME (ms) AND EER (%) OF GTGF AND
STDI T ESTED W ITH D IFFERENT G ALLERY S IZE
IE E Pr E oo f
549
9
Fig. 8. A comparison of the FRR and FAR results for the continuous authentication among the GTGF, the STDI methods and the Touchalytics method [7]. The Y axis is the percentage of the FAR and FRR; the X axis is the number of consecutive traces (γ ) used in continuous authentication. The solid lines represent the FRR results and the dash lines represent the FAR results.
gesture during feature extraction. Last, the best performance in terms of recognition and verification has been achieved by the STDI-A scheme. Its improvement over GTGF-A is derived from statistical analysis on the intra-class variations in GTGF representation. 4) Evaluation of Continuous Mobile Authentication and Baseline Comparisons: To test the performance and usability of the STDI based user authentication method in a real world scenario, we used the implemented App ST D I −U A to verify users. Openset user recognition is excluded from this test since the Android system (version 4.1.x) does not currently support multiple user account management. A gallery of traces (Ng per gesture) for each subject is formed by randomly selecting traces (Ng /5 per gesture) from each session in UHTOUCH v2 to cover variations in touch manner. The traces in UHTOUCH v3 are used as the probe set in the test. During testing, the app ran on a Samsung Galaxy S3 phone with Android version 4.1.1. It has Quad-core 1.4 GHz Cortex-A9 CPU and 1GB RAM. The computational complexity for GTGF is O(Ns ∗ Ng ), and the computational complexity for STDI is O(Ns ), where Ns is the number of subjects in the gallery. Note that one STDI is built for each subject per gesture from a number of Ng traces in the gallery. The first two rows in Table IV depict the average computational time per probe trace with different gallery size at the online testing stage. As Ng increased from 30 traces per gesture to 100 traces, the computational time for GTGF also increased significantly
594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620
10
622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645
646
647 648 649 650 651 652 653 654 655 656 657 658
659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678
from 32.22 ms to 81.36 ms. However, the computational time for STDI remains at around 12.90 ms. The third and fourth rows in Table IV depict the average EER in user authentication using five continuous traces. These EERs were achieved by GTGF and STDI methods with different gallery size respectively. As Ng increased from 30 traces per gesture to 100 traces, the EERs achieved by GTGF and STDI decreased. When using more than 80 traces, the EER improvement for both methods has been reduced. Figure 8 depicts a comparison of the continuous authentication results among GTGF, TA [7] and STDI in terms of the average FRR (solid line) and average FAR (dash line). The distance from the probe trace and the distances from the following γ −1 consecutive probe traces are averaged to obtain the score for authentication. If it is above the threshold, the app classifies the user as an unauthorized intruder and invokes explicit user authentication. FARs for all methods reduced in a small but steady rate as the γ increased. Meanwhile, the FRRs of all methods decreased significantly as the number of traces increased. For the GTGF and STDI methods, the decreasing rate reduced after the γ > 5 and almost converged when γ = 7. When γ > 5, the delay caused by the increased computational burden became noticeable for GTGF. However, no obvious delay has been observed for the STDI method, even when γ = 7.
[6] N. Sae-Bae, N. Memon, and K. Isbister, “Investigating multi-touch gestures as a novel biometric modality,” in Proc. IEEE 5th Int. Conf. Biometrics, Theory, Appl. Syst. (BTAS), Washington, DC, USA, Sep. 2012, pp. 156–161. [7] M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song, “Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 1, pp. 136–148, Jan. 2013. [8] X. Zhao, T. Feng, and W. Shi, “Continuous mobile authentication using a novel graphic touch gesture feature,” in Proc. IEEE Int. Conf. Biometrics, Theory, Appl. Syst., Washington, DC, USA, Sep./Oct. 2013, pp. 1–6. [9] A. Bragdon, E. Nelson, Y. Li, and K. Hinckley, “Experimental analysis of touch-screen gesture designs in mobile environments,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., Vancouver, BC, Canada, 2011, pp. 403–412. [10] S. Li and A. C. Kot, “Fingerprint combination for privacy protection,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 2, pp. 350–360, Feb. 2013. [11] X. Zhao, S. K. Shah, and I. A. Kakadiaris, “Illumination normalization using self-lighting ratios for 3d2d face recognition,” in Proc. Eur. Conf. Comput. Vis., Workshops Demonstrations, 2012, pp. 220–229. [12] A. Messerman, T. Mustafic, S. A. Camtepe, and S. Albayrak, “A generic framework and runtime environment for development and evaluation of behavioral biometrics solutions,” in Proc. 10th Int. Conf. Intell. Syst. Design Appl., Cairo, Egypt, Nov./Dec. 2010, pp. 136–141. [13] D. Gafurov and E. Snekkkenes, “Arm swing as a weak biometric for unobtrusive user authentication,” in Proc. Int. Conf. Intell. Inf. Hiding Multimedia Signal Process., Harbin, China, Aug. 2008, pp. 1080–1087. [14] M. Faundez-Zanuy, “On-line signature recognition based on VQ-DTW,” Pattern Recognit., vol. 40, no. 3, pp. 981–992, Mar. 2007. [15] A. Kholmatov and B. Yanikoglu, “Biometric authentication using online signatures,” in Proc. Int. Symp. Comput. Inf. Sci., Antalya, Turkey, 2004, pp. 373–380. [16] Y. Yamazaki, Y. Mizutani, and N. Komatsu, “Extraction of personal features from stroke shape, writing pressure and pen inclination in ordinary characters,” in Proc. 5th Int. Conf. Document Anal. Recognit., Bangalore, India, Sep. 1999, pp. 426–429. [17] X. Cao and S. Zhai, “Modeling human performance of pen stroke gestures,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., San Jose, CA, USA, 2007, pp. 1495–1504. [18] M. A. Ferrer, A. Morales, and J. F. Vargas, “Off-line signature verification using local patterns,” in Proc. 2nd Nat. Conf. Telecommun., Arequipa, Peru, May 2011, pp. 1–6. [19] D. Impedovo and G. Pirlo, “Automatic signature verification: The state of the art,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 38, no. 5, pp. 609–635, Sep. 2008. [20] W. Shi, J. Yang, Y. Jiang, F. Yang, and Y. Xiong, “Senguard: Passive user identification on smartphones using multiple sensors,” in Proc. 7th Int. Conf. Wireless Mobile Comput., Netw. Commun., Cagliari, Italy, Oct. 2011, pp. 141–148. [21] A. D. Luca, A. Hang, F. Brudy, C. Lindner, and H. Hussmann, “Touch me once and i know it’s you!: Implicit authentication based on touch screen patterns,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., Austin, TX, USA, 2012, pp. 987–996. [22] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 39–58, Jan. 2009. [23] T. Hasan and J. H. L. Hansen, “A study on universal background model training in speaker verification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 1890–1899, Sep. 2011. [24] X. Zhao, E. Dellandréa, J. Zou, and L. Chen, “A unified probabilistic framework for automatic 3D facial expression analysis based on a Bayesian belief inference and statistical feature models,” Image Vis. Comput., vol. 31, no. 3, pp. 231–245, 2013. [25] G. Toderici et al., “Bidirectional relighting for 3D-aided 2D face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., San Francisco, CA, USA, Jun. 2010, pp. 2721–2728. [26] R. V. Yampolskiy and V. Govindaraju, “Behavioural biometrics: A survey and classification,” Int. J. Biometrics, vol. 1, no. 1, pp. 81–113, Jun. 2008. [27] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE Trans. Pattern Anal. Mach. Learn., vol. 20, no. 3, pp. 226–239, Mar. 1998.
IE E Pr E oo f
621
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
V. C ONCLUSION
This paper presented a novel mobile authentication approach based on touch gestures. An STDI, which synthesizes trace feature instances using intra-person variations, has been proposed. Using a multi-session touch gesture dataset containing six commonly used gestures, we have systematically evaluated the performance of the proposed method. Comparisons between the STDI and other methods in the literature have also been conducted. An STDI-based User Verification App has been implemented and tested on a dataset collected during the daily use of the phone. The results have demonstrated the effectiveness and usability of the STDI in the real-world mobile authentication. R EFERENCES
[1] A. J. Aviv, K. Gibson, E. Mossop, M. Blaze, and J. M. Smith, “Smudge attacks on smartphone touch screens,” in Proc. 4th USENIX Conf. Offensive Technol., Washington, DC, USA, 2010, pp. 1–7. [2] C. Schneider, N. Esau, L. Kleinjohann, and B. Kleinjohann, “Feature based face localization and recognition on mobile devices,” in Proc. 9th Int. Conf. Control, Autom., Robot. Vis., Dec. 2006, pp. 1–6. [3] M. Baloul, E. Cherrier, and C. Rosenberger, “Challenge-based speaker recognition for mobile authentication,” in Proc. Int. Conf. Biometrics Special Interest Group, Darmstadt, Germany, Sep. 2012, pp. 1–7. [4] T. Feng, X. Zhao, and W. Shi, “Investigating mobile device picking-up motion as a novel biometric modality,” in Proc. IEEE Int. Conf. Biometrics, Theory, Appl. Syst., Sep./Oct. 2013, pp. 1–6. [5] T. Feng et al., “Continuous mobile authentication using touchscreen gestures,” in Proc. IEEE Conf. Technol. Homeland Secur., Boston, MA, USA, Nov. 2012, pp. 451–456.
679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774
775 776 777 778 779 780 781 782 783 784 785 786
Xi Zhao received the Ph.D. (Hons.) degree in computer science from the Ecole Centrale de Lyon, Ecully, France, in 2010. After graduation, he conducted research in the fields of biometrics, face analysis, and pattern recognition, as a Post-Doctoral Fellow with the Department of Computer Science, University of Houston, Houston, TX, USA. He is currently a Research Assistant Professor with the Department of Computer Science, University of Houston. His current research interests include biometrics, mobile computing, computer vision, and assistive technology. Dr. Zhao currently serves as a reviewer of the IEEE T RANSACTIONS ON I MAGE P ROCESSING , the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS - PART B, the IEEE T RANSACTIONS ON I NFORMATION F ORENSICS AND S ECURITY, Image Vision Computing, the IEEE International Conference on Automatic Face and Gesture Recognition, the IEEE International Conference on Advanced Video and Signal-Based Surveillance, and the International Conference on Pattern Recognition. He has also served on the Program Committee of Workshop on 3-D Face Biometrics, in 2013, and the Registration Chair of the International Conference on Biometrics: Theory, Applications and Systems, in 2013.
Weidong Shi received the Ph.D. degree in computer science from the Georgia Institute of Technology, Atlanta, GA, USA, where he was involved in research on computer architecture and computer systems. He was a Senior Research Staff Engineer with the Motorola Research Laboratory, Nokia Research Center, and the Co-Founder of a technology startup. He is currently an Assistant Professor with the University of Houston, Houston, TX, USA. He was involved in designing multiple Nvidia platform products, and was credited to publish Electronic Art’s console game. He has authored or coauthored publications covering research problems in computer architecture, computer systems, multimedia/graphics systems, mobile computing, and computer security. He holds multiple issued and pending U.S. and Trademark Office patents.
Ioannis A. Kakadiaris is currently a Hugh Roy and Lillie Cranz Cullen University Professor of Computer Science, Electrical and Computer Engineering, and Biomedical Engineering with the University of Houston (UH), Houston, TX, USA. He joined UH, in 1997, after a post-doctoral fellowship with the University of Pennsylvania, Philadelphia, PA, USA. He received the B.Sc. degree in physics from the University of Athens, Athens, Greece, the M.Sc. degree in computer science from Northeastern University, Boston, MA, USA, and the Ph.D. degree from the University of Pennsylvania. He is the Founder of the Computational Biomedicine Laboratory at UH. His research interests include multimodal biometrics, face recognition, computer vision, pattern recognition, biomedical image analysis, and predictive analytics. He was a recipient of a number of awards, including the NSF Early CAREER Development Award, the Schlumberger Technical Foundation Award, the UH Computer Science Research Excellence Award, the UH Enron Teaching Excellence Award, and the James Muller Vulnerable Plaque Young Investigator Prize. His research has been featured on the Discovery Channel, National Public Radio, KPRC NBC News, KTRH ABC News, and KHOU CBS News. His selected professional service leadership positions include: the General Cochair of the 2013 Biometrics: Theory, Applications and Systems Conference, General Cochair of the 2014 SPIE Biometric and Surveillance Technology for Human and Activity Identification, Program Cochair of the 2015 International Conference on Automatic Face and Gesture Recognition Conference, and Vice President for Technical Activities of the IEEE Biometrics Council.
IE E Pr E oo f
753
11
Tao Feng received the B.S. degree in software engineering from the Huazhong University of Science and Technology, Wuhan, China, and the M.Sc. degree in computer science from the Chinese Academy of Sciences, Institute of Automation, Beijing, China, with a focus on knowledge system and knowledge visualization. He is currently pursuing the Ph.D. degree with the Department of Computer Science, University of Houston, Houston, TX, USA. His current research interests include mobile computing, artificial intelligence, human–computer interaction, mobile security, and machine learning.
787 788 789 790 791 792 793 794 795 796 797 798 799 800 801
802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828
AUTHOR QUERY
IE E Pr E oo f
AQ:1 = Alogrithms 3 and 4 are not cited in body text. Please indicate where they should be cited.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
1
Mobile User Authentication Using Statistical Touch Dynamics Images Xi Zhao, Member, IEEE, Tao Feng, Student Member, IEEE, Weidong Shi, Senior Member, IEEE, and Ioannis A. Kakadiaris, Senior Member, IEEE
1 2 3 4 5 6 7 8 9 10 11 12
Abstract— Behavioral biometrics have recently begun to gain attention for mobile user authentication. The feasibility of touch gestures as a novel modality for behavioral biometrics has been investigated. In this paper, we propose applying a statistical touch dynamics image (aka statistical feature model) trained from graphic touch gesture features to retain discriminative power for user authentication while significantly reducing computational time during online authentication. Systematic evaluation and comparisons with state-of-the-art methods have been performed on touch gesture data sets. Implemented as an Android App, the usability and effectiveness of the proposed method have also been evaluated. Index Terms— Touch gesture, user authentication, mobile security, statistical touch dynamics images, behavioral biometrics.
15
I. I NTRODUCTION
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
T
IE E Pr E oo f
14
13
HE increasing ubiquity of smartphones raises the issue of security of paramount importance. The data stored and accessed from such devices are rarely protected except by a point-based authentication at the login stage. This is deficient since it cannot detect intruders after a user has passed the point entry and is vulnerable to simple attack (e.g., smudge attacks [1]) after loss or theft. As opposed to password or image-pattern-based authentication, continuous authentication has received increased attention in recent years. It intelligently monitors and analyzes the user-device interaction to ensure correct identity, which either complements point-based authentication or even substitutes it when continuous authentication satisfies particular accuracy requirements. While using a smartphone, user identity can be sensed via multiple integrated sensors (e.g., the touch screen, motion sensors, microphone, or camera). Users’ faces [2] and voices [3] can be used for authentication, but it is not convenient or Manuscript received February 12, 2014; revised May 21, 2014 and August 7, 2014; accepted August 8, 2014. Date of publication August 22, 2014. This work was supported in part by the U.S. Department of Homeland Security (DHS) under Award N66001-13-C-3002 and in part by the National Science Foundation (NSF) under Award CNS 1205708. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of DHS or NSF. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jaihie Kim. X. Zhao is with the Department of Computer Science, University of Houston, Houston, TX 77004 USA, and also with the School of Electronic and Information Engineering, Xian Jiaotong University, Xi’an 710049, China (e-mail:
[email protected]). T. Feng, W. Shi, and I. A. Kakadiaris are with the Department of Computer Science, University of Houston, Houston, TX 77004 USA (e-mail:
[email protected];
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2014.2350916
Fig. 1. Plot (a) depicts over 300 touch traces of zoom-in gesture from 30 users. Each subfigure contains around 10 traces from a single user. Plots (b) and (c) are the enlarged figures depicting the first and second users’ touch gestures.
optimal to apply these approaches while the user is using most of the apps (e.g., reading news and email, surfing the internet). Users must face the camera or keep talking to be continuously authenticated, which greatly reduces the device usability. Authentication using motion data (i.e., Accelerometer/Gyroscope data) has been studied. However, due to signal noise and difficulty in decoupling identity from activity, this modality currently can only be applied on motion-specific activities (e.g., phone pick-up motion during answering the call [4]). Conversely, the touch screen is quite suitable for authentication, not only because it is the most frequently used sensor but also the touch gesture contains identity-rich user behavioral traits. The feasibility of using touch gestures as a biometric modality has been investigated [5]–[8]. It can be observed in Fig. 1 that most of the touch traces from one user tend to follow a similar pattern while the patterns vary among different users. Intrinsically, touch traces are affected by two biometric features (i.e., user hand geometry and finger muscle behavior).
1556-6013 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
2
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
99 100 101 102 103 104 105 106 107 108
In our previous work, we evaluated the Graphic Touch Gesture Feature (GTGF) for touch based user authentication [8]. The GTGF can represent touch dynamics in an explicit manner where the discriminative power from touch trajectories and tactile pressure is combined. However, implemented as an Android App, this method suffers from computational overhead and incurs a noticeable run-time delay when the gallery size increases because a probe trace must be compared with each trace in the gallery. To improve the effectiveness of GTGF and reduce the computational expense at the same time, we propose to apply a Statistical Feature Model (SFM) as Touch Dynamics Images (STDI), which builds a touch gesture “appearance” model from the GTGF. STDI learns the intra-person variations of the GTGF in the image domain by statistical analysis and is capable of synthesizing new instances according to a new probe trace. Instead of directly computing the distance between the probe trace and the gallery traces, the distance is computed between the probe trace and its corresponding synthesized instance given a user-specific STDI. The synthesized instance from the STDI mirrors the probe trace. The instance tends to be more dissimilar to the trace if the trace originates from an identity other than the genuine user. We evaluated the proposed method on three sets of touch gestures (flick up/down, flick right/left, zoom in/out) simply because they are among the most used gestures during userdevice interaction [9]. Flick up/down is generally used when reading text or scrolling menus. Flick right/left is generally used when browsing photos, translating screens, and unlocking phones, while zoom in/out is generally used whenever users switch between details and overviews of the contents. Our main contributions are: (i) applying the feature representation principle in SFM to touch gestures by STDI, which retains the effectiveness of previous GTGF and is more efficient in terms of computation; (ii) developing the algorithms to apply the STDI on the mobile user verification and recognition problems; (iii) systematically evaluating the proposed algorithm by varying different parameters, and providing a comparison between the proposed algorithm with the state-of-the-art methods; and (iv) developing an Android app and testing performance of the verification algorithm in a realworld scenario. The rest of paper is organized as follows. First, we discuss the related work in Section II. Methods are presented in Section III. GTGF extraction is presented in Section III-A. The STDI is described in Section III-B. The user authentication algorithms are presented in Section III-C. Section IV presents the experimental results. In Section V, we conclude our study.
a pen or stylus [14], [15]. Yamazaki et al. [16] extracted personal features from stroke shape, writing pressure and pen inclination to identify an individual. Cao et al. [17] discussed a quantitative human performance model of making single-stroke pen gestures. Ferrer et al. [18] used local patterns (e.g., the Local Binary Pattern and the Local Directional Pattern), to segment and extract identity features from signatures. Impedovo et al. [19] presented the stateof-the-art in automatic signature verification. However, verifying users with pen strokes cannot be pervasively applied in mobile user authentication since only a few types of mobiles provide a pen or stylus. Furthermore, most users opt to interact with their phone directly using their fingers for convenience. Studies exploring the touch gesture as a biometric modality for mobile devices are quite recent [20]. Feng et al. [5] extracted finger motion speed and acceleration of touch gesture as features for identification. Luca et al. [21] directly computed the distance between traces using the dynamic time warping algorithm. Sae-Bae et al. [6] designed 22 touch gestures for authentication, involving different combinations of five fingers. They computed the dynamic time warping distance and Frechet distance between multi-touch traces. However, users must perform their pre-defined touch gestures for authentication, which thus cannot be transparent and nonobtrusive to users. Frank et al. [7] studied the correlation between 30 analytic features from touch traces and classified these features using K-nearest-neighbor and Support Vector Machine approaches. However, the analytic features did not capture the touch trace dynamics. The proposed method is quite different from existing works. The pressure dynamics are adopted in our method and are represented as the profiles of the STDI instances. This is different from extracting a single mid-stroke pressure value [7] or simply being ignored as in other studies [6], [21]. Rather than directly using GTGF for authentication [8], the STDI applies the similar feature representing principle of the statistical feature model in [24] to learn the intra-person variations from many traces by a user so that scores are computed between pairs of gallery models and probe traces, instead of using pairs of gallery traces and probing traces, which significantly reduces the computational time.
IE E Pr E oo f
52 53
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
II. R ELATED W ORKS Mobile User Authentication can be categorized into two groups: the physiological biometrics, which rely on static physical attributes (e.g., fingerprints [10], facial features [11]) and behavioral biometrics, which adopt identity-invariant features of human behavior to identify or verify people during their daily activities [12]. Prior research on behavioral biometrics has shown that biometric traits can be extracted from physiological characteristics of arm gestures [4], [13], strokes and signatures using
III. M ETHODS The authentication scenarios considered in this paper include open set user recognition and user verification. In the open set user recognition scenario, mobile devices are shared among a few users (e.g., family members). The sensitive group data should be shared only among the authorized users, and users’ identities should first be verified to ensure they belong to the authorized group. Meanwhile, the user should be further recognized to customize the user experience on the device (e.g., limiting the functions for a child, granting different access levels). In a more general scenario, phones are used only by their owner. Other individuals are considered intruders and should be excluded from using the phone. Thus, the method verifies whether the device is being used by its owner.
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
3
TABLE I G ESTURE D ESCRIPTIONS
to the tactile pressure of a sample, while intensity values in a block are related to the moving speed in X, Y directions. The values in each block are assigned by translating the information of a normalized sample, following Eq. 2, Eq. 3 and Eq. 4. p
Ux − xˆn , Ux xˆn − xˆn−1 , U y − yˆn 128 ∗ , Uy yˆn − yˆn−1 , pˆ n H ∗ , 2 Lp
In = 128 ∗ xˆn = Inb
185
A. Graphic Touch Gesture Feature
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183
186 187 188 189 190 191 192 193 194 195
196 197
198 199 200 201 202 203 204 205 206 207 208 209 210
yˆn = p Hn
=
GTGF represents touch traces as images. To normalize and register the traces with different numbers of points and time intervals, we use cubic interpolation to resample the traces in terms of their x-y coordinates, time, and pressure series. The cubic interpolation is chosen because it is the simplest method that offers true continuity between the samples. After normalization, a single touchtip trace Sˆ 0 (i.e., UD, DU, LR, RL) is obtained consisting of c samples. And a multiple touchtip trace Sˆ 0 , Sˆ 1 (i.e., ZI, ZO) consists of 2c samples, c samples in Sˆ 0 and c samples in Sˆ 1 respectively. sˆn0 = (xˆn0 , yˆn0 , tˆn0 , pˆ n0 ), n ∈ 1, 2, ..., 50, sˆn1 = (xˆn1 , yˆn1 , tˆn1 , pˆ n1 ), n ∈ 1, 2, ..., 50.
(1)
We then translate a trace Sˆ into a GTGF T . A zero-valued image template T with resolution set to H × W is used. Its size is empirically set to 100 × 150 as a tradeoff between the feature “discriminability” and computational efficiency. GTGF is extracted by filling the image template with different intensity values according to an input normalized trace. The normalized traces are projected onto two perpendicular X and Y axes (horizontal and vertical directions according to the touch screen). In general, the upper part of a GTGF image describes the feature along the X direction and the lower part describes the feature along the Y direction. A block in a GTGF image corresponds to a sample sˆn in a trace. Each block spans three columns. The Y expansion in a block is related
212 213 214 215
216
217
(3)
218
219
(4)
where is the ceiling operator; n is the index of block. The T has a fixed number of levels ( H2 ) for the Y expansion (profile) of GTGF. The parameter L p in Eq. 4 is used to translate the continuous pressure measure into one of H /2 levels. p The higher this level Hn is, the higher the finger pressure applied to the screen. For the same purpose, parameters Im , Ux and U y are adopted in Eqs. 2 and 3. The value 128 evenly divides the range of pixel levels ([0, 256]). Levels within the range [1, 128] describe the negative values of xˆn or yˆn , representing that the finger moves to the left or bottom directions. Levels within the range [129, 256] describe the positive values of xˆn or yˆn , representing that the finger moves to the right or top directions. The closer the pixel levels are to 0 or 256, the faster the fingertip moves along positive or negative directions correspondingly. In this way, the direction, the pressure, and the dynamics of the traces are captured into GTGF, as depicted in Fig. 2. The images within the same row are quite similar to each other, but major differences can be observed between the two rows. The differences are mainly caused by the user identities. For traces Sˆ 0 and Sˆ1 in a multiple fingertip gesture (i.e., Pinch, Spread), we extract two GTGF T 0 and T 1 , respectively. The construction of GTGF has multiple advantages. First, original traces have different spatial topology and temporal duration and thus are difficult to compare directly. GTGF solves the difficulty by registering these traces into a canonical 2D template T . Second, the dynamics are considered to be an important factor in other pattern recognition problems (e.g., facial expression recognition [22] and speech recognition [23]). However, they have not been commonly considered in the touch gesture in the mobile authentication literature. The GTGF is able to represent the gesture dynamics in terms of movement and pressure intuitively and explicitly. Third, due to the inhomogeneity between movement and pressure data, it is difficult to combine their discriminative power at the feature level. However, GTGF takes both features into consideration and combines their discriminative power.
IE E Pr E oo f
184
To perform user authentication, we capture the touch traces from the touch screen output and segment the traces with support from the Android API. We use a series of x-y coordinates of finger touch points, pressure values and time stamps in generating GTGF feature representation. Each series varies in time duration, trace length, dynamics of finger movements and tactile pressure, which is dependent on the user’s hand geometry and muscle behavior. Touch gesture types (e.g., slide up, slide down) introduce another type of variation other than identity variations. To exclude this variation, the captured traces are pre-filtered into one of these six predefined gestures based on screen regions where the traces start and end. The six gestures (Table I) ensure the generality and usability of the proposed method. In cases of multiple fingertip touch gestures, the Euclidean distances between two fingers at the start and end of traces are also adopted. An increasing distance between two finger tips indicates that a spread gesture is performed, while a decreasing distance indicates that a pinch gesture is performed.
166
=
(2)
211
B. Statistical Touch Dynamics Images STDI learns the intra-class variation modes and synthesizes the new instances using the learned base feature and affine combinations of these learned modes. In the testing phase, the STDI expresses the probe trace in the new basis as
220
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257
258 259 260 261 262
4
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
Fig. 2. Examples of the GTGF extraction. The first row includes GTGF extracted from five touch traces of a single subject while the second row includes GTGF extracted from five traces from another subject. The movement and pressure dynamics can be explicitly represented in GTGF. We can observe major variations on the image pattern between two rows but minor variations within a row.
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278
synthesized instances. The distance between the synthesized probe instance and the original probe instance is computed for verification. This distance tends to be greater when the probe is from different users than those from the same user as the one on which the STDI trained. Since the STDI is learned with the specific knowledge of the user, its representation power decreases for others. Thus, less similarity exists between the synthesized and original instances extracted from others’ traces. 1) STDI Training: The training of an STDI is presented in Alg. 1. It takes as input a set of touch traces per gesture type from the user and outputs the trained STDI. This training process runs repeatedly for all types of touch gestures (i.e., LR, RL, DU, UD, ZI, ZO) to finish the training process. Thus, six STDIs are trained for the six gestures, respectively, for each user.
Algorithm 1 Training Algorithm (Per Gesture Type) Input: The training set of touch gesture (type t) traces Stu from the user u. Output: The user-specific Statistical Touch Dynamics Images Dut .
For each trace Si ∈ Stu : 1) Extract the GTGF Ti (Sec. III-A), 2) Reshape the 2D matrix Ti into a vector ti , 2: Learn the base vector v¯ and basis feature vectors vi by applying Statistical Analysis on {ti } (Eq. 5). 1:
vectors as in statistic feature models [24]: v = v¯0 +
280 281 282 283 284 285 286 287 288
bi vi = v¯0 + V b,
289
(5)
Since the gesture traces have been registered and scaled in feature extraction, the GTGF can be directly used for training without further normalization or alignment. For every T ∈ R H ×W in the training set, it is first reshaped into t ∈ R H W . The 2D shape information lost due to reshaping can be easily restored by a reverse reshaping process before score computation. As in Eq. 5, a vectorized feature v can be expressed as a base v¯ plus a linear combination of n basis feature vectors vi . The approach to compute the STDI is to apply Principal Component Analysis (PCA) to the training
290
i=1
where the base v¯0 is the mean of training vectors. The vectors vi are n eigenvectors corresponding to the n largest eigenvalues. The number n is determined based on the sum of the n largest eigenvalues which is greater than a portion (κ) of the sum of all the eigenvalues. The coefficients bi are the feature parameters different from eigenvalues from PCA. They control the appearance of learned STDI, as depicted in Fig. 3. By varying their values, the appearance of STDI can be changed. We can assume that the vectors vi are orthonormal after PCA and the parameters bi are assumed to follow the Gaussian distributions N (0, σi2 ). When bi deviates too much from its mean, singular STDI instances may be generated. We further express the term ni=1 bi vi as V b. 2) STDI Testing: The testing algorithm is presented in Alg. 2. It takes a probe trace S p and a trained model D as input, where the model D and the trace S p share the same gesture type. The output is a score d representing the distance between the trace S p and the user u represented by the trained model Dut .
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309
Algorithm 2 Testing Algorithm Input: A probe trace S tp and the trained model Dut . Output: The score d. 1: 2:
279
n
IE E Pr E oo f
263
3: 4: 5:
Extract the GTGF T pt from the probe trace S tp , Reshape T pt into v tp , Synthesize the instance v˜ tp using the parameters bˆ e (Eqs. 6 - 8), Reshape the vector v˜ tp back to T˜ pt Compute the score d between T pt and its responding instance T˜ pt (Eq. 9).
The trained STDI has the capability to synthesize different instances by varying the parameters b. Given an input v tp for testing, its responding instances v˜ tp can be synthesized from Eq. 5 using the estimated parameters blx . The estimation is
310 311 312 313
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
5
Fig. 3. A demonstration of the STDI in Eq. 5 and its instance synthesis. (a) The appearance of the STDI with the coefficient b1 set to 3σ1 (top) and -3σ1 (bottom) with other coefficients fixed to 0; (b) the appearance of the STDI with the coefficient b2 set to 3σ2 (top) and −3σ2 (bottom) with other coefficients fixed to 0; (c) the appearance of the STDI with the coefficient b3 set to 3σ3 (top) and −3σ3 (bottom) with other coefficients fixed to 0; (d) the probe feature (top) and the synthesized instance (bottom), where the test instance and the STDI are from the same user; (e) the probe feature (top) and the synthesized instance (bottom), where they originate from different users. It can be observed that the synthesized instances tend to be more similar to their original features when the probe instances originate from the same user as the STDIs.
performed using Eq. 6: be = V T (v tp − v¯0 ).
315
316 317 318 319 320
321
322
323 324 325 326
327
328 329 330
331
332 333 334 335 336 337 338 339 340 341 342 343
(6)
be
Instead of directly applying to Eq. 5, we limit the range of these parameters by applying the function f to increase the separability between classes. The function f limits the distribution of the synthesized instances close to the training class in the feature space. bˆ e = f (be ), for bi ∈ be ,
f (bi ) =
while keeping the method computationally efficient, subjectspecific STDIs are learned from the GTGF feature space. The expense of the score computation is then proportional only to the number of subjects rather than the number of features in the gallery. 1) Single User: The user verification algorithm is designed for a single user. In this case, the six touch gestures from the genuine owner are collected. A set of user-specific STDIs are trained for these gestures respectively. Note that the touch traces are automatically classified using the pre-filtering described at the beginning of Sec. III. During the online stage, the type of the probe trace is retrieved first and the corresponding STDI is selected. Then, a distance is computed between the probe trace and the selected STDI, and is further compared with the threshold for verification. The user verification algorithm takes a set of six trained user-specific STDIs {D}, the probe trace S p and the threshold ξ as inputs, and outputs a boolean decision B. First it extracts the GTGF T p from S p as in Sec. III-A, and selects the STDI D t from D that shares the same gesture type as the probe S p . Then it computes the score d given D t and T p , as in Alg. 2. Finally, it computes and outputs the B. If d < ξ , the user is genuine; otherwise, the user is an impostor. 2) Multiple Users: The open set user recognition algorithm is designed for the case of multiple users, where user identity management (UIM) is needed. Mobiles with UIM can reject the unauthorized users and change the personalized settings according to the authorized user identity. Touch traces from all authorized users are collected and a set of six STDIs are trained for each user. During the online stage, the type of the probe trace is retrieved first and compared with all STDIs in the gallery from the same type. Distances are computed given the probe trace and the STDIs. Note that the distance computation only needs to be repeated for all subjects, instead of all traces in the gallery. The minimum distance is then selected and compared with the threshold for authentication. Only if it is lower than a threshold is its corresponding identity returned as the result.
IE E Pr E oo f
314
bi ρσi sgn(bi )
if abs(bi ) ≤ ρσi , (7) otherwise,
where σi is the expectation of the aforementioned Gaussian distributions of bi , ρ is a constant, and sgn is the signum function extracting the sign of a real number. Then the responding instance v˜ tp can be synthesized as follows: v˜ tp = v¯0 + V bˆ e .
(8) T pt
The L 1 distance is computed between the input instance and its corresponding one T˜ pt given the STDI Dut as the score metrics. L 1 (T˜ pt , T pt ) = |T pt − T˜ pt |.
(9)
C. User Authentication Algorithms
Directly computing the score metrics between a pair of feature instances from the gallery and probe sets is commonly adopted by an authentication system [8], [25], [26]. However, the phone suffers from computation overhead when the gallery size is large to cover different touch habits from multiple users. The probe trace must compare with each trace in the gallery stored in the phone, which leads to an obvious delay during user interaction when many traces per subject are stored in the gallery. However, reducing the number of traces per subject in the gallery inevitably decreases the generality of the user’s identity feature. To capture the intra-person variations
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381
6
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
Algorithm 3 The User Verification Algorithm Input: Six user-specific STDIs D, the probe trace S p , the threshold ξ . Output: A boolean decision B.
TABLE II S TATISTICAL I NFORMATION A BOUT THE D ATASETS
Extract the GTGF T p from S p as in Sec. III-A, Select the STDI D t from D that shares the same gesture type as the probe S p , 3: Compute the score d given D t and T p , as in Alg. 2. 4: Compute and output the B: 1) if d < ξ , B = tr ue; otherwise, B = f alse 1:
AQ:1
2:
Algorithm 4 The User Identification Algorithm Input: Sets of STDI Du for the user group, the probe trace S p , the threshold ξ . Output: The user identity I. Extract the GTGF T p from S p as in Sec. III-A, 2: For each subject u: 1) Select the STDI Dut from Du which shares the same gesture type as the probe S p , 2) Compute the score du given Dut and T p , as in Alg. 2, 3: find the smallest dm among all du s, 4: if dm ξm , then I = Null, else I is the identity corresponding to the dm .
382 383 384 385 386 387 388 389 390 391
392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409
IE E Pr E oo f
1:
The user recognition algorithm takes sets of STDIs {Du } for the user group, the probe trace S p and the threshold set {ξ } as inputs, and outputs the user identity I. First it extracts the GTGF T p from S p as in Sec. III-A. Then for each subject u in the gallery, it selects the STDI Dut from {Du } that shares the same gesture type as the probe S p . Then it computes the score du given Dut and T p , as in Alg. 2. After finishing the loop, it finds the smallest dm among all du s. If dm ξ , the user is not in the user group in the gallery; otherwise, it assigns the user identity as the one corresponding to dm . IV. E XPERIMENTAL R ESULTS AND D ISCUSSION A. Data Acquisition
20 traces for each gesture (Tab. I). All 78 subjects participated in all the following sessions from the second to the sixth. The explanation and practice procedures were repeated for the newly recruited subjects in the second session. Sessions occurred with at least three days between two consecutive sessions. Subjects also provided at least 20 traces per gesture during each session. Subjects could vary the manner they held the phone (e.g., left-hand holding vs right-hand holding, whole palm holding vs half palm holding), the hand pose (palm down vs palm up) or the fingers used to perform the single touch gesture (thumb vs index finger) or the finger orientation between sessions. We named the dataset collected in the first session with a smaller number of subjects UH-TOUCH v1 and the dataset collected in the following sessions with all subjects UH-TOUCH v2. To test the performance and usability of STDI in a real world scenario, we implemented another app named ST D I –U A on the Android system to verify users. The app runs as a background service, collecting and verifying the touch traces in the context of Android Launcher. The UHTOUCH v3 dataset was collected by this app for evaluation. Twenty subjects whose data were acquired in the UHTOUCH v2 dataset were recruited. Each subject used the device (a Samsung Galaxy S3 phone with Android version 4.1.1) for three days as their daily phone and their touch traces in the context of Android Launcher were recorded. There were 1,327 touch traces collected on average from each user. Table II depicts the statistical information for these datasets. The Recognition Rate (RR) and Equal Error Rate (EER) were used as the evaluation criteria for the recognition and verification experiments. RR is defined as the number of times that users have been correctly identified over the size of the probe set. EER is the common value when the false acceptance rate (FAR) and the false rejection rate (FRR) are equal.
To systematically evaluate performance of GTGF and STDI, we developed a data collection app which captures touch gestures using a standard API of the Android system. When fingers contact the touchscreen, it records the trace by recording raw touch samples from the API. We recruited 78 subjects, of which 64 were right-handed, and 75 had touchscreen device experience. The data acquisition contains six sessions collected over several months. The first session served as a pilot session where only 30 subjects were recruited. We explained the purpose of the study and the use of our data acquisition program. Then, the subjects practiced for 5 to 10 min. The subjects were instructed to perform the six types of most used touch gestures repetitively during the practice time so that they could get used to the data collection app and their gestures could converge to their daily touch habit. After a user could use the program in a natural way, the subject provided
B. Experimental Results
1) Evaluation on STDI Configuration: Figure 4 depicts the results of grid search on the 3D GTGF parameter space, including the relative movement upper bound Ux , U y in the x, y directions and the upper bound for the tactile pressure L p during extracting GTGFs. They varied in the range from 0.3 to 0.7, from 20 to 60 and from 30 to 70 on the L p , Ux , U y axes, respectively. These ranges were selected to cover most of the meaningful values in the
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445
446 447 448 449 450 451 452 453 454
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
7
455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490
IE E Pr E oo f
Fig. 4. Results of grid search on the parameter space (L p , U x , U y ) on the evaluation dataset UH-TOUCH v1. Each subfigure is a slice on the U x axis in the 3D parameter space, and the x, y axes in all subfigures represent L p and U y axes in the parameter space. The colors represent the average EERs for the six gestures under a set of parameters. Note that the same color in different subfigures may represent different EER values according to the accompanying colormaps.
parameter space. For the experiment setup, we randomly selected half of the traces per subject per gesture from the UH-TOUCH v1 as the gallery (target) set and used the other half per subject per gesture from UH-TOUCH v1 as the probe (query) set. The experiments were conducted six times for all gestures and EER reported were computed by averaging the EERs from all the gestures. The L 1 distance was computed directly between GTGFs as the score metrics. The variations on EER are from 11.28% to 12.14% and the best EER has been achieved for the values 0.35, 20, 30 in the parameter space. This demonstrated that the proposed GTGF feature is not sensitive to the parameter variations in the aforementioned tested range. The parameters at which the best EER was achieved were used in the experiments described below. We evaluated the performance of the proposed method under different values of the parameter κ in STDI. The κ determines the number of the eigenvectors to preserve in STDIs. When the κ is larger, more eigenvectors are preserved so that new instances can be synthesized in a higher dimensional vector space. This allows more variations on the instance synthesis, but meanwhile increases the computation complexity. The experiments were conducted in UH-TOUCH v2 and the setup was the same as the previous experiments. Figure 5 depicts the results for the experiments in terms of EER in Fig. 5(a) and RR in Fig. 5(b). In verification, the average EERs of the six gestures are 9.6%, 9.7% and 12.3% for κ = 0.95, κ = 0.90 and κ = 0.80. In recognition, the average RRs of the six gestures were 82.6%, 82.3% and 78.1% for κ = 0.95, κ = 0.90 and κ = 0.80. The performances are similar for most of the gestures when κ = 0.95 and κ = 0.90, but a clear decrease in performance can be observed when κ = 0.80. Thus, we opt to use 0.90 as the value of κ, as a tradeoff between performance and computational complexity. We also evaluated the performance of the proposed method under different values of the parameter ρ in STDI. The ρ sets
Fig. 5. A comparison of the different parameter values κ in terms of User Verification and Recognition on UH-TOUCH v2; (a) EER under three κ values of the six touch gestures; (b) RR under three κ values of the six touch gestures. The parameter κ determines the number of the bases in STDIs.
the boundary for coefficients b, which limits the variability of synthesizing the new instances. The greater this boundary, the broader the synthesized features can spread in the vector space. The smaller this boundary, the tighter the synthesized features of the class are limited in the space. The experiments are conducted in UH-TOUCH v2 and the setup is the same as the previous experiments. Figure 6 depicts the results for the experiments in terms of EER in Fig. 6(a) and RR in Fig. 6(b). In verification, the average EERs of the six gestures were 10.9%, 9.7%, 10.8% and 13.4% for ρ = 0.5, ρ = 1.0, ρ = 3.0 and ρ = ∞. In recognition, the average RRs of the six gestures were 80.7%, 82.3%, 80.9% and 78.2% for ρ = 0.5, ρ = 1.0, ρ = 3.0 and ρ = ∞. The best performance for most of the gestures has been achieved in both the recognition and verification experiments when ρ = 1.0. Thus, we opted to use 1.0 as the value of ρ. A possible explanation is that when the synthesized features spread widely in the vector space, the inter-class distance may be reduced and thus false positive cases increase. When the distribution of the synthesized features is limited to a small region, the intra-
491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
IE E Pr E oo f
8
Fig. 6. A comparison of the different parameter values ρ in terms of User Verification and Recognition on UH-TOUCH v2; (a) EER under four ρ values of the six touch gestures; (b) RR under four ρ values of the six touch gestures. The parameter ρ limits the variability during synthesizing the trace instances.
Fig. 7. Comparisons of different authentication schemes in terms of User Verification and Recognition on UH-TOUCH v2. (a) EER of the three different authentication schemes. (b) RR of the three different authentication schemes.
class variations are limited and thus the true negative cases increase, where a balance is achieved at the value ρ = 1.0. 2) Demonstration on STDI Advantages: To demonstrate the advantage of converting touch traces into GTGF and further constructing STDI, we conducted a comparison between three methods. The first method is based on the Multivariate Gaussian Distribution Model (MGD) learned directly from the resampled traces where the number of samples on all traces are normalized to a constant (i.e., 50) as Eq. 1 in Sec. III-A. An MGD model is built for each sample, which contains variables of movement and pressure dynamics x, y, and p. Thus, a set of 50 MGD models were trained for complete traces from a gesture type of a user. These models output a set of probabilities to represent the similarities between the testing trace and the trained model. These probabilities are summed to a single score for authentication. The second GTGF-based method is to extract the GTGF feature and compute the L 1 distance between traces from the training set and the testing set [8]. The third method
is the proposed STDI-based method. For the experiment setup, we randomly selected half of the traces in each session in the UH-TOUCH v2 and combined to the gallery or training set. The remaining portion from UH-TOUCH v2 was used as the probe or testing set. Figure 7 depicts the comparisons among these three methods in terms of EER in Fig. 7(a) and RR in Fig. 7(b). In the verification experiments, the average EERs for MGD, GTGF and STDI on the six gestures were 36.8%, 12.8% and 9.7%, respectively. The improvement of the GTGF over the MGD is obvious (12.8% vs 36.8%), which indicates the necessity of adopting the GTGF feature. The decrease of EER from 12.8% to 9.7% demonstrates that the traces are better represented and more distinguishable by STDI. In the recognition experiments, the average RRs for MGD, GTGF and STDI on the six gestures were 55.2%, 79.0% and 82.3%, respectively. The increase of RR demonstrates the same evidence as in the verification experiments. By adopting GTGF and STDI, touch traces can be better represented and classified in user identification and verification problems.
530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
TABLE III A C OMPARISON OF EER AND RR FOR D IFFERENT F USION S CHEMES AND M ETHODS
550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593
3) Evaluation of Score Fusion and Baseline Comparisons: To evaluate the performance of the proposed method using different combinations of touch gestures (combining single touchtip gestures, combining multiple touchtip gestures and combining all six gestures) and compare the method with the methods available in the literature, we tested the method on UH-TOUCH v2. The same gallery and probe set were used as in the previous experiment. After score computation, we obtained one score matrix for each gesture. Then, we fused the score matrix of multiple gestures using the sum rule, which is proved by Kitller et al. [27] to be superior in comparison to other rules (i.e., product, min, max, median rule). Table III depicts a comparison of the EER and RR for different fusion schemes and methods: (i) GTGF-S: fusion of the score matrices of four single touchtip gestures computed from the aforementioned GTGF method [8]; (ii) GTGF-M: fusion of the score matrices of the two multiple touchtip gestures computed from the GTGF method [8]; (iii) STDI-S: fusion of the score matrices of four single touchtip gestures computed from the proposed method; (iv) STDI-M: fusion of the score matrices of the two multiple touchtip gestures computed from the proposed method; (v) GTGF-A: fusion of the score matrices of the six gestures computed from the aforementioned GTGF method [8]; (vi) STDI-A: fusion of the score matrices of the six gestures computed from the proposed method; (vii) TA-A: fusion of the score matrices of the six gestures computed from the method in [7]; (viii) DM-A: fusion of the score matrices of the six gestures computed from the method in [6]. As in most score-level fusion schemes, combining more scores from different channels increases the overall performance. This trend can be observed from the decrease in EER and increase in RR from the STDI-M (including only two multi-touch gestures) to the STDI-S (including four singletouch gestures) and further to the STDI-A schemes (including all six gestures). Furthermore, the performance of the GTGF-based method demonstrated a clear improvement over the other methods in the scheme of combining all six gestures. There are two reasons for this improvement. First, compared with the method in [6], we adopted tactile pressure in our method, which contains extra clues on the subject’s muscle behavior. However, their method did not include this information. Second, compared to the work in [7] which extracted 30 static analytic feature values, the proposed method includes movement dynamics and pressure dynamics of the touch
TABLE IV AVERAGE C OMPUTATIONAL T IME (ms) AND EER (%) OF GTGF AND
STDI T ESTED W ITH D IFFERENT G ALLERY S IZE
IE E Pr E oo f
549
9
Fig. 8. A comparison of the FRR and FAR results for the continuous authentication among the GTGF, the STDI methods and the Touchalytics method [7]. The Y axis is the percentage of the FAR and FRR; the X axis is the number of consecutive traces (γ ) used in continuous authentication. The solid lines represent the FRR results and the dash lines represent the FAR results.
gesture during feature extraction. Last, the best performance in terms of recognition and verification has been achieved by the STDI-A scheme. Its improvement over GTGF-A is derived from statistical analysis on the intra-class variations in GTGF representation. 4) Evaluation of Continuous Mobile Authentication and Baseline Comparisons: To test the performance and usability of the STDI based user authentication method in a real world scenario, we used the implemented App ST D I −U A to verify users. Openset user recognition is excluded from this test since the Android system (version 4.1.x) does not currently support multiple user account management. A gallery of traces (Ng per gesture) for each subject is formed by randomly selecting traces (Ng /5 per gesture) from each session in UHTOUCH v2 to cover variations in touch manner. The traces in UHTOUCH v3 are used as the probe set in the test. During testing, the app ran on a Samsung Galaxy S3 phone with Android version 4.1.1. It has Quad-core 1.4 GHz Cortex-A9 CPU and 1GB RAM. The computational complexity for GTGF is O(Ns ∗ Ng ), and the computational complexity for STDI is O(Ns ), where Ns is the number of subjects in the gallery. Note that one STDI is built for each subject per gesture from a number of Ng traces in the gallery. The first two rows in Table IV depict the average computational time per probe trace with different gallery size at the online testing stage. As Ng increased from 30 traces per gesture to 100 traces, the computational time for GTGF also increased significantly
594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620
10
622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645
646
647 648 649 650 651 652 653 654 655 656 657 658
659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678
from 32.22 ms to 81.36 ms. However, the computational time for STDI remains at around 12.90 ms. The third and fourth rows in Table IV depict the average EER in user authentication using five continuous traces. These EERs were achieved by GTGF and STDI methods with different gallery size respectively. As Ng increased from 30 traces per gesture to 100 traces, the EERs achieved by GTGF and STDI decreased. When using more than 80 traces, the EER improvement for both methods has been reduced. Figure 8 depicts a comparison of the continuous authentication results among GTGF, TA [7] and STDI in terms of the average FRR (solid line) and average FAR (dash line). The distance from the probe trace and the distances from the following γ −1 consecutive probe traces are averaged to obtain the score for authentication. If it is above the threshold, the app classifies the user as an unauthorized intruder and invokes explicit user authentication. FARs for all methods reduced in a small but steady rate as the γ increased. Meanwhile, the FRRs of all methods decreased significantly as the number of traces increased. For the GTGF and STDI methods, the decreasing rate reduced after the γ > 5 and almost converged when γ = 7. When γ > 5, the delay caused by the increased computational burden became noticeable for GTGF. However, no obvious delay has been observed for the STDI method, even when γ = 7.
[6] N. Sae-Bae, N. Memon, and K. Isbister, “Investigating multi-touch gestures as a novel biometric modality,” in Proc. IEEE 5th Int. Conf. Biometrics, Theory, Appl. Syst. (BTAS), Washington, DC, USA, Sep. 2012, pp. 156–161. [7] M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song, “Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 1, pp. 136–148, Jan. 2013. [8] X. Zhao, T. Feng, and W. Shi, “Continuous mobile authentication using a novel graphic touch gesture feature,” in Proc. IEEE Int. Conf. Biometrics, Theory, Appl. Syst., Washington, DC, USA, Sep./Oct. 2013, pp. 1–6. [9] A. Bragdon, E. Nelson, Y. Li, and K. Hinckley, “Experimental analysis of touch-screen gesture designs in mobile environments,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., Vancouver, BC, Canada, 2011, pp. 403–412. [10] S. Li and A. C. Kot, “Fingerprint combination for privacy protection,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 2, pp. 350–360, Feb. 2013. [11] X. Zhao, S. K. Shah, and I. A. Kakadiaris, “Illumination normalization using self-lighting ratios for 3d2d face recognition,” in Proc. Eur. Conf. Comput. Vis., Workshops Demonstrations, 2012, pp. 220–229. [12] A. Messerman, T. Mustafic, S. A. Camtepe, and S. Albayrak, “A generic framework and runtime environment for development and evaluation of behavioral biometrics solutions,” in Proc. 10th Int. Conf. Intell. Syst. Design Appl., Cairo, Egypt, Nov./Dec. 2010, pp. 136–141. [13] D. Gafurov and E. Snekkkenes, “Arm swing as a weak biometric for unobtrusive user authentication,” in Proc. Int. Conf. Intell. Inf. Hiding Multimedia Signal Process., Harbin, China, Aug. 2008, pp. 1080–1087. [14] M. Faundez-Zanuy, “On-line signature recognition based on VQ-DTW,” Pattern Recognit., vol. 40, no. 3, pp. 981–992, Mar. 2007. [15] A. Kholmatov and B. Yanikoglu, “Biometric authentication using online signatures,” in Proc. Int. Symp. Comput. Inf. Sci., Antalya, Turkey, 2004, pp. 373–380. [16] Y. Yamazaki, Y. Mizutani, and N. Komatsu, “Extraction of personal features from stroke shape, writing pressure and pen inclination in ordinary characters,” in Proc. 5th Int. Conf. Document Anal. Recognit., Bangalore, India, Sep. 1999, pp. 426–429. [17] X. Cao and S. Zhai, “Modeling human performance of pen stroke gestures,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., San Jose, CA, USA, 2007, pp. 1495–1504. [18] M. A. Ferrer, A. Morales, and J. F. Vargas, “Off-line signature verification using local patterns,” in Proc. 2nd Nat. Conf. Telecommun., Arequipa, Peru, May 2011, pp. 1–6. [19] D. Impedovo and G. Pirlo, “Automatic signature verification: The state of the art,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 38, no. 5, pp. 609–635, Sep. 2008. [20] W. Shi, J. Yang, Y. Jiang, F. Yang, and Y. Xiong, “Senguard: Passive user identification on smartphones using multiple sensors,” in Proc. 7th Int. Conf. Wireless Mobile Comput., Netw. Commun., Cagliari, Italy, Oct. 2011, pp. 141–148. [21] A. D. Luca, A. Hang, F. Brudy, C. Lindner, and H. Hussmann, “Touch me once and i know it’s you!: Implicit authentication based on touch screen patterns,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., Austin, TX, USA, 2012, pp. 987–996. [22] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 39–58, Jan. 2009. [23] T. Hasan and J. H. L. Hansen, “A study on universal background model training in speaker verification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 1890–1899, Sep. 2011. [24] X. Zhao, E. Dellandréa, J. Zou, and L. Chen, “A unified probabilistic framework for automatic 3D facial expression analysis based on a Bayesian belief inference and statistical feature models,” Image Vis. Comput., vol. 31, no. 3, pp. 231–245, 2013. [25] G. Toderici et al., “Bidirectional relighting for 3D-aided 2D face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., San Francisco, CA, USA, Jun. 2010, pp. 2721–2728. [26] R. V. Yampolskiy and V. Govindaraju, “Behavioural biometrics: A survey and classification,” Int. J. Biometrics, vol. 1, no. 1, pp. 81–113, Jun. 2008. [27] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE Trans. Pattern Anal. Mach. Learn., vol. 20, no. 3, pp. 226–239, Mar. 1998.
IE E Pr E oo f
621
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
V. C ONCLUSION
This paper presented a novel mobile authentication approach based on touch gestures. An STDI, which synthesizes trace feature instances using intra-person variations, has been proposed. Using a multi-session touch gesture dataset containing six commonly used gestures, we have systematically evaluated the performance of the proposed method. Comparisons between the STDI and other methods in the literature have also been conducted. An STDI-based User Verification App has been implemented and tested on a dataset collected during the daily use of the phone. The results have demonstrated the effectiveness and usability of the STDI in the real-world mobile authentication. R EFERENCES
[1] A. J. Aviv, K. Gibson, E. Mossop, M. Blaze, and J. M. Smith, “Smudge attacks on smartphone touch screens,” in Proc. 4th USENIX Conf. Offensive Technol., Washington, DC, USA, 2010, pp. 1–7. [2] C. Schneider, N. Esau, L. Kleinjohann, and B. Kleinjohann, “Feature based face localization and recognition on mobile devices,” in Proc. 9th Int. Conf. Control, Autom., Robot. Vis., Dec. 2006, pp. 1–6. [3] M. Baloul, E. Cherrier, and C. Rosenberger, “Challenge-based speaker recognition for mobile authentication,” in Proc. Int. Conf. Biometrics Special Interest Group, Darmstadt, Germany, Sep. 2012, pp. 1–7. [4] T. Feng, X. Zhao, and W. Shi, “Investigating mobile device picking-up motion as a novel biometric modality,” in Proc. IEEE Int. Conf. Biometrics, Theory, Appl. Syst., Sep./Oct. 2013, pp. 1–6. [5] T. Feng et al., “Continuous mobile authentication using touchscreen gestures,” in Proc. IEEE Conf. Technol. Homeland Secur., Boston, MA, USA, Nov. 2012, pp. 451–456.
679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752
ZHAO et al.: MOBILE USER AUTHENTICATION USING STDIs
755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774
775 776 777 778 779 780 781 782 783 784 785 786
Xi Zhao received the Ph.D. (Hons.) degree in computer science from the Ecole Centrale de Lyon, Ecully, France, in 2010. After graduation, he conducted research in the fields of biometrics, face analysis, and pattern recognition, as a Post-Doctoral Fellow with the Department of Computer Science, University of Houston, Houston, TX, USA. He is currently a Research Assistant Professor with the Department of Computer Science, University of Houston. His current research interests include biometrics, mobile computing, computer vision, and assistive technology. Dr. Zhao currently serves as a reviewer of the IEEE T RANSACTIONS ON I MAGE P ROCESSING , the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS - PART B, the IEEE T RANSACTIONS ON I NFORMATION F ORENSICS AND S ECURITY, Image Vision Computing, the IEEE International Conference on Automatic Face and Gesture Recognition, the IEEE International Conference on Advanced Video and Signal-Based Surveillance, and the International Conference on Pattern Recognition. He has also served on the Program Committee of Workshop on 3-D Face Biometrics, in 2013, and the Registration Chair of the International Conference on Biometrics: Theory, Applications and Systems, in 2013.
Weidong Shi received the Ph.D. degree in computer science from the Georgia Institute of Technology, Atlanta, GA, USA, where he was involved in research on computer architecture and computer systems. He was a Senior Research Staff Engineer with the Motorola Research Laboratory, Nokia Research Center, and the Co-Founder of a technology startup. He is currently an Assistant Professor with the University of Houston, Houston, TX, USA. He was involved in designing multiple Nvidia platform products, and was credited to publish Electronic Art’s console game. He has authored or coauthored publications covering research problems in computer architecture, computer systems, multimedia/graphics systems, mobile computing, and computer security. He holds multiple issued and pending U.S. and Trademark Office patents.
Ioannis A. Kakadiaris is currently a Hugh Roy and Lillie Cranz Cullen University Professor of Computer Science, Electrical and Computer Engineering, and Biomedical Engineering with the University of Houston (UH), Houston, TX, USA. He joined UH, in 1997, after a post-doctoral fellowship with the University of Pennsylvania, Philadelphia, PA, USA. He received the B.Sc. degree in physics from the University of Athens, Athens, Greece, the M.Sc. degree in computer science from Northeastern University, Boston, MA, USA, and the Ph.D. degree from the University of Pennsylvania. He is the Founder of the Computational Biomedicine Laboratory at UH. His research interests include multimodal biometrics, face recognition, computer vision, pattern recognition, biomedical image analysis, and predictive analytics. He was a recipient of a number of awards, including the NSF Early CAREER Development Award, the Schlumberger Technical Foundation Award, the UH Computer Science Research Excellence Award, the UH Enron Teaching Excellence Award, and the James Muller Vulnerable Plaque Young Investigator Prize. His research has been featured on the Discovery Channel, National Public Radio, KPRC NBC News, KTRH ABC News, and KHOU CBS News. His selected professional service leadership positions include: the General Cochair of the 2013 Biometrics: Theory, Applications and Systems Conference, General Cochair of the 2014 SPIE Biometric and Surveillance Technology for Human and Activity Identification, Program Cochair of the 2015 International Conference on Automatic Face and Gesture Recognition Conference, and Vice President for Technical Activities of the IEEE Biometrics Council.
IE E Pr E oo f
753 754
11
Tao Feng received the B.S. degree in software engineering from the Huazhong University of Science and Technology, Wuhan, China, and the M.Sc. degree in computer science from the Chinese Academy of Sciences, Institute of Automation, Beijing, China, with a focus on knowledge system and knowledge visualization. He is currently pursuing the Ph.D. degree with the Department of Computer Science, University of Houston, Houston, TX, USA. His current research interests include mobile computing, artificial intelligence, human–computer interaction, mobile security, and machine learning.
787 788 789 790 791 792 793 794 795 796 797 798 799 800 801
802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828
AUTHOR QUERY
IE E Pr E oo f
AQ:1 = Alogrithms 3 and 4 are not cited in body text. Please indicate where they should be cited.