Enhanced User Authentication through Typing Biometrics with Artificial Neural. Networks and K-Nearest Neighbor Algorithm. Fadhli Wong Mohd Hasan Wong, ...
Enhanced User Authentication through Typing Biometrics with Artificial Neural Networks and K-Nearest Neighbor Algorithm Fadhli Wong Mohd Hasan Wong, Ainil Sufreena Mohd Supian and h a d Faris Ismail Faculty of Engineering, International Islamic University Malaysia, Jalan Gombak 53100 Kuala Lumpur, Malaysia. Lai Weng Kin and Ong Cheng Soon MIMOS Berhad Malaysia, Technology Park Malaysia, 5 7000 Kuala Lumpur, Malaysia. Abstract The emergence of global network access has promoted increased chances of malicious attack and intrusion. Password authentication has been known as the most commonly safeguard measure against these intrusions. Common it is, but the security measures that it provides have always been questionable. Thus, it gives rise to the need for a more secured and reliable authentication method in accessing computer systems. It is the aim of this paper to propose the design and development of a real time enhanced password security system through &ping biometrics. Typing biometrics deals with the analysis of unique habitual typing rhythms of individuals. This paper depicts the use of time latency between keystrokes to create qping patterns for individuals. Time latencies are extracted and class$ed accordingb, that will then be used to recognize authentic users and reject imposter. The performance of both Arti9cial Neural Networks and KNearest Neighbors as possible classiJiersfor this purpose, were studied.
1) Introduction Technological achievements over the past decade have resulted in improved network services, particularly in the areas of performance, reliability and availability. The increasing use of automated information systems and computers has simplified our lives significantly, while making us overwhelmingly dependent on computers and digital networks. However, the overwhelming interests in global accessibility brought about by the advances in technology have unveiled new threats to computer system security. Advanced safeguards against fraud and impersonation, as well as more foolproof methods against unauthorized access to computer resources and data are now being sought. An accurate automatic personal identification is critical to a wide range of application
0-7803-7147-X/01/$10.0002001 IEEE
--.-.
91 1
domains such as access control, electronic commerce and welfare benefits disbursement. Passwords have remained as the de facto security standard for physical access, despite the fact that they have been shown to be a fairly weak mechanism for authenticating users [I]. Secure methods for authenticating users have been a topic of research since the introduction of multi-user computing systems, but the principles behind such methods have been with society much longer. Computer security usually involves several components, which includes physical security, identifications, authentication and verification [2]. Accurate and automatic identification and authentication of users is a fundamental problem in network environments. Shared secrets such as personal identification numbers (PINS) or passwords and key devices like smart cards are just not good enough in some cases. What is needed is something that could verify that the user is physically the person he/she claimed to be - biometrics. As such, we present one such security using biometrics technology based on the users' unique habitual typing rhythms. The motivation stems from observations that similar neuro-physiological factors that make written signature unique are also exhibited in a user's typing pattern [3]. Furthermore, both the National Science Foundation, Washington DC, and the National Institute of Standards and Technology, Gaithersburg have conducted studies establishing that typing patterns are unique to each individual [4]. This paper presents our result for authentication system based on typing biometrics.
2) Biometric Systems The term 'biometrics' refers strictly speaking to a science involving the statistical analysis of biological characteristics [ 5 ] . Here biometrics is used in a context of analyzing human characteristics for security purposes. In general, biometric authentication procedures use the features of an individual that are unique to that individual to identify him or her, e.g. fingerprint or iris. Similarly, typing biometrics uses the individual unique typing pattern
to separate an authentic user from an imposter. The action of typing the password can be analyzed with respect to its physiological characteristics. The latency time between keystrokes, keystroke pressure, key displacement and key displacement duration are some of the quantifiable components [6,7]. Association of identity to an individual is called person identification. The problem of resolving the identity of a person can be categorized into two fundamentally distinct types of problems with different inherent complexities: (i) verification and (ii) recognition. Verification (authentication) refers to the problem of confirming or denying a person’s claimed identity (Am I who I claim I am?). Recognition (identification) refers to the problem of establishing a subject’s identity (Who am I?). A reliable personal identification is critical in many daily transactions. A significant difference between a biometrics-based person identification and other conventional methods of identification is that the conventional methods do not involve any complex pattern recognition and hence they almost always perform accurately as intended by their system designers. On the other hand, a typical biometricsbased system may not be so perfectly accurate and may commit- any one of the two types of errors. A false acceptance (false positive or false match) refers to identifying an impostor to be a genuine user. A false reject (false negative or false non-match) refers to rejecting a genuine user as an impostor. It is desirable to maintain both low false match and low false non-match rates to achieve a high overall accuracy of the system. 2.1) Concepts of Typing Biometrics
The typing biometrics system that we investigated is based on the current password or PIN system with an extra dimension of keystroke dynamics. Not only must an intruder know the correct password using this technology, but he or she must also be able to replicate the rate of typing and time intervals between each key pressed to gain access to the information. It is most likely that, even if an unauthorized person is able to guess the correct password, they will not be able to type it with the proper rhythm unless they have had the ability to hear and memorize the correct user keystrokes. Typing biometrics base on time latency between keystrokes is the most economical biometrics that can be implemented without compromising its security features. Proposed as early as 1990, this biometrics usually does not require any additional hardware [2]. Furthermore, recognition based on typing rhythm is not intrusive, making it quite applicable to computer access security, as users will be typing at the keyboard anyway [3]. Since users are already accustomed to typing in their username/password pair or an accountPIN pair to authenticate themselves to computer systems, adding
912
keystroke pattern authentication comes at no significant additional cost to the end user. Previous research in the area of biometrics has shown keystroke dynamics as a real possibility to authenticate users [2,3,5,6].
3) Timing System Capturing real-time keystrokes of users accurately is vital to the typing biometrics system. The basic foundation for this typing biometrics system to work is to have an accurate and reliable data sources of typing patterns in time. The core of having an accurate time records would be having an accurate time measuring device, either software or hardware. We look into the possibility of using the ClockKounter Time Chip (CTC) or the Real-time Clock (KTC) to build the timing system. Design issues that include interrupts handling processes, specific register and addresses and the multitasking in the Windows environment complicate the design. Alternatively, we look at using the processor as a source of the timing system and it led us to the Intel Time Stamp Counter function support by the Intei Pentium processor. Thus, we had developed a timing system (software) using the Intel Time Stamp Counter that captures time differences between each pair of key pressed on a normal Windows 95/98 keyboard [SI.
4) Experimental Setup There were 4 parts to the experiment: 1. Interface for Capturing Time 2. Authorized User Database Construction 3. Authorized User Authentication 4. Imposter Rejection An interface for capturing was developed using Visual C++. The interface sits on top of the timing system mentioned in the earlier section. 10 users had been selected to build their own database of typing patterns based on their own password selectivity. Data was collected on only 2 dedicated PC under a highly supervised and controlled environment. Users are given ample time to get used to the typing rhythm of their selected password before the database was built. Data was continuously collected for a period of 1 month. With the built in database, the system was presented to public users for unauthorized access. There were 100 unauthorized users’ signatures collected.
4.1) Data Extraction Data collected were extracted according to features that were needed for the experiment. A simple box plot algorithm was used as a graphical display that simultaneously describes several important features of a data set, such as center, spread, departure from symmetry, and identification of observations “outliers” that lie
unusually far from the bulk of the data [9]. The features extracted were then normalized using standard normal bell curve algorithm [9]. Data that lie outside the curve were converted to the selected limit or cut off points at the boundary of the curve. This simple preprocessing is done to ensure that the data collected is being filtered from unwanted data.
6) K-Nearest Neighbor The nearest neighbor rule is another algorithm that can be used to classify objects accordingly [l 13. These rules rely on the training set of objects with known class membership to make decisions on the membership of unknown objects. It classifies an unknown object to the class of its nearest neighbor in the measurement space using, most commonly, Euclidean metrics (Figure 3).
5) Artificial Neural Networks (Multiple Layer Perceptron) The MLP is a subset of Artificial Neural Networks (ANN), which is modeled on the biological brain to imitate its processing power [IO]. This is schematically represented in Figure 1.
I
\
1
., Figure 3: The nearest neighbor rule
I
X”
I
Euclidean distance is given by
kfcid
D=
&(Xo
-XJ2
+(XI
-x2)2 +(x,
-X$...(x”-,
-x”)2 (1)
I
Figure 1: Model of Neuron in an ANN
where n is the number of dimension. In terms of efficiency, the training is trivial. All patterns are stored and hence may require a lot of storage. Classification may also be time consuming since all stored patterns must be compared. Nearest neighbor classification is prone to errors due to rogue patterns Ell]. A rogue pattern is a mislabeled pattern. Error here is defined as the difference between the calculated or observed value and the true value. Error is often referred to the absolute error. Relative error is the absolute error divided by the true value of the quantity.
During training, the weights between the input-hidden layers and the hidden output layers were modified using the Hebbian Learning Rule. The network was trained until the weights that corresponds to the real output were below a 0.01 threshold. Once trained, new patterns were fed into the system and compared to decide if the new patterns were similar to any of the patterns that had been used to train the network. The input data to the MLP system are between 0 and 1. The schematic representation of the conversion of the patterns from the timings to the required format is shown in Figure 2. Statistical analysis showed that 99% of the patterns time was between 0.05 and 0.4 sec. Therefore this range was used for the conversion and the data that fell below 0.05 sec and above 0.4 sec were converted to Os and 1s respectively.
Absolute error (e): e =X(*)
x
Relative error (e,)
x-x
e, =-
X
where x denotes the test data and X denotes the training data
(3 )
4 To eliminate the problem caused by rogue patterns we may use not just the nearest neighbor but also a group of them [12]. Using N number of neighbors, we take a majority vote of their classes to give a classification. As N gets larger this method approaches the optimal decision. This method classifies an unknown object to the class most heavily represented among its N nearest neighbors.
b -/ 0
0.05
0.4
seconds
Figure 2: A schematic representation converting individual timings into range (0 - 1)
--. .‘
- --
913
7) Results 7.1) K" 8
-.
'-
The results of the experiment can be summarized using the following graphs. Figure 4 depicts the typing pattern of User 1 logging-in into his own account for 10 attempts. The x-axis of the graph shows the interval between keystrokes. From the graph, it can be observed that similar and almost identical typing patterns exist. Now when we explore Figure 5, it can be observed that when 10 intruders attempt to access User 1 account, the typing patterns generated were far from being similar. As a result, by choosing a certain threshold, in this case of 0.5, we will reject all 10 imposter as shown in, Figure 6 .
.....
..---........ ................. .
i
A
6 -
i
0 v)
Y
5 -
.E
4 --
8
s
U
3
-
3-
1 -
._.
........
,,
'. p
A,,
.
,'
.
A
\
A
A/ A
I
A
\ ' A /
2 -
I
/-
'\
4
0-
User 1 Login Against User 1
U)
......................
...
.
1
r.
,
-
.
'I Acceptance and False A c c q tance Rate at Threshold Level of 0 5 ................
0
1
2
3
5
4
-,..-.-..--... . . . . . . . . . . . . . . .
6
Interval Between Keys
Figure 4: The Typing Patterns of User 1 Intruder Login Against User I
-1
e
0.8
$
0.6
8
04
,User 1
,,' ,, , ' ,
Figure 7: The result of legal user acceptance and false acceptance rate using K".
I
~ a l ~~ ce c t p t ~ e
bcqta=
C U
Rate Using A"
.E 0 2
n
mno%
0 0
1
2
3
4
5
6
8000%
Intervals Between Keys
60nm
Figure 5: The typing patterns of 10 intruders
4000%
nnrl l l o e r 1
2000%
000%
Figure 7 shows the acceptance rate of 10 authentic users logging-in into their account and also the false acceptance rate of 100 intruders logging-in into the respective legal user account. On average, the acceptance rate of user was found to be 84.63% and the rejection rate was 98.97% by setting the threshold level to 0.5.
1
2
3
4
particjpaots
5
8
7
8
19Pcceptaxe
8
10
R&
~FgktAcC~~i?a?E
Figure 8: The result of legal user acceptance and false acceptance rate using ANN.
914
7.2)
ANN - MLP
For the ANN analysis, we found that the false acceptance rate is higher as shown at Figure 8. By using the same data for KNN analysis, the ANN system was trained with the authorized user typing patterns. The number of samples used to build both the database is the same. We will highlight in the next section on the performance of both the classifiers.
8) Summary and Conclusion Research to date has suggested that typing biometrics, which seeks to analyze the rhythm or behavioral pattern of a user at a keyboard, could be used as means of personal identification [6]. In this paper we have reported results using KNN and ANN as the classifiers. The average authorized acceptance rate and false acceptance rate are shown in Figure 9, with 84.63% and 1.03% for KNN and 99% and 29% for ANN. &mlp8I b c w n m
8nd F8lS & X a a W e
Rtdsfor KNH and AMI(
physical injury that may lead to deviation at the beginning itself. The selection of the password itself was left to be determined by the user. It was the aim of the experiment to simulate the real environment as closely as possible. With this liberty, some passwords were found to have fewer characters than the rest. We suspect that various password lengths will have a direct affect on the performance of both KNN and ANN, which we will study in our future work. In general, both methods, namely the K-Nearest Neighbor and Multiple Layer Perceptron for Artificial Neural Networks exhibit great potential to be implemented into the system. A more through study has to be made on both methods to further improve the performance of both.
9) Acknowledgements The authors wish to thank all the users who had participated in our experiment. We wish to thank Center for Engineering Excellence, International Islamic University Malaysia for sponsoring one of our student authors to the 3 5 ~Asilomar Conference on Signals, Systems and Computers, California, USA. We also wish to thank the organizing committee of the conference for granting our student author a partial travel fund.
10) References 1. Malik, W, “Does Your Pasword Policy Reduce Enterprise 2.
Security?”, Research Note, Decision Framework, Gartner Group, 1 March 2000. Joyce, R and Gupta, G, “Identity Authentication Based on Keystroke Latencies”, Communication ACM, 33, February 1990.
Monrose, F, Reiter, KR and Wetzel, S , “Password Hardening Based on Keystrokes Dynamic”, $h ACM ConJ on Computer and Comm Security, November 1999. 4. Abdullah, N.N,“User Authenticationvia Neural Networks”, Faculty Computer Science & IT, UTM, Skudai, Johor. 5 . Jain, A and Pankanti, S , “Biometrics Systems: Anatomy of Performance”, IEICE Trans. Fundamentals, EOO-A, January
3.
Figure 9: The average results of both K” and ANN From the results it can be observed that the threshold setting for KNN will greatly varies the acceptance rate and false acceptance rate. At the level of 0.5, security was set to very high for the system was able to reject imposter at 98.97% of the time but only allows the authentic user to access the system for 84.63% of the time. Increasing the threshold value will increase legal acceptance rate but at the same time, it will also allow more imposter to break tough into the system as shown in Figure 6. Furthermore, KNN is very sensitive to unwanted noise and other abnormalities that present in the data set. A better prefiltering method can be used to overcome this problem. However ANN, which is highly adaptive and able to tolerate noise more, was able to attain a legal user acceptance rate of 99%. However, using ANN, a 29% of false acceptance rate will need to be hrther reduced. By having larger training set for ANN, we can achieve higher accuracy. Typing patterns are highly behavioral and are subjected to the mental conditions e.g. tiredness, fatigue, stress and
-
. -. %. - .
915
2001. 6. William, G. De Ru and Jan, H.P, “Enhanced Password Authentication through Fuzzy Logic”, IEEE Expert Intelligent Systems, NovDec 1997. 7. E.R. Tee and Selvanathan, N., “Pin Signature Verification
Using Wavelet Transform”, Malaysia Journal Of Computer Science, 9-2,pp 71 - 78,December 1996. 8. Wong, Fadhli M.H, C.S. Ong and W.K. Lai, ”A High Resolution and Accurate Pentium Based Timer”, Proceedings I“ National Real Time Technology and Application Symposium, Malaysia, pp 29-34,October 2000. 9. Montgomery, D.C. and Runger, G.C., “Applies Statistics and Probability for Engineers”, 2”dEdition, John Wiley & Sons Inc., 1999. 10. S . HayKin, “Neural Networks : A Comprehensive Foundation”,Prentice Hall, 1994. 11. Louis E. Stein & Frank L. Worley Jr, “Engineering Practice”, 1986. 12. http://ips9.main.eng.hokudai.ac.jp/research/pattern/