FUZZY KEYSTROKE BIOMETRICS ON WEB SECURITY Marino Tapiador and Juan A. Sigüenza Grupo de Neurocomputación Biológica Escuela Técnica Superior de Informática Universidad Autónoma de Madrid
[email protected],
[email protected] 1. Introduction
Biometrics is an important method to be used, by computers, to measure user characteristics to achieve a secure identification or verification [1, 2]. Indeed we can use static or dynamic biometrics depending on what we want to measure: the user’s body or the user’s behavior. We have developed a system based on behavioral biometrics to get authentication in a web application: BioWeb. The system uses the keystroke dynamics [3,4] to learn which is the user’s behavior when he types a sequence of characters in the keyboard. Previous works in the area of biometrics have shown the keystroke dynamics as a real possibility to authenticate a user [3, 4] with a computer. In this paper we try to reach these results to computer networks, particularly to the Internet, with all the problems related to this special situation where we will find lots of different kinds of computers, platforms and machine frequencies to measure the user's typing rhythm. 2. Methodology
A. Technical implementation. BioWeb has two different parts: the client side and the server side. The client part consists of DHTML pages and the browser. The browser was a common navigator (Netscape). The system simulates a website with a UserId/Password to control the access to it; indeed BioWeb has several features as the capability of register new users, simulate a login, and the training of users with their UserId/Password in order to create their behavioral templates. The control is performed by a Java Applet that receives the user’s keystrokes and registers the inter-key time intervals between two consecutive keystrokes in the keyboard in order to get the user’s typing pattern. In other words: their own keystroke dynamics. On the other hand, in the server side, we have developed the part of the application that plays the role of pattern matching system. This part in the application is implemented with a standard CGI program developed in C++. The algorithm coded to provide the pattern matching process uses a simple classification system based on fuzzy sets to associate keystrokes (time intervals) with user classes by an small set of fuzzy rules tuned in after several experiments, for an extended explanation see de Ru and Eloff (1997) [3]. A fuzzy template is created for a user and it is used to decide whether a user typing into the system is the original user, comparing the sampled keystrokes with the fuzzy template.
B. Experimental procedure Nine volunteers (included the two authors) with an age range: 20-52 years were used in the experiment. Although with different levels of expertise, all of them were used to computer typing. For a period of two months a Web server was specifically mounted to collect data from the volunteers, that allow the volunteers to introduce the required information in one or in several sessions. All the volunteers connected with the web server through a LAN, using as browser different versions of Netscape Communicator, eight of them Windows version (minimum 4.0) and a single one a OS/2 version (2.02). Pentium, Pentium MMX and Pentium II, at different clock frequencies, were used. The volunteers were asked to introduce six pairs of UserId plus Password at least fifteen times each pair. The list of the pairs UserId / Password is shown in Table I. Three of the pairs were very common words for the Table I volunteers, some of them were in *UserId* *Password* Spanish and some in English. autonoma internet The other three pairs were despacho telefono arbitrary chosen with no hardware software semantic meaning either in car&mil5 jim3eza$ Spanish or in English. qzr$tmp9 ojm&xdw2 The time required to complete alcn&ei3 ñzurb$1y all the pairs in one single session range from 45 to 60 minutes. 3. Results
The data collected from volunteers were analyzed in several ways. In this paper only partial results are presented due to the lack of available space. First results are shown in Figures 1 and 2. In both cases the graphic lines represented the seven normalized values corresponding to the measured intervals between two consecutive pressed keys. The cases shown corresponded with the UserIds alcn&ei3 (Fig. 1) and autonoma (Fig. 2) for one of the volunteers (M.A.M.). These two UserId were chosen because they represented, “a priori”, the 0,8 easiest and the more difficult 0,6 for typing. Both figures 0,4 represent learning curves and the data shows the classical 0,2 temporal evolution of the 0 learning task procedure for each 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 interval through the first to the Volunteer: M.A.M. th th 16 / 17 trial. UserID: alcn&ei3 Fig.1 All the curves show two different parts. The first one comprises from the 1 to the 8 trial, and is adjusted by a decreasing potential function (darker continuous lines). The second one comprises from the 9 to the 17 trial, and is adjusted by a straight line parallel to the 1,2 1
X-axis. The only remarkable difference between the two Figs., is the range of the time intervals (Y-axis). Giving flatter curves in the case of the easiest UserId: autonoma, (range 0.5-1.0) than in the complex one: 1,2 1 alcn&ei3, (range 0.2-1.0). From these first results we can 0,8 conclude the following: 0,6 1. Only a short number of trials, no 0,4 more than ten in any case, are necessary before we can obtain a 0,2 0 stabilization of the learning 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 procedure for any kind of Volunter: M.A.M. UserId. UserID: autonoma 2. In spite of the “a priori” Fig.2 differences in difficulty, between the two UserId, the stabilization of the learning procedure takes place within the same range (first 10 trials). 3. A largest time interval for the complex UserId: alcn&ei3, is the main difference between both UserId. This reflects a bigger irregularity in the user’s behavior as could be observed in the oscillations of the learning curves. In a second step, the collected samples from each volunteer were divided in two portions. The first part consisted on the first ten trials (training samples) and was used to get an average by training user template. The second part consisted of the last five trials (operation samples), and were individually used to test the user templates previously obtained. By this procedure we got for each UserId/Password pair, nine different user templates one for each volunteer). Each user template was tested against the data obtained from all volunteers (eight of them acting as impostors in each case). Figure 3 shows the general testing for the pair autonoma/internet. In this graph were plotted the average distances between each user template (X-axis) and the individual operation samples of each user (Yaxis). A darker surface represents shorter distance to the template, while a lighter surface represents longer data distance. This representation gives an idea of the False Match Rate and S9 the False Non-Match Rate behavior S8 of our system. S7 Fig. 3 shows that the best distances S6 are grouped along the main diagonal S5 of the graph (genuine distances). This S4 result means that each user is S3 correctly recognized by his/her S2 template, while the rest of the S1 1 2 3 4 5 6 7 8 9 volunteers used in each case as UserID: autonoma impostors will be rejected as correct Password: internet user (impostors distances). Fig. 3 The same result was obtained with the pair software/hardware. However when we analyze the results corresponding the more difficult pairs, those were quite different.
S9 S8 S7 S6 S5 S4 S3 S2 S1
1 2 3 4 5 6 7 8 9 UserID: alcn&ei3 Password: ñzurb$1y Fig. 4
Fig. 4 shows that in spite of some grouping in the main diagonal, also appear that some impostors were recognized as a correct user, with a failure of the system. From these second results we can conclude that with the easiest pairs UserId/Password we can get better control of access due to their narrower dispersion observed in the corresponding bidimensional graphs.
4. Conclusions Our work states some new additions to the field of the keystroke dynamics: 1. Is possible to use it through Internet. 2. Standard technologies can be used (CGI, Applets). 3. Java multi-thread techniques can be used in a not “intrusive” way to measure typing patterns in an opened network. 4. Only few trials are necessary for the learning period. 5. Simple UserId and Password works better than complex one. 6. This technique also allows a low cost in terms of hardware and software. 7. Our system also prevents from “brute-force” in case of trial and error attacks.
5. References [1] A. Jain, R. Bolle and Sharath Pankanti. “Introduction to Biometrics”. In “Biometrics. Personal Identification in Networked Society”. A.Jain, R.Bolle, S.Pankanti (Eds.). pp.1-41. Kluwer Academic Publishers. [2] J.D. Woodward. “Biometrics: Privacy’s foe or privacy friend?. Proceedings of the IEEE (Special Issue on Automated Biometrics), 85: 1480.1402, 1997. [3] W.G. de Ru and J. Eloff. “Enhanced Password Authentication through Fuzzy Logic”. IEEE Expert, Vol.12, No.6, Nov/Dec 1997, pp.38-45. [4] M.S. Obaidat and B. Sadoun. “Keystroke Dynamics Based Authentication”. In “Biometrics. Personal Identification in Networked Society”. A.Jain, R.Bolle, S.Pankanti (Eds.). pp.213-229. Kluwer Academic Publishers.
6. Acknowledgements This work has been partly supported by CICYT TEL97-0306.