Proceedings of 2nd National Conference on Challenges & Opportunities in Information Technology (COIT-2008). RIMT-IET, Mandi Gobindgarh. March 29, 2008.
Proceedings of 2nd National Conference on Challenges & Opportunities in Information Technology (COIT-2008) RIMT-IET, Mandi Gobindgarh. March 29, 2008
Masquerade Detection using Typing Pattern Anand Gupta, Anupriya Asthana, Nidhi Gupta Department of Information Technology, Netaji Subhas Institute of Technology Abstract: -We present an algorithm used to detect a masquerader at pre-event scenario based on user patterns. Three fundamental parameters of the typing pattern are defined: typing speed, accuracy and inter-character delays. These behavioral traits are used to distinguish an authentic user from a masquerader. Authentic user’s samples are pre-stored in the database at the server’s side. When the user enters his login information, his typing pattern details are sent to the server. The server compares these details to the pre-stored entries of the database. If a match occurs, the server validates the login and sends the validation in form of cookies to the client. Further access is provided only if the validation cookie exists. The two-way communication between the client-side and server-side takes place by using encryption techniques. The algorithm has been applied to real systems and an accuracy of up to 98% has been achieved. Key Words: System Security, Pattern Recognition, Masquerade Detection, and Typing Pattern
I. INTRODUCTION A masquerader is a person who can access an individual’s account and use it for malicious purposes. A masquerader may not always be an outsider who might gain access by the public network but can also an unscrupulous employee want to get greater access. After logging in, the masquerader generally exploits any or all of the information and privileges granted to the victim. Masquerade detection is of paramount importance where the information, if accessed by an unauthorized user, can lead to drastic consequences as in defense related scenarios or in business situations where integrity and confidentiality of the information needs to be maintained. Detecting these masquerades, therefore, becomes essential for the security of personal information and confidential data, and to protect restricted privileges from falling into the wrong hands. With the increased exposure of humans to computers, the average level of computing know-how has increased. The hackers have become more professional and proficient in putting their finger over the technical loopholes to capture authorized passcodes. As the sophistication of masqueraders has increased, the measures to be taken for system security also need to be intensified. Due to the fact that the masquerader has the passcode and that we need to detect such an access at pre-event scenario, maintenance of security is a challenging task. This calls for parameters in addition to the passcode to authenticate login. An option is the incorporation of behavior pattern of the user as a dependable parameter for providing a higher level of security. In this paper, we propose the incorporation of the keyboard-typing pattern of the user as a parameter for filtering
out masqueraders. Most of the authentication processes today take place online or on some local/private network. Any new algorithm that is developed must be implemented keeping in mind both the server and the client. HTTP cookies, sometimes known as web cookies or just cookies, are parcels of text sent by a server to a web browser and then sent back unchanged by the browser each time it accesses that server. HTTP cookies are used for authenticating, tracking, and maintaining specific information about users. Cookies are a tool used to give a customized experience to the user. Cookies are only valid for a limited duration of time, known as the expiry time. In case, the validated cookie does not exist, the user needs to log in again. Now, in case a masquerader attack has occurred, the validated cookie shall not be generated and hence no access granted. Keeping in mind, the interaction between the server and the client, our aim is to detect the masqueraders in the least amount of overheads. We propose an algorithm, which aims at reducing the computation time and the percentage of masquerader detection with respect to previous techniques. A. Related Work In 2006, Garg et al. [1] collected user behavior data and constructed vectors based on user information such as mouse speed, distance, angles and amount of clicks during a user session. They modeled the technique of user identification and masquerade detection as a binary classification problem and used Support Vector Machine (SVM) to learn and classify these feature vectors. They covered only the characteristics based on mouse movements of users, but user characteristics are dependent on keyboard behavior as well. Thus masquerader detection is not complete without studying keyboard patterns. In 2006, Bhukya et al. [2] presented a formulation to compute the effectiveness of masquerade detection and also present an approach to masquerade detection using Hidden Markov Models (HMM). Their experimentation was on the well-known Schonalu dataset (SEA). In 2007, Seo and Cha [3] performed an empirical study investigating the effectiveness of SVM and sequence-based kernel methods. Sequence-based kernel methods showed slightly better performance than generic RBF kernel with same frequency of false alarms. In addition, the composition of two kernel methods showed that frequency of false alarms could be further reduced. The previous work required constant monitoring of the user to generate a profile. This is because for previous
270
Proceedings of 2nd National Conference on Challenges & Opportunities in Information Technology (COIT-2008) RIMT-IET, Mandi Gobindgarh. March 29, 2008
algorithms needed to refer to the entire profile of the user to detect a masquerader attempt. Maintenance and constant monitoring of a user profile uses up a lot of resources, both, CPU-wise and database storage-wise. They do not detect masquerader at pre-event scenario or at other at key access points.
password and Enter key will not be counted. x
Threshold number: Minimum number of previously stored samples required for the server to start the validation process.
x
Ratio of Correctly Timed Characters (RCTC): It is the ratio of the number to characters having ICDs (Inter Character Delays; explained below) within range to the total number of characters in the passcode.
x
Criterion Factor (CF): Minimum value that RCTC must take in order for the authentication process to complete.
x
The following are the parameters that we shall be recording and working upon for pattern detection:
x
Typing speed: Refers to the number of characters (not the number of words) typed per unit of time. From here on, typing speed shall be referred to as ‘speed’. It becomes an important parameter for preliminary filtering of masqueraders.
x
Accuracy: Refers to the number of correct characters typed per unit of time. This is calculated by subtracting the number of backspaces pressed by the user from the total number of characters typed, (giving the number of correct characters typed) and then dividing it by the total time taken. The accuracy level of each user is also characteristic to him/her. If he/she makes mistakes, he/she is bound to make them at the same places and the same number of times in general.
x
Inter-character delay (ICD): It is the time lapsed between the release of one key and the pressing of the next key by the user. From here on, it shall be referred to as ICD. While typing, every person takes a certain (almost fixed) amount of time to type two characters. This would depend on the person’s proficiency on the keyboard and on the distance between the two keys on the keyboard. E.g. if a person is used to typing a certain string, suppose “abc” , then he/she would take lesser time to type it than a novice. This concept can be directly applied to authentic user and masquerader respectively, thus the masquerader (in most cases) would not be able to match the ICDs of the user while entering the passcode.
B. Our Work In this paper, we concentrate on the user’s keyboard characteristics, which involve keeping a track of the typing pattern of a few keystrokes; those pertaining to the user’s passcode. This algorithm scores over the previous designs since the filtering process takes place in pre-event scenario, i.e. before the masquerader gets access into the system. The process may start and end within a matter of a fraction of a second (may increase/decrease marginally depending upon processing speed of system and network connection) in contrast to the prior algorithms. The algorithm filters the masqueraders at the first access level itself. The parameters stored in this method are taken only at a certain time, and do not need constant monitoring. There is lesser resource consumption for both computation as well as storage. These factors greatly increase the efficiency of the algorithm, as it requires less computation time as compared to the previous work done in this field. The main advantage of this algorithm is the high level of accuracy with which masquerade attacks are detected and filtered out. As this algorithm has been devised for logging in on any kind of network where server and client interaction is required, it can be applied to many website logins. In this algorithm, we assume that the user types in the passcode every time he/she requires access. This means that he/she does not store his/her passcode on the browser. Also, the user may not copy-paste his/her passcode into the passcode field. We also assume that any other script (like a key logger) is not running in the background and capturing the typing pattern of the user. Before going forward, it is necessary to lucidly understand and define certain terms that shall find frequence in the text that follows: x False positive: Occurs when the system classifies an action as anomalous (a possible intrusion) when it is a legitimate action. x
False negative: Occurs when an actual intrusive action has occurred but the system allows it to pass as nonintrusive behavior.
x
Hit Rate: Ratio between the numbers of successfully detected masqueraders to the total number of masquerader attacks.
x
Total time: It is the amount of time taken from the pressing of the first key to the release of the last key of the passcode. The time between the last character of
271
The paper is divided into various sections. Section 2 shows how the system works and how it operates on the parameters. Section 3 consists of further analysis with graphs depicting the results obtained and what can be done in order to improve the results. Section 4 gives the working of the system. Section 5 concludes the paper. Section 6 gives future scope.
Proceedings of 2nd National Conference on Challenges & Opportunities in Information Technology (COIT-2008) RIMT-IET, Mandi Gobindgarh. March 29, 2008
II. SYSTEM DESIGN
Figure 1. Framework for masquerade detection by the algorithm in pre-event scenario Accuracy = A. Web Page The web page is a portal for the user to login to the desired website. The user logs in his account by requesting a web page from the server. Once the username and passcode is entered, the passcode and the typing parameters get stored in the temporary storage and sent to the server for matching. Passcode Comparator. Passcode comparator lies at the server side and compares the entered password and stored one. Sample Analyzer. After the passcode comparator finds a correct match, it collects the times for the three parameters as follows: Speed
=
Total number of characters typed (including backspaces)
Number of correct characters typed (Total - backspaces) Total time
ICD = (System Time when second key is pressed) – (System Time when first key is released) Temporary Storage. Once the Sample Analyzer has calculated the values, they are stored in the temporary storage under – Speed, Accuracy and ICD of every two characters at the time of login. Values remain in temporary storage till the time the authentication process completes. Temporary storage is written on only by Sample Analyzer and accessed by the Authenticator. The temporary storage establishes the communication by passing values from the client to the server. This communication takes place over a channel. To make this interaction secure, encryption techniques like SSL, RSA or SSH need to be employed.
Total time Previous Samples. The parameters of the previous successful
272
Proceedings of 2nd National Conference on Challenges & Opportunities in Information Technology (COIT-2008) RIMT-IET, Mandi Gobindgarh. March 29, 2008
logins are stored at the server side needed for comparison to allow access. The values are stored under the same heads as that of Temporary Storage. All the values in this table are accessed by the Authenticator for each attempt to log in. In case previous samples are less than the threshold number, then the system stores unconditionally with the assumption that the login attempts are by authentic users. The system learns by storing the parameters for the successful attempts only in the case when speed and accuracy lie in the range. Criterion Generator. Criterion Generator lies on the server side and generates a Criterion Factor. This criterion factor gives the minimum RCTC required for authorization. Criterion Factor is in percentage points, thus we need to multiply the RCTC by hundred before we compare it with the Criterion Factor. E.g. suppose our criterion is 80% and the total number of characters in the correct passcode is 10. Now suppose we have RCTC as 0.7 (implies 70%) i.e. 7 out of the 10 characters had time within the respective ICDs of the 10 characters. Since our Criterion factor is 80% this clearly falls short of the RCTC required. Had the RCTC been 0.8 or more, access would have been granted. Authenticator. The authenticator resides on the server side. Let us define the range of the three parameters used for authentication by the authenticator. The range of speed, accuracy & ICD (of every two characters) varies from the Minimum to the Maximum of that parameter taking all the previous samples in consideration. Therefore, Min-all