Int. J. Security and Networks, Vol. x, No. x, xxxx
1
Masquerade detection on GUI-based Windows systems Arshi Agrawal and Mark Stamp* Department of Computer Science, San Jose State University San Jose, CA 95192, USA Email:
[email protected] Email:
[email protected] *Corresponding author Abstract: A masquerader is an attacker who attempts to mimic the behaviour of a legitimate user so as to evade detection. Much previous research on masquerade detection has focused on analysis of command-line input in UNIX systems. However, these techniques may fail to detect attacks on modern graphical user interface (GUI)-based systems, where typical user activities include mouse movements, in addition to keystrokes. We have developed an event logging tool for Windows systems which has been used to collected a large, publicly available dataset suitable for testing masquerade detection strategies. Using this dataset, we employ hidden Markov model (HMM) analysis to compare the effectiveness of various detection strategies. Our results show that a linear combination of keyboard activity and mouse movements, yields stronger results than when relying on keyboard activity alone, or mouse movements alone. These preliminary results can serve as a baseline for future masquerade detection research. Keywords: masquerade detection; Windows; GUI; graphical user interface; HMM; hidden Markov models. Reference to this paper should be made as follows: Agrawal, A. and Stamp, M. (xxxx) ‘Masquerade detection on GUI-based Windows systems’, Int. J. Security and Networks, Vol. x, No. x, pp.xxx–xxx. Biographical notes: Arshi Agrawal is currently working as a Software Quality Engineer at EMC Corporation, where she is a key contributor on the security team of EMC’s Data Domain division. She is involved in several projects, including Secure Multi-Tenancy, which enables EMC to lead in cloud security by helping to deal with critical security vulnerabilities such as heartbleed. Her previous work experience was in the fields of web development, mobile application development and primary storage. She completed her Masters in Computer Science from San Jose State University, California, in 2013. She also holds an MBA in Marketing and Finance along with a Bachelors in Electronics and Communication Engineering, both from India. Mark Stamp can neither confirm nor deny that he spent more than seven years as a National Security Agency cryptanalyst. However, he can confirm that he spent two years at a small Silicon Valley startup, developing a novel security-related product. For the past several years, He has been Professor of Computer Science at San Jose State University, where he teaches courses in information security, publishes research article, writes textbooks, and supervises large numbers of Master’s student projects.
1 Introduction A masquerader is an attacker who impersonates a legitimate user in an effort to remain undetected. Masquerade detection is a difficult special case of the more general intrusion detection problem. In this paper, we consider masquerade detection in the context of graphical user interface (GUI)-based systems; specifically Windows systems. We have collected a large dataset that includes keyboard and mouse activity. A considerable body of previous work has focused on masquerade detection based on Unix command-line activity;
Copyright © 20xx Inderscience Enterprises Ltd.
the survey (Bertacchini and Fierens, 2009) cites nearly 40 such masquerade detection papers published prior to 2009. However, Unix command line based techniques may not be useful for modern GUI based systems, where typical user activities include mouse movements as well as keystrokes. Command line data alone cannot efficiently detect intrusion attacks on such systems (Bhukya et al., 2007). In this paper, we discuss an event logging tool that we have developed. This tool has been used to capture a substantial dataset of GUI-based user data on Windows systems. As evidence of the viability of this dataset, and to set a baseline for future research, we apply hidden Markov model
2
A. Agrawal and M. Stamp
(HMM) masquerade detection techniques to our collected data. This paper is organised as follows. Section 2 covers relevant background information. Section 3 considers issues related to data capture in a GUI environment and provides details about the data capturing tool that we have developed. In Section 4 we give results for HMM-based masquerade detectors when applied to our GUI-based dataset. Finally, in Section 5 we provide our conclusions.
2 Background In this section we briefly survey previous work in masquerade detection. Then we discuss HMMs, which form the basis of scoring method used in this paper. Finally, we discuss ROC curves, which we use to quantify the effectiveness of our approach.
2.1 Masquerade detection A considerable body of research in masquerade detection has been based on user-issued Unix commands (Bertacchini and Fierens, 2007, 2009; Bhukya et al., 2007; Erbracher, xxxx; Huang and Stamp, 2007; Garg et al., 2006; Kothari, 2013; Schonlau et al., 2001). The Schonlau dataset (Schonlau, 1998) is the standard for comparing results in this field. This dataset contains Unix command-line data collected from 50 users, consisting of 5000 training commands and 10,000 test command for each user. Both the training and test data are based on sequences of 100 consecutive commands. The 10,000 test commands per user contain both attack data and user data, with a key provided to distinguish between the two. The survey paper (Bertacchini and Fierens, 2009) discusses more than 40 research papers that use the Schonlau dataset to test various masquerade detection strategies. These detection techniques are categorised as follows. •
Information-theoretic: This approach is based on entropy analysis or compression of user-issued commands. To date, these results have not been impressive.
•
Text mining: This is a data mining approach where repetitive command sequences are extracted from the training data and used for scoring.
•
hidden Markov model (HMM): This is the technique used in this paper and we discuss it in detail below. As with many other fields, HMMs have proven effective for masquerade detection and are a standard against which other techniques are often compared.
•
Naïve Bayes: This technique is based on the frequency of commands, and does not take sequencing into consideration. Given its simplicity, naïve Bayes has proven reasonably effective and also serves as a standard against which proposed techniques can be compared.
•
Sequence and bioinformatics: This approach relies on extracting sequence-related information.
•
Support vector machine (SVM): This is a machine learning algorithm that maps data points to a high dimensional space, for the purpose of making it easier to distinguish attacks from normal data.
•
Other approaches: A variety of other approaches have been considered, including a hybrid Bayes, a one-step Markov approach, and a hybrid multi-step Markov technique (Bertacchini and Fierens, 2009). Also, several researchers have applied various combinations of techniques (Beauquier and Hu, 2007).
While the Schonlau dataset has certainly proven valuable, it does have some significant weaknesses. The dataset is very old, and it only includes command-line information, both of which limit its value with respect to modern GUI based systems, such as Microsoft Windows. On modern systems, command line activity is often virtually non-existent; mouse movements and keyboard activities would seem to be crucial for user profiling on modern systems. Next, we briefly consider a few examples of recent research in masquerade detection that is relevant to GUI-based systems. Algorithms that rely on a user’s typing patterns have been proposed. For example, in Peacock et al. (2004); Monrose and Rubin (1997); Shavlik et al. (2001), the authors consider characteristics such as typing speed, accuracy and intercharacter delays. Analysis based on mouse activity, typing speed, and background processes is considered in Garg et al. (2006). This work uses SVMs for classification. There have also been efforts to model user behaviour by monitoring system calls and analysing audit logs and program execution traces (Feng et al., 2003). Such information is available from call stack traces. A similar line of research was conducted in Imsand et al. (2009), using neural networks as the basis for classification, Additional related research can be found in Garg (2006). This previous work has contributed techniques and results that are relevant to the problem of masquerade detection on GUI-based systems. However, such research is hampered by the lack of a standard, publicly available dataset, analogous to the Schonlau dataset for Unix command-line based research. The lack of such a dataset makes it difficult to directly compare research results. Our goals for this paper are twofold. First, we provide details on a dataset that we have collected, which is available from the authors. We believe this data could serve as the analog of the Schonlau dataset for GUI-based systems. Second, we analyse a hidden Markov model masquerade detection strategy using our dataset. These HMM-based results can serve as a baseline for future research involving our dataset.
2.2 Hidden Markov models A hidden Markov model (HMM) is a machine learning technique based on a Markov chain. The underlying Markov chain is not directly observable, but it is probabilistically related to a sequence of observations. An HMM can be trained on a given sequence of observations. Once the parameters of the model have thus been
Masquerade detection on GUI-based Windows systems determined, the model can be used to score a sequence and thereby determine its similarity to the training sequence. The following notation is commonly used for HMMs (Stamp, 2012). T = length of the observation sequence N = number of states in the model M = number of observation symbols
X = (X0 , X1 , . . . , XT −1 ) = hidden state sequence A = state transition probability matrix B = observation probability matrix π = initial state distribution O = (O0 , O1 , . . . , OT −1 ) = observation sequence An HMM is completely determined by A, B, and π, which are all row-stochastic matrices, that is, each row forms a discrete probability distribution. We denoted an HMM by λ = (A, B, π). A generic HMM is illustrated in Figure 1; the part above the dashed line includes the ‘hidden’ Markov process. Note that the probability distributions in B relate the (hidden) states of the underlying Markov process to the observations. Figure 1 Generic hidden Markov model X0
B Observations:
B
2.3 ROC curves
•
True positive (T P ): A positive instance is classified as such.
•
False positive (F P ): A negative instance is classified as a positive.
•
True negative (T N ): A negative instance is classified as such.
•
False negative (F N ): A positive instance is classified as a negative.
These outcomes are illustrated in the form of a ‘confusion matrix’ in Figure 2 (Heagerty and Zheng). Figure 3 provides another illustration in terms of the distributions and a given threshold value. Figure 2 Confusion matrix
A ···
B
A XT −1
Predicted Class P
N
P
TP
FN
N
FP
TN
B
?
?
?
O0
O1
O2
Actual Class
Markov process:
A X2
effectiveness of our detection technique is the area under the ROC curve.
A receiver operating characteristic (ROC) curve is a graphical plot that represents the trade off between the true positive rate and the false positive rate as the threshold is varied through the range of scores. ROC curves are useful for analysing the performance of a binary classifier. For a binary classifier, there are four possible outcomes:
Q = distinct states of the Markov process V = 0, 1, . . . , M − 1 = set of possible observations
A X1
3
?
···
OT −1
Source: Stamp (2012)
The usefulness of HMMs derives from the fact that there are efficient algorithms to solve each of the following three problems (Stamp, 2012). Problem 1: Given the model λ = (A, B, π) and an observation sequence O, determine P (O | λ). That is, we can score a sequence to see how well it fits a given model. Problem 2: Given the model λ = (A, B, π) and observation sequence O, find an optimal hidden state sequence X. That is, we can ‘uncover’ the hidden part of the model. Problem 3: Given an observation sequence O and N , determine a model λ = (A, B, π) that maximises the probability of O. That is, we can train a model to fit a given observation sequence. Note that the only assumption we make here is N , the number of states in the underlying (hidden) Markov process. This is the sense in which an HMM is an unsupervised machine learning technique. In this research, we use GUI-based data to train HMMs for a user (Problem 3). Then we score user data and attack data (Problem 1) to determine the effectiveness of our models. As is common in masquerade detection research, the ‘attack’ data consists of other user’s data. Our measure of the
Given a threshold, the true positive rate (TPR) is the fraction of true positives to the total number of actual positives. In our application, the TPR is the rate at which an intruder is correctly identified as such. In general, the false positive rate (FPR) is the fraction of false positives to the total number of actual negatives. In our application the FPR gives the rate at which the actual user is misclassified as an intruder. In terms of the confusion matrix in Figure 2, TPR, FPR, and accuracy are calculated as TP TP + FN FP FPR = FP + TN TP + TN Accuracy = TP + FN + FP + TN TPR =
(1)
An ROC curve is obtained by plotting the FPR versus the TPR as the threshold varies through the range of the data. An example ROC curve is given in Figure 4.
4
A. Agrawal and M. Stamp
Figure 3 Classification results (see online version for colours)
This level of detail provides useful information and is easily collected, without being too invasive of user privacy. Table 2 shows a sample of a mouse log file generated by our event logger. Note that we observe a double-click in the last two entries of the table, since they have the same coordinates and same time stamps for the given application. Table 1 Command line vs. GUI User 1
Command line dir or ls [with or without parameters]
2
dir or ls [with or without parameters]
Figure 4 ROC curve (see online version for colours)
GUI-based system 1 Mouse coordinates (movement) 2 Left click (on start menu) 3 Left click (on computer folder) 4 Click to open the folder 1 Press Windows key from keyboard 2 Press arrow keys to reach computer folder 3 Press enter 4 Press arrow keys to reach the desired folder 5 Press Enter
3.2 Keyboard interaction
The area under the ROC curve (AUC) is a measure of the effectiveness of a given binary classifier (Ataman and Zhang, 2006). In the case of ideal separation, there is no overlap between the distributions in Figure 3. In such a case the AUC is 1.0. On the other hand, a binary classifier that is no better than flipping a coin will yield an AUC of 0.5.
3 Event logging for GUI based systems In this section, we discuss our event logging tool. But first, we note in passing that event logging for a GUI system is much more involved than in a command-line Unix system. For example, Table 1 gives a comparison of the relevant data for a command-line and a GUI-based system for a user trying to list the contents of a directory. This added complexity makes the logging and processing challenging, but it also greatly increases the information available for profiling users. Our event logging tool is designed to collect relevant masquerade detection data on Windows systems, while respecting user privacy. Specifically, the features collected consist of mouse interactions, keyboard interaction, and active applications. Next, we provide more details on each of these datasets.
3.1 Mouse interaction Mouse activity includes a variety of user interactions. We log mouse clicks, distinguishing between left and right clicks. At each mouse click we also record the mouse coordinates, a timestamp, and the application on which the event occurred.
Keyboard interaction can provide a variety of behavioural information. For example, information such as typing speed and the use of keyboard shortcuts are likely useful for distinguishing users. Table 3 shows a sample keyboard dataset collected by our event logger. Note that we collect the number of keys pressed and various special keys (e.g., shift, control) are distinguished. However, to protect user-confidential data such as passwords, ordinary keystrokes are indicated by ‘*’. As with the mouse dataset, a timestamp and the active application are also logged. While additional keyboard data would certainly be useful, we believe our approach offers a reasonable level of detail while respecting user privacy and minimising the data collection burden on the system.
3.3 Active applications As noted above, when logging mouse and keyboard activity, we log the relevant application. We also log active applications, independent of mouse or keyboard activities. An example of such a log appears in Table 4.
3.4 Event logging tool To capture the information discussed above, we created several low level system hooks. Specifically, we created a mouse hook and a keyboard hook which enable us to intercept mouse and keyboard event messages before they reach an application. The logging application was developed in C#, using the Microsoft .Net framework. After capturing sufficient user data, we exfiltrate the data in the form of a compressed zip file, which is sent to a common mail server. For additional details on the logging application, see Agrawal (2013). An analogous logging tool could be used for real-time monitoring of user behaviour.
Masquerade detection on GUI-based Windows systems
5
Table 2 Sample dataset of mouse activity Click Left Right Left Left
Coordinates 590,349 1268,8 1026,87 1026,87
Timestamp May 29 10:14:19 2012 May 29 10:14:24 2012 May 29 10:14:56 2012 May 29 10:14:56 2012
Application file:///C:/Users/Documents/Visual Studio 2010/Projects/bin/Debug/LogActivePrograms.EXE LogActivePrograms (Running) Microsoft Visual C# 2010 Express mouseLogs.txt – Notepad mouseLogs.txt – Notepad
Table 3 Sample dataset of keyboard activity Timestamp May 29 10:09:29 2012 May 29 10:09:34 2012 May 29 10:09:39 2012 May 29 10:09:44 2012 May 29 10:09:49 2012
Keys pressed *****[SHIFT][TAB]*** *[CTRL]********** [SHIFT]*****[ENTER][ENTER] *****[SHIFT][SHIFT]*** ***[CAPSLOCK]***[F5]***
Application Logs.txt – Notepad Google – Google Chrome Google Maps – Google Chrome TextPad – C:/Users/Documents/chap3.tex LogActivePrograms (Running) – Microsoft Visual C# 2010 Express
Table 4 Sample dataset of active applications Timestamp May 29 10:09:29 2012
Current application TextPad – C:/Users/Documents/chap.tex
May 29 10:09:49 2012
Google – Google Chrome LogActivePrograms (Running) – Microsoft Visual C# 2010 Express
May 29 10:10:09 2012
Google Maps – Google Chrome
4 Results For each user, we trained a HMM on a subset of the mouse data and another HMM on a subset of the keyboard data. The models were then tested on the remaining user data and on masquerade attacks. As in previous studies, we use other users’ data for the masquerade attacks. All HMM scores in this paper are given in the form of log likelihood per observation. Also, we provide results here for experiments using N = 2 and N = 3 hidden states; increasing the number of hidden states beyond N = 3 provided no significant increase in scores.
4.1 Mouse results As mentioned in Section 3.1, for the mouse data, our observations consist of the type of click (left or right), the coordinates of the click, the time at which it occurred, and the active application. For the purposes of our HMM, the observation sequence consists of the type of click and the mouse position, where the position has been discretised
Active applications Logs.txt – Notepad LogActivePrograms (Running) – Microsoft Visual C# 2010 Express LogActivePrograms.EXE TextPad – C:/Users/Documents/chap.tex mouseLogger – Visual C++ 2008 Express Edition Google – Google Chrome LogActivePrograms.EXE TextPad – C:/Users/Documents/chap.tex LogActivePrograms (Running) Microsoft Visual C# 2010 Express LogActivePrograms.EXE Google Maps – Google Chrome
to a 16 × 16 grid. The score results for user 1 are given in Figures 5 and 6, for N = 2 and N = 3 hidden states, respectively. The corresponding ROC curves are given in Figures 7 and 8, respectively. The AUC values for these curves appear in Table 5. As another example, we consider user 18. Figures 9 and 10 show scatter plots for HMMs with N = 2 and N = 3 hidden states, respectively. The corresponding the ROC curves appear in Figures 11 and 12), with the AUC results in Table 6. Table 5 User 1 mouse N 2 3
AU C 0.9300 0.9567
Table 6 User 18 mouse N 2 3
AU C 0.9000 0.9400
6
A. Agrawal and M. Stamp
Figure 5 User 1 mouse HMM results (N = 2) (see online version for colours)
Figure 6 User 1 mouse HMM results (N = 3) (see online version for colours)
Figure 7 ROC curve for user 1 mouse HMM (N = 2) (see online version for colours)
4.2 Keyboard results Recall that for the keyboard data, special keys are distinguished, but ordinary keystrokes are not. As with the mouse data, the time and active application are also recorded.
For the keyboard HMMs, we used the keystrokes as our observations. Since ordinary keystrokes are not distinguished, there are relatively few distinct observations, as compared to the mouse data.
Masquerade detection on GUI-based Windows systems
7
Figure 8 ROC curve for user 1 mouse HMM (N = 3) (see online version for colours)
Figure 9 User 18 mouse HMM results (N = 2) (see online version for colours)
Figure 10 User 18 mouse HMM results (N = 3) (see online version for colours)
The scatterplots for user 1, with N = 2 and N = 3 hidden states, are in Figures 13 and 14 The corresponding ROC curvesffappear in Figures 15 and 16. From the AUC values in Table 7, we see that these models classify attacks only slightly
better than flipping a coin. Similar results were obtained for other users; for the sake of brevity, we do not include additional results here.
8
A. Agrawal and M. Stamp
Figure 11 ROC curve for user 18 mouse HMM (N = 2) (see online version for colours)
Figure 12 ROC curve for user 18 mouse HMM (N = 3)
Figure 13 User 1 keyboard HMM results (N = 2) (see online version for colours)
4.3 Combined model Finally, we tested linear combinations of the mouse and keyboard scores. Specifically, we used
score = a · scorem + b · scorek
(2)
where scorem is the mouse HMM score (i.e., log likelihood per observation) and scorek is the keyboard HMM score,
Masquerade detection on GUI-based Windows systems
9
Figure 14 User 1 keyboard HMM results (N = 3) (see online version for colours)
Figure 15 ROC curve for user 1 keyboard HMM (N = 2) (see online version for colours)
Figure 16 ROC curve for user 1 keyboard HMM (N = 3) (see online version for colours)
with 0 ≤ a, b ≤ 1. From the AUC numbers in Table 8, we see that the best results were obtained for a = 0.8 and b = 0.2. These results indicate that there is some useful information contained in the keyboard data.
Table 7 User 1 keyboard N 2 3
AU C 0.5975 0.5550
10
A. Agrawal and M. Stamp
Table 8 Combined scores using equation (2) for user 11, N = 3 a 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
b 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
AU C 0.9385 0.9487 0.9538 0.9487 0.9487 0.9436 0.9410 0.9410 0.9410 0.9410 0.7487
5 Conclusion Masquerade detection on GUI-based systems is a significant and challenging problem. To date, research in this field has been hampered by an inability to directly compare proposed techniques. In contrast, command-line masquerade detection research has benefitted from the publicly available Schonlau dataset (Schonlau, 1998). In this paper, we have described a GUI-based dataset that we collected from actual users. We believe that this dataset is realistic, in the sense that we can reasonably expect to collect such data without impinging on user privacy expectations, and without adversely affecting system performance. We believe that this dataset can serve a useful purpose for GUI-based masquerade detection research, allowing various proposed techniques to be directly compared. Our dataset is available from the authors. We trained HMMs on user data and tested the resulting models by scoring them on user data and masquerade data. We showed that mouse activity provides good detection results, while keyboard activity alone does not, and a combined score involving both mouse and keyboard activity can yield an improvement over models based on mouse activity alone. These results serve to validate our data as well as providing a baseline for future research.
References Agrawal, A. (2013) User Profiling in GUI based Windows systems for Intrusion Detection, Master’s Project 303, http://scholarworks.sjsu.edu/etd_projects/303/ Ataman, K. and Zhang, Y. (2006) ‘Learning to rank by maximizing AUC with linear programming’, International Joint Conference on Neural Networks, Vancouver, BC, Canada, pp.123–129. Beauquier, J. and Hu, Y.J. (2007) ‘Intrusion detection based on distance combination’, World Academy of Science, Engineering and Technology, Vol. 31, pp.172–180. Bertacchini, M. and Fierens, P.L. (2007) ‘Preliminary results on masquerader detection using compression based similarity metrics’, Electronic Journal of SADIO, Vol. 7, No. 1, pp.31–42.
Bertacchini, M. and Fierens, P.L. (2009) A Survey on Masquerader Detection Approaches, CIBSI 2009, http://www. criptored.upm.es/cibsi/cibsi2009/docs/Papers/ CIBSI-Dia2Sesion5(2).pdf Bhukya, W.S., Kommuru, S.K. and Negi, A. (2007) ‘Masquerade detection based upon GUI user profiling in linux systems’, 12th Asian Computing Science Conference, Lecture Notes in Computer Science, 9–11 December, Doha, Qatar, Springer, Volume 4846, pp.228–239. Erbracher, R.F., Prakash, S., Claar, C.L. and Couraud, J. (2006) Intrusion Detection: Detecting Masquerade Attacks using UNIX Command Lines, http://digital.cs.usu.edu/ erbacher/publications/Masquerade DetectionConference.pdf Feng, H.P., Kolesnikov, O.M., Fogla, P., Lee, W. and Gong, W. (2003) ‘Anomaly detection using call stack information’, Proceedings of IEEE Symposium on Security and Privacy, 11–14 May, Berkeley, CA, pp.62–75. Garg, A., Rahalkar, R., Upadhyaya, S. and Kwiaty, K. (2006) ‘Profiling users in GUI based systems for masquerade detection’, Information Assurance Workshop, IEEE, 21–23 June, West Point, NY, pp.48–54. Garg, A. (2006) ‘USim: a user behavior simulation framework for training and testing IDSes in GUI based systems’, Simulation Symposium, April. Heagerty, P.J. and Zheng, Y. (2005) ‘Survival model predictive accuracy and ROC curves’, Biometrics, Vol. 61, pp.92–105. Huang, L. and Stamp, M. (2007) ‘Masquerade detection using profile hidden Markov models’, Computers and Security, Vol. 30, No. 8, pp.732–747. Imsand, E.S., Garrett, D. and Hamilton, J.A. (2009) ‘User identification using GUI manipulation patterns and artificial neural networks’, Computational Intelligence in Cyber Security, 2009 (CICS ’09), 30 March–2 April, Nashville, TN, pp.130–135. Kothari, A. (2013) Defeating Masquerade Detection, Master’s Projects 239, http://scholarworks.sjsu.edu/etd_projects/239 Monrose, F. and Rubin, A. (1997) ‘Authentication via keystroke dynamics’, ACM Conference on Computer and Communications Security, 1–4 April, Zurich, Switzerland, pp.48–56. Peacock, A., Ke, X. and Wilkerson, M. (2004) ‘Typing patterns: a key to user identification’, IEEE Security and Privacy, Vol. 2, No. 5, September–October, pp.40–47. Schonlau, M., DuMouchel, W., Ju, W-H., Karr, A.F., Theus, M. and Vardi, Y. (2001) ‘Computer intrusion: detecting masquerades’, Statistical Science, Vol. 15, No. 1, pp.1–17. Schonlau, M. (1998) Masquerading User Data, http://www.shonlau.net/intrusion.html Shavlik, J., Shavlik, M. and Fahland, M. (2001) ‘Evaluating software sensors for actively profiling Windows 2000 computer users’, Fourth International Symposium on Recent Advances in Intrusion Detection (RAID ’01), 10–12 October, Davis, CA, USA. Stamp, M. (2012) A Revealing Introduction to Hidden Markov Models, http://www.cs.sjsu.edu/ stamp/RUA/HMM.pdf