Masquerade Detection in Network Environments - CiteSeerX

1 downloads 0 Views 609KB Size Report
∗Ames Laboratory, †§¶Dept. of Computer Science, ‡Dept. of Statistics. Iowa State University – Ames, IA, USA. Email: ∗[email protected],1†sandeepk ...
2010 10th Annual International Symposium on Applications and the Internet

Masquerade Detection in Network Environments Chris Strasburg∗ , Sandeep Krishnan† , Karin Dorman‡ , Samik Basu § , Johnny S. Wong ¶ ∗ Ames Laboratory, †§¶ Dept. of Computer Science, ‡ Dept. of Statistics Iowa State University – Ames, IA, USA Email: ∗ [email protected],{† sandeepk,§ sbasu,¶ wong }@cs.iastate.edu, ‡ [email protected] As a research problem, masquerade detection is a subset of the anomaly detection paradigm for general intrusion detection systems (IDS). Anomaly based IDS relies on tightly associating normal behavior with a user, computer or network. The goal of these systems is to detect malicious activity by monitoring aspects of user, system, or network behavior. The underlying assumption is that malicious behavior is anomalous, and that anomalous behavior is likely malicious. Driving problem: Most efforts to develop detection mechanisms for masquerade attacks have focused on host-based approaches, including command line, system call, and GUI interaction profiling While these approaches have shown steady improvement [2], their applicability is limited to scenarios in which hosts can be directly monitored and infrastructure is in place to gather and process host data. With the recent growth in massively distributed and interconnected systems, including cloud computing, distributed grid computing, and smart homes, a networkcentric approach to masquerade detection is needed. Network-based detection does not rely on direct host monitoring and raises fewer privacy and security concerns. While organizations may not routinely collect command line or system call data on all users, a large majority have the capability to collect statistical information on network traffi for diagnostic and general IDS purposes. In addition, in US government agencies, organizational and legal restrictions limit the collection of user-specifi information. Since our data can be easily de-identifie by remapping IP addresses while retaining pertinent statistical features, we avoid these and other potential roadblocks to masquerade detection. Approach: In this work, we apply Support Vector Machine (SVM) classificatio to masquerade detection, using only basic network statistics and server log data to generate user profile and detect potential masqueraders. The primary contributions of this work include: 1) Adapt masquerade detection to a network setting. The proposed model is applied in networked en-

Abstract—As reliance on Internet connected systems expands, the threat of damage from malicious actors, especially undetected actors, rises. Masquerade attacks, where one individual or system poses as another, are among the most harmful and difficul to detect types of intrusion. Previous efforts to detect masquerade attacks have focused on host-based approaches, including command line, system call, and GUI interaction profiling but when host data is not accessible or legal/ethical restrictions apply, these methods are infeasible. In this work, we present an approach to masquerade detection using only basic network statistics. We use server log analysis to tag network events with the associated user and build user network profiles By utilizing only anonymized summary data, we limit the privacy impact of masquerade detection while avoiding the data accessibility issues associated with host-based approaches. We compile 90 days of NetFlow data from over 50 users and show the user profile are unique, and likely useful for detecting masqueraders. Finally, we apply Support Vector Machine (SVM) classificatio to demonstrate feasibility of masquerade detection using network data. Keywords-D.4.6. Security and Privacy Protection; H.1.2. User/Machine Systems; K.6.5. Security and Protection;

I. I NTRODUCTION With the increase of Internet connected technology, there is also an increase in the potential for damage caused by malicious actors. In the fiel of intrusion detection, masquerade attacks are one of the most harmful and difficul to detect intrusion types. Maxion and Townsend [1] describe the following example masquerade attack: When a legitimate user takes a coffee break, leaving his/her terminal open and logged in, an interloper assumes control of the keyboard (taking advantage of the legitimate user’s privileges) and enters commands. More generally, a masquerade attack is an unauthorized attempt to impersonate a legitimate user. Examples include identity theft, stolen credentials, or the use of another’s computer or account. The attacker is then free to access confidential/persona information, protected resources, or monitor future activity. As masquerade techniques advance, it is imperative to develop robust and accurate techniques to detect these attacks. 978-0-7695-4107-5/10 $26.00 © 2010 IEEE DOI 10.1109/SAINT.2010.66

30 38

analysis problems with the SEA experiment. Their data modifie the masquerade detection experiment such that every user was tested against all other 49 users as masqueraders. The comparison between SEA and SEA-1v49 demonstrates the trade off between accurately modeling real world attacks (SEA), and conducting reproducible experiments (SEA-1v49). Addressing the lack of command line parameters in the SEA dataset, the Greenberg dataset [4] contains full command line history data for 168 users. By providing additional command features, parameters and aliases, this dataset allowed direct comparisons between approaches using truncated and enriched command lines. Classificatio techniques: Schonlau et al. [3] initiated recent interest in command-line based masquerade detection through the comparison of six different classificatio techniques on the SEA dataset. The authors included results from information theoretic, Markovian, and biologically inspired approaches. While there was wide variation in performance, no method was successful in achieving the desired false positive rate of 1%. Probabilistic models such as the Hidden Markov Model (HMM), have also been applied to perform masquerade detection. Posadas et al. [5] applied HMM to a modifie SEA dataset called SEA-I. They extracted grammar using the Sequitar algorithm for every session and then use the generated rules as the training data for the HMM model. Other HMM based approaches, like the [6] and [7] use multiple agents which are trained to identify self vs. non-self users. Sequence alignment is a technique widely used in bioinformatics to identify the similarity between two biological sequences. These approaches were adapted to sequences of UNIX commands in the sequence-match work [8], however the results were not promising [3]. Other approaches ([9], [10]) have achieved better detection rates but also generated high false positives. Na¨ıve Bayes and SVM have been the most widely applied approaches to masquerade detection. The advantages of Na¨ıve Bayes include simplicity and surprising accuracy. Maxion and Townsend [1] were the firs to apply Na¨ıve Bayes classificatio to the SEA dataset, with results suggesting that the technique was effective (detection rate in the mid 60% with very low false positive rate of 1.3%). Thereafter many additional works were based on the Na¨ıve Bayes approach ([11], [12]). Approaches using SVM-based classifier include work by Kim and Cha [13], who applied SVM classifier in combination with a voting scheme to the SEA, 1v49, and Greenberg datasets. They demonstrated both higher detection and false positive rates as compared to Na¨ıve Bayes. Wang et al. [14] compared one-class and

vironments where host-based data may be difficul or impossible to obtain. We demonstrate that even simple features of network traffi show promising performance compared to that obtained on hostbased data. 2) An analysis of the data characteristics underpinning successful masquerade detection. While previous works use classifie performance to validate the predictive success of the utilized data, we confir our data have characteristics key to successful classificatio as well as features that may degrade classifie performance. 3) The application of SVM classificatio to identify masqueraders from Netflo data. Our work establishes a methodology for applying SVMbased classificatio techniques and demonstrates the impact of simple feature reduction strategies on classifie performance. Organization: The remainder of the paper is organized as follows. A brief overview of related work is given in Section II. Section III provides an overview of the proposed masquerade detection framework and presents the details of the dataset and classificatio techniques used in this paper. Experimental results are given in Section IV, and Section V contains our conclusions and suggestions for future work. II. R ELATED W ORK A number of datasets and classificatio techniques have been used to explore masquerade detection over the past decade. A comprehensive review of this work is given by Bertaccini and Fierens [2]. Data sets: To enable meaningful comparison between approaches and encourage peer review, publicly available datasets are used extensively to perform analysis on masquerade detection. One of the most widely used datasets was published by Schonlau et al. in 2001 [3], and is commonly referred to as the SEA dataset. SEA contains the firs 15,000 commands executed by 70 users over a period of several months. While the number of commands per user is fi ed, some users generated the commands in a few days and others took months. For training and testing, data was grouped into blocks of 100 commands. The initial experimental design attempted to model real-world attack scenarios by injecting masquerade data into the testing data blocks for each user. From the 15,000 commands collected for each user, the firs 5000 commands were used as the training data and the remaining 10,000 commands for each user were injected with data belonging to other users. A variation of this dataset, SEA-1v49, was produced by Maxion and Townsend [1] to address repeatability and 39 31

Note that the active hours are selected using the mean as a threshold, which is sensitive to extreme outliers. NetFlow statistics on all incoming and outbound connections are generated on switches and routers at Ames Laboratory and aggregated at a central collection point. In general, it may be necessary to correlate and deduplicate information from distributed collection points for an overall picture of network traffi behavior. The NetFlow data are tuples of the form (T, Dt, S, D, Sp, Dp, P ) where T is the time-stamp, Dt is the amount of data transmitted in bytes, S is the source IP address, D is the destination IP address, Sp is the source port, Dp is the destination port, and P is the protocol number. Our efforts focus on using the destination IP address alone, analogous to truncated command line data in host-based masquerade detection. We will explore the use of other features in future work. We split the 50 users into 30 victims and 20 “masqueraders.” The three months of data amounted to 40,514,129 NetFlow events for the victims and 25,126,566 events for the “masqueraders.” To train the classifiers we compress events into records of the form (S, D, ISD (H)), where ISD (H) indicates if source S visits destination D in hour H. There are multiple entries for each (S, D) pair corresponding to the active hours for each source S (user). For some analyses, we also considered data (S, D, ISD (Y )), where ISD (Y ) indicates a visit from S to D in day Y . We randomly select 90% of the data from the 30 victims for training the classifiers For testing, the remaining 10% of victim data is merged with the “masquerader” data. Unique data properties. Significan differences between the data collected in this work and command-line data used previously include data volume (number of instances), diversity (number of features), and sparseness (instantiated features per instance). Data volume: As noted earlier, the command line datasets consisted of tens of thousands of instances per user. By contrast, our users average 1.3 million records. While this volume of data supports estimation of the increased number of parameters (IP address characteristics for each user), it also presents data management challenges using standard tools. For instance, the parameter selection tool, grid.py, of LibSVM takes over a day to fin optimal cross-validation parameters. These challenges prompted our departure from some of the standard SVM practices, such as iterative training and parameter optimization when performing feature selection. Instead, we estimate parameters using subsets of users to determine if parameters change significantl after pruning. We then apply those parameters to the entire data set. While there is a risk for sub-optimal

two-class SVMs using both command presence/absence (binary feature) and command frequencies with Na¨ıve Bayes on the SEA dataset. The one-class SVM on binary data and two-class Na¨ıve Bayes using command frequencies were the most successful, with results similar to those obtained in [3]. Other approaches to masquerade detection have been compared in [3], however none were competitive compared to SVM and Na¨ıve Bayes. III. E XPERIMENT D ESIGN Drawing from the successful approaches to host based masquerade detection, we focus our initial efforts on Support Vector Machine (SVM) two-class classifiers A two-class classifie is trained on both positive and negative instances, i.e. self and other instances. For each user, we generate one two-class model (self and other). We then ask all classifier to vote on unlabeled instances; the highest confidenc vote classifie the instance. Our data consist of IP addresses visited by users. Here, we compare results using binary (visited/not visited) training data. On account of the very large feature space we investigate domain-specific IP address grouping according to netmask, and general feature reduction strategies. Data. We received permission to construct a dataset collected by the network monitoring tools in use at Ames Laboratory, US Department of Energy at Iowa State University. The data consist of mail server logs (POP and IMAP logins when a user checks for mail and SMTP logins when a user sends mail) and NetFlow records over the course of 90 days. To preserve privacy, user identifier are replaced with generic IDs, and source and destination IP addresses are re-mapped to non-public IPs. Timestamps and other statistical information are left intact. The mail server log was used to associate user accounts with NetFlow traffi and to determine when the user was active. We arranged the POP, IMAP, and SMTP logins in tuples of the form (T, U, S) where T is the time-stamp, U is the username, and S is the source IP from which the user connects. A user’s primary source IP address is the most common source IP associated with that user. Hours when the user is considered active on the primary source IP are determined by comparing the e-mail server activity for that hour with the user average. Hours with above average activity are tagged for training, while hours with below average activity are ignored. We selected 50 users, for whom the number of “active” hours during the 90 day collection period ranges from 39 to 2,104, reflect ve of the varied network usage. 40 32

Train models for each user using svm train. • Generate false positive and true positive rates for each classifie using the 10% validation data for self, and the 20 masquerader users for non-self. Feature reduction strategies: Feature reduction is the technique of selecting a subset of features to build models. Due to the large number of destination addresses (features), we examine several approaches to feature reduction: pruning low information gain features, low incidence features, and grouping features by category. Information gain (IG) is a measure of the reduction in uncertainty caused by including a feature. If a destination is similarly used by all users, it yields little information about user identity and has low IG. Thus, one way to prune features is to remove those with low IG. Not all high IG features are useful. In order to identify users over time, we need to identify features that persist in time. A feature used just once by some users may be very discriminative in training data, but not very predictive. We propose pruning non-persistent features that occur no more than f ve times for any user. Because our features are IP addresses, a third feature reduction approach is to group the destinations by network class. Each IP address is comprised of four parts, separated by a ‘.’. These parts are commonly known as classes. For example, address aaa.bbb.ccc.ddd is in class A network aaa, which contains class B bbb, which in turn, contains class C ccc, and so on. Grouping by class C combines all IP addresses with the same aaa.bbb.ccc component and reduces meaningless feature diversity (e.g. contacting one of Google’s several IP addresses) without losing significan organizational discrimination (e.g. using Yahoo instead of AltaVista as a search engine).

TABLE I S PARSENESS O F DATA Grouping None Network C-block Network B-block Network A-block

Mean Sources per Houra 39.98 25.99 17.94 10.69

Total Sourcesb 148554 109764 17705 168



Proportionc 0.00026 0.00024 0.001 0.063

a mean number of sources visited per hour, b total sources visited, c mean fraction of sources visited per hour.

model construction, in practice we noticed very little performance variation for small changes in model parameters. Diversity: Compared with a few thousand possible commands on a standard Linux system, the number of distinct destination IP addresses for the 50 chosen users was over 140,000. This complexity hinders classification so we explore feature reduction. Sparseness: A third feature of NetFlow data is sparseness. Most destinations are not visited in most hours (see Table I). Sparseness provides opportunities for optimization. For instance, filte -based feature selection techniques can significantl improve performance. Methodology. SVMs: SVMs address the challenges of massive datasets with nonlinearity and have been broadly and successfully applied to a variety of classificatio problems. They train and predict very quickly on large datasets. At a high level, SVM applies efficien transformations (kernel functions) to map training data instances into high-dimensional spaces. In spaces of high enough dimension, instances become linearly separable, and an optimal separating hyperplane can be constructed. An optimal classifie is constructed by maximizing the margin between labeled training instances and this hyperplane. Depending on the underlying kernel function, SVMs can represent a very rich set of non-linear functions in lower dimensions. Implementation: We use the LibSVM implementation [15], which supports several classificatio modes and provides parameter selection, scaling, and analysis tools. Based on the non-linearity and interdependencies of our data, the Radial Basis Function (RBF) kernel is selected for data transformation. This kernel maps instances to Gaussian distributions in the high-dimensional space. RBF kernels have performed well in a broad array of classificatio problems. As demonstrated by Chang and Lin [15], data scaling parameter selection can significantl improve the SVM classifie performance. Thus, our evaluation process is: • Convert the data to LibSVM input format. • Scale the data using the svm scale utility. • Run a grid-based parameter search on 4-5 users, using the grid.py utility.

IV. R ESULTS Data Analysis. Masquerader detection is most successful if user NetFlow data is unique and persistent. Uniqueness insures that each user (or “masquerader”) produces distinguishing NetFlow data. Persistence insures that training and test data from the same user are not distinguishable. To test these properties in the NetFlow data, we randomly selected 100 destinations receiving at least 10 total visits. A binary temporal usage profil (Fig. 1) is a sequence of 90 numbers indicating the relative frequency of particular NetFlow events in each sampled day. We compute profile for users (probability that a particular user visits any of the 100 destinations), destinations (proportion of 50 users visiting a particular destination), and user/destination combinations (0/1 indicator if a particular user visited a particular destination). We use 41 33

TABLE III C LASSIFICATION ACCURACY WITH G ROUPING

TABLE II L OGISTIC R EGRESSION P REDICTION OF N ETWORK U SAGE . modela user destination user + destination user + destination + user/destinationc a

deviance 83,855 83,742 81,583 44,910

dfb 459,990 459,990 459,981 459,972

AIC 83,875 83,762 81,621 44,991

Grouping None Class C Class B Class A

null deviance is 85,991 with 459,999 df, b degrees of freedom, c p-value for this vs. preceding model rounds to 0

a

the profile to predict daily binary data ISD (Y ). To summarize the profiles we cluster each profil type into 10 groups using k-means [16]. Cluster identity is then used to predict I(Y ) through logistic regression with R [17]. Table II shows the fi of four models. Adding the user/destination cluster factor significantl (p-value< 0.00001) improved the fit The importance of the user/destination term demonstrates user uniqueness; users clearly differ in their destination usage. On the other hand, plots of the mean profile for the 10 user/destination clusters (Fig. 1) show profile do not necessarily persist. In particular, users may stop using a destination (plot 2), ephemerally interact with a destination (plot 3), or increasingly visit a destination (plot 8). In addition, several profile demonstrate significan cyclic trends (plots 1, 5, and 8-10). Since for SVM classification we randomly sampled training data from total data, the lack of persistence will likely not affect our results. However, classificatio on live data without temporal permutation could lead to false positives as user network profile shift in time.

Total Features 148554 109764 17705 168

Training Featuresa 38297 25744 8554 157

% Feature Reduction N/A 26% 88% 99.9%

% AUC Change N/A -15% 2.9% 2.1%

Training features are those seen in the selected training data.

TABLE IV C LASSIFICATION ACCURACY WITH F EATURE R EDUCTION (PARAMETER VALUES : C=0.05125, G =0.003125) Grouping Class B Class B Class A

Feature Reduction∗ Per-user < 5 Per-user < 5, Info Gain < 2 Per-user < 5, Info Gain < 2

Total Features 16446 16381

% Feature Reduction 88.9% 89.0%

% AUC Increase 1.0% 2.3%

149

99.9%

0.1%

∗ Per-user refers to pruning destinations not contacted at least f ve times by any user. Information Gain refers to pruning destinations with an Information Gain of less than two bits.

SVM. To evaluate the performance of our approach, we generate Receiver Operator Characteristic (ROC) curves, which show the trade-off between false positive rate and detection rate. The typical metric used to compare ROC curves is the Area Under the Curve (AUC), a value between 0 and 1. The greater the area, the better the trade-off between accuracy and false positive rate. Parameter selection: The RBF γ parameter (also g in command line tools, tables, and graphs) controls model fl xibility. The C parameter controls the penalty assigned to misclassifie points in the training data. Using the grid-based search for optimal parameters, we obtained C = 0.05, g = 0.003. The g value indicates each point in the training set is represented by a Gaussian distribution with small variance. However, the low C value allows the classifie to make more training errors, improving the predictiveness of the model. The results below are evaluated using these parameter values. Data Grouping: As shown in Table III, grouping features by network class could improve classificatio accuracy. Grouping by class C network showed a large negative impact, however the change in feature space was also small (only a 26% reduction). Grouping by class B showed the greatest improvement, with class A grouping giving somewhat less of an improvement. of network traffi is within two class B networks. We speculate that the performance loss after class C grouping is random fluctuatio caused by the substantial noise left in the feature set. Class B and A grouping sufficientl reduce the feature set to allow better classification although class A may remove too much userdistinguishing information.

Fig. 1. Mean profile of user/destination clusters 1 through 10, ordered by row, inferred from binary data over the data collection period.

42 34

Fig. 2. ROC curves for users (a)5, (b)13, and (c)214. Image (d) is the ROC curve for the overall classifie using low-IG and non-persistent feature pruning, grouped by network class B.

a.

b.

c.

d.

TABLE V C OMPARISON OF N ET F LOW AND H OST-BASED A PPROACHES

Low Occurrence / Low Information Gain (IG) Pruning: Table IV shows the effect of filterin out low IG features and non-persistent features. Applying both techniques to the Class B grouping further improved classificatio performance by 2.3%. Further feature pruning on Class A grouped data has little effect (0.1% improved AUC), again suggesting Class A grouping already removed important feature information.

Method Our Approach SVM [13] SVM [13] SVM [13] Naive Bayes [1]

Detect % 60 94.8 80.1 71.1 66.2 a False

Overall Results. While the ROC curves reveal a difficult in controlling false positive rates, our most effective configuratio achieved a detection rate of 60% with 5% false positives. A comparison with host-based approaches (on truncated command lines) is given in Table V.

FAR %a 5 0 9.7 6.0 4.6

Dataset Ames Laboratory 1v49 SEA Greenberg SEA

alarm (positive) rate.

classifie could motivate more effective use of the data and improve feature selection. V. C ONCLUSION AND F UTURE W ORK A. Conclusion

Some users exhibit very strong individual ROC curves. In particular, the ROC curves for users 5, 13, and 214 are listed in Figure 2. For many users there is an initial jump in detection rate, but further improvement yields steep costs in false alarms. Further investigation into the characteristics of those users who were most easily

Vast research has gone into masquerade detection using host-based data, most focusing on command lines. While these methods improve, we feel there is imminent need to explore masquerade detection using more readily available network data. 43 35

We collect a NetFlow dataset and show there is strong association between users and their NetFlow data. Applying SVM classificatio to binary features confirme that this data is classifiabl with a detection rate of 60% and 5% false alarms. While this is comparable to the host-based results using truncated command-line data, we note that it is not yet sufficien for fiel applications. Many of the masquerade detection improvements successful with host-based data could be adapted to improve detection with network data. Pending approval, we plan to publish our experimental NetFlow data, permitting others to validate our work and test new methods and approaches.

R EFERENCES [1] R. A. Maxion and T. N. Townsend, “Masquerade detection using truncated command lines,” 2002, p. 219. [2] M. Bertacchini and P. Fierens, “A survey on masquerader detection approaches,” 2008. [3] M. Schonlau, W. Dumouchel, W. hua Ju, A. F. Karr, M. Theus, and Y. Vardi, “Computer intrusion: Detecting masquerades,” Statistical Science, vol. 16, pp. 58–74, 2001. [4] S. Greenberg, “Using unix: Collected traces of 168 users,” 88/333/45, Department of Computer Science, University of Calgary, Alberta, Canada, Tech. Rep., 1998. [5] R. Posadas, J. C. Mex-Perera, R. Monroy, and J. A. NolazcoFlores, “Hybrid method for detecting masqueraders using session folding and hidden markov models,” in MICAI, 2006, pp. 622– 631. [6] T. Okamoto, T. Watanabe, and Y. Ishida, “Towards an immunitybased system for detecting masqueraders,” in KES, 2003, pp. 488–495. [7] T. Okamoto and Y. Ishida, “Framework of an immunity-based anomaly detection system for user behavior,” in KES (3), 2007, pp. 821–829. [8] T. Lane and C. E. Brodley, “Approaches to online learning and concept drift for user identificatio in computer security,” in In KDD. AAAI Press, 1998, pp. 259–263. [9] S. Coull, J. Branch, B. Szymanski, and E. Breimer, “Intrusion detection: A bioinformatics approach,” 2003. [10] S. Coull and B. Szymanski, “Sequence alignment for masquerade detection.” Comput. Stat. Data Anal., vol. 52, no. 8, pp. 4116– 4131, 2008. [11] R. A. Maxion and T. N. Townsend, “Masquerade detection augmented with error analysis,” vol. 53, Mar. 2004. [12] K. S. Killhourhy and R. A. Maxion, “Investigating a possible fl w in a masquerade detection system,” Technical Report 869, Newcastle University, School of Computing Science, Tech. Rep., 2004. [13] H.-S. Kim and S. D. Cha, “Empirical evaluation of svm-based masquerade detection using unix commands,” Computers & Security, vol. 24, no. 2, pp. 160–168, 2005. [14] K. Wang and S. Stolfo, “One-class training for masquerade detection,” in 3rd IEEE Conference Data Mining Workshop on Data Mining for Computer Security, 2003. [15] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001, software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. [16] J. B. MacQueen, “Some methods for classificatio and analysis of multivariate observations,” in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, 1967, pp. 281–297. [17] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2009, ISBN 3-900051-07-0. [Online]. Available: http://www.R-project.org [18] K.-M. Chung, W.-C. Kao, C.-L. Sun, L.-L. Wang, and C.-J. Lin, “Radius margin bounds for support vector machines with the rbf kernel,” Neural Comput., vol. 15, no. 11, pp. 2643–2681, 2003. [19] Y. wei Chen and C. jen Lin, “Combining svms with various feature selection strategies,” in Taiwan University. SpringerVerlag, 2005.

B. Future Work A more thorough analysis of the underlying data characteristics will enable new and improved techniques. We plan to perform an examination of user characteristics which influenc detectability, and to investigate the use of false alarm reduction strategies. We anticipate performance can be improved with more advanced feature selection (see [18], [19]) and the inclusion of new feature types available in NetFlow (e.g. amount of data exchanged or protocol used). Alternative classificatio techniques, such as Naive Bayes and HMMs, may also yield good results. Also, we have performed classifi cation using binary features, however classificatio on destination visit frequencies may be superior. In addition to classificatio accuracy, we also plan to evaluate the proposed approach for novel uses and practical considerations. We focused here on masquerader detection, but as detection and false alarm rates improve, our technique could enable automated real-time intrusion response in distributed environments. Although we utilize two sources of information in this work (email server logs and NetFlow data), we are pursuing approaches that use NetFlow data only to further protect user privacy and reduce implementation complexity. Our anticipation is that NetFlow data will also provide a more accurate basis from which to determine when a user is active. Finally, we intend to evaluate any methods for scalability and economic cost, especially compared to host-based methods. VI. ACKNOWLEDGMENTS We would like to thank the Ames Laboratory of the Department of Energy for providing access to data and facilities in support of this research, and in particular, we appreciate their willingness to pursue the publication of this data. 44 36

Suggest Documents