Performance Comparison of Four Rule Sets: An Example for ...

12 downloads 660 Views 1MB Size Report
extract the minimum features/rules set required to classify ... sition; security; performance measures; ... attributes and building minimal rule sets for classifying.
Performance Comparison of Four Rule Sets: An Example for Encrypted Traffic Classification Riyad Alshammari, A. Nur Zincir-Heywood and Abdel Aziz Farrag Faculty of Computer Science, Dalhousie University 6050 University Avenue Halifax, NS, Canada (riyad,zincir,farrag)@cs.dal.ca

Abstract—The objective of this work is the classification of encrypted traffic where SSH is taken as an example application. To this end, four learning algorithms AdaBoost, RIPPER, C4.5 and Rough Set are evaluated using flow based features to extract the minimum features/rules set required to classify SSH traffic. Results indicate that C4.5 based classifier performs better than the other three. However, we have also identified 15 features that are important to classify encrypted traffic, namely SSH. Keywords-encrypted traffic classification; problem decomposition; security; performance measures;

I. I NTRODUCTION As the remarkable increase of applications on the Internet encrypts their traffic, it becomes more and more difficult for the network administrators to accurately classify traffic for the task of network management and Quality of service. The traditional approach to classify network traffic based on known port numbers is no longer effective since applications such as Peer-to-Peer file sharing use non-standard ports to bypass firewalls or circumvent operating systems restrictions. Another approach to classifying applications is to inspect the payload of every packet. This technique can be extremely accurate when the payload is not encrypted. However, encrypted applications such as SSH imply that the payload is opaque. One possibility is to identify specific features of the network traffic and use them to guide the traffic classification. Recent research in this area focuses on Machine Learning based approaches such as Hidden Markov models, Na¨ıve Bayesian models, AdaBoost, or Maximum Entropy to classify network traffic [1], [2], [3], [4]. To this end, any machine learning approach requires a feature set and a training phase. During the training phase, a set of preclassified feature vectors (training set) is processed by the machine learning algorithm. At the end of this phase, a classifier is returned, which can be used to classify any unknown traffic. These techniques rely on the observation that different applications have distinct behavior patterns on the network. However, in general all these efforts show that even though it is easier to apply such techniques to well known application traffic such as Web and Mail, more work is required to classify encrypted applications, such

as SSH, accurately. SSH is typically used to login to a remote computer but it also supports tunneling, file transfers and forwarding arbitrary TCP ports over a secure channel between a local and a remote computer. What makes the detection of this application interesting is that its payload is encrypted and multiple applications can be run within an SSH session. In this work, we have chosen to work with SSH as an example of encrypted applications. The reason behind this is that SSH allows us to know the ground truth about the labels of the training data set given that it is a public domain application. Moreover, SSH handshake is not encrypted. Both of these properties do not exist with other encrypted applications such as Skype. The objective of this work is to investigate the discovery of rule sets in the format of “IF-THEN” by using statistic flow features to classify encrypted traffic where SSH is taken as an example. However, the price paid to generate the rules is the size of the rule base. Thus, the reduction for the rule base is very important. Finding the minimal rule set is proved to be an NP-hard problem [22]. Therefore, we are going to investigate the performance of the four machine learning algorithms (AdaBoost, RIPPER, C4.5 and Rough Set with Genetic Algorithms) in finding relevant attributes and building minimal rule sets for classifying SSH traffic without using features such as IP addresses, source/destination ports and payload information. The rest of this paper is organized as follows. Related work is discussed in Section II. Section III details the data sets, features and machine learning algorithms employed. Section IV presents our experimental results. Conclusions and future work are discussed in Section V. II. R ELATED W ORK As discussed earlier, having an encrypted payload and being able to run different applications over SSH makes it a challenging problem to classify SSH traffic without using IP addresses, port numbers and payload from a given traffic log file. However, when existing literature is explored it is seen that most papers focus on classification of well known applications such as HTTP, SMTP, FTP etc. Not much attention is paid to the classification of encrypted traffic.

In literature, Zhang and Paxson present one of the earliest studies of techniques based on matching patterns in the packet payloads [8]. Dreger et al. [9] and Moore et al. [4] applied more sophisticated analyses, which still require payload inspection. Early et al. employed a decision tree classifier on n-grams of packets for distinguishing flows [10]. Moore et al. used Bayesian analysis to classify flows into broad categories such as bulk transfer, P2P or interactive [3][4]. Haffner et al. employed AdaBoost, Hidden Markov, Na¨ıve Bayesian and Maximum Entropy models to classify network traffic into different applications [2]. Their results showed AdaBoost performed the best on their data sets; with an SSH detection rate of 86% and false positive rate of 0%, but they employed the first 64 bytes of the payload. Karagiannis et al. proposed an approach that does not use port numbers or payload information on traffic that is not encrypted [5]. However, their approach relies on information about the behavior of the hosts on the network. Thus, they cannot identify distinct applications and cannot classify individual flows or connections. More recently, Wright et al. investigate the extent to which common application protocols can be identified using only packet size, timing and direction information of a connection [1][11]. They employed a kNearest Neighbor (kNN) and Hidden Markov Model (HMM) learning systems to compare the performance. Even though their approach can classify distinct encrypted applications, their performance on SSH classification dropped to 76% detection rate and 8% false positive rate. Bernaille et al. employed first clustering and then classification to the first four packets in each connection to identify SSL connections [21]. Another recent work by Williams et al. [12] compared five different classifiers namely, Bayesian Network, C4.5, Na¨ıve Bayes (with discretisation and kernel density estimation) and Na¨ıve Bayes Tree, on the task of traffic flow classification. They found that C4.5 performed better than the others. However, rather than giving classification results per application, they give overall accuracy results per machine learning algorithm. On the other hand, Alshammari et al. investigated the performance of C4.5 and RIPPER algorithms on three different feature sets (flow, heuristic and packet header features) for the classification of encrypted traffic such as SSH without using features such as IP addresses, source/destination ports and payload information [25], [26]. Results indicated that C4.5 learning model based rule set provides the highest performance using flow based feature sets. III. M ETHODOLOGY As discussed earlier, in this work machine learning algorithms are going to be employed in order to identify the minimum rule set to the problem of SSH traffic classification.

Table I S UMMARY OF TRACES

Total Packets MBytes % of TCP packets % of TCP bytes % of UDP packets % of UDP bytes % of Other packets % of Other bytes

Dalhousie 337,041,778 213,562 86.51% 91.03% 13.33% 8.95% 0.16% 0.02%

NIMS 34,808,433 35,640 96.01% 98.77% 3.76% 1.22% 0.23% 0.01%

A. Data Collection In our experiments, the performance of the different machine learning algorithms is established on two different network data sources: Dalhousie and NIMS traces. The Dalhousie data set is captured on our university’s campus network whereas NIMS data set is generated at our lab. • Dalhousie data sets were captured by the University Computing and Information Services Centre (UCIS) in January 2007 on the Dalhousie campus network between the university and the commercial Internet. Given the privacy related issues university may face, data is filtered to scramble the IP addresses and further truncate each packet to the end of the IP header so that all payload is excluded. Moreover, the checksums are set to zero since they could conceivably leak information from short packets. However, any length information in the packet is left intact. Thus these data sets are anonymized and do not have any payload information. Brief statistics on the traffic data collected are given in Table I. • NIMS data sets consist of packets collected internally at our research test-bed network. Our data collection approach is to simulate possible network scenarios using one or more computers to capture the resulting traffic. We simulate an SSH connection by connecting a client computer to four SSH servers outside our testbed via the Internet, Figure 1. We ran the following six SSH services: (i) Shell login; (ii) X11; (iii) Local tunneling; (iv) Remote tunneling; (v) SCP; and (vi) SFTP. We also simulated the following application behaviors (background traffics) such as DNS, HTTP, FTP, P2P (limewire), and telnet. All generated files are stored in PCAP format. These files include all information about a packet including all headers, application payload, as well as a timestamp value indicating packet-arrival time on the network card. Brief statistics on the traffic data collected are given in Table I. Since both of the traffic traces are very large data sets. We performed subset sampling to limit the memory and CPU time required for training and testing. The training data set, Dal Training Sample, is generated by sampling randomly selected (uniform probability) flows from five applications

Figure 1.

Simulating network traffic on local network and capturing this traffic

FTP, SSH, DNS, HTTP and MSN. In total, Dal Sample consists of 12246 flows, 6123 SSH and 6123 non-SSH. For NIMS training data, NIMS data set is labeled into multi-classes depending on SSH services/classes (SHELL, SCP, SFTP, X11, Local and Remote tunneling) and background traffic (FTP, TELNET, DNS, HTTP and P2P limewire). The applications in each data set are separated into 2 classes in-class (SSH applications) and out-class (background). The in-class contains 6000 flows (1000 flows for each applications) while the out-class contains 5000 flows (1200 flows for each applications), in total 11000 flows. B. Ground Truth Dalhousie traces (UCIS) are labeled by a commercial classification tool called PacketShaper, which is a deep packet analyzer [19]. PacketShaper uses Layer 7 filters (L7) to classify the applications [20]. Thus, by deep packet inspection, the handshake part of the SSH protocol can easily be identified since that part is not encrypted. In other words, we can confidently assume that the labeling of the data set is 100% correct and this provides us the ground truth for the Dalhousie traces. PacketShaper labeled all the traffic either as SSH or Non-SSH. Finally, classifying NIMS traces are done accurately since we know exactly, which applications were running in every experiment. Thus, we labeled NIMS traces as SSH and Non-SSH, too. The reasons we did not depend on the handshake part of the protocol to classify SSH traffic are two folds: i) payload inspection is not part of our feature set because we aim to generalize this work to other encrypted applications such as Skype or virtual private network tunnels; ii) requires the identification of the beginning of a connection in order to work. One of the drawbacks of this is the case of lossy links (packet losses due to link errors).

C. Feature Selection In this work, network traffic is represented using flowbased features. In this case, each network flow is described by a set of statistical features. Flow is defined as the traffic between two hosts at the network layer of the protocol stack where both hosts use the same 5-tuple vector (the Source/Destination IP addresses, IP protocol and Source/Destination port numbers) to exchange the traffic. The first packet seen determines the beginning of the flow while the termination of the flow depends on either a timeout or protocol based termination mechanism. The traffic in the flow can be viewed as two ways which are: (i) Traffic from source to destination; and (ii) Traffic from destination to source. Moreover, a feature is a descriptive statistic that can be calculated from one or more packets. To this end, NetMate [15] is employed to process data sets, generate flows and compute feature values. Flows are bidirectional and the first packet seen by the tool determines the forward direction. Moreover, flows are of limited duration. UDP flows are terminated by a flow timeout. TCP flows are terminated upon proper connection teardown or by a flow timeout, whichever occurs first. The TCP flow time out value employed in this work is 600 seconds [16]. We extract a set of features as in [12], [18], and then form the input vector for which the machine learning model provides a label {SSH, non-SSH} for each flow. As discussed earlier, features such as IP addresses, source/destination port numbers and payload are excluded from the feature set to ensure that the results are not dependent on such biased features. D. Classifiers Employed In order to identify SSH traffic; four different machine learning algorithms are employed. These are AdaBoost, RIPPER, C4.5 and Rough Set with Genetic Algorithm.

Table II F LOW BASED FEATURES EMPLOYED Protocol (proto) Number of Packets in forward direction (total fpackets) Number of Packets in backward direction (total bpackets) Min forward inter-arrival time (min fiat) Std. deviation of forward inter-arrival times (std fiat) Mean forward inter-arrival time (mean fiat) Max forward inter-arrival time (max fiat) Min forward packet length (min fpktl) Max forward packet length (max fpktl) Std. deviation of forward packet length (std fpktl) Mean backward packet length (mean fpktl)

E. AdaBoost AdaBoost, Adaptive Boosting, is a meta-learning algorithm, which means that a strong classifier is built from a linear combination of weak (simple) classifiers. It incrementally constructs a complex classifier by overlapping the performance of possibly hundreds of simple classifiers using a voting scheme. These simple classifiers are called decision stumps. They examine the feature set and return a decision tree with two leaves. The leaves of the tree are used for binary classification and the root node evaluates the value of only one feature. Thus, each decision stump will return either +1 if the object is in class, or -1 if it is out class. AdaBoost is simple to implement and known to work well on very large sets of features by selecting the features required for good classification. It has good generalization properties. However, it might be sensitive to stopping criterion or result in a complex architecture that is opaque. A more detailed explanation of the algorithm can be found in [14]. F. RIPPER RIPPER, Repeated Incremental Pruning to Produce Error Reduction, is a rule based machine learning algorithm [13]. Rules are added to explain positive examples such that if an instance is not covered by any rule then it is classified as negative. In RIPPER, conditions are added to the rule to maximize an information gain measure [14]. Change in gain is defined as shown in Eq. 1. 0

N N+ ) (1) Gain(R , R) = s.(log2 +0 − log2 N N where N is the number of instances that are covered by R and N+ is the number for true positives in them. N´ and N´ + are similarly defined for R´. s is the true positives in R and R´ after adding the condition. In short, the change in gain measures the reduction in bits to encode a positive instance. Conditions are added to a rule until it covers no negative examples. Once a rule is grown, it is pruned back by deleting conditions in reverse order to find the rule that maximizes the rule value metric, Eq. 2. 0

rvm(R) =

p−n p+n

(2)

Duration of the flow (duration) Number of Bytes in forward direction (total fvolume) Number of Bytes in backward direction (total bvolume) Min backward inter-arrival time(min biat) Std. deviation of backward inter-arrival times (std biat) Mean backward inter-arrival time (mean biat) Max backward inter-arrival time (max biat) Min backward packet length (min bpktl) Max backward packet length (max bpktl) Std. deviation of backward packet length (std bpktl) Mean forward packet length (mean bpktl)

where p and n are the number of true and false positives, respectively. To measure the quality of a rule, minimum description length is used [14]. RIPPER stops adding rules when the description length of the rule base is 64 (or more) bits larger than the best description length. A more detailed explanation of the algorithm can be found in [14]. G. C4.5 C4.5 is a decision tree based classification algorithm. A decision tree is a hierarchical data structure for implementing a divide-and-conquer strategy of attribute based model building. It is an efficient non-parametric method that can be used both for classification and regression. In non-parametric models, the input space is divided into local regions defined by a distance metric. In a decision tree, the local region is identified in a sequence of recursive splits in smaller number of steps. A decision tree is composed of internal decision nodes and terminal leaves. Each node m implements a test function fm(x) with discrete outcomes labeling the branches. This process starts at the root and is repeated until a leaf node is encountered. The value of a leaf constitutes the output. In the case of a decision tree for classification, the goodness of a split is quantified by an impurity measure. A split is pure if for all branches, for all instances choosing a branch belongs to the same class after the split. One possible function to measure impurity is entropy, Eq. 3 [14]. m = −

n X

pim log2 pim

(3)

j=1

If the split is not pure, then the instances should be split to decrease impurity, and there are multiple possible attributes on which a split can be performed. Indeed, this is locally optimal, hence has no guarantee on finding the smallest decision tree. In this case, the total impurity after the split can be measured by Eq. 4 [14]. In other words, when a tree is constructed, at each step the split that results in the largest decrease in impurity is chosen. This is the difference between the impurity of data reaching node m, Eq. 3, and the total entropy of data reaching its branches after the split, Eq. 4. A more detailed explanation of the algorithm can be found in [14].

0

m = −

n k X Nmj X j=1

Nm

pimj logpimj

(4)

i=1

H. Rough set Rough set theory is a branch of Set theory, which is a major area of research in mathematics, with many interrelated subfields such as combinatorial set theory. Rough set theory was developed by Pawlak in 1982 [23]. The rough set can be used to obtain preliminary knowledge on data, such as decision rules. Moreover, Rough set theory is used for inducing minimal decision rules from labeled data or reducing the number of attributes, it also can be used for classification data or discovering structural relationships within data. The main involvement of rough set to learning theory is the concept of reducts. Reduct can be defined as obtaining a minimum subset of attributes where relation was unchanged by doing a calculation on a subset of the problem. Moreover, Rough set theory can classify data into sets (set of rules) induced by a selecting the minimum set of attributes. After reduct generation, the finding of the rules is automatically computed. Then, Genetic algorithms is employed for the attribute reduction. Genetic algorithms, which is a well known effective searching and optimizing technique, works very well on combinatorial problems such as reduct finding in rough set theory. Moroeover, Genetic algorithms are believed to be effective for large decision system reduction [22]. The rules generated have the format of “IF-THEN”. A more detailed explanation of the algorithm can be found in [23], [22] IV. E XPERIMENTS AND R ESULTS In traffic classification, two metrics are typically used in order to quantify the performance of the classifier: Detection Rate (DR) and False Positive Rate (FPR). In this case DR will reflect the number of SSH flows correctly classified whereas FPR will reflect the number of non-SSH flows incorrectly classified as SSH. Naturally, a high DR rate and a low FPR would be the desired outcomes. They are calculated as follows: #F N Classif ications DR = 1 − T otalN umberSSHClassif ications FPR =

#F P Classif ications T otalN umberN on SSHClassif ications

where FN, False Negative, means SSH traffic classified as non-SSH traffic. Once the aforementioned feature vector is prepared for the data sets, then RIPPER, AdaBoost, C4.5 and Rough Set based classifiers are trained on the Dal Training Samples. To this end, we have used Weka [17] for AdaBoost, RIPPER and C4.5 algorithms, which is an open source tool for data mining tasks. We employed Weka with its default parameters to run all three algorithms on our data sets.

For Rough Set technique, we used Rosetta [24], which is a toolkit for analyzing tabular data within the framework of rough set theory. Figures 2 and 3 list the results for the four machine learning algorithms on the Dalhousie/NIMS training and ‘test (validation)’ traces. All models tend to return an excellent detection rate on NIMS traces and very low false positive rate. On Dalhousie traces, C4.5 and RIPPER appear to provide the stronger performance with consistently better in- and out-class FP and DR under both Dalhousie data training and validation partitions. Conversely, Rough Set method was the weaker classifier under the Dalhousie data. In all cases, the approach adopted to attribute selection was to include as wide a set as possible and let the ‘embedded’ properties of the various learning algorithms establish which subset of attributes to actually employ. Given this capability, we are now in a position to review the attributes selected by each model, where this is readily achieved classwise in the case of both C4.5 and Rough Set methos while RIPPER build the model for one class. The summary for AdaBoost is not as straight-forward and will therefore be limited to the total set of attributes utilized, independent of the class. Table III summarizes these findings for the case of AdaBoost, RIPPER, Rough Set and C4.5 algorithms, respectively. AdaBoost uses a lower count of attributes for SSH detection relative to other classifiers. Conversely, Rough Set uses the largest set of attributes as a whole. Each classifier also identifies attributes unique to their solution. Also of interest is the high level of overlap in shared attributes. Table IV lists the attributes that are shared by 3 classifiers or more. These attributes give more insight about the behavior (blueprint) of the encrypted traffic. Intuitively, what these algorithms learned from the data makes sense, given that the SSH protocol is an interactive protocol (user-machine). In order to correctly identify SSH traffic, the classifiers naturally needs to explore both directions. Each direction has its unique signature given that a client and a server operate differently. Thus, we believe that the attributes listed in Table III are actually what machine learning algorithms discovered to represent the behavior of the client and the server side of an SSH session. They are separated into two: (i) Attributes of the forward direction represent the traffic from client to server; and (ii) Attributes of the backward direction represent the traffic from server to client. Moreover, for SSH traffic on the Dalhousie training data set, C4.5 classifier model generates 13 rules, AdaBoost classifier model generates 9 rules, RIPPER classifier model generates 11 rules for SSH traffic and Rough Set based method generates 372195 rules. In short, the models/rules employed by AdaBoost, RIPPER and C4.5 appear to be simpler than Rough Set method and does not trade off for classification accuracy.

Figure 2.

Figure 3.

Results on the training and testing (Detection Rate)

Results on the training and testing (False Positive Rate)

Table IV C OMMON F EATURES USED BY 3 OR MORE C LASSIFIERS ON DALHOUSIE T RAINING DATA S ETS min fpktl max fiat mean fpktl min fiat max biat min bpktl total fvolume duration

std fpktl max fpktl max fiat max bpktl mean bpktl min biat total fpackets

V. C ONCLUSION AND F UTURE W ORK In this work, we investigate the performance of the model/rules generated by AdaBoost, C4.5, RIPPER and Rough Set based learning algorithms for distinguishing SSH

traffic from non-SSH traffic in a given traffic trace. To do so, we employ traffic traces captured on our Dalhousie Campus network and traces generated in our lab. We evaluated the aforementioned learning algorithms using traffic flow based features. Results show that the rules generated by C4.5 based classifier performs better than the other three classifiers on the above data sets using flow based features. In the worst case scenario, the rules generated by C4.5 based classifier provide the strongest performance class-wise. Moreover, this work identified 15 major features shared by at least 3 of the 4 learning algorithms that are based on – Duration, Direction, Inter-arrival time and Packet length – that distinct an encrypted application from another application. For future work, we are interested in investigating our approach for other encrypted applications such as Skype traffic and test its robustness with more data sets..

Table III F EATURES USED BY E ACH C LASSIFIERS FOR IN - CLASS ON T RAINING DATA S ETS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

C4.5 max bpktl std bpktl total fvolume total fpackets max biat min fpktl max fiat total bvolume mean fpktl max fpktl mean bpktl proto std fiat std fpktl min bpktl min fiat min biat total bpackets

AdaBoost max bpktl max biat std fpktl mean bpktl duration min fpktl total fpackets max fpktl min biat min fiat

ACKNOWLEDGMENT This work was in part supported by MITACS, NSERC and the CFI new opportunities program. Our thanks to Dalhousie UCIS team for providing us the anonymozied Dalhousie traffic traces. All research was conducted at the Dalhousie Faculty of Computer Science NIMS Laboratory, http://www.cs.dal.ca/projectx. R EFERENCES [1] Wright C., Monrose F., Masson G. M., “HMM Profiles for Network Traffic Classification”, Proceedings of the ACM DMSEC, pp 9-15, 2004. [2] Haffner P., Sen S., Spatscheck O., Wang D., “ACAS: Automated Construction of Application Signatures”, Proceedings of the ACM SIGCOMM, pp.197-202, 2005. [3] Moore A. W., Zuev D., “Internet Traffic Classification Using Bayesian Analysis Techniques”, Proceedings of the ACM SIGMETRICS, pp 50-60, 2005. [4] Moore A., Papagiannaki K., “Toward the Accurate Identification of Network Applications”, Proceedings of the Passive & Active Measurement Workshop, 2005. [5] Karagiannis, T., Papagiannaki, K., and Faloutsos, M, “BLINC: Multilevel Traffic Classification in the Dark”,Proceedings of Applications, Technologies, Architectures, and Protocols For Computer Communications pp 229-240, 2005. [6] Bernaille L., Teixeira R., Akodkenou I., “Traffic Classification on the Fly”, Proceedings of the ACM SIGCOMM Computer Communication Review, 2006. [7] Erman J., Arlitt M., Mahanti A., “Traffic Classification using Clustering Algorithms”, Proceedings of the ACM SIGCOMM, pp. 281-286, 2006.

RIPPER max bpktl std bpktl max biat mean bpktl std fpktl total fpackets max fpktl min fpktl min bpktl mean fpktl total fvolume max fiat duration min fiat min biat

Rough Set min bpktl mean bpktl max bpktl std bpktl min fpktl mean fpktl max fpktl std fpktl min biat mean biat max biat std biat min fiat mean fiat max fiat std fiat duration proto total fpackets total bpackets total bvolume total fvolume

[8] Zhang Y., Paxson V., “Detecting back doors”, Proceedings of the 9th USENIX Security Symposium, pp. 157-170, 2000. [9] Dreger H., Feldmann A., Mai M., Paxson V., Sommer R., “Dynamic application layer protocol analysis for network intrusion detection”, Proceedings of the 15th USENIX Security Symposium, pp. 257-272, 2006. [10] Early J., Brodley C., Rosenberg C., “Behavioral authentication of server flows”, Proceedings of the 19th Annual Computer Security Applications Conference, pp. 46-55, 2003. [11] Wright C. V., Monrose F., Masson G. M., “On Inferring Application Protocol Behaviors in Encrypted Network Traffic”, Journal of Machine Learning Research, (7), pp. 2745-2769, 2006. [12] Williams N., Zander S., Armitage G., “A Prelimenary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Comparison”, ACM SIGCOMM Computer Communication Review, Vol. 36, No. 5, pp. 5-16 , 2006. [13] Cohen W. W., “Fast effective rule induction”, Proceedings of the 12th International Conference on Machine Learning, pp. 115-123 , 1995. [14] Alpaydin E., “Introduction to Machine Learning”, MIT Press, ISBN: 0-262-01211-1. [15] NetMate, http://www.ip-measurement.org/tools/netmate/. [16] IETF, http://www3.ietf.org/proceedings/97apr/97aprfinal/xrtftr70.htm. [17] WEKA Software, http://www.cs.waikato.ac.nz/ml/weka/.

[18] Alshammari, Riyad; Nur Zincir-Heywood, A., “A flow based approach for SSH traffic detection,” Systems, Man and Cybernetics, 2007. ISIC. IEEE International Conference on , vol., no., pp.296-301, 7-10 Oct, 2007. [19] PacketShaper, http://www.packeteer.com/products/packetshaper/. [20] l7-filter, http://l7-filter.sourceforge.net/ [21] Bernaille L., Teixeira R., “Early Recognition of Encrypted Applications”, Passive and Active Measurement Conference (PAM), Louvain-la-neuve, Belgium, April, 2007. [22] Wroblewski, J.: Finding Minimal Reducts Using Genetic Algorithm. Proceeding of Second Annual Join Conference on Information Sciences. Wrightsvillle Beachm, NC 186-189 [23] Pawlak, Z., 1982. Rough sets. International Journal of Computer and Information Sciences, 11:341-356. [24] Aleksander Ohrn: ROSETTA Technical Reference Manual. Department ofCom- puter and Information Science, Norwegian University of Science and Technology, Trondheim, Norway. May 25, 200. [25] Alshammari R., Zincir-Heywood A. N., ”A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification”, IEEE Computational Intelligence in Security for Information Systems CISIS 2008, pp.203-210, October 2008. [26] Alshammari, Riyad; Zincir-Heywood, A. Nur, “Investigating Two Different Approaches for Encrypted Traffic Classification, ” Privacy, Security and Trust, 2008. PST ’08. Sixth Annual Conference on , vol., no., pp.156-166, 1-3 Oct. 2008

Suggest Documents