818
JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 18, NO. 5, OCTOBER 2016
A Probabilistic Sampling Method for Efficient Flow-based Analysis Zahra Jadidi, Vallipuram Muthukkumarasamy, Elankayer Sithirasenan, and Kalvinder Singh Abstract: Network management and anomaly detection are challenges in high-speed networks due to the high volume of packets that has to be analysed. Flow-based analysis is a scalable method which reduces the high volume of network traffic by dividing it into flows. As sampling methods are extensively used in flow generators such as NetFlow, the impact of sampling on the performance of flow-based analysis needs to be investigated. Monitoring using sampled traffic is a well-studied research area, however, the impact of sampling on flow-based anomaly detection is a poorly researched area. This paper investigates flow sampling methods and shows that these methods have negative impact on flow-based anomaly detection. Therefore, we propose an efficient probabilistic flow sampling method that can preserve flow traffic distribution. The proposed sampling method takes into account two flow features: Destination IP address and octet. The destination IP addresses are sampled based on the number of received bytes. Our method provides efficient sampled traffic which has the required traffic features for both flow-based anomaly detection and monitoring. The proposed sampling method is evaluated using a number of generated flow-based datasets. The results show improvement in preserved malicious flows. Index Terms: Anomaly detection, destination IP address, flowbased analysis, monitoring, octet, probabilistic sampling.
I. INTRODUCTION
D
UE to the exponential growth of network throughput, scalable methods are required to analyse the high volume of network traffic. Flow analysis based on packet header information is an efficient method for traffic management in highspeed networks. However, with the increasing number of flows in modern high-speed networks, it is nearly impossible to analyse the network traffic using conventional methods. Sampling is an important method for reducing flows to be investigated. Traditional sampling methods are mostly useful for monitoring purposes; however, these sampling methods severely decrease the effectiveness of anomaly detection [1]. This impact of sampling on the performance of anomaly detection is the gap that is addressed in this study. A flow is a set of packets with a number of common features such as source IP address, source port, destination IP address, destination port, and protocol [2]. Recently, researchers considered flow-based anomaly detection as an efficient method for Manuscript received May 15, 2015; approved for publication by Editor Vangelis Angelakis, Division III Editor, June 9, 2016. Z. Jadidi, V. Muthukkumarasamy, E. Sithirasenan, and K. Singh are with School of Information and Communication Technology, Griffith University, QLD, Australia, emails: {zahra.jadidi, v.muthu, e.sithirasenan}@griffithuni.edu.au,
[email protected]. Digital object identifier 10.1109/JCN.2016.000110
Fig. 1. Architectural components.
volume attacks which cause a considerable change in flow traffic volume, for example, denial of service (DoS) attacks, distributed DoS (DDoS) attacks, worms, scans and botnets [3], [4]. However, flow-based anomaly detection methods are required to be performed in sampled traffic. Sampling methods are widely used in flow-based analysers to adapt flow traffic to memory budget and decrease CPU usage. However, the distortion of traffic features and the changes in the traffic characteristics as a result of sampling methods may increase the error rate of anomaly detection algorithms [5], [6]. Two categories of sampling methods, packet sampling and flow sampling, are used in computer networks. Flow sampling is more efficient than packet sampling in terms of preserving the characteristics of traffic [5]. In this study, we propose a probabilistic flow sampling method which is evaluated with a number of generated flow-based datasets. This method uses destination IP addresses of the flows to make flow groups. A specific threshold is used to determine the desired groups which are sampled with a probability based on their byte numbers. The information loss is decreased using the proposed method. This method is useful for flow-based analysis, including monitoring and anomaly detection. Fig. 1 shows the architectural components of this study. A key advantage of the proposed method is that it can sample all victim IPs which are the destination of volume anomalies. Several flow-based datasets are used in this study to evaluate the performance. CAIDA DDoS datasets, CAIDA traces 2013, and CAIDA traces 2012 are packet-based datasets which are used to generate flow-based datasets for the evaluation [7]– [9]. The remainder of this paper is organized as follows. Section II critically examines the motivation and related works and describes the architectural components of this study. Section III describes the intrinsic characteristics of flow-based datasets. Section IV discusses the proposed sampling method. Section V provides the experimental results and discusses the strengths and limitations of the proposed solution. Section VI examines the complexity of the sampling methods. Section VII concludes the paper.
c 2016 KICS 1229-2370/16/$10.00
JADIDI et al.: A PROBABILISTIC SAMPLING METHOD FOR EFFICIENT FLOW-BASED...
II. MOTIVATION The volume of flow traffic in high-speed networks is too large for analysis using traditional flow-based analysers. An overview of flow-based intrusion detection is provided in [10] which investigates the limitations of this method. Various flow-based anomaly detection systems [11]–[14] have been proposed and they mostly assume that the resources are enough to process all flows. However, flow generators such as NetFlow, which is proprietary to Cisco, widely use sampling methods to decrease the amount of resources required. This study investigates the impact of sampling on flow-based analysis.
A. Existing Sampling Methods A popular way to reduce data is by sampling. NetFlow uses sampling techniques to be able to collect and analyse this huge amount of flow traffic. Packet sampling and its application to network traffic measurement are studied in [15], [16]. An adaptive packet sampling approach [17] is used to achieve more accurate measurements of network traffic. However, a packet sampling distorts the distribution of flow features. Flow sampling applied to flows has better accuracy compared to packet sampling. However, more memory and CPU power are required with flow sampling [5]. In this regard, a number of methods such as smart sampling [18] and sample-and-hold [19] have been proposed to decrease the memory requirements. The flow size is defined as the number of bytes in a flow. Although flow sampling methods have merit in preserving the flow distribution, large flows have greater preference in traditional sampling methods. Therefore, these traditional methods destroy flow-based anomaly detection, as malicious flows most often are small in size [20]. Selective sampling [21] attempts to improve anomaly detection, but it is suitable only for small flows and hence, significant information is lost. The performance of a wavelet-based, volume anomaly detection method and two port scan detection algorithms are investigated using four sampling methods [1]. All of the sampling methods have a destructive impact on the detection rate and the false-positive rate of the anomaly detection method. Sampling methods are also compared in [5]. According to the comparison, sampling methods often focus on traffic monitoring and do not preserve the traffic features required for anomaly detection. An adaptive and feature-aware sampling technique designed for anomaly-detection methods is proposed in [5]. The impact of opportunistic sampling methods, selective sampling and smart sampling, on anomaly detection is studied in [20]. The entropy changes during a number of attacks are considered in these opportunistic methods. According to the results, selective sampling is suitable for small-sized anomalies while large-sized anomalies are detectable by smart sampling. The impact of sampling methods on traffic classification using machine learning methods is studied in [22]–[24], in which a solution is proposed to reduce the negative impact. Flow-based anomaly detection using sampled traffic is an open issue and is improved in this study by proposing a probabilistic flow sampling method.
819
B. Architectural Components Two main modules of this study are flow extraction and sampling (Fig. 1). B.1 Flow Extraction Module The flow extraction module uses packet-based traffic to generate flow traffic. This module simulates NetFlow components, NetFlow exporter and NetFlow collector, using Softflowd and Flowd [25]. Softflowd [26] reads a packet-based captured file and generates flow records. Then, it sends these records to the NetFlow collector, Flowd [27]. Flowd-reader reads the following NetFlow fields: Source/destination IP, source/destination port, packets, octets, flags, and protocol. Softflowd and Flowd are used in this study to generate the required flow-based datasets. Then, the sampling module applies the proposed sampling method to these datasets. B.2 Sampling Module The accuracy of flow-based anomaly-detection methods is negatively affected by sampling methods because of changes in the distribution of flows. Smart sampling is a probabilistic method proposed for monitoring. This method gives low probability to small flows, which are often the source of volume attacks [1], [20]. In respect to anomaly detection, selective sampling, which is biased for small flows, is proposed. Both of these methods lose many informative flows, as they are unipurpose. This study proposes a flow sampling method to address the weaknesses of these methods. Our proposed method is a probabilistic sampling method in which stratified sampling is used to calculate the probability. Stratified sampling divides the population into a number of groups. Each group is identified with its corresponding stratum. In our probabilistic sampling method, two main flow features, destination IP address and octet, are used. Our method divides the flow traffic into flow groups with the same destination IPs and then samples are taken from each group. In this method, IPs which receive bytes above a predefined threshold are sampled with greater priority. This sampled traffic is suitable to be used by flow-based analysers for monitoring and anomaly detection purposes.
III. INTRINSIC CHARACTERISTICS OF FLOW TRAFFIC Flow-based datasets are required for the evaluation of the proposed sampling method. In this study, CAIDA traces 2013, CAIDA traces 2012, and CAIDA DDoS datasets are used to generate flow-based datasets for the evaluation of the proposed method [7], [8]. CAIDA datasets are very large, therefore, subsets of these datasets are randomly selected in this study. Table 1 shows the detailed information of the generated flow-based datasets. Each flow has eight features: a) Source IP address, b) source port, c) destination IP address, d) destination port, e) packets, b) octets, f) TCP flags, and g) IP protocol. NetFlow simulators, Softflowd and Flowd, generate the flow-based datasets in this study.
820
JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 18, NO. 5, OCTOBER 2016
Table 1. Generated flow-based CAIDA datasets. Dataset
Total flows
Flow CAIDA DDoS Flow CAIDA traffic traces 2013 Flow CAIDA traffic traces 2012
6,730,479 1,273,303 225,177
threshold. Therefore, p(dstIPi ) × l shows the number of flows that our proposed method samples from flows targeting dstIPi .
Total destination IPs 225 337 304
s− total =
m X
s(o|dstIPi )
(2)
i=1
A. Characteristics of Flow Sizes An important flow feature is “octet”, which is the byte number carried by a flow record and is called flow size. There is a heavy-tailed distribution for bytes in flow-based traffic [22]. A large number of the flows are small but a small number are large. Small flows carry low numbers of bytes while large flows carry lots of bytes (see Fig. 2). Fig. 2 shows the distribution of flows in CAIDA datasets. According to this figure, most DDoS flows and benign flows are small-sized. The cumulative number of bytes shows the numbers of bytes included in the flows so far. Traditional sampling methods bias themselves toward large flows important for traffic monitoring and they do not sample small flows effectively. On the other hand, a number of methods focus on small flows to improve anomaly detection but they lose a lot of large flows [20]. B. Distribution of Destination IP Addresses in Flow Traffic Small flows often carry anomaly information, while large flows carry the majority of the network information [20]. For efficient sampling, both large and small flows should be preserved. Our proposed method uses two flow features: Destination IP address and octet. The destination IP address is an important feature of a flow, because IPs which receive a considerable number of flows and bytes are important for monitoring and for anomaly detection. The proposed method can sample both small and large flows based on the sum of bytes targeting a particular destination IP address. Definition 1: If dstIPi denotes the destination IP address of a flow record i, the sum of all octets (bytes) targeting this particular destination IP address is shown by s(o|dstIPi ), defined in (1) where n is the number of flows with dstIPi , m is the number of destination IP addresses, and dstIPi = (dstIP1 , dstIP2 , · · ·, dstIPm ). s(o|dstIPi ) =
n X
(octetj (dstIPi ))
(1)
j=1
Definition 2: The sum of bytes targeting all destination IP addresses is shown by s_total, defined in (2). This parameter is used in this study for normalization of the probability of dstIPi with s(o|dstIPi ) larger than a particular threshold. In this regard, s(o|dstIPi )/s_total provides normalized probability. The sum of all normalized probabilities is 1. This guarantees that all destination IP addresses with normalized probabilities will be sampled at least once. In the proposed method, it is assumed that l is the number of samples from destination IPs receiving s(o|dstIPi ) above the
Volume anomalies such as DDoSs, worms, and port scans cause large changes in the volume of the flow traffic targeting victim destination IP addresses [28]. These anomalies generate a large number of bytes. An ideal sampling method should preserve the victim IP addresses which are attacker targets. In addition, IP addresses which receive large flows, carrying a great number of bytes, are important for monitoring purposes. Both large flows and volume anomalies cause an increase in the numbers of bytes received by the destination IP addresses. In this study, informative IP addresses are defined as IP addresses receiving more bytes than a particular threshold. The proposed method samples informative IP addresses with greater priority and can preserve the distribution of flows. Fig. 3 shows s(o|dstIPi ) versus destination IP address for different flow-based datasets. IP addresses are anonymized in this figure. According to this figure, a few IP addresses receive the majority of bytes and others receive a small number of bytes. The logarithms of s(o|dstIPi ) in CAIDA DDoS datasets shown in Fig. 3 confirms that a number of IPs receive more DDoS bytes. Therefore, using a suitable threshold helps to preserve as much information as possible. IV. PROPOSED SAMPLING METHOD The proposed sampling method is a probabilistic method in which each flow is sampled with the probability p as shown in (3), where t is threshold.
p(dstIPi ) =
b s(o|dstIPi ) s_total
s(o|dstIPi ) < t s(o|dstIPi ) ≥ t
(3)
IP addresses which receive bytes above the threshold are sampled with normalized probability. All of these informative IPs are sampled at least once. This is very helpful for anomaly detection, as our proposed method can sample all victim destination IP addresses. The number of samples from each informative IP address is directly proportional to the number of bytes that it receives. Therefore, in anomaly situations, the number of samples from the victim IPs is increased. For IP addresses receiving fewer bytes than the threshold, this method uses a constant value b, 0 < b ≤ 1. A small value is desirable for parameter b to decrease the probability of this area. The pseudo code of the proposed sampling method is shown in Fig. 4. This pseudo code is for a flow group in which the destination IP is dstIPi . The sampling rate in NetFlow is a fixed (static) rate which is based on the worst situation [10]. This study also has a static sampling rate which is adjusted by the threshold and b. Based on our knowledge, the use of destination IP addresses has not been considered for the improvement of anomaly detection. Using the threshold, the proposed method filters out unimportant destination IPs. Therefore, the majority of samples are selected
JADIDI et al.: A PROBABILISTIC SAMPLING METHOD FOR EFFICIENT FLOW-BASED...
(a)
821
(b)
(c)
Fig. 2. Distribution of flow-based CAIDA datasets: (a) Distribution of flow sizes (bytes) in CAIDA traces 2013, (b) distribution of flow sizes (bytes) in CAIDA traces 2012, and (c) distribution of flow sizes (bytes) in CAIDA DDoS. 7
12
6
x 10
16
x 10
14
10
12
s(o|dstIP)
s(o|dstIP)
8
6
10 8 6
4
4 2
2 0 350
400
450
500
550
600
650
700
750
0 350
800
400
450
Destination IP addresses
500
550
600
650
700
750
800
Destination IP addresses
(a)
(b)
8
9
x 10
22 20
7
18
Log ( s(o|dstIP) )
8
s(o|dstIP)
6 5 4 3
16 14 12 10
2
8
1
6
0 400
450
500
550
600
650
700
750
800
Destination IP addresses
(c)
4 400
450
500
550
600
650
700
750
800
Destination IP addresses
(d)
Fig. 3. Distribution of destination IP addresses in flow CAIDA datasets based on their received bytes: (a) Distribution of flow CAIDA traces 2013, (b) distribution of flow CAIDA traces 2012, (c) distribution of flow CAIDA DDoS, and (d) distribution of logarithms of bytes in flow CAIDA DDoS.
from the desired area, which is above the threshold. As a result, flows are preserved efficiently. The sampled traffic can be used for flow-based anomaly detection, which is useful for detecting volume anomalies. In the
following sections, the performance of the proposed method is compared with those of other sampling methods with the same sampling rate. This study decreases the amount of information loss compared with other methods.
822
JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 18, NO. 5, OCTOBER 2016
while (there is dstIPi ) do n
Compute
s( o | dstIPi ) = ∑ ( octet j ( dstIPi )) for the dstIPi
Table 2. Comparison of different sampling methods in preserving malicious flows. Flow-based dataset 1 Flow-based dataset 2
j =1
if ( s( o | dstIPi
) ≥
threshold )
Sample flows with a normalized probability else Sample flows with a constant probability b end if end while
Sampling rate Preserved DDoS flows in selective sampling Preserved DDoS flows in smart sampling Preserved DDoS flows in the proposed method
CAIDA_DDoS, CAIDA_traces_2013
CAIDA_DDoS, CAIDA_traces_2012
27,000 10,255
27,000 8,952
6,101
4,672
9,989
7,654
Fig. 4. Pseudo-code of the proposed sampling method.
A. Other Probabilistic Flow Sampling Methods Two probabilistic flow sampling methods, smart sampling and selective sampling, are used in this study for the evaluation of our proposed method. • Smart sampling: It calculates the probability of each flow based on the flow size [20]. The probability of each flow is defined in (4), where x is the flow size in byte and t is a threshold. In this method, flow sizes larger than the threshold are sampled with probability 1. On the other hand, the probabilities of flows smaller than the threshold are proportional to their sizes. This method cannot sample malicious small flows effectively. x/t xt
•
V. EXPERIMENTAL RESULTS AND ANALYSIS The efficiency of the proposed flow sampling method in preserving traffic information is compared with smart and selective sampling in this section. The number of sampled flows in the proposed method is directly proportional to the number of bytes that each IP address receives. In volume anomalies such as DDoS attacks, victim IP addresses receive a large number of malicious flows and hence, this method samples these anomalies with high priority. The same situation occurs for IP addresses which receive large flows with a large number of bytes. The performance of the proposed sampling method in preserving DDoS flows and large flows are evaluated in this section. Table 2 shows the performance of different sampling methods in preserving DDoS flows from the flow-based CAIDA datasets. Two datasets are randomly selected from the large number of flows in Table 1. Each of these datasets has 270,000 flows which are the input of the sampling methods. The ratio of malicious
to benign flows is 1:5 for both datasets. As CAIDA traces 2012 have 225,177 flows, the selected datasets have 270,000 to have the ratio 1:5. A fixed sampling rate, 27,000, is selected in this study. Therefore, sampling algorithms sample 0.1 of flow records based on their probabilities. Selective sampling improves anomaly detection and it always has bias toward small flows, which are the source of a large number of anomalies [20]. According to Table 2, the results from the proposed sampling method and selective sampling were very close. In both datasets, they sample considerably more DDoS flows than smart sampling. This experiment is repeated with ratios 1:2 and 1:10, and the results confirm the similarity of the proposed method and selective sampling in preserving DDoS flows. Therefore, our method is almost as effective as selective sampling for preserving DDoS flows. However, the advantage of the proposed method is in preserving large flows and all informative destination IPs. According to [20], some anomalies generate large flows, for example, a great number of demands for downloading a particular new service released for Linux. In addition, a number of DDoS flows are large in size (see Fig. 2). These large anomalies carry a lot of bytes but there are a small number of them in flow traffic. Destination IPs of these anomalies receive bytes more than the threshold in the proposed sampling method, therefore, they are sampled with normalized probability. As these anomalies are large-sized, they are not sampled in the selective sampling method [20]. Smart sampling has a bias toward large flows and it does not sample malicious flows effectively [5], [20]. In spite of the poor anomaly detection in smart sampling, this method can preserve the majority of bytes because it samples large anomalies. According to Fig. 5, the results from the proposed sampling and smart sampling in terms of preserving bytes were close. The proposed method has merit in sampling both small and large flows. Therefore, it can sample both small and large anomalies, and hence, it preserves more DDoS bytes than selective sampling, which most often samples small anomalies. The proposed sampling method can sample all destination IP addresses which receive DDoS attacks. The high performance of our method is shown in Fig. 5. A challenging issue in sampling methods is the number of un-sampled attacked IP addresses. An ideal sampling should be able to preserve the informative attacked IPs important for flow-based anomaly detection. Table 3 compares the number of sampled victim destination IPs. The highest number of unsampled victim destination IP addresses belongs to smart sampling, which loses many attacks. However, the proposed method
JADIDI et al.: A PROBABILISTIC SAMPLING METHOD FOR EFFICIENT FLOW-BASED...
823
0.35 Processing time (s)
0.3 0.25 0.2 0.15 Smart sampling
0.1
Selective sampling
0.05 Proposed sampling
0 1
Fig. 5. Comparison of preserved bytes.
Dataset1
Dataset 2
Sampled victim IPs
Original traffic Selective sampling Smart sampling Proposed sampling Original traffic Selective sampling Smart sampling Proposed sampling
207 115 77 101 212 127 88 106
3
4 5 6 7 Time windows
8
9
10
Fig. 6. Processing times of sampling methods with ten trials.
Table 3. Comparison of sampled victim IPs. Sampling methods
2
Table 4. Comparison of preserved packets.
Sampled informative victim IPs 104 71 49 104 92 67 54 92
can preserve all informative victim destination IP addresses. VI. SAMPLING COMPLEXITY The processing times of the proposed sampling method are compared with smart and selective sampling methods using 10 different datasets. In this regard, 10 subsets of CAIDA datasets are randomly selected. The sampling rate is 27,000 for all of these datasets. The processing times are shown in Fig. 6. The results illustrate that the proposed method needs less processing time compared with other methods. The proposed sampling method is also evaluated in preserving traffic information with different sampling rates and the results are compared with other sampling methods (see Fig. 7). The percentage of preserved bytes in the proposed method is similar to that in smart sampling for all sampling rates. However, in respect of sampled malicious bytes, the performance of the proposed method is significantly better than that of smart sampling and similar to that of selective sampling. The proposed method is also evaluated in terms of preserving packets and it is compared with other sampling methods and other studies in Table 4. The results show the similarity of our method and smart sampling in preserving packets. VII. CONCLUSION Sampling is widely deployed in high-speed networks to reduce the numbers of flow records, used for monitoring and anomaly detection. Although, several sampling methods have been proposed to improve flow-based monitoring in sampled
Method
Preserved packets (%)
Selective sampling (dataset1) Smart sampling (dataset1) Proposed sampling (dataset1) Selective sampling (dataset2) Smart sampling (dataset2) Proposed sampling (dataset2) Smart sampling [29] Selective sampling [29] Smart sampling [1] Selective sampling [30]
12.9 62.3 59.1 9.3 41.4 31.2 85.4 0.7 68.9 2.1
Preserved bytes (%) 5.9 52.5 44.5 7.8 36.6 27.1 ..... ..... ..... .....
traffic, the impact of sampling on flow-based anomaly detection has not been investigated sufficiently. This study addressed this problem by proposing a sampling method using two flow features: destination IP address and octet. Informative IPs were defined as IP addresses which received more bytes than a predefined threshold. This threshold was used as a filter to omit undesired flows. The proposed method often sampled flows from the area above the threshold and flows with informative destination IP addresses were sampled with higher priority. Three packet-based datasets were used to generate flow-based traffic: CAIDA traces 2013, CAIDA traces 2012, and CAIDA DDoS. The number of preserved DDoS flows in the proposed method was similar to that of selective sampling used for anomaly detection. However, the proposed method improved the number of preserved DDoS bytes. This method also sampled all informative victim IP addresses, therefore, it was comparable with smart sampling used for monitoring purposes. The proposed method which samples both small and large flows can be used for monitoring of high speed networks without decreasing the performance of flow anomaly detection. A fix sampling rate is used in this study. However, the future aim is to develop a method in which the sampling rate is adaptable to the existing memory budget. ACKNOWLEDGMENTS The authors would like to thank CAIDA for the three datasets provided for generating flow-based CAIDA datasets.
824
JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 18, NO. 5, OCTOBER 2016
Fig. 7. Comparison of sampling methods with different sampling rates.
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
J. Mai, C.-N. Chuah, A. Sridharan, T. Ye, and H. Zang, “Is sampled data sufficient for anomaly detection?,” in Proc. ACM SIGCOMM, 2006, pp. 165–176. P. Winter, E. Hermann, and M. Zeilinger, “Inductive intrusion detection in flow-based network data using one-class support vector machines,” in Proc. IFIP NTMS, 2011, pp. 1–5. A. Sperotto and A. Pras, “Flow-based intrusion detection,” in Proc. IFIP/IEEE IM, 2011, pp. 958–963. B. Li, J. Springer, G. Bebis, and M. Hadi Gunes, “A survey of network flow applications,” J. Netw. Comput. Appl., vol. 36, pp. 567–581, 2013. K. Bartos and M. Rehak, “Towards efficient flow sampling technique for anomaly detection,” in Proc. TMA, 2012, pp. 93–106. J. Mai, A. Sridharan, C.-N. Chuah, H. Zang, and T. Ye, “Impact of packet sampling on portscan detection,” J. Sel. Areas Commun., vol. 24, pp. 2285–2298, 2006. The CAIDA UCSD “DDoS Attack 2007” Dataset, [Online]. Available: http://www.caida.org/data/passive/ddos-200708nct04_dataset.xml The CAIDA UCSD Anonymized Internet Traces 2013, [Online]. Available: http://www.caida.org/data/passive/passive_2013_dataset.xml The CAIDA UCSD Anonymized Internet Traces 2012, [Online]. Available: http://www.caida.org/data/passive/passive_2012_dataset.xml A. Sperotto et al., “An overview of IP flow-based intrusion detection,” IEEE Commun. Surveys Tuts., vol. 12, pp. 343–356, 2010. Z. Jadidi, V. Muthukkumarasamy, and E. Sithirasenan, “Metaheuristic algorithms based flow anomaly detector,” in Proc. APCC, 2013, pp. 717–722. Z. Jadidi, V. Muthukkumarasamy, E. Sithirasenan, and M. Sheikhan, “Flow-based anomaly detection using neural network optimized with GSA algorithm,” in Proc. IEEE NFSP, 2013, pp.76–81. M. Sheikhan and Z. Jadidi, “Flow-based anomaly detection in high-speed links using modified GSA-optimized neural network,” Neural Comput. Appl., vol. 24, pp. 599–611, 2014. P. Gogoi, D. Bhattacharyya, B. Borah, and J. K. Kalita, “MLH-IDS: A multi-level hybrid intrusion detection method,” The Computer Journal, vol. 57, pp. 602–623, 2014. N. Hohn and D. Veitch, “Inverting sampled traffic,” IEEE/ACM Trans. Netw., vol. 14, pp. 68–80, 2006. N. Duffield, C. Lund, and M. Thorup, “Estimating flow distributions from sampled flow statistics,” IEEE/ACM Trans. Netw., vol. 13, pp. 933–946, 2005. B.-Y. Choi, J. Park, and Z.-L. Zhang, “Adaptive packet sampling for accurate and scalable flow measurement,” in Proc. IEEE GLOBECOM, 2004, pp. 1448–1452. N. Duffield, C. Lund, and M. Thorup, “Properties and prediction of flow statistics from sampled packet streams,” in Proc. ACM SIGCOMM, 2002, pp. 159–171. C. Estan and G. Varghese, “New directions in traffic measurement and accounting,” in Proc. ACM SIGCOMM, vol. 32, 2002. G. Androulidakis, V. Chatzigiannakis, and S. Papavassiliou, “Network anomaly detection and classification via opportunistic sampling,” IEEE Netw., vol. 23, pp. 6–12, 2009.
[21] G. Androulidakis and S. Papavassiliou, “Improving network anomaly detection via selective flow-based sampling,” IET Commun., vol. 2, pp. 399–409, 2008. [22] V. Carela-Espanol, P. Barlet-Ros, A. Cabellos-Aparicio, and J. SolePareta, “Analysis of the impact of sampling on NetFlow traffic classification,” Computer Netw., vol. 55, pp. 1083–1099, 2011. [23] Z. Jadidi, V. Muthukkumarasamy, E. Sithirasenan, and K. Singh, “Performance of flow-based anomaly detection in sampled traffic,” J. Netw., vol. 10, pp. 512–520, 2016. [24] Z. Jadidi, V. Muthukkumarasamy, E. Sithirasenan, and K. Singh, “Intelligent sampling using an optimized neural network,” J. Netw., vol. 11, pp. 16–27, 2016. [25] Q. A. Tran, F. Jiang, and J. Hu, “A real-time netflow-based intrusion detection system with improved BBNN and high-frequency field programmable gate arrays,” in Proc. IEEE TrustCom, 2012, pp. 201–208. [26] [Online]. Available: http://www.mindrot.org/projects/softflowd/, as of June 2014. [27] [Online]. Available: http://www.mindrot.org/projects/flowd/, as of June 2014. [28] T. Qin, X. Guan, W. Li, P. Wang, and M. Zhu, “A new connection degree calculation and measurement method for large scale network monitoring,” J. Netw. Comput. Appl., vol. 41, pp. 15–26, 2014. [29] I. Paredes-Oliva, P. Barlet-Ros, and J. Sole-Pareta, “Scan detection under sampling: A new perspective,” Computer, vol. 46, pp. 38–44, 2013. [30] G. Androulidakis and S. Papavassiliou, “Intelligent flow-based sampling for effective network anomaly detection,” in Proc. IEEE GLOBECOM, 2007, pp. 1948–1953.
Zahra Jadidi finished her Ph.D. study in Information and Communication Technology at Griffith University in Australia in 2016. She received her Master of Electronic engineering at Islamic Azad University (South-Tehran Branch) in Iran. She has more than 10year work experience in computer networks. She also has teaching experience in IT networking in Australia. Her research interest is artificial intelligence and its application in the management of computer networks, and network security.
JADIDI et al.: A PROBABILISTIC SAMPLING METHOD FOR EFFICIENT FLOW-BASED...
Vallipuram Muthukkumarasamy obtained B.Sc.Eng. with 1st Class Hons from University of Peradeniya, Sri Lanka and obtained Ph.D. from Cambridge University, England. He is currently attached to School of Information and Communications Technology, Griffith University, Australia as Associate Professor. His current research areas include investigation of security issues in wireless networks, sensor networks, trust management in MANETs, key establishment protocols and medical sensor networks. He is currently leading the Network Security research Group at the Institute for Integrated and Intelligent Systems at Griffith University. He has also received number of best teacher awards.
Elankayer Sithirasenan obtained his Ph.D. in the field of network security and the Masters’ degree in Software Engineering from Griffith University, Australia in 2009 and 2004, respectively. He received his B.Sc. degree in Electrical and Electronic Engineering from University of Peradeniya, Sri Lanka in 1991. He is currently teaching in IT networking at TAFE Queensland Gold Coast and is an adjunct researcher at Griffith University. His current research interests include wireless/wired network security, authentication and access control, disaster recovery, intrusion detection and prevention, outlier detection on multilevel, multivariate data sets, and software requirements analysis. He was a lecturer in Information and Communication Technology at Griffith University, Gold Coast, Australia from December 2010 to December 2013. He has worked as a lecturer at the Faculty of Engineering, University of Peradeniya, Sri Lanka form 1999 to 2003 and the technical director at Integrated Digital Systems, Sri Lanka from 1998 to 2008.
825
Kalvinder Singh obtained a B.Sc. with 1st Class Hons from James Cook University, Australia, and his Ph.D. at Griffith University, Australia. He is currently employed at Silver Spring Networks, and his research areas include investigation of security issues in sensor networks, key establishment protocols, and medical sensor networks.