Ann. Telecommun. DOI 10.1007/s12243-016-0495-x
Performance evaluation of Enhanced Very Fast Decision Tree (EVFDT) mechanism for distributed denial-of-service attack detection in health care systems Haider Abbas 1,2 & Rabia Latif 2 & Seemab Latif 2 & Ashraf Masood 2
Received: 23 June 2015 / Accepted: 26 February 2016 # Institut Mines-Télécom and Springer-Verlag France 2016
Abstract Securing cloud-assisted Wireless Body Area Network (WBAN) environment by applying security mechanism that consumes less resources is still a challenging task. This research makes an attempt to address the same. One of the most prominent attacks in cloud-assisted WBAN is Distributed Denial of Service (DDoS) attack that not only disrupts the communication but also diminishes the network bandwidth and capacity. This work is an extension of our previous research work in which an Enhanced Very Fast Decision Tree (EVFDT) was proposed which could detect DDoS attack successfully. However, in our previous work, the proposed algorithm is evaluated on the dataset generated by implementing LEACH protocol in NS-2. In this paper, a real-time cloudassisted WBAN test bed is deployed to investigate the efficiency and accuracy of proposed EVFDT algorithm for real-time sensor network traffic. To evaluate the performance of proposed algorithm on real-time WBAN, four metrics are used including classification accuracy, time, memory, and computational cost. It was observed that EVFDT outperforms the existing algorithms by maintaining better results for these
* Haider Abbas
[email protected];
[email protected] Rabia Latif
[email protected] Seemab Latif
[email protected] Ashraf Masood
[email protected]
1
King Saud University, Riyadh, Saudi Arabia
2
National University of Sciences and Technology, Islamabad, Pakistan
metrics even in the presence of extreme noise. Experimental results show that the EVFDT algorithm attains significantly high detection accuracy with less false alarm rate. Keywords Cloud-assisted wireless body area network . Distributed Denial of Service attack . Very Fast Decision Tree . Data mining . Machine learning
1 Introduction Nowadays, the integration of Wireless Body Area Network (WBAN) and cloud computing platform has gaining increasing popularity because of the rapid development of these technologies. This new platform provides the basis for real-time eHealth monitoring and management services at competitive cost. Despite its tremendous benefits, secure data communication between WBAN and cloud platform is still an open issue. Apart from other critical issues in WBAN environment such as energy efficiency, Quality of Service (QoS), and standardization, security and privacy of patients’ critical data are of utmost importance and need special attention. Among these, one of the most nagging security issues is the availability of patients’ critical data at all time. The Distributed Denial of Service (DDoS) attack is the most powerful attack on the availability of patient health data and services of health care professional. DDoS attack severely affects the capacity and performance of a WBAN network. Therefore, timely detection of DDoS attack is an essential goal to minimize the damage caused by DDoS attack [1, 2]. For detecting a DDoS attack in cloud-assisted WBAN, there is a need for a defensive approach that understands the network semantics and flow of traffic in the network. When a victim node is flooded with huge amount of packets that exceeds its processing capacity, the excess must be dropped. The
Ann. Telecommun.
packet-based dropping approach helps to differentiate the legitimate traffic from the attack traffic in order to prevent the impact of attack traffic on legitimate service requesters [3]. Analyzing the flow of network traffic shows that the patterns of network traffic have irregular structure, and therefore, statistical pattern identification approaches are required. Existing attack detection and defense approaches are not directly applicable on real-time WBAN network due to its resource constrained nature. Therefore, there is a need for an approach that is light weight and efficient in handling real-time WBAN streaming data. For this paper, data mining techniques have been studied and explored. Among these, Very Fast Decision Tree (VFDT) is proved to be most efficient in terms of handling stream data and hence considered as more appropriate for low-power WBAN sensors. Following are the underlying reasons for the selection of VFDT: (a) light-weight data mining algorithm; (b) can progressively build decision tree from scratch; (3) a test and train process is performed upon the arrival of every new data segment, thus keeping the stored data up to date; (4) does not require reading full, thus consuming less memory space; (5) appropriate for huge amount of non-stationary and streaming data obtained from real-time WBAN sensors; and (6) provides a transparent learning process. These features make VFDT a suitable candidate for implementing an autonomous decision maker for DDoS attack detection in cloudassisted WBAN [4]. In this paper, we device an improvement of VFDT [5] namely enhanced VFDT (EVFDT) that differs from the existing algorithms by providing sustainable classification accuracy and regulating the tree size growth to a reasonable extent while consuming less memory and time resources. Our proposal is to build a decision-tree-based classification algorithm capable of handling noisy data and detects a DDoS attack efficiently with high accuracy and low falsepositive rate while allowing legitimate requesters to access the resources. The proposed algorithm is deployed at victim node on real-time WBAN test bed. The contributions of this paper include the following: 1. Proposed a novel EVFDT classification algorithm that is capable to classify network traffic and detects DDoS attack with high accuracy and less false alarms. EVFDT has achieved average classification accuracy of about 97.4 % with average 2.1 % false-positive rate. 2. Analyzing the effect of noise on stream data and its impact on the accuracy and false alarm rate while detecting a malicious behavior in the network. 3. Highlight the shortcomings of the existing DDoS detection techniques. 4. For experiment, a real-time WBAN test bed is deployed to monitor the health of patients. Both legitimate and attack traffics are generated for the evaluation of proposed algorithm.
5. Evaluation of the proposed algorithm and result comparison. The remainder of this paper is structured as follows: Section 2 summarizes the existing data mining and stream mining techniques for detecting DDoS attack along with their limitation when applied in resource-scarce WBAN environment. Section 3 elucidates the proposed system model and proposed EVFDT algorithm that consists of the accuracy enhancement and tree pruning mechanisms. In Sect. 4, a real-time demonstration is given in which the proposed EVFDT algorithm is applied on real-time WBAN test bed for DDoS attack detection. At the end of Sect. 4, a detailed performance analysis and result comparison is given. Finally, Sect. 5 concludes the paper.
2 Related work 2.1 DDoS attack A Distributed Denial of Service (DDoS) attack is a large-scale, coordinated attack on the availability of network resources and services of a victim node. Multiple nodes are deployed to initiate an attack by sending a huge amount of packets toward the victim node in order to consume the key resources of the victim node and make them unavailable to legitimate nodes. These resources mainly include the network bandwidth, computing power, and memory resources [6]. DDoS attack is broadly classified into bandwidth depletion and resource depletion attack. A complete taxonomy of DDoS attack is shown in Fig. 1. In bandwidth depletion attack, the aim of an attacker is to flood the victim node with excessive amount of unwanted traffic to prevent the legitimate traffic to reach the victim node. It is subdivided into flood attack and amplification attack. In resource depletion attack, the aim of an attacker is to tie up the critical resources of the victim node, thus making it unable to process the legitimate requests for services [7]. A thorough analysis of DDoS attack in real-time cloudassisted WBAN environment and its implications shows that DDoS attack has following characteristics: 1. During an attack, the packet length, sequence number, and window size remain fixed. 2. Source node ID and destination node ID are spoofed and generated randomly. 3. Packet throughput decreases for legitimate users, which is defined as the number of bytes transferred from source to destination per unit time. 4. Packet loss increases for legitimate users, which occurs due to the interaction of legitimate traffic with attack traffic. 5. Packet delays increase as network congestion builds up. 6. Traffic jitter may also increase significantly to impact especially real-time traffic.
Ann. Telecommun. DDoS Attack Bandwidth Depletion Attack Flood Attack
Amplification Attack
Resource Depletion Attack Protocol Exploitation Attack
Malformed Packet Attack
Fig. 1 Taxonomy of DDoS attack
2.2 Data mining techniques Nowadays, data mining techniques have been gaining increasing popularity in the field of network computing and therefore considered as one of the most promising solutions for identifying the malicious activities of nodes in the network. For this paper, data mining techniques have been studied and analyzed for the detection of DDoS attack in real-time cloud-assisted WBAN environment. Existing data mining techniques (Subbulakshmi [8], Wu et al. [9], Lee et al. [10], Arun et al. [3], Thwe et al. [11]) for DDoS attack detection can be either source based or destination based. Source-based detection mechanisms are deployed near the source of an attack whereas destination-based detection techniques are deployed near the victim of an attack. In addition, none of the existing scheme is tested in real-time environment. The details and comparison of these techniques along with their advantages and disadvantages can be found in [12]. Studies show that data mining techniques are not suitable for mining high-speed real-time streaming data and require a huge amount of memory to store the data. To overcome the limitations of data mining techniques, stream mining techniques have been proposed to overcome the limitations of data mining techniques. 2.3 Stream mining techniques: VFDT Analyzing the feasibility of VFDT in cloud-assisted WBAN [3] proves that VFDT is a lightweight data mining technique that is able to process a huge amount of high-speed streaming data using less memory resources. In [4], the author deploys VFDT for the detection of DDoS attack and the objectivebased comparison is given. The result shows that the VFDT proves to be an efficient tool for detecting the malicious activity in the network. Therefore, VFDT is selected and optimized in order to make it an efficient tool for detecting DDoS attack in real-time cloud-assisted WBAN environment. 2.3.1 Variants of VFDT VFDT is a stream-based data classification method that learns using a complete set of N training samples expressed as (X, y), where X is a vector of n attributes given as {X1, X2…Xn} and y is a discrete class label. The goal is to construct a model of a
mapping function y = f (X) that will predict the classes of subsequent samples x with high accuracy. To design a VFDT for DDoS attack detection, the mathematical preliminaries used for the classification are first discussed ([5, 13]). Hoeffding bound This gives a certain level of confidence about the best attribute to split the node. For N independent observations of a real-valued random variable r whose bounded range is R, the Hoeffding bound states that with confidence level 1-δ, the true mean of variable r is at least r − ε, where ε can be calculated using Eq. 1. sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi R2 lnð1=δÞ ∈¼ ð1Þ 2N
Information gain VFDT uses the information gain as a heuristic evaluation function to find the upper and lower bounds with high confidence. The upper bound G(.)+ and lower bound G(.)− are calculated using Eqs. 2 and 3. rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X lnð1=δÞ þ H ðSel ðT ; A; vÞÞþ PðT ; A; vÞ þ GðA; T Þ ¼ 2N veA ð2Þ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X lnð1=δÞ PðT ; A; vÞ þ H ðSel ðT ; A; vÞÞ− GðA; T Þ− ¼ 2N veA ð3Þ
where A is an attribute in the T set of training samples. P(T, A, v) is a fragment of the training samples in set T that holds the value v for attribute A. Sel(T, A, v) selects all the training samples having value v for attribute A from set T. Although VFDT classifies stream data efficiently, it has limitations like it cannot handle noisy data and classification accuracy decreases with the increase in noise. In recent past, few variations of VFDT have been proposed. In this section, variants of VFDT are discussed and briefly analyzed for their feasibility along with their limitations when employed for DDoS attack detection in cloud-assisted WBAN. The variations of VFDT are analyzed on the following parameters: attack detection accuracy, cost, time, and memory. Domingos [5] proposed VFDT based on Hoeffding bound (HB) using Eq. 1 to control over error in the attribute splitting distribution selection. Information gain G(.) given in Eqs. 2 and 3 is used as a heuristic evaluation function (HEF) in order to decide the split attribute to convert the decision nodes to leaves. ΔG = G(Xa) − G(Xb) defines the difference between the two best attributes Xa (highest G(.)) and Xb (second highest G(.)). If ΔG > e, then Xa was considered as highest value attribute in G(.). At this stage, the splitting occurs on attribute Xa and the decision node is converted into leaf node. The major
Ann. Telecommun. Fig. 2 Proposed system architecture
drawback of this technique lies in the fact that there exist certain cases when the two information gains have very small differences and are equally good to become a leaf node. At this stage, a tie condition occurs and the process gets stuck. Resolving the tie breaking is a computation intensive task that increases the processing time and decreases the overall accuracy of decision tree. To overcome this limitation of VFDT, the authors of the paper proposed a fixed tie-breaking threshold τ. Whenever the difference between two information gains is very small, τ acts as a quick decisive parameter to solve the tie condition. The node splitting occurs on the current best attribute regardless of how good the second best attribute might be. The value of τ is chosen randomly and remains fixed throughout the process. An excessive tie-breaking condition reduces the performance of VFDT-τ significantly on noisy and complex streaming data, even with the use of parameter τ. VFDT-τ does not support pruning, as the tree size itself is very small. While improving the accuracy, the tree size explodes. Therefore, a suitable pruning mechanism is required at this stage. To overcome the limitation of fixed tie-breaking threshold, Yang [14] proposed an algorithm based on adaptive tiebreaking threshold computed directly from the HB mean. The value of HB mean fluctuates intensively with the increase in the noise percentage causes, thus reducing the accuracy of attack classification. Concept-adaptive VFDT (CVFDT) [13] maintains two trees simultaneously in memory. The tree with the shortest depth is retained, and the other one is discarded. The main drawback of this technique is that it consumes more memory and time to maintain two trees. Also, CVFDT does not handle noisy data efficiently.
Table 1 List of statistical features
2.4 Effect of noise in streaming data Noisy data is considered as a meaningless or extraneous data that makes the identification of data patterns more difficult. Noise is an obvious aspect of data stream coming from sensor nodes and interpreted as outliers. As the noise increases in sensor data stream, the number of outliers also increases. In sensor networks, noise arises due to the changes in system behavior and malicious activity in the network. There are two major sources of noise [15]: 2.4.1 Error An error is defined as a noisy value coming from an erroneous sensor. Outliers caused by such errors have a very high probability of occurrence. 2.4.2 Event An event refers to a particular phenomenon which, in this case, is an attack occurrence event that changes the state of a system. Outliers caused by events occur with small probability, but they are lasting and modify the historical patterns of sensor data. In both cases, the presence of outliers due to noisy data decreases the measurement accuracy and triggers the false alarm rate that might lead to unacceptable consequences in critical health-related applications. At the same time, the tree size grows tremendously which results in added memory consumption. The goal is to remove these outliers from the sensor data in order to ensure the high detection accuracy with fewer false alarms while keeping the resource consumption of the network to a minimal.
Features
Description
2.
Packet loss percentage (LP) Delay or latency
The number of packets or bytes lost due to the interaction of the legitimate traffic with the attack traffic The time taken by the packets to reach from source to destination
3.
Jitter (delay variation)
4.
Throughput
The variation in delay or packet delay variation. It is the variation in the time between packets arriving within a particular window. The number of bytes transferred per unit time from source to destination
1.
Ann. Telecommun.
Fig. 3 Proposed EVFDT flowchart
3 Proposed system architecture
3.1 Pre-processing phase
The proposed Distributed Denial of Service (DDoS) attack detection system studies the real-time sensor network traffic behavior and classifies it as a normal or malicious traffic based on the observed traffic patterns. The proposed system architectures are shown in Fig. 2. It is composed of the following two phases:
The input to the pre-processing phase is the sensor data stream captured from real-time WBAN sensors, and the output of this phase is the labeled dataset for training purposes. Each instance of an incoming data stream is specified as a collection of features and is presented in feature vector space. The pre-processing phase is further divided into packet feature extraction phase followed by the labeling phase. In feature extraction phase, the real-time packets are captured from the network traffic in order to construct the new
& &
Pre-processing phase Attack classification phase
Ann. Telecommun. Fig. 4 Procedure for accuracy enhancement
statistical features. These statistical features are listed in Table 1 and are used for DDoS attack detection and analysis. The identified features are important in defining the Quality of Service (QoS) of the network and to classify the network traffic pattern under DDoS attack. In labeling phase, classes are assigned to these statistical features. The entire dataset is divided into two classes labeled as B1^ for attack and B0^ for non-attack packet. After labeling, the resulted dataset consisting of both attack and non-attack data is used for training the classification tree. 3.2 Attack classification: proposed classification algorithm A new classification model namely Enhanced Very Fast Decision Tree (EVFDT) is proposed. It is an optimization of original VFDT-τ to make it efficient in classifying the DDoS attack in real-time cloud-assisted WBAN environment. The obvious aspect of real-time environment is that the percentage of noise ratio is very high. Due to this noise, detection accuracy and tree size of classification tree increase tremendously. To overcome the effect of this noise on classification results, EVFDT incorporates two additional approaches namely accuracy enhancement and tree pruning to improve the detection accuracy and control the tree size. The proposed EVFDT
flowchart is given in Fig. 3. The tree initialization procedure is same as original VFDT-τ discussed in Sect. 2. The tree growing procedure is updated in terms of accuracy and tree size. 3.2.1 Accuracy enhancement A stated in Sect. 2.3.1, HB fluctuation intensifies with the increase of noise, thus causes detrimental effect on the accuracy and tree size of VFDT-τ. To overcome this issue, the proposed EVFDT attempts to modify the attribute splitting procedure by using an adaptive tie-breaking threshold τ that restricts the decision node to become a splitting attribute. In existing VFDT-τ and its variants, the value of τ is pre-configured by user and remains fixed throughout the tree building process. It is not possible to find the best value of t until all the possibilities are tried by brute force. Testing large number of different values for τ is not favorable in real-time environment. Instead, EVFDT assigns an adaptive tie-breaking threshold for splitting, equal to the mean of difference between HB values, which provides the basis for node splitting throughout the tree building process. By using this method, EVFDT has a dynamic tie-breaking threshold τ whose value is no longer fixed and pre-defined but instead depends upon the arrival of new instances, their HB value, and the mean of difference between HB values.
Ann. Telecommun. Fig. 5 Arduino XBee shield over e-Health sensor shield
EVFDT incorporates the procedure of accuracy enhancement with dynamic τ as presented in Fig. 4 and explained as follows: Let XS be the sorted list of HB values seen at leaf l, HBCount be the total number of HB values in XS seen at leaf l, and n be the new HB value seen at leaf l. Starting with HBCount ≥3, the mean of difference between HB values in XS is calculated. This mean is stored as threshold Tr as given in the following equation: HBcount
Tr ¼
X
X S jþ1 −X S j
j¼1
HBCount
where XS is a sorted list of HB values and HBCount is the total number of values in XS. A threshold Tr is updated for each new value in sorted list. Find the position of new HB value in XS, let XSn be the position of HB in XS, and the PrunedMean is calculated with new value of HB as given in the following equation:. PrunedMean ¼
ðX S n −X S n−1 Þ þ ðX S nþ1 −X S n Þ 2
If this PrunedMean is less than the updated threshold Tr, then HB is added to XS and returns the PrunedMean; otherwise, HB is discarded as an outlier and return Tr as PrunedMean. This whole process continues during the tree building process. Whenever a new HB is added in XS, threshold Tr is updated. 3.2.2 Tree pruning Pruning plays a very important role in decision tree learning process which helps to minimize the tree size by cutting off the tree nodes that participate less to classify the instances. Pruning is done to lessen the overall complexity of decision tree which arises due to the presence of noisy or erroneous
data. Analysis shows that the improvement in classification accuracy causes enormous increase in tree size which in turn takes more memory space and computational time. To overcome the problem of tree size explosion, we propose a pruning mechanism presented by red dotted box in Fig. 3 and explained as follows: Let HT be the Hoeffding tree to be pruned, S be the stream of examples belonging to S0, i.e., S0 ϵ S, and DataSeenAtLeaf n0 is the number of samples seen at leaf node. For every example, S in S0 is passed through the tree starting from the root node. If S0 is filtered to the leaf node n0, then increment DataSeenAtLeaf n0, otherwise, continues growing the HT. If DataSeenAtLeaf n0 is less than τ, where τ is the threshold for checking the eligibility of a node to be part of HT, then prune the tree by deleting the leaf node n0 and updating the HT. Otherwise, do not prune the HT. The eligibility of the node to be part of HT is checked by comparing the number of samples seen at the leaf node with τ. The comparison will tell us that this leaf node has less contribution toward classification as less number of instances is filtered to this leaf node on the current HT.
4 Experiments and results 4.1 Experimental test bed To evaluate the performance of proposed EVFDT algorithm for DDoS attack detection, the real-time WBAN test bed is deployed using e-Health sensor platform. The proposed EVFDT algorithm has been implemented using Very Fast Machine Learning VFML libraries [16]. The experiments were run on Ubuntu 14.04 64-bit workstation with 2.8-GHz processor and 8-GB RAM with all background processes switched off. Wireshark 1.10.3 is installed to capture the real-time packets coming from WBAN network via Zigbee module.
Ann. Telecommun.
Fig. 6 E-Health sensor shield with ten sensors
4.1.1 Test bed setup The test bed uses e-Health sensor shield v2.0 by Libelium communications distribution to demonstrate the real-time WBAN scenario [17]. It monitors patients’ health by deploying different medical sensors on patients’ body to get sensitive data of patients for subsequent analysis by physicians. Figure 5 shows the BArduino XBee shield^ over eHealth sensor shield. It is a 802.15.4 Arduino shield embedded with Digi XBee 802.15.4 OEM RF module having up to 100-m distance transmission. It is specifically developed for low-power wireless communication applications such as WBAN and sensor networks. Arduino XBee shield fits in the XBee socket of e-Health sensor shield in order to sense and control the data from physical world and transfers it to base station wirelessly. From base station, the data is shifted to cloud for permanent storage. Ten different medical sensors are deployed at different locations on patients’ body as shown in Fig. 6. These sensors include the following: air flow (breathing), oxygen in blood (SPO2), body temperature, electrocardiogram (ECG), glucometer, galvanic skin response (GSR-sweating), blood pressure (sphygmomanometer), patient position (accelerometer), and muscle/electromyography (EMG) sensor. Table 2
Normal Attack
Confusion matrix Normal
Attack
True negative False negative
False positive True positive
The gathered patient information can be wirelessly sent to base station via 802.15.4/Zigbee shield. The gathered data can be further send to cloud for permanent storage. 4.1.2 Topology design In the demonstration of WBAN test bed, star topology is deployed in which each sensor is connected directly with the aggregate node (e-Health sensor shield) which in turn connected to base station (PC/laptop) wirelessly using Zigbee as shown in Fig. 6. Sensor data is further transfer to the cloud server for permanent storage. 4.1.3 Traffic generation The traffic is generated by running the Arduino application and Wireshark application simultaneously. Arduino application is used to gather data from sensors while Wireshark is used to capture packets coming from WBAN sensor network which are then used for the calculating statistical features given in Table 1. Sensor nodes have been identified through sensor node ID. Legitimate traffic is generated with fixed delay having fixed packet size and packet rate. Whereas, the attack traffic is generated by attaching the DDoS attack code with four sensor nodes using Arduino application. The packet rate of attack traffic is 150 pkts per second while the packet size remains fixed. The complete experiment is run for 1 h in which 50,000 data instances have been collected. These data instances are divided into five datasets containing different numbers of instances for evaluation purposes.
Ann. Telecommun. Table 3 No. of instances
Experimental results of classification algorithms for real-time dataset VFDT-τ
CVFDT
Accuracy (%)
OVFDT
EVFDT
False-positive rate Accuracy (%) (%)
False-positive rate Accuracy (%) (%)
False-positive rate Accuracy (%) (%)
False-positive rate (%)
10,000
88.1
9.7
90.0
8.4
91.3
6.1
95.7
3.2
20,000 30,000
88.9 89.6
8.5 7.6
91.2 92.1
7.6 6.8
92.5 93.7
5.3 4.1
96.8 97.6
2.9 2.1
40,000
91.0
6.8
93.0
5.3
94.6
3.6
98.1
1.4
50,000 Avg
92.2 89.8
5.8 7.6
93.8 92.0
4.1 6.4
96.4 93.6
2.9 4.4
98.8 97.4
1.1 2.1
The resulting dataset contains both attack and non-attack data which is fed into the pre-processing phase. After preprocessing, the dataset is divided into 20 % testing data and 80 % training data to learn the classifier in attack classification phase. 4.2 Performance metrics To evaluate the performance of proposed algorithm on realtime cloud-assisted WBAN test bed, following evaluation parameters are employed: 4.2.1 Attack detection accuracy For measuring detection accuracy, confusion matrix is used, which is defined as a ratio of number of correct predictions to the total number of tested examples. Table 2 presents the confusion matrix.Based on the confusion matrix, attack detection accuracy (Eq. 4) and false-positive rate (Eq. 5) are calculated as follows: True Positives þ True Negatives Detection Accuracy ¼ ð4Þ Total number of tested examples FP False Positive Rate ¼ FP þ TN
ð5Þ
ROC Curve
100 80 60 40
EVFDT VFDT OVFDT CVFDT
20 0
0
20
40 60 80 100-Specificity
True positive (TP) = samples that are correctly classified as attack class. True negative (TN) = samples that are correctly classified as non-attack class. False positive (FP) = samples that are incorrectly classified as attack class. False negative (FN) = samples that are incorrectly classified as non-attack class. The proposed classification algorithm EVFDT is compared with existing stream mining classification algorithms [5, 13, 14] in terms of detection accuracy (Eq. 4) and false-positive rate (Eq. 5) on the datasets generated by deploying real-time WBAN test bed. From Table 3, it can be seen that the percentage of detection accuracy is 98.3 % with 1.6 % false-positive rate for 50,000 data instances. For the same dataset, it is evident that the EVFDT achieves maximum gain of 6.6 % and a minimum gain of 2.4 % in accuracy of attack detection. Similarly, the sensitivity and specificity of proposed and existing algorithms were calculated using Eq. 6 (sensitivity) and Eq. 7 (specificity): TP TP þ FN TN Specificity ¼ TN þ FP
Sensitivity ¼
ð6Þ ð7Þ
Based on the sensitivity and specificity, receiver operating characteristic (ROC) curve is plotted which is used to assess the effectiveness of a given detection technique. Figure 7 shows the ROC curve of the EVFDT and existing algorithms. From the figure, it is evident that the proposed EVFDT has high sensitivity (true positive rate) with fever false-positive rate (100-specificity) as compared to other existing algorithms. Table 4 Cost matrix
100
Fig. 7 ROC curves showing the tradeoff between sensitivity and falsepositive rate (100-specificity) of DDoS attacks
Normal Attack
Normal
Attack
0 1
λ 0
Ann. Telecommun.
Cost per… 0.6 0.5 0.4 0.3 0.2 0.1 0
0.493
0.482 0.306 0.22
VFDT
CVFDT OVFDT Algorithms
EVFDT
Fig. 8 Cost per sample for existing algorithms and proposed EVFDT Fig. 10 Number of instances vs memory resources
4.2.2 Cost per sample
4.2.3 Resource usage
In addition to accuracy, it is also important to consider cost when evaluating the performance of classification algorithm on real-time WBAN test bed. A cost matrix is a mean for influencing decision making of a classification model. It provides the basis to the classification model to minimize the costly misclassifications and maximize useful accurate classifications. Table 4 shows the cost matrix.As the objective of the proposed DDoS attack classification algorithm is to avert the rejection of legitimate user’s access, therefore, the cost of false alarm was assumed higher. In this paper, the cost of false positives has been assumed five times more than the cost of false negatives. The cost function was employed to simplify the performance comparison of the proposed EVFDT algorithm with existing stream mining algorithms. The formula for calculating cost function is given in Eq. 8. It is based on the number of samples that are classified incorrectly. Less cost means better performance of the detection system. Cost ¼ ð1‐Detection AccuracyÞ þ λ ðFalse ‐Positive RateÞ
ð8Þ
where the parameter λ is the difference between false alarm and miss. For this paper, λ is set as 5. The costs of proposed algorithm and existing classification algorithms are calculated and compared using Eq. 8, and the results are given in Fig. 8. Results show that the cost per sample of proposed EVFDT is less as compared to VFDT and its variants.
Resource usage includes the overall usage of CPU time and memory for processing the algorithm. Computational time is a time taken in seconds for processing a full set of data steam. Time usage of the proposed algorithm is proportional to O(lpdvc) where lp is the length of a pruned tree, d is the total number of attributes, v is the number of values per attribute, and c is the number of classes. VFDT-τ has a small running time due to small and fix value τ. EVFDT takes slightly more time than VFDT-τ because of pruning and threshold computation as shown in Fig. 9. Similarly, the total amount of memory required to run EVFDT is the sum of memory allocated for learning and memory allocated for training. The total memory required for running EVFDT is shown in Fig. 10 and is proportional to O(ndvc), where n is the number of decision nodes in a tree, d is the total number of attributes, v is the number of values per attribute, and c is the number of classes. The detailed comparison of EVFDT and existing techniques [18] shows that only OVFDT and EVFDT can handle noisy data. At the same time, OVFDT handles noisy data to some extent and becomes inaccurate with the increase in noise percentage due to the presence of outliers. VFDT-τ and CVFDT do not provide tree pruning at all. They maintain small tree size from the beginning but with increased error percentage in classification. Only proposed EVFDT algorithm efficiently handles noisy data and, at the same time, maintains small tree size with less resource usage.
5 Conclusion
Fig. 9 Number of instances vs computation time
Nowadays, zombie-based DDoS attack occurs with legitimate flow of traffic. Therefore, it is very difficult to detect such attack even with the presence of stored attack traffic signatures. The challenge is to distinguish the legitimate traffic and DDoS attack traffic. Existing schemes for the data classification fail when deployed for real-time WBAN streaming data because they demand a huge amount of memory space for
Ann. Telecommun.
data storage. On the other hand, stream mining techniques efficiently handle real-time WBAN streaming and thus proved to be an appropriate candidate for resource-scarce WBAN network. An algorithm based on VFDT is proposed in this paper. Our main contributions include a novel Enhanced Very Fast Decision Tree (EVFDT) classification algorithm, and it differs from existing algorithms in terms of attack classification accuracy and tree size. The performance of EVFDT algorithms is evaluated by deploying a real-time WBAN test bed. Both legitimate and attack traffics are generated from the test bed in order to train the classification algorithm which is further used for DDoS attack detection. Experimental result shows that the proposed algorithm is able to detect an attack with high classification accuracy (98.8 %) and low falsepositive rate (1.1 %) with less time and memory overhead. The proposed model is deployed at victim node and is deployed in real-time network environment. Acknowledgments The authors would like to extend their sincere appreciation to the Deanship of Scientific Research at King Saud University for its funding of this research through the Research Group Project no. RG-1435-048. The authors would also like to thank the National University of Sciences and Technology, Islamabad, Pakistan, for its support during the research.
References 1.
2.
3.
4.
Latif R, Abbas H, Assar S (2014) Distributed Denial of Service (DDoS) attack in cloud-assisted wireless body area networks: a systematic literature review. J Med Syst 38(128):1–10 Silva BM, Rodrigues JJ, de la Torre Díez I, López-Coronado M, Saleem K (2015) Mobile-health: a review of current state in 2015. J Biomed Inform 56:265–272, ISSN 1532–0464 Arun RK, Selvakumar S (2013) Detection of distributed denial of service attacks using an ensemble of adaptive and hybrid neurofuzzy systems. J Comput Commun 36(3):303–319 Latif R, Abbas H, Assar S, Latif S (2014) Analyzing feasibility for deploying very fast decision tree for DDoS attack detection in cloud-assisted WBAN. In: Proceedings of 10th International Conference on Intelligent Computing., pp 507–519
5. 6. 7.
8.
9. 10. 11. 12.
13.
14. 15. 16. 17.
18.
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining., pp 71–80 Deepak A, Puneet S, Vineet S (2014) Impact analysis of Denial of Service (DoS) due to packet flooding. Int J Eng Res Appl 4(6):144– 149 Zargar ST, Joshi J, Tipper D (2013) ‘A survey of defense mechanisms against Distributed Denial of Service (DDoS) flooding attacks’, communications surveys & tutorials. IEEE Commun Soc 15(4):2046–2069 Subbulakshmi T (2011) Detection of DDoS attacks using Enhanced Support Vector Machines with real time generated dataset. In: Proceedings of 3 rd International Conference on Advanced Computing (ICoAC)., pp 17–22 Wu YC, Tseng HR, Yang W, Jan RH (2011) DDoS detection and traceback with decision tree and grey relational analysis. Int J Ad Hoc Ubiquit Comput 7(7):121–136 Lee SM, Kim DS, Park JS (2012) Detection of DDoS attacks using optimized traffic matrix. Int J Comput Math Appl 63:501–510 Thwe T, Thandar P (2014) Statistical anomaly detection of DDoS attacks using K-nearest neighbour. Int J Comput Commun Eng Res (IJCCER) 2(1) Latif R, Abbas H, Latif S, Masood A (2015) EVFDT: an Enhanced Very Fast Decision Tree algorithm for detecting distributed denial of service attack in cloud-assisted wireless body area network. Mobile Information Systems, Hindawi Publishing Corporation, Article ID 260594. doi:10.1155/2015/260594 Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD 2001. ACM, New York, pp 97–106 Yang H, Fong S (2011) Moderated VFDT in stream mining using adaptive tie threshold and incremental pruning. In: Proceedings of the 13th International Conference, DaWaK., pp 471–483 Fawzy A, Hoda MO, Hegazy O (2013) Outliers detection and classification in wireless sensor networks. Egypt Inform J 14:157–164 VFML (Very Fast Machine Learning) toolkit. Available online: http://www.cs.washington.edu/dm/vfml/ (accessed on 30 January 2014) E-Health Sensor Platform V2.0 for Arduino and Raspberry Pi (Biometric / Medical Applications). https://www.cooking-hacks. com/documentation/tutorials/ehealth-biometric-sensor-platformarduino-raspberry-pi-medical (accessed on 04 March 2014) Yang H, Fong S, Sun G, Wong R (2012) AVery Fast Decision Tree algorithm for real-time data mining of imperfect data streams in a distributed wireless sensor network. Int J Distrib Sens Netw. doi:10. 1155/2012/863545