A Perceptron based Classifier for Detecting Malicious Route Floods in Wireless Mesh Networks Lakshmi Santhanam*, Anindo Mukherjee*, Raj Bhatnagar Ψ , and Dharma P. Agrawal* ∗
Ψ
Department of ECECS, CDMC Lab, University of Cincinnati, Cincinnati, OH, USA Department of ECECS, Machine Intelligence Lab, University of Cincinnati, Cincinnati, OH, USA Email: {santhal, mukherao, dpa}@ececs.uc.edu,
[email protected]
Abstract Wireless Mesh Networks (WMN) are evolving as a new paradigm for broadband Internet, in which a group of static mesh routers employ multihop forwarding to provide wireless Internet connectivity. All routing protocols in WMNs naively assume nodes to be nonmalicious. But, the plug-in-and–play architecture of WMNs paves way for malicious users who could exploit some loopholes of the underlying routing protocol. A malicious node can inundate the network by conducting frequent route discovery which severely reduces the network throughput. In this paper, we investigate the detection of route floods by incorporating a machine learning technique. We use a perceptron training model as a tool for intrusion detection. We train the perceptron model by feeding various network statistics and then use it as a classifier. We illustrate using an experimental wireless network (ns-2) that the proposed scheme can accurately detect route misbehaviors with a very low false positive rate.
open wireless communication channel and the multihop nature of communication can pave way to malicious intruders. A malicious intruder can exploit the hidden loopholes in the routing protocol [2], to conduct various kinds of attack such as route disruption attacks (e.g. blackhole attack, selfish node attack, malicious route flood, route table poisoning, route looping, modification of route requests) and packet forwarding attacks (e.g. selfish node attack, Denial of Service (DoS) attack). Bhargava et al. [3] demonstrated that a false distance vector and a false sequence number attack can drastically reduce the network throughput by 75%. Building a secure routing protocol (SEAD [4], Ariadne [5]) is not a complete solution to thwart attacks on the routing protocol, as MRs are often deployed in public locations (like rooftops, streetlights, poles etc.) that are easily accessible to potential adversaries.
Keywords - Anomaly Detection, AODV, Intrusion Detection System, Normalization, Perceptron , and Wireless Mesh Networks.
1. Introduction Wireless Mesh Networks (WMNs) [1] are becoming increasingly popular as they provide cheap and ubiquitous broadband Internet access. WMNs consist of a group of static mesh routers (MRs) that are connected by wireless links. It obviates the need for extensive wired backhaul network by connecting only a small subset of MRs (known as Internet Gateways (IGWs)) to the wired backbone. Other MRs forward their traffic in a multihop fashion towards the IGW. Fig. 1 shows a typical mesh network scenario. Security in WMNs is very much in its infancy and there are several issues that remain to be addressed. The
Figure 1. Sample Wireless Mesh Network Scenario In this paper, we study the problem of detecting malicious nodes that imitates a normal node in all respects, except in performing unnecessary route discoveries. These malicious nodes frequently initiate route discovery to unknown destinations with intent to flood the network with route request packets. As it is difficult to distinguish between a route discovery initiated with a malicious intent and a legitimate route discovery for repairing broken/stale routes, this type of attack is hard to detect. Though, several broadcast management techniques exists that try to alleviate
*This work is has been supported by the Ohio Board of Regents, Doctoral Enhancement Funds Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI'07) 0-7695-2798-1/07 $20.00 © 2007 0-7695-2798-1/07 $25.00 ©2007
redundant broadcasts [6]; they fail to address the problem of bogus broadcast packets. Desilva et al. [7] propose a statistical method to mitigate malicious route floods; in which when the generation rate of a node exceeds a certain threshold, all the route requests of that node are dropped by its neighbors. We however, detect the same based on the examination of the network. In order to detect malicious route floods in a WMN; we propose to deploy an Intrusion Detection System (IDS) on every MR in the network. As WMNs don’t suffer from power constraints, each MR can be made as a monitoring agent. We propose an IDS based on a machine learning technique called perceptron training. The IDS collects a set of statistics on few route and data parameters such as RREQ (Route Request), RREP (Route reply), and RERR (Route error), data packets originated, and data packets forwarded. The above listed parameters are then fed to the perceptron model in order to learn the normal routing behavior and establish a normal routing profile. Typically, it is difficult to judge a safe threshold value for the number of RREQs packets that can be generated by a MR in a WMN as it depends on several other network conditions like link failure, stale route, network congestion, etc. Thus, we propose a threshold independent scheme called perceptron based IDS to detect malicious route flood attack. It adaptively modifies the detection model based on the training data and detects attack in the face of new data. The remainder of the paper is organized as follows: Section 2 describes the related work. Section 3 presents the various loop holes in the route discovery phase of Ad hoc On-Demand Vector (AODV) routing protocol. We study the impact of a malicious route flood attack through simple simulations. Section 4 delineates the architecture of our perceptron based IDS. Section 5 illustrates the effectiveness of our proposed scheme through extensive simulations. We finally conclude the paper and present the future work in Section 6.
2. Related Work Most of the traditional IDS, detect attack by processing audit trails for deviations from normal activity (anomaly detection [8]) or by matching typical intrusion instances (misuse detection [9]). The main drawback of misuse detection is that, it cannot detect new attacks. On the other hand, an anomaly detector successfully detects unknown intrusions also. But, if an attacker changes its profile slowly, even well-known attacks remain undetected. Intelligent machine learning techniques like perceptron, artificial neural networks (ANN), and decision trees can be used to train the IDS accurately. ANN has innumerable advantages in the detection of
Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI'07) 0-7695-2798-1/07 $20.00 © 2007
intrusions [10]. Zhang et al. [11] propose to detect DoS attacks using a hybrid ANN classifier based on perceptron and backpropagation. It derives a statistical similarity decision vector between the observed profile and the established normal profile. As in the presence of a DoS attack the similarity vector is very small, an alert is raised by the IDS. But, the detection metrics for a route flooding attack and DoS attack are different. Hence, such a scheme is not suitable for detecting misbehavior in the routing protocol. Ghost et al. [12] employ a neural network to detect anomalous intrusions on a software system (Buffer overflow attack). Siaterlis et al. [13] employ a Multi-Layer Perceptron (MLP) to detect Distributed DoS attacks by monitoring several passive flow statistics. Thus, most of the works in the literature focus on developing a IDS for detecting DoS attacks only and are unsuitable for detecting route floods. We however, focus on the problem of detecting malicious route floods by using the intelligence provided by our perceptron model. Huang et al. [14] propose a cooperative detection system by electing Custer Heads (CHs). The CHs apply simple rules to identify various route attacks. As the CHs are elected randomly, there is a possibility that a malicious attacker can be elected as a CH which would be detrimental to the network. We however, employ a distributed approach to detect route attacks that also decreases the detection time when compared to [14].
3. Malicious Route Flood & its Impact In this paper, we use an AODV routing protocol for our analysis. In this section, we explain the route discovery phase in the AODV protocol and highlight the vulnerabilities in it that can be exploited by an attacker. We then investigate the impact of a malicious route flood attack on the network.
3.1 Vulnerabilities of AODV AODV is a reactive ad hoc routing protocol. A source node in AODV initiates a route discovery to a destination only if its route entry is unknown or old. It broadcasts a route request packet (RREQ) requesting for a route to the required destination. This is repeatedly re-broadcasted by other nodes until it reaches the destination itself or an intermediate node with a fresh enough route. However, a malicious node can periodically generate large number of false RREQ packets to non-existent destination nodes so that it is repeatedly re-broadcasted till the lifetime of the packet. This attack is known as a malicious route flood attack. This creates a deluge of packets which drastically affects the achievable throughput at the application layer. If the network is saturated, a malicious flooding further aggravates the network performance [7]. An
Number of MAC Transmissions
150
1800
100 50
2000
No attack
1500
Under attack
1000
0 450
375
300
225
150
0
500
75
Instantaneous Throughput (Kbps)
Flow-2
0
Time (Secs)
Figure 2. Instantaneous throughput of flows during attack
1529
109 212
200
1 RREQ/sec 5 RREQ/sec
150 100 50 0
AODV
UDP DATA
Figure 3. Average Number of Packets Transmitted by each MR
attacker can intensify the effect of a malicious route flood by further manipulating the protocol. Typically, each RREQ packet is associated with an identification number (ID) to prevent redundant broadcast. A malicious node can modify the ID of a RREQ packet so that it always appears to be a fresh request to other nodes and is repeatedly re-broadcasted by them. A malicious route flood attack results in route invasion, packet dropping, and network congestion.
3.2 Impact of Malicious Route Flood In this sub-section, we study and illustrate the impact of a malicious route flood attack in WMNs through simulations in ns-2 [15]. We consider a simple IEEE 802.11s based mesh network with 49 MRs (7 x 7) deployed in a grid like fashion in an area of 1500 x 1500 meters. We randomly attach 2-3 mesh clients to each of the MRs. One MR is elected as the IGW. The MRs communicate with each other using legacy IEEE 802.11 based interfaces. We start flows from few mesh clients (attached to MRs). We initiate 30 UDP flows, sending traffic at a constant rate of 200 Kbps. We use a constant packet size of 512 bytes. We use IEEE 802.11 as the channel arbitration protocol. The transmission range and the channel capacity are set to 250 meters and 11 Mbps respectively. AODV is used as the routing protocol. The total simulation time is set to 500 seconds. Each simulation is repeated with 10 different traffic profiles containing randomly chosen traffic sources. We randomly choose one MR as malicious, which periodically generates (every 1 second) large number of RREQ packets to arbitrarily chosen unknown destinations. We consider the case when all MRs are backlogged with traffic. Fig. 2 shows the effect of a malicious route flood (1 RREQ/ sec) on the instantaneous throughput of flows at the IGW (all flows not shown for clarity), under a randomly chosen traffic profile. We see that while the throughput of flow-3 is affected by a small percentage (45%) only, the throughput of flow-1 and flow-2 are drastically reduced by 99% in the presence of a malicious route flood
Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI'07) 0-7695-2798-1/07 $20.00 © 2007
0 RREQ/sec 2 RREQ/sec 10 RREQ/sec
250
Throughput (Kbps)
Flow-1 Flow-3
200
50
100
200
400
600
Offered load (Kbps)
800
Figure 4. Throughput obtained vs. offered load conditions
attack. Fig. 3 illustrates the average number of packets transmitted at each MR in the presence and the absence of a malicious route flood attack. It can be seen from Fig. 3 that during an attack, the average number of RREQs packets received at each MR is abnormally high (212), while the average number of data packets transmitted by each MR is cut down from 1800 to 1529. This is because; control packets are given higher priority over data packets to prevent link breakages. We next study the effect of a malicious MR generating RREQs at different rates (1-10 RREQs /sec), for different offered load. We clearly see from Fig. 4 that as the generation rate of malicious requests increases, the aggregate throughput of flows drastically decreases. Thus, a malicious route flood attack is a serious threat.
4. Perceptron based IDS In this section, we present the application of a machine learning technique called perceptron in IDS. We first present the high level architecture and then discuss the configuration of our perceptron based classifier.
4.1 System Architecture An IDS is deployed at every MR in the WMN. The architecture of the proposed IDS is shown in Fig. 5.
Figure 5. Architecture of perceptron based IDS
It consists of the following components: • Packet Monitor & Feature Extractor Packet monitor at every MR intercepts the incoming traffic and passes it to the feature extractor. Feature selection plays an important role in determining the accuracy of any classifier. The feature extractor gathers a set of statistics for each of its neighbors | N j | (where
| N j | represents the set of neighbors of a node N i ). A sample snapshot of the features collected is shown in Table. 1. For broadcast packets like RREQ, we maintain only the number of packets sent by a node’s neighbor. But, for unicast packets like RREP, we gather two statistics i.e., the number of packets sent by node’s neighbor with destination as the node itself and with destination as some other node; the latter being collected by listening in promiscuous mode. Table 1. Sample snapshot of nodal table number of route # RREQ sent ( N j to N i ) request packets Number of route # RERR sent ( N j to N i ) error packets number of route # RREP sent ( N j to N i ) reply packets number of data # Packets originated packets originated ( N j to N i ) with source address as self number of data # Packets forwarded packets forwarded ( N j to N i ) with source address as self • Anomaly Detector It consists of the trained perceptron model. The monitored network statistics is fed as input to the perceptron classifier, which gives a binary output (t). A positive output is indicative of a normal condition and a negative output is indicative of an attack. When an attack is detected, an alert is issued by the alert module to block the intrusive activity and a report is lodged in the intrusive log. • Known Attack Profile The training data for the perceptron model is generated by conducting experimental simulations using ns-2 [15]. The known attack is grafted into the simulation and the accumulated network statistics gathered during this period are labeled as attack instances.
4.2 Perceptron Training Algorithm We model the detection of malicious flood attack as a typical classification problem. Perceptron is used as a classifier to assess the network status based on the monitored network activity. Perceptron is the simplest form of a linear classifier [16]. The configuration of a
Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI'07) 0-7695-2798-1/07 $20.00 © 2007
linear perceptron is shown in Fig. 6. As seen in Fig. 6, a perceptron accepts a linear combination of real-valued inputs; and outputs 1, if the result is greater than a certain threshold and outputs -1, otherwise. For a given set of n inputs x1, x2, …, xn, the output is,
o(x1, x2,...,xn ) = {1 if w0 + w1x1 + w2 x2 + ...+ wn xn > 0 -1 otherwise},
(1)
where each wi denotes the contribution of each input xi to the perceptron output. Here, w0 denotes the threshold that the linear combination of inputs must surpass for an output of 1. For simplification, we augment an additional unity constant x0 = 1 to the original input vector so that the above equation can be rewritten in a simpler form as, n
o( x1 , x2 ,..., x n ) = {1 if ¦ w i x i > 0 ; else -1 (2) i=0
Figure 6. Linear Perceptron Model The perceptron training algorithm involves the G evolution of the weight vector w , which would correctly classify all the instances presented to it in the training stage. We input a feature vector of route and data parameters to the perceptron along with the label ( ω1 or ω 2 ), which represents the class of the instance
(where
ω1
represents
a
normal
instance
and
ω 2 represents an attack instance). We begin the training with some random assignment of weights. When the perceptron misclassifies an example, the weights are actively adapted according to the following perceptron training rule, w i = w i + η x i if w i x i ≤ 0 and x i ∈ ω1 , and
w i = w i − η x i if w i x i ≥ 0 and xi ∈ ω 2 , (3) where η is the learning rate of the algorithm which
moderates the extent to which weights are changed. We set η to a small value of 0.02. Thus, N training examples consisting of normal and attack instances are presented to the algorithm iteratively. The algorithm is repeated until all the examples are correctly classified.
Cluster
1
2
0
0.2
0.4 0.6 Silhouette Value
0.8
1 0.8 0.6 0.4 0.2 0 500
1500
2500
4000
12000
Training Sample Size
1
Figure 7. Linearly separable training data
True Positive Rate
True P ositiv e R ate
1.2
Figure 8. Effect of training sample size on true positive rate
It is important to note that a perceptron can be applied for linearly separable data points only. To highlight the fact that the attack instances and the normal instances are linearly separable in the space of detection metrics, we tested the data by performing k-means clustering test [16]. We see from the silhouette plot (MATLAB) in Fig. 7, that our training sample is perfectly classified into two clusters (attack and non-attack instances).
5. Performance analysis In this section, we study the performance of our proposed perceptron based IDS using simulations performed in ns-2 [15]. We use the same scenario and attack configuration as described in Section 3. Each simulation is run for 600 seconds. We generate a sample size of 33,500 feature points, comprising of 50 % of attack instances and 50 % of normal instances. The training period is set for 500 epochs so that the weights of the perceptron model converge for sure. We use about 50 % of the generated data for our training model and the other half is used later for testing and validation purposes. We evaluate the network performance based on the following detection metrics: • True positives (TP): Number of times an alert is raised, when an attack is present. • False negatives (FN): Number of times when no alert is raised, but attack is present. • False positives (FP): Number of times when alert is raised, but attack is not present. • True negatives (TN): Number of times when no alert is raised, when no attack is present. The performance of our classification algorithm is thus based on TPR (True positive rate) and FPR (False positive rate). TPR = TP , which is the ratio of TP + FN number of alerts when there is an attack to total number of attacks. Similarly, FPR = FP . TN + FP We illustrate the detection accuracy (TPR) of the trained perceptron model for various sizes of training
Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI'07) 0-7695-2798-1/07 $20.00 © 2007
1.01 1 0.99 0.98 0.97 0.96 0.95 0
0.02
0.04
0.06
0.08
False Positive Rate
Figure 9. ROC Curve
sample in Fig. 8. We find that the accuracy of the model is less for a small sample size. The model achieves best results for a training size of 4000-8000 instances. We next study the Receiver Operating Characteristics (ROC) curve (TPR vs. FPR) which reflects the tradeoffs in the sensitivity of the detection algorithm. Fig. 9 shows the ROC curve for our detection scheme. We observe that in our scheme very less number of normal instances are misclassified as anomalies (as seen by 0 FPR value) and all attack instances are correctly identified as intrusions (as seen by the high TPR value close to 1). Table. 2 summerizes the misclassification rate of our trained algorithm for different generation rate of RREQs by a single malicious MR. Misclassification rate is defined as the percentage of inputs misclassified during one training epoch (include FP and FN). Table 2. Misclassification rates of perceptron Bogus RREQ generation Misclassification rate rate 1 RREQ/sec 0.9811 % 5 RREQ/sec 0.9881 % 10 RREQ/sec 0.9995 % Fig. 10 demonstrates the detection ability of our system when the number of malicious MRs generating malicious requests is increased. Even if only a small percentage of the MRs are malicious, we see that our perceptron based IDS accurately detects them with a high TPR of 98% and has close to 100% detection rate for largely compromised network. This is because for a high attack rate, the data is easily separable in the space of detection metrics and is thus easily classifiable by the perceptron model. Similarly, Fig. 11 shows the FPR for varying number of good MRs. A large number of good MRs imply very few MRs are compromised. It can be seen that the maximum value of FPR is within 23%, for a largely compromised WMN. We also see that as the attack rate increases, the FPR increases indicating that a
0.98 0.97
1 RREQ
2 RREQ
0.96
5 RREQ
10 RREQ
0.95 5
10
15
20
1 RREQ
2 RREQ
0.2
5 RREQ
10 RREQ
0.15 0.1 0.05 0 25
25
30
35
40
45
Number of Well Behaving MRs
Number of Misbehaving MRs
Figure 10. TPR under varying number of misbehaving MRs
0.25
Figure 11. FPR under varying number of well behaving MRs
small percentage of false alarms would be raised from time-to-time. Both graphs prove that the proposed perceptron based IDS has a very high TPR (100%) and a low FPR (23%). The proposed IDS, thus detects malicious route discoveries accurately and efficiently. It also illustrates the fact that the features (detection metrics) selected as inputs for training the perceptron model, are accurate predictors of the attack. We next study the effect of learning rate (η ) on the detection rate of our proposed IDS. It is important to choose a small value for the perceptron to correctly classify the data with a low FPR and a high TPR. We see from Fig. 12, that a lower learning rate has a higher detection rate over higher learning rate. Hence, we choose an ideal learning rate of 0.02 for our model.
[3]
6. Conclusion and future work
[9]
In this paper, we propose to use an intelligent machine learning technique based on linear perceptron classifier to detect malicious route discoveries. We model the detection of malicious route flood as a classification problem. As malicious route flood attack has a global effect at every MR in the WMN, the IDS raises a timely alert on the occurrence of the attack. We prove through extensive simulations, that our perceptron based IDS model has a high detection rate and a low false positive rate. As a part of our future work, we plan to use other machine learning techniques like multi-layer perceptron model to detect attacks like selfish node and blackhole attack, which fail to get classified by a linear perceptron model.
7. References N. Nandiraju, D. Nandiraju, L. Santhanam, B. He, J. Wang, and D. P. Agrawal, “Wireless mesh networks: current challenges and future directions of web-in-thesky,” To appear in IEEE Communication Magazine. [2] P. Ning and K. Sun, “How to misuse AODV: a case study of insider attacks against mobile ad-hoc routing protocols,” Ad Hoc Networks, 3(6), pp. 795-819, 2005.
[4]
[5]
[6]
[7]
[8]
[10] [11]
[12]
[13]
[14]
[1]
Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI'07) 0-7695-2798-1/07 $20.00 © 2007
[15] [16]
True Positive Rat
0.99
Fals e P os itiv e Rate
True Positive Rates
1
1.2 1 0.8 0.6 0.4 0.2 0 0.002
0.005
0.02
0.2
Learning Rate
Figure 12. Effect of learning threshold on the detection rate
B. Bhargava, W. Wang, and Y. Lu, “On vulnerability and protection of ad hoc on-demand distance vector prototol,” In the Proc. of International Conference on Telecommunication, 2003. Y. Hu, D. B. Johnson, and A. Perrig, “SEAD: secure efficient distance vector routing for mobile wireless ad hoc networks,” Ad Hoc Networks, pp. 175-192, 2003. Y. Hu, A. Perrig, and D. B. Johnson, “Ariadne: a secure on-demand routing protocol for ad hoc networks,” In the Proc. of ACM Mobicom, pp. 12-23, 2002. S.-Y. Ni, Y.–C. Tseng, Y. Shyan, and J.-P. Sheu, “The broadccast storm problem in a mobile ad hoc networks,” In the Proc. of IEEE MCN, pp. 152-162, 1999. S. Desilva and R. V. Boppana, “Mitigating malicious control packets floods in ad hoc networks,” In the Proc. of IEEE WCNC, Vol.4, pp. 2112-2117, 2005. A. Valdes and D. Anderson, “Statistical methods for computer usage anomaly detection using NIDES,” Technical report, SRI International, January 1995. W. Lee, S. J. Stolfo, and K. W. Mok, “A data mining framework for building intrusion detection models,” In the Proc. of IEEE Symposium of Security and Privacy, pp. 120-132,1999. J. Cannady, “Artificial neural networks for misuse detection,” In the Proc. NISSC, pp. 443–456, 1998. Z. Zhang, J. Li, C. N. Manikopoulos, J. Jorgenson, and J. Ucles, “HIDE: A heirarchial network intrusion detection system using statistical preprocessing and neural network classification,” In the Proc. of Workshop on Information Assurance and Security, pp. 85-90, 2001. A. K. Ghosh, J. Wanken, and F. Charron, “Detecting anomalous and unknown intrusions against programs,” In the Proc. of Computer Security Applications Conference, pp. 259-267, 1998. C. Siaterlis and V. Maglaris, “Detecting incoming and outgoing DDoS attacks at the edge using a single set of network charactersitics,” In the Proc. of 10th IEEE Intl. Symposium on Computers and Communication Systems, pp. 469-475, 2005. Y.-A. Huang and W. Lee, “A cooperative intrusion detection system for ad hoc networks,” In the Proc. of st 1 ACM Workshop on Ad hoc and Sensor Networks, pp. 135-147, 2003. Network Simulator (NS-2), http://www.isi.edu/nsnam/ns/index.html. S. Theodoridis and K. Koutroumbas, “Pattern Recognition,” 3rd Edition, Academic Press, 1997