Sybil Attack Detection using Sequential Hypothesis ...

3 downloads 0 Views 115KB Size Report
Jaypee Institute of Information Technology, Noida, India. [email protected], k.kant@jiit.ac.in. Abstract—Sybil attack poses a serious threat to geographic.
Sybil Attack Detection using Sequential Hypothesis Testing in Wireless Sensor Networks P. Raghu Vamsi and Krishna Kant Department of Computer Science and Engineering Jaypee Institute of Information Technology, Noida, India. [email protected], [email protected] Abstract—Sybil attack poses a serious threat to geographic routing. In this attack, a malicious node attempts to broadcast incorrect location information, identity and secret key information. A Sybil node can tamper its neighboring nodes for the purpose of converting them as malicious. As the amount of Sybil nodes increase in the network, the network traffic will seriously affect and the data packets will never reach to their destinations. To address this problem, researchers have proposed several schemes to detect Sybil attacks. However, most of these schemes assume costly setup such as the use of relay nodes or use of expensive devices and expensive encryption methods to verify the location information. In this paper, the authors present a method to detect Sybil attacks using Sequential Hypothesis Testing. The proposed method has been examined using a Greedy Perimeter Stateless Routing (GPSR) protocol with analysis and simulation. The simulation results demonstrate that the proposed method is robust against detecting Sybil attacks. Keywords—Sequential hypothesis testing, Sybil attack, geographic routing, wireless sensor networks.

I.

I NTRODUCTION

Wireless Sensor Networks (WSNs) usually deployed in outdoor environments to monitor, sense, and report the process or object of interest. Theses networks consist of low cost and inexpensive sensor nodes with limited resources in-terms of energy, computation and memory. Some applications of WSNs include pungent gas detection in chemical factories or coal mines, weather forecasting, fire in forest detection, buildings and bridge monitoring, and etc. [1]. Each sensor node contains an identity in-order to uniquely identify them in WSN. The sensor nodes task is to report the sensed data along with its identity and location information to the base station (BS) in a single or multi-hop fashion with localized routing decisions in a shortest path. The base station does the necessary action founded on the received data. In the routing schemes like hierarchical clustering routing [2], a set of sensor nodes reports the events to the cluster head (CH) and CH performs data aggregation and report the aggregated data to the base station. This means that the correct and reliable location information is needed for reporting events, to make routing decisions, data aggregation, data storage and retrieval. However, Sybil attack is a harmful attack on the geographic and ad hoc routing in which an adversary capture and tampers the benign node for the purpose of converting them as malicious. In this attack, a Sybil node purposefully broadcasts false node identity, location and secret key information to disrupt or to attract or to obtain full control over the network traffic. Sybil attack was first addressed in peer-to-peer networks as subverting the reputation

c 978-1-4799-3140-8/14/$31.00 2014 IEEE

system by duplicating the network identities [3]. An adversary can launch a Sybil attack in any or all of the following cases [17]: 1)

2) 3)

During direct or indirect communication with Sybil node. In indirect communication, Sybil node passively listens to the ongoing packets and attempt to capture the secret key and identity information. The attack can be initiated by malicious nodes in the network incrementally or simultaneously in many places. The attack can be be launched with stolen or fabricated identities and secret key information.

In this paper, a method has been proposed to detect and isolate malicious nodes which falsify identity and location information using sequential hypothesis testing (SHT). Each node in the network runs SHT to identify the Sybil attack. With SHT, a node can accept a hypothesis among two competing hypotheses (H0 (null): neighboring node is not a Sybil node, H1 (alternate): neighboring node is a Sybil node). Having observations of activities carried out by neighboring nodes, a node computes a test statistic T (x) and compares it against two thresholds t1 and t2 respectively to decide among three alternatives. First, acceptance of the null hypothesis if T (x) < t1 . Second, acceptance of the alternate hypothesis if T (x) > t2 . Finally, computing test statistic one more time if t1 < T (x) < t2 . In addition, the test will be carried out using user configured false positive and false negative values. Throughout this paper the term adversary and malicious are used as synonymous to Sybil behavior. The rest of the paper is structured as follows. In Section II, related work on solutions to Sybil attack is presented. In Section III, network model and assumptions considered the proposed method is explained. In Section IV, Sybil attack detection with sequential hypothesis testing is presented. In Section V, a simulation study has presented to examine the proposed model. Finally, Section VI concludes the paper. II.

R ELATED W ORK

In this section related works on the detection of Sybil attacks are presented. It is discussed in Section I that the Sybil attack may be launched by an adversary by physically tampering the sensor nodes. The tampered sensor node again kept back in the sensor field so that the tampered node attempt to convert its neighboring nodes as malicious. In this way, a single tampered sensor device attempts to replicate the conversion of benign nodes as malicious nodes as the

698

network operation’s progress. The purpose of Sybil attack is to falsify node identities, location information and secure key information. Finally, at a time of network operation there could be several malicious nodes which can attempt a Sybil attack to disrupt network operations. In the literature, there are numerous methods to identify Sybil attack [4-10]. However, most of these schemes assume costly setup such as the use of relay nodes or use of expensive devices and expensive encryption methods to verify the location information. James et al [4], have proposed several techniques to defend against Sybil attack using conventional random key distribution schemes and public keys generated from polynomials. Along with these techniques, the authors have shown that the Sybil attack is exceedingly detrimental for sensor network operations. Qinghua et al [5], have proposed a method based on one way hash chain. This method outperforms conventional public key cryptography implementation for Sybil attack detection. This method can be best suitable in the networks following communication in a tree or hierarchical fashion. Jiangtao et al [6], has proposed a solution based on received signal strength (RSS) observed from neighboring nodes. In addition to RSS, status messages from member nodes are used to achieve accuracy in reaching a decision about Sybil attack. This model uses Jakes channel model, status messages, neighboring nodes RSS with their identity, location information, and power levels etc,. In this method, a node is required to keep track of several variables to detect a Sybil attack and this requires large storage space. Murat et al [7], have proposed a lightweight solution based on RSS. However, this method requires special verification nodes to check the RSS values and the proposed method needs to modify to scale large network dimensions. Bin et al [8], have proposed a meta heuristic method SybilACO. In this method, Ant Colony Optimization (ACO) has been used in defending against Sybil attack. It is well known that ACO algorithms often exhibit good optimization but slow while compare to classical heuristics. Mingxi et al [9], has proposed a regional statistics detection scheme against Sybil attacks. This method is a distributed method and expands the solutions [6] [7]. In this method, a node collects statistics from its neighboring nodes in its region in terms of RSS and records them in its regional statistics table. Defending against the Sybil attack will be done in this table. However, the scheme needs to extend to large network dimensions. Jun et al [10], have proposed an interesting method called ZoneTrust. In this method, the entire network is divided into a set of zones. A statistical technique called sequential hypothesis testing has been utilized to detect suspect regions. Zone Trust detects even a substantial fraction of compromised nodes while reducing false positive and false negative rates. This detection technique is modeled using game theoretic analysis. Optimization strategies are defined for the attacker and defender. ZoneTurst proves that node compromise is greatly limited by the defender if the attacker and defender follow optimal strategies.

An approach for detecting replica clusters in WSN using SHT has been proposed in [11]. This method takes advantage of a sequential probability ratio test to detect and isolate replica clusters with user configured false positives and false negatives. Motivated from the methods [10][11], in this paper a method has been suggested to detect and isolate Sybil nodes using sequential hypothesis testing with user configured false positives and false negatives. This method is formulated to suit larger scale WSNs with the power of distributed detection of Sybil attack. III.

N ETWORK M ODEL AND A SSUMPTIONS

In this section, network model and assumptions considered for the proposed method are presented. A. Network Model Let S be a set of n sensor nodes S = {s1 , s2 , ............, sn } deployed in a geographical region (xi , yi ). These nodes interact directly with each other to forward the packets. In this model, it is assumed that each node has a unique identity and aware of its own location. Generally, location information will be obtained by installing Global Positioning System (GPS) or using any localization schemes [12]. Basically, WSNs are static that is they are immobile. However, a node may be replaced or relocated manually in case of node failures or running out of battery power. Every node in the network uses a symmetric key for encrypting the data and generating a Hash code for maintaining packet integrity. Every node will communicate using bidirectional transceivers. Every node makes use of promiscuous mode of the network interface. In promiscuous mode, a node can observe all packets passing through its radio range. B. Performance Metrics The following performance metrics are considered to validate our model. •

Average number of samples: It is the average number of samples required by a node to accept a hypothesis among two competing hypotheses.



False positive: It is the probability that a non Sybil node is misidentified as Sybil node.



False negative: It is the probability that a Sybil node is misidentified as benign node. IV.

S YBIL ATTACK D ETECTION

Sequential Probability Ratio Test (SPRT) is a statistical decision process that was developed by Wald [13]. It is also popular as Sequential Hypothesis Testing (SHT). SPRT is a widespread technique employed in manufacturing industries to identify defective items among manufactured items. Unlike conventional hypothesis testing, SPRT reaches the decision without fixed sample size and with only limited samples of evidence. In the literature, researchers have modeled SPRT as a random walk model with a lower and upper bound. In this random walk model, the process of making a decision will start from a point between two bounds and move towards the upper bound or lower bound with respect to incoming input

2014 International Conference on Signal Propagation and Computer Technology (ICSPCT)

699

evidence. When the random walk reaches to or exceeds the lower bound then SPRT accepts the null hypothesis. On the other hand, if the random walk reaches to or exceeds the upper bound then the SPRT reject the null hypothesis and accepts the alternate hypothesis. A. Formulating hypotheses Let Sj be a neighboring node to node Si , then Si observes all outgoing messages from its neighboring nodes with in its transmission range R. Assume that the time domain of each sensor node divided into non overlapping time intervals (tn ). During these time intervals, Si computes the number of distinct out going messages originated from Sj . This distinction in terms of node identity and location information. Each node maintains two tables to collect evidences. First, the observation table, which consists of neighboring node identities and corresponding locations. Second, distance table, which consists of neighbor identities and their relative distances. The observation of non distinct messages are considered as success and distinct message from previous observation is considered as failure. Let O1 , O2 , O3 , ...., On are the independent and identically distributed (iid) observations then evidences are collected in binary values as follows. •



Evidence from direct observations  1 if Oi → Oj E0 (i, j) = 0 otherwise Evidence from distance estimation  1 if dij ≤ R E1 (i, j) = 0 otherwise

(1)

(2)

(3)

where, ⊕ is XOR operation between two evidences. With the value of Ec (i, j), malicious activity can be identified. For example, if E0 (i, j) = 1 and E1 (i, j) = 0 then Ec (i, j) = 1, this mean that a Si can observe Sj but location is falsified by Sj . Since Ec (i, j) results in to binary values and those outcomes are modeled as iid observations. Let Xi be a Bernoulli random variable that is defined as Xi =

 1 if Ec (i, j) = 1 0 if Ec (i, j) = 0

(4)

where, Ec (i, j) is the consolidated evidence value and the success probability (p) of Bernoulli distribution is defined as

700

πi (θ) =

I(θ ∈ Θi )π(θ) ´ π(θ)dθ Θi

(5)

where I(.) takes the value 1 if the given condition satisfy and 0 otherwise. The marginal density of X with constrains on Hi , for i = 0,1 is defined as ˆ mi (x) =

Θi

f (x|θ)π(θ).

(6)

The problem of comparing two competing hypothesis can be considered as H0 : Neighboring node Sj is a benign node. H1 : Neighboring node Sj is a Sybil node.

where, E0 (i, j) is the first evidence collected from direct observations between Si and Sj . It will be 1 when Si can observe Sj (i.e Oi → Oj ) otherwise E0 will become 0. E1 (i, j) is the distance measurement between Si and Sj . Let (xi , yi ) and (xj , yj ) be the locations of Si and Sj respectively, then the distance between these two locations are calculated  as dij = (xj − xi )2 + (yj − yi )2 . The value of second evidence E1 becomes 1 when the measured distance is less than or equal to the transmission range (R) otherwise it become 0. With the collected evidences the consolidated value Ec (i, j) is calculated as Ec (i, j) = E0 (i, j) ⊕ E1 (i, j)

p = P r(Xi = 1) = 1 − P r(Xi = 0). A node compute consolidated evidences as non distinct messages as X ∼ f (x|θ), where f (x|θ) represents the conditional density over data vector X given the parameter θ ∈ Θ and Θ denotes the parameter space as Θ = {Θ0 = Benign node, Θ1 = Sybil node} with prior density π(θ) for θ on Θ. The prior πi (θ) with support to Θi for i = 0, 1 is given as

where, H0 and H1 are the null and alternative hypotheses respectively. During the hypothesis testing, we define false positives (α) and false negatives (β) are defined as follows α: false positive error that the decision leads to acceptance of H1 , when H0 is true. β: false negative error that the decision leads to acceptance of H0 , when H1 is true. In hypothesis testing, it is good to achieve α = β = 0, however, α and β are opposite to each other, which means that as α decreases β increases. To minimize the α and β values, we require large number of samples. However, with sequential hypothesis testing a decision can be achieved with few samples by maintaining desired α and β values. Since β is the probability of false negatives, 1 − β is the probability of detecting Sybil attack. The lower bound on detecting Sybil  attack is given by [13] with the inequality as ∗ −β ∗ (1 − β) ≥ 1−α , where α∗ and β ∗ are user configured 1−α∗ false positives and false negatives. B. Test Statistics Let T(X) be the test statistic which is defined as logarithm of Bayes factor BF(X) as T(X) = ln(BF(X)), where  BF (X) =

P r(θ ∈ Θ1 |X) P r(θ ∈ Θ0 |X)



P r(θ ∈ Θ0 ) P r(θ ∈ Θ1 )

 (7)

since the observations are iid process then the conditioning can be dropped and BF(X) for n observations become BF (X) =

 n   P r(Xi |Θ1 ) i=1

P r(Xi |Θ0 )

.

2014 International Conference on Signal Propagation and Computer Technology (ICSPCT)

(8)

T (X) can be optimized by comparing to a non negative constant threshold (t) such that T (X) = ln(BF (X)) > t. By this comparison, the test becomes sensitive and positive value (resp. negative value) of T (X) accepts the H1 (resp. H0 ). Hence, the ratio of T (X) become the optimal test statistic and it can be rewritten in terms of marginal distributions as

T (X) =

n 

ln(BF (Xi )) =

i=1

n 

 ln

i=1

m1 (Xi ) m0 (Xi )

 .

(9)

Among n observations, let γn be the number of observations that Xi = 1, then we rewrite T (X) as T (X) = γn ln

P1 1 − P1 + (n − γn )ln P0 1 − P0

TABLE I. Simulator Examined Protocol Mac Simulation Time Area Nodes Propagation model Transmission range Transmission power Receive power Pt Traffic type Bandwidth Maximum connections Packet size Packet rate (drate) Security attack

S IMULATION PARAMETERS ns-2..35 GPSR with SHT 802.11 600 seconds 150 x 150 meters 75 Two ray ground reflection 40 meters 0.0522 mW 0.0522 mW 0.001 CBR over UDP 200 kbps 25 64 bytes 4 packets/second (1/4=0.25) Sybil attack

(10)

where P0 = P r(Xi = 1|Θ0 ), P1 = P r(Xi = 1|Θ1 ), and P0 < P1 . C. Stopping criteria The test statistic with involvement of marginal distributions again represent monitoring likelihood ratio and it turn out to be SHT. This test involves comparing T (X) to two thresholds t0 and t1 , and the hypothesis test is carried as follows •

T (X) ≤ t0 : accept H0 and terminate the test.



T (X) ≥ t1 : accept H1 and terminate the test.

t0 ≤ T (X) ≤ t1 : continue the process with another observation.  ∗    ∗ β and t1 = ln 1−β . Since the in which t0 = ln 1−α ∗ α∗ number of samples required to decide among two competing hypothesis are not known in advance, according to [13], the expected number of samples required to terminate the test is given as •

E[n] =



E [T (X)]

r(Xi |Θ1 ) E ln P P r(Xi |Θ0 )



(11)

The formulated hypothesis has examined against greedy perimeter stateless routing algorithm (GPSR) [14]. Simulation study to examine the proposed method has been explained in the next Section. V.

S IMULATION S TUDY

The proposed method was examined with network simulator ns-2.35 [15]. Each node in the network was configured according to CC2420 specification [16]. This chip follows the IEEE 802.15.4 specification and operated at 2.4 GHz frequencies. Details of simulation parameters are given in Table 1. It is considered that the network consists of no malicious node at the time of deployment. Nodes will become malicious as the network operations progress. The simulation has been performed for 600 seconds, in which 0 to 180 seconds are provided for establishing UDP traffic between a maximum of 25 nodes. In general, when the network operations are going on, an adversary captures the node and tamper it to obtain

Fig. 1.

Average number of samples vs packet rate for false positive case

the information related to identity, location, and secret key material. The tampered node will be kept back in the sensor network so that it slowly capture the neighboring nodes for the purpose of converting them as malicious. In this connection, the simulation has been done with an incremental appearance of Sybil attacks. It is configured that Sybil attack starts with 3 nodes after 10 seconds from beginning of simulation time and the attacking nodes will be doubled for every 10 seconds. This increment will continue till 50 percent of nodes in the network become malicious. To test the effectiveness of the proposed method data packets is transmitted at the rate of 1, 2, 3, and 4 packets per second. When a data packet is received by a node, it has to register its identity and location information on a data packet and forward the packet to the next eligible node. Meanwhile, every node broadcasts its location and identity information as HELLO control message for every |u ∗ 10.0| seconds. Where, u is a uniform random number generated between 0 and 7.5. The user configured false positive and false negative values α∗ and β ∗ was configured as 0.01 and the lower and upper bounds t0 and t1 are set as 0.1 and 0.9 respectively. The results presented here are the mean of outputs obtained from 40 simulation runs. Every node conducts the SHT for each and every observation. Since every node in the network

2014 International Conference on Signal Propagation and Computer Technology (ICSPCT)

701

R EFERENCES [1]

[2]

[3] [4]

[5]

[6] Fig. 2.

Average number of samples vs packet rate in false negative case [7]

operated in promiscuous mode, each node observes and records all ongoing activities carried out by its neighboring nodes. It is to note that SHT will be repeated with new observations when the computed test statistic remains between these lower and upper bounds. From the simulation runs an observation has been made that the method has detected Sybil nodes accurately. This means that there is no impact of false positive and false negative in detecting the Sybil nodes. Figure 1 plots the average number of samples needed to accept the null hypothesis. The average number of samples required to accept the null hypothesis is gradually decreased as the packet rate increases. It is seen that the average packets varied between 6.484 to 6.032 from 1 packet per second to 4 packets per second. Hence, the proposed method requires a maximum of 6.484 observations to accept the null hypothesis. Figure 2 plots the average number of samples needed to accept the alternate hypothesis. The average number of samples required to reject the null hypothesis is gradually decreased as the packet rate increases as like in null hypothesis case (Figure 1). The average number of packets varied between 8.112 and 7.712 as the packet rate increases from 1 packet per second to 4 packets per second. This mean that in any case (i.e, acceptance of the null or alternate hypothesis) average number of samples required are varied between 6 to 8 samples without having influence of false positive and false negatives.

[8]

[9]

[10]

[11]

[12]

[13] [14]

[15] [16] [17]

VI.

C ONCLUSION

I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,” Computer networks, vol. 38, no. 4, pp. 393– 422, 2002. W. R. Heinzelman, A. Sinha, A. Wang, and A. P. Chandrakasan, “Energy-scalable algorithms and protocols for wireless microsensor networks,” in Acoustics, Speech, and Signal Processing, 2000. ICASSP’00. Proceedings. 2000 IEEE International Conference on, vol. 6, pp. 3722– 3725, IEEE, 2000. J. R. Douceur, “The sybil attack,” in Peer-to-peer Systems, pp. 251–260, Springer, 2002. J. Newsome, E. Shi, D. Song, and A. Perrig, “The sybil attack in sensor networks: analysis & defenses,” in Proceedings of the 3rd international symposium on Information processing in sensor networks, pp. 259–268, ACM, 2004. Q. Zhang, P. Wang, D. S. Reeves, and P. Ning, “Defending against sybil attacks in sensor networks,” in Distributed Computing Systems Workshops, 2005. 25th IEEE International Conference on, pp. 185– 191, IEEE, 2005. J. Wang, G. Yang, Y. Sun, and S. Chen, “Sybil attack detection based on rssi for wireless sensor network,” in Wireless Communications, Networking and Mobile Computing, 2007. WiCom 2007. International Conference on, pp. 2684–2687, IEEE, 2007. M. Demirbas and Y. Song, “An rssi-based scheme for sybil attack detection in wireless sensor networks,” in Proceedings of the 2006 International Symposium on on World of Wireless, Mobile and Multimedia Networks, pp. 564–570, IEEE Computer Society, 2006. B. Zeng and B. Chen, “Sybilaco: Ant colony optimization in defending against sybil attacks in the wireless sensor network,” in Computer and Communication Technologies in Agriculture Engineering (CCTAE), 2010 International Conference On, vol. 1, pp. 357–360, IEEE, 2010. M. Li, Y. Xiong, X. Wu, X. Zhou, Y. Sun, S. Chen, and X. Zhu, “A regional statistics detection scheme against sybil attacks in wsns,” in Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on, pp. 285– 291, IEEE, 2013. J.-W. Ho, M. Wright, and S. K. Das, “Zonetrust: fast zone-based node compromise detection and revocation in wireless sensor networks using sequential hypothesis testing,” Dependable and Secure Computing, IEEE Transactions on, vol. 9, no. 4, pp. 494–511, 2012. J.-W. Ho, “Sequential hypothesis testing based approach for replica cluster detection in wireless sensor networks,” Journal of Sensor and Actuator Networks, vol. 1, no. 2, pp. 153–165, 2012. H. Wymeersch, J. Lien, and M. Z. Win, “Cooperative localization in wireless networks,” Proceedings of the IEEE, vol. 97, no. 2, pp. 427– 450, 2009. A. Wald, Sequential analysis. Courier Corporation, 1973. B. Karp and H.-T. Kung, “Gpsr: Greedy perimeter stateless routing for wireless networks,” in Proceedings of the 6th annual international conference on Mobile computing and networking, pp. 243–254, ACM, 2000. “Network simulator ns-2.35.” http:// www.isi.edu/nsnam/ns. Accessed: 2014-06-04. “Cc2420 data sheet.” http://www.ti.com/product/cc2420. Accessed: 2014-06-04. Y. Yu, K. Li, W. Zhou, and P. Li, “Trust mechanisms in wireless sensor networks: Attack analysis and countermeasures,” Journal of Network and Computer Applications, vol. 35, no. 3, pp. 867–880, 2012.

In this paper, sequential hypothesis test based technique to detect Sybil attacks has been proposed and applied. Due to its simplicity, the proposed method work with general observations carried by each and every node in the network. In addition, the presented method detects he Sybil attacks accurately without having false impact of false positives and false negatives. Distributed nature of the proposed methods enabled to suit for networks deployed in a large network dimension. Finally, these simulation results show that the presented method is robust against Sybil attacks.

702

2014 International Conference on Signal Propagation and Computer Technology (ICSPCT)

Suggest Documents