A Scheme for Secure and Reliable Distributed Data Storage in ...

4 downloads 22784 Views 267KB Size Report
is compromised, pre-compromise data accumulated in the sensor is exposed to access .... randomly select some sensor nodes to physically corrupt them. (such as ... Recover. RS), where: • Share. RS is a probabilistic algorithm that takes an in-.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

A Scheme for Secure and Reliable Distributed Data Storage in Unattended WSNs Yi Ren, Vladimir Oleshchuk, and Frank Y. Li Dept. of Information and Communication Technology, University of Agder, Norway Email:{yi.ren, vladimir.oleshchuk, frank.li}@uia.no

Abstract—Unattended Wireless Sensor Networks (UWSNs) operated in hostile environments face a risk on data security due to the absence of real-time communication between sensors and sinks, which imposes sensors to accumulate data till the next visit of a mobile sink to off-load the data. Thus, how to ensure forward secrecy, backward secrecy and reliability of the accumulated data is a great challenge. For example, if a sensor is compromised, pre-compromise data accumulated in the sensor is exposed to access. In addition, by holding key secrecy of the compromised sensor, attackers also can learn post-compromise data in the sensor. Furthermore, in practical UWSNs, once sensors stop working for accidents due to node crash or battery depletion, all the data accumulated will be lost. To address the challenges, we propose a secure and reliable data distribution scheme in this paper. Detailed analysis shows that our scheme can provide forward secrecy, probabilistic backward secrecy and data reliability. To further improve probabilistic backward secrecy and data reliability, a constrained optimization data distribution scheme is proposed. Detailed analysis and simulation results show the superiority of the proposed scheme in comparison with several previous approaches developed for UWSNs.

I. I NTRODUCTION Recently, the security aspects of Unattended Wireless Sensor Networks (UWSNs) have gained more attention in the research community [1], [2], [3], [4]. In an UWSN, sensors cannot off-load data to a sink at will or in real-time due to the absence of an on-line sink or a base station in the network. Instead, a mobile sink visits the network periodically for data collection. In other words, in time intervals between any two consecutive visits, the sensors have to accumulate and store the sensed data till the next visit of the mobile sink. The design of UWSN is motivated by scenarios where not real-time information, but historical information is of interest. For example, [5] introducing a military UWSN application for border surveillance, target acquisition, situational awareness, etc., where unattended ground sensors are deployed in the ground of adversary environment to gather information about adversary activities. In addition, the U.S Defense Advanced Research Projects Agency (DARPA) developed a robotic radio relay node - National LANdroid [6] for battlefield data collection. Nodes are deployed in battlefield for data collection and then transmit the collected data to ally units (e.g., tank or soldiers) when they arrive. Compared with traditional WSNs, the property of UWSNs poses many new challenges in security. For example, a mobile adversary which roams in the UWSN periodically compromises and releases sensors to enrich its knowledge of all

collected data when the mobile sink is absent. Since data is accumulated and stored in sensors, one importance issue is Forward Secrecy (FSe) - how to ensure that pre-compromise data will not be revealed if a sensor is compromised? On the other hand, the mobile adversary may release the sensor and then turn to compromise other sensors, another issue is Backward Secrecy (BSe) - how to guarantee that post-compromise data will not be exposed? Moreover, data reliability is also critical - how to keep the accumulated data survival if sensor nodes stop working due to power depletion, corroding or getting smashed? To deal with the aforementioned problems, DISH [3] and POSH [4] are proposed to provide FSe and merely certain probabilistic BSe in ideal networks where sensors and communication channels are reliable. However, they are not resilient to node failure and Byzantine failure. Aiming at this problem, the authors in [7] take advantage of (k, n) secret sharing and (m, n) Reed-Solomon (RS) Codes, in which m (or k) of n data parts are required to reconstruct data, adding data redundancy to provide resilience to node invalidation and Byzantine failure. However, neither FSe and BSe nor how to specify (m, n) is addressed in their work. Two other approaches, keyinsulated [8] and intrusion-resilient [9] encryption schemes are designed to provide both FSe and BSe. Both approaches require public-key cryptosystems and are not suitable for resource-constrained sensors. Therefore, none of the above mentioned schemes satisfy the overall requirements of FSe, BSe and data reliability needed for UWSNs. This paper makes two main contributions. Firstly, we propose a secure and reliable data distributed storage scheme based on (m, n) RS Codes. The proposed scheme can provide FSe, probabilistic BSe and reliability of data without relying on reliable nodes and communication channels. Secondly, to further improve probabilistic BSe and reliability of data, we propose a constrained optimization data distribution scheme considering that nodes may be compromised. Based on the optimized data distribution scheme, suitable values of (m, n) can be selected to maximize security level and at the same time maximize data reliability. We show further through detailed analysis and simulation that our scheme provides FSe, enhanced probabilistic BSe and is resilient to node and message failure. The rest of the paper is organized as follows. In Section II, the network model, threat model and design goals are presented. Section III provides the detailed description of our

978-1-4244-5638-3/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

proposed scheme. How to distribute data to neighbor nodes is presented in Section IV. Then Section V analyzes and simulates performance of the scheme. Finally, Section VI concludes the paper. II. N ETWORK MODEL , T HREAT MODEL AND DESIGN GOAL A. Network Model We consider an UWSN that consists of N sensor nodes. It can be formulated as an undirected graph G(N , E), where the sensor node set is N = {s1 , s2 , · · · , sN } and the edge set is E = {e1 , e2 , · · · , eM }. We assume that a node si has nbi neighbors, which compose a neighbor node set N Bi . There is a mobile sink that visits the UWSN periodically to collect data. The time interval between the current visit and the previous visit is denoted as T . The sensor si generates data at each round, and the data generated at round r is denoted as dri . Once a data value dri is generated, it is stored locally, and waits until an authorized mobile sink offloads them. Each sensor has the ability to perform one-way hashing and symmetric key encryption. We assume that the mobile sink is a trusted party which cannot be compromised. Additionally, the mobile sink will re-initialize the secret keys and reset the round counters when the mobile sink visits the network.

Definition 1. We define the secrecy of the data generated before round r1 as FSe. The FSe of a sensor si is compromised if the data generated and encrypted before the round r1 can be decrypted by an ADV which holds the secret obtained during reside period Trp . Definition 2. We define the secrecy of the data generated after round r2 as BSe. The BSe of a sensor si is compromised if the data generated and encrypted after the round r2 can be decrypted by a ADV which holds the secret obtained during reside period Trp . Data reliability: The proposed scheme should be resilient to node crash, meaning that data can be retrieved even if some nodes have lost their functionality. Our design goal is to guarantee FSe, BSe and data reliability. T Sink arrives

compromise r

forward secrecy

release

1

reside period Trp

r

2

return r

Sink leaves 3

backward secrecy

Figure 1. Illustration: a node is compromised by ADV at round r1 , and released at round r2 ; at round r3 , the ADV returns again.

B. Threat Model The UWSNs could be attacked in many ways. In this paper, we focus on a Mobile Adversary that prefers roaming in the UWSN while the mobile sink is absent. We refer to it as ADV hereafter. The ADV has capabilities [3] as follows: • Compromise power: The ADV can compromise up to k < N sensors during a time interval T . • No interference: The ADV would not interfere the communication between nodes, would not rework any data sensed by, or stored on sensors it compromises. In other words, the ADV is read-only. • Strictly local eavesdropping: The ADV is unable to monitor and record all the communications. It can only eavesdrop incoming and outcoming communications on currently compromised nodes. Beside the attacks mentioned above, the ADV can also randomly select some sensor nodes to physically corrupt them (such as smash, melt or corrode), or sensor nodes may fail due to power depletion or natural disaster. In this occasion, the nodes totally lose the functionality.

D. Preliminaries 1) Erasure Code: A (m, n) erasure code encodes a block 1 the size of the original of data into n fragments, which has m block, so that any m fragments can be used to reconstruct the original block. An example of such erasure coding scheme is RS Codes [10]. We define an n-party RS Codes algorithm with data space DAT A as a pair Π = (ShareRS , RecoverRS ), where: RS • Share is a probabilistic algorithm that takes an inR put d ∈ DAT A and generates the n-vector P ← − RS Share (d), where P = {p1 , p2 , · · · , pn }, R means / DAT A, random output, and pi ∈ {0, 1}∗ . If d ∈ ShareRS returns ⊥ (“undefined”). RS • Recover is a deterministic algorithm that takes input P ∈ ({0, 1}∗ ∪ ♦)n , where ♦ represents a data part that has been missing (or is not available). The RecoverRS outputs RecoverRS (P) ∈ DAT A ∪ ⊥, where ⊥ is a distinguished value, denoting failed recovery. III. T HE P ROPOSED SCHEME

C. Design Goals Our design goals are to guarantee data confidentiality and data reliability against the attacks launched by the ADV. Data confidentiality: As shown in Fig. 1 , we further divide data confidentiality into FSe and BSe. Let’s assume that an ADV compromises a sensor node si at round r1 , and release the si at round r2 (r1 < r2 ). Between round r1 and r2 , the ADV is residing in si , and we define this time interval as reside period Trp .

In this section, we propose a secure and reliable data distribution scheme to provide FSe, BSe and data reliability. To provide FSe for sensor si , a simple way is to update its secret key Ki at each round by applying hash function, e.g., Kir = h(Kir−1 ) (Ki0 = Ki ). Due to one-way property of hash function, the ADV cannot derive the previous rounds key (before the sensor was compromised). Thus, FSe is provided. However, the ADV which holds the secret key Kir , r ∈ [r1 , r2 ] still can derive the future key which will be used in the

978-1-4244-5638-3/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

following rounds. In other words, if the ADV returns at round r3 (r2 < r3 ), it still can decrypt the data which was encrypted in time interval [r2 , r3 ], by mimicking key update, meaning that the BSe is not guaranteed. To guarantee both FSe and BSe, we propose a data distribution scheme as following.

IV. O PTIMIZED DATA DISTRIBUTION SCHEME In this section, we discuss the data distribution scheme based on the possibility that nodes to be compromised to achieve enhanced data confidentiality and reliability. A. Node selection scheme

A. The proposed scheme We observe that data encrypted by symmetric encryption cannot guarantee BSe. It holds as long as a sensor relies only on itself for security. However, as we discuss later, BSe can be probabilistically achieved if sensors cooperate with their neighbors. The new scheme that satisfies the mentioned above requirements contains the following steps: Step 1: System initialization. The mobile sink picks a secure hash function, denoted as h(.), and a master key denoted as Km . Before deploying each sensor node si , the mobile sink preloads to the sensor hash function h(.), and initial data encryption keys Ki for each sensor. Here, Ki is computed as h(Km ||i). In the end of each round, the round index r and the encryption key Kir are updated as Kir = h(Kir−1 ), where r = 1, 2, · · · and Ki0 = Ki . Thus, the mobile sink only needs to store a single master Km and all round keys Kir can be derived as needed. Step 2: Distributed data storage. Each sensor si firstly generates a keyed hash value with round key Kir by M ACir = h(dri ||Kir ), and then a plaintext data that consists of dri , M ACir , and values r and si , denoted as P Ltextri = {dri ||M ACir ||r||si }, is encrypted by using updated key Kir . The encryption data is denoted as EN textri

= Enc(Kir , P Ltextri ) = Enc(Kir , {dri ||M ACir ||r||si }).

Thus, the integrity and FSe of the sensed data is guaranteed. dri is equipped in mri

=

{EN textri , r||si }.

Step 3: Data parts generation. si employs (m, n) RS code (ShareRS ) to encode EN textri into n data parts, denoted as a set of Pi = {pri,1 , pri,2 , · · · , pri,n }. Step 4: Data distribution. si selects top n security level neighbors in set N Bi (e.g., sj ) based on node selection scheme (more details in Section IV), and sends one randomly selected distinct data part mri,j = {pri,j , r||si } to sj by using pairwise secret key Ki,j to encrypt the packet. si → sj : {Enc(Ki,j , mri,j )}. After the data is distributed, the original data is erased securely. Step 5: Data reconstruction. The mobile sink collects m data parts from nodes and reconstructs data using (m, n) RS Codes.

Inspired by the routing path selection algorithm in [11], we assume each node has a Probability Vector (PV) PV i = [Pi,1 , Pi,2 , · · · , Pi,nbi ] to reflect the security level of its neighbor nodes in N Bi , where Pi,j (j = 1, 2, · · · , nbi ) is the probability that si,j , a neighbor node of the si , is compromised in time interval T . Pi,j could be evaluated from the feedback of certain security monitoring software and/or assigned manually by the mobile sink based on information such as the physical protection, the location, or the role of the nodes. For example, the nodes buried under the ground have higher security level (lower Pi,j ) than the nodes exposed, or the nodes deployed in enemy ground would have lower security level (higher Pi,j ). Without loss of generality, we further assume Pi,1 ≤ Pi,2 ≤ · · · ≤ Pi,nbi , meaning that the security levels are ordered from high to low. Given a probability threshold value P Ti, si can select t qualified neighbor nodes that have lower probability of being compromised than the threshold value P Ti , denoted as set N Bqlf _i = {si,1 , si,2 , · · · , si,t }, where Pi,1 ≤ Pi,2 ≤ · · · ≤ Pi,t ≤ P Ti . Then, the data distribution scheme of si can be formulated to a constrained optimization problem: minimize P rrecov (m, n) subject to Pi,j ≤ P Ti where P rrecov (m, n) is the probability that the original data n is recovered by a ADV. Given a redundancy factor τ = m of the (m, n) RS Codes, the data distribution scheme can be divided into two classes depending on τ . 1) Maximum security without redundancy, that is τ = 1 or m = n. To provide maximum security, in other words, minimize P rrecov (m, n), the data distribution scheme must force the ADV to compromise all the qualified data holders. In the data distribution scheme, si encodes data into n = t parts and distributes them to the t qualified neighbor nodes in N Bqlf _i . The P rrecov (m, n) is thus equal to the probability that all the t nodes are compromised, P rrecov (m, n) =

t 

Pi,j .

(1)

j=1

It is easy to derive that the higher the number of qualified neighbor nodes, the lower the P rrecov (m, n). However, too many t may cause large storage and communication overhead. Given a required security level λi , and considering storage overhead and communication overhead, si can choose the top n = t (t ≤ t) security level nodes, which satisfy t P rrecov (m, n) = j=1 Pi,j ≤ λi to distribute the data. Discussion: When τ = 1, the data distribution scheme is able to provide maximum security, but it cannot improve reliability. In other words, it is not resilient to node failure and

978-1-4244-5638-3/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

message failure. Even one node loses function, or one data part is not delivered, the original data cannot be recovered. In practical networks, sensors may stop working due to node crash, and messages cannot always be delivered, it is necessary to add redundancy for data reliability. 2) Maximum security with redundancy, that is τ > 1 or m < n. Encoded by a (m, n) RS Codes, when m < n, if α (α ≤ n − m) data parts are corrupted or lost, the original data still can be recovered. Note that the higher the τ is the more data reliability can be obtained, but the easier the ADV can recover the data. The tradeoff is thus, given a required redundancy threshold, e.g., τ < 1 + 2t , how to distribute data parts among nodes that satisfy the required security level to obtain the maximum security while having the maximum data reliability. si encodes data into n = t parts and distributes them to t qualified neighbor nodes in N Bqlf _i . Considering that the data redundancy is upper-bounded by τ < 1 + 2t , to maximize the data reliability, m can be chosen as nt . (2) m> t+2 Thus, it is easy to see that EN text can be recovered by the ADV, only if the ADV compromised at least m nodes in {si,1 , si,2 , · · · , si,n }, which has the probability m 

Pi,j  P rrecov 

n 

with PV 9 = {5%, 5%, 10%, 10%, 20%, 30%, 40%}. Given a threshold value P T9 = 25%, it is easy to see that qualified nodes are selected as set N Bqlf _9 = {s5,1 , s5,2 , · · · , s5,5 }. At round 8, the s9 generates data d89 , encrypts it into EN text89 , encodes EN text89 into n = t = 5 parts, distributes the 5 parts to all the 5 nodes in N Bqlf _9 , and then follows the steps below depending on the τ . 1) τ = 1. Since m = n = 5, it forces the ADV to 8 compromise all the 5 nodes 5to recover the EN text9 with probability P rrecov (5, 5) = j=1 Pi,j = 5% · 5% · 10% · 10% · 20% = 0.0005%. To compromise the BSe of s9 , the ADV has to recover EN text89 . On the other hand, it has to compromises s9 to get the key secret to decrypt the EN text89 . Assuming P9 = 20% is the probability of the s9 to be compromised by the ADV, the probability of BSe of s9 to be compromised is P rBSe_comp = P9 · P rrecov (5, 5) = 0.0001%. 2) τ > 1. Based on Eq. (2), m should be chosen as m = 4 (m > 25 7 ). If 1 (n − m = 1) data part is corrupted or lost, the EN text89 still can be recovered. The ADV has to compromise at least 4 nodes to recover the EN text89 with probability 0.0025% ≤ P rrecov (4, 5) ≤ 0.01%. Given P9 = 20%, the BSe of s9 to be compromised is 0.0005% ≤ P rBSe_comp ≤ 0.002%. V. PERFORMANCE A NALYSIS

Pi,j .

(3)

In this section, we show a comparison between the results obtained through a MATLAB simulator [12] we developed. To reduce storage overhead and communication overhead, We consider an UWSN where 200 nodes are randomly si can choose the top n = t (t ≤ t) security level nodes, distributed in a 500m by 500m area. Each sensor node has a t which satisfy P rrecov (m, n) = j=1 Pi,j ≤ λi to distribute transmission range equal to T R = 60m. The simulation results the data, for a given required security level λi . are averaged over 100 randomly deployed networks. Nodes are Below, we state our claims for the security of proposed divided into four sets with different compromise probability scheme. We defer the proofs to Appendix. Pi : 20% of nodes with probability Pi = 50%; 30% of nodes with Pi = 40%; 30% of nodes with Pi = 20%; and 20% of Claim 3. The proposed scheme can guarantee FSe. nodes with Pi = 10%. We set the probability threshold value Lemma 4. The BSe of the sensor si can be compromised P T = 30% of the proposed node selection scheme, meaning i by a ADV, if and only if the following three conditions are that node with P > P T is considered too risky to allocate i i satisfied. data parts and would not be selected based on node selection scheme. The required security level is set as λi = 0.1%. Since 1) the sensor si is compromised by the ADV; both [3] and [4] operate in ideal network without node and 2) the ADV’s compromising ability k > m; 3) the ADV compromised at least m neighbor nodes of si message failure, we conduct the simulation compare to the proposed scheme, the scheme used in [7], and a naive scheme that store the corresponding data parts. meaning that no security mechanism is adopted. Proof: Straightforward. As shown in Fig. 2 (A), (B), (D) and (E), we observe Claim 5. Let P Ti be a probability threshold, τ be the redunthat the proposed scheme can guarantee the best probabilistic dancy factor and Pi be the probability of si to be compromised BSe with respect to [7] and the naive scheme, no matter in time interval T . If Conditions 1-3 of Lemma 4 are satisfied what redundancy factor τ is. When τ < 1, the proposed then the probability P rBSe_comp that the ADV compromises scheme has the highest probability of data reliability. However, the BSe of si is as following: when τ = 1, both [7] and the proposed scheme has lower ⎧ = 0 , k < m P r probability of data reliability than the naive scheme provided. ⎪ BSe_comp ⎪ ⎪ ⎨m P P  P r This observation agrees with the discussion in Section IV-A BSe_comp j=1 i,j i n ⎪  j=n−m Pi,j Pi , k > m, τ < 1, Pi,j ≤ P Ti 1) maximum security without redundancy, meaning that the ⎪ ⎪ t original data cannot be recovered if one data part is lost. ⎩ P rBSe_comp = j=1 Pi,j Pi , k > m, τ = 1, Pi,j ≤ P Ti . Since the nodes are randomly distributed in the simulation, An example. For simplicity, we assume that a sensor s9 has the number of si ’s neighbor nodes is different, which causes 7 neighbor nodes, denoted as N B9 = {s9,1 , s9,2 , · · · , s9,7 }, effect on P rBSe_comp and data reliability. Such effects with j=1

j=n−m

978-1-4244-5638-3/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

(A) τ=1

0.2 0.15 Naive scheme

0.1

Wang et al. [7] Ours

0.05

Prob. of BSe to be compromised

Prob. of data reliability

0.25

0.65 0.6 0.55 0.5 Naive scheme

0.45

Wang et al. [7] Ours

0.4 0.35

0

50

100 Index of nodes

150

0.3

200

(D) τ

Suggest Documents