Online data storage using implicit security - Semantic Scholar

4 downloads 7676 Views 286KB Size Report
It is advantageous to use implicit security for online data storage in a cloud ... sharing scheme it is assumed that the encrypted data is stored in a secure place and ... have been proposed [6,10,30], but have a complexity of OПnlog2nч at best, ...
Information Sciences 179 (2009) 3323–3331

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Online data storage using implicit security Abhishek Parakh *, Subhash Kak Computer Science Department, Oklahoma State University, Stillwater, OK 74075, USA

a r t i c l e

i n f o

Article history: Received 27 October 2008 Received in revised form 24 April 2009 Accepted 24 May 2009

Keywords: Online data storage Data partitioning Implicit security architecture

a b s t r a c t It is advantageous to use implicit security for online data storage in a cloud computing environment. We describe the use of a data partitioning scheme for implementing such security involving the roots of a polynomial in finite field. The partitions are stored on randomly chosen servers on the network and they need to be retrieved to recreate the original data. Data reconstruction requires access to each server, login password and the knowledge of the servers on which the partitions are stored. This scheme may also be used for data security in sensor networks and internet voting protocols. Ó 2009 Elsevier Inc. All rights reserved.

1. Introduction Securing data stored on distributed servers is of fundamental importance in cloud computing. The traditional (explicit) approach to securing data is to store and back it up on a single server and allow access upon the use of passwords that are needed to be frequently changed. But there is a tendency among users to keep passwords simple and memorable [11,21,23] leading to the possibility of brute force attacks. Furthermore since the data on the Web is archived, keys that provide adequate encryption today are likely to be broken in the future. Therefore, explicit security architecture may not be adequate for many applications. To go beyond the present approach, one may incorporate further security within the system by using puzzles [14] or otherwise by another layer that increases the space that the intruder must search in order to break the system. Here, we propose an implicit security architecture in which security is distributed amongst many entities. In contrast to hash-based distributed security models for wireless sensor networks [27–29,31,12], we consider a more general method of data partitioning. In this approach, stored data is partitioned into two or more pieces and stored at randomly chosen places on the network that are known only to the owner of the data. Access to these pieces not only depends on the knowledge of a password but also on the knowledge of where the pieces are stored. The division of data is performed in such a way that the knowledge of all the pieces is required to recreate the data and that none of the individual pieces reveals any useful information. In scenarios where one or more pieces may be at the danger of being lost or inaccessible due to system or network failure, one may employ schemes that can recreate the data from a subset of original pieces. Formally, our scheme consists of two parts – the first part is a ðk; kÞ partitioning scheme where all the k partitions are required to recreate the data and the second part extends the first part of the scheme to a ðk; nÞ partitioning scheme, where k 6 n and k P 2, i.e. only k out of n partitions are required to recreate the data. A number of schemes have been proposed in the communications context for splitting and sharing of decryption keys [25,8,22]. These schemes fall under the category of ‘‘secret sharing schemes”, where the decryption key is considered to

* Corresponding author. Tel.: +1 405 744 5740. E-mail addresses: [email protected] (A. Parakh), [email protected] (S. Kak). 0020-0255/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2009.05.013

3324

A. Parakh, S. Kak / Information Sciences 179 (2009) 3323–3331

be a secret. Motivated by the need to have an analog of the case where several officers must simultaneously use their keys before a bank vault or a safe deposit box can be opened, these schemes do not consider the requirement of data protection for a single party. Further, in any secret sharing scheme it is assumed that the encrypted data is stored in a secure place and that none of it can be compromised without the decryption key. Some schemes that directly apply secret sharing for distrib2 uted data storage in sensor networks have been proposed [6,10,30], but have a complexity of Oðnlog nÞ at best, whereas our scheme can achieve linear complexity. Other schemes aim at sharing multiple secrets [7] but have a complexity of Oðn2 Þ or generate group keys for group sharing of information [17]. Certain probability models have been proposed for reconstructing secret sharing over the Internet [18], but only focus on strategies for share distribution over the network and do not provide methods for data partitioning. In this paper, we therefore protect data by distributing its parts over various servers. The idea of doing these partitions is a generalization of the use of 3 or 9 roots of a number in a cubic transformation [15]. The scheme we present is simple and easily implementable. We would like to stress that the presented scheme is different from Shamir’s secret sharing scheme [25] which takes the advantage of polynomial interpolation and maps the secret on the y-axis; whereas we map the data as roots of a polynomial on the x-axis. A potential application of the idea of implicit security is in secure internet voting. Internet voting is an inherent part of the online virtual worlds, such as Second Life, Active Worlds, Habbo Hotel, and others, and is only a few years away from large scale real-world implementations [26,1–3]. These virtual worlds provide an excellent test-bed for online voting schemes, which once successful may be used in real-world democratic process. To make use of the implicit security architecture in online voting, the voter’s cast ballot is considered as his data and the partitions of this ballot may be stored on different servers for distributed security. Such a system will protect the intermediate election results from being revealed while distributing the security of the ballots over different servers [20]. 2. Proposed data partitioning scheme 2.1. The ðk; kÞ data partitioning scheme We consider the model of Fig. 1 to illustrate the difference between explicit and implicit security. We need to partition data and send it to randomly chosen servers. We now describe our data partitioning scheme. By the fundamental theorem of algebra, every equation of degree k has k roots. We use this fact to partition data into k partitions such that each of the partition is stored on a different server. No explicit encryption of data is required to secure each partition. The partitions in themselves do not reveal any information and hence are implicitly secure. Only when all the partitions are brought together is the data revealed. Consider an equation of order k

xk þ ak1 xk1 þ ak2 xk2 þ    þ a1 x þ a0 ¼ 0

ð1Þ

Eq. (1) has k roots denoted by fr1 ; r2 ; . . . ; rk g # fset of complex numbersg and can be rewritten as

ðx  r 1 Þðx  r 2 Þ    ðx  r k Þ ¼ 0

ð2Þ

In cryptography, it is more convenient to use the finite field Zp where p is a large prime. If we replace a0 in (1) with the data d 2 Zp that we wish to partition then,

xk þ

k1 X

aki xki þ d  0 mod p

ð3Þ

i¼1

where 0 6 ai 6 p  1 and 0 6 d 6 p  1. (Note that one may alternatively use d in (3) instead of d.) This may be rewritten as

Fig. 1. Illustration of implicit and explicit security architectures.

A. Parakh, S. Kak / Information Sciences 179 (2009) 3323–3331 k Y ðx  ri Þ  0 mod p

3325

ð4Þ

i¼1

where 1 6 ri 6 p  1. The r i are the partitions (Fig. 2). It is clear that the term d in (3) is independent of variable x and therefore k Y

r i  d mod p

ð5Þ

i¼1

If we allow the coefficients in (3) to take values a1 ¼ a2 ¼    ¼ ak1 ¼ 0, then (3) will have k roots only if GCDðp  1; kÞ–1 and 9b 2 Zp such that d is the kth power of b. One simple way to chose such a p would be to choose a prime of the form ðk  s þ 1Þ, where s 2 N. However, such a choice would not provide good security because knowledge of the number of roots and one of the partitions would be sufficient to recreate the original data by computing the kth power of that partition. Furthermore, not all values of d will have a kth root and hence one cannot use any arbitrary data, which would ideally be required. Therefore, one of the restrictions on choosing the coefficients is that not all of them are simultaneously zero. For example, if the data needs to be divided into two parts then an equation of second degree is chosen and the roots computed. If we represent this general equation by

x2 þ a1 x þ d  0 mod p

ð6Þ

then the two roots can be calculated by solving the following equation modulo p:



a1 

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a21  4d

ð7Þ

2

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi which has an solution in Zp only if the square root a21  4d exists modulo p. If the square root does not exist then a different value of a1 needs to be chosen. We present a practical way of choosing the coefficients below. However, this brings out the second restriction on the coefficients, i.e. they should be so chosen such that a solution to the equation exists in Zp . Theorem 1. If the coefficients ai ; 1 6 i 6 k  1 in Eq. (3) are randomly and uniformly chosen and are not all simultaneously zero, then the knowledge of any k  1 roots of the equation, such that Eq. (4) holds, does not provide any information about the value of d with a probability greater than that of a random guess of 1=p. Proof. Given a specific d, the coefficients in (3) can be chosen to satisfy (4) in Zp in the following manner. Choose at random from the field k  1 random roots r1 ; r2 ; . . . ; rk1 . Then kth root r k can be computed by solving the following equation,

rk ¼ d  ðr 1  r2      r k1 Þ1 mod p

ð8Þ

Since the roots are uniformly and randomly distributed in Zp , the probability of guessing r k without knowing the value of d is 1=p. Conversely, d cannot be estimated with a probability greater than 1=p without knowing the kth root rk . It follows from Theorem 1 that data is represented as a multiple of k numbers in the finite field. h

Fig. 2. The details of the partitioning process.

3326

A. Parakh, S. Kak / Information Sciences 179 (2009) 3323–3331

Example 1. Let data d ¼ 10, prime p ¼ 31, and let k ¼ 3. We need to partition the data into three parts for which we will need to use a cubic equation, x3 þ a2 x2 þ a1 x  d  0 mod p. We can find the equation satisfying the required properties using Theorem 1. Assume, ðx  r1 Þðx  r 2 Þðx  r 3 Þ  0 mod 31. We randomly choose two roots from the field, r1 ¼ 19 and r 2 ¼ 22. Therefore, r 3  d  ðr1  r2 Þ1  10  ð19  22Þ1 mod 31  11. The equation becomes ðx  r 1 Þðx  r2 Þðx  r 3 Þ  x3  21x2 þ x  10  0 mod 31, where the coefficients are a1 ¼ 1 and a2 ¼ 21 and the partitions are 11, 22 and 19. 2.2. Choosing the coefficients In the previous section, we described two conditions that must be satisfied by the coefficients. The first condition was that not all the coefficients are simultaneously zero and second that the choice of coefficients should result in an equation with roots in Zp . Since no generalized method for solving equations of degree higher than 4 exists [5], a numerical method must be used which becomes impractical as the number of partitions grow. An easier method to compute the coefficients is exemplified by Theorem 1 and Example 1. One might ask why should we want to compute the coefficients if we already have all the roots? We answer this question in Section 3. 2.3. Introducing redundancy In situations when the data pieces stored on one or more (less than a threshold number of) servers over the Internet may not be accessible, the user should be able to recreate the data from the available pieces. The procedure outlined below extend the k partitions to n, n P k partitions such that only k of them need to be brought together to recreate the data. If fr1 ; r2 ; . . . ; r k g is the original set of partitions then they can be mapped into a set of n partitions fp1 ; p2 ; . . . ; pn g by the use of a mapping function based on linear algebra. If we construct n linearly independent equations such that

a11 r 1 þ a12 r 2 þ    þ a1k rk ¼ c1 a21 r 1 þ a22 r 2 þ    þ a2k rk ¼ c2 .. . an1 r 1 þ an2 r 2 þ    þ ank rk ¼ cn where numbers aij are randomly chosen from the finite field Zp , then the n pi ¼ fai1 ; ai2 ; . . . ; aik ; ci g; 1 6 i 6 n. The above linear equation can be written as matrix operation,

a11

a12

6a 6 21 6 . 6 . 4 . an1

a22

3 2 3 r1 c1 6 7 6 7 . . . a2k 7 76 r 2 7 6 c2 7 76 . 7 ¼ 6 . 7 76 . 7 6 . 7 54 . 5 4 . 5

an2

. . . ank

2

. . . a1k

new

partitions

are

32

rk

cn

To recreate rj ; 1 6 j 6 k from the new partitions, any k of them can be brought together,

2

3

2

31 2

am1

am2

. . . amk

6r 7 6 a 6 2 7 6 n1 6 . 7¼6 . 6 . 7 6 . 4 . 5 4 .

an2

...

ank 7 7 7 7 5

ai2

...

aik

r1

rk

ai1

kk

c1

3

6c 7 6 27 6 . 7 6 . 7 4 . 5 ck

k1

A feature of the presented scheme is that new partitions may be added and deleted without affecting any of the existing partitions. 2.4. Complexity of the proposed protocols Following subsections discuss the complexity issues of the proposed data partitioning scheme. 2.4.1. The complexity of the ðk; kÞ data partitioning scheme The complexity of the ðk; kÞ partitioning scheme is OðkÞ. This is illustrated by the following algorithm. Algorithm 1a. (k,k) Data Partitioning Scheme Input data d 1. Choose randomly and uniformly from a finite field Zp ; k  1 numbers r1 ; r2 ; . . . ; r k1 . 2. Compute rk ¼ d  ðr 1  r2      rk1 Þ1 mod p.

A. Parakh, S. Kak / Information Sciences 179 (2009) 3323–3331

3327

3. Construct the kth degree polynomial: pðkÞ ¼ ðx  r1 Þðx  r2 Þ    ðx  rk Þ mod p ¼ xk þ ak1 xk1 þ ak2 xk2 þ    þ a1 x þ a0 mod p. 4. The roots r1 ; r2 ; . . . ; r k of the polynomial pðkÞ represent the data partitions. Output ? vector ~ R ¼ ½r 1 ; r 2 ; . . . ; rk  As seen from the above algorithm, partition creation requires k multiplications and one inversion operation modulo prime p. Hence the time complexity of OðkÞ. Similarly, data reconstruction requires k multiplications which is illustrated by the following algorithm. vector ~ R ¼ ½r1 ; r 2 ; . . . ; rk 

Algorithm 1b. (k,k) Data ReconstructionInput 1. Data d ¼ r 1  r 2      r k mod p. Output ? d

Linear complexity of Algorithms 1a and 1b is highly desirable for resource constraint environments such as that of sensor networks. 2.4.2. The complexity of the ðk; nÞ data partitioning scheme The proposed ðk; nÞ data partitioning scheme uses the ðk; kÞ partitioning scheme as a sub-algorithm. That is, we first obtain k partitions using Algorithm 1a and then introduce redundancy to cope with possible inaccessibility of some of the partitions. The ðk; nÞ partitioning scheme works as follows. Algorithm 2a. (k,n) Data Partitioning Scheme Input data d 1. Use Algorithm 1a to create k partitions of data d. Obtain vector ~ R ¼ ½r 1 ; r2 ; . . . ; rk . 2. Generate n  k matrix A such that all the rows of matrix are linearly independent.Note. A simple way to choose such a matrix would be to choose a Vandermonde matrix of the following form

2

1 x1

6 6 1 x2 6 A¼6. 6 .. 4 1 xn

x21

x31

x22

x32

x2n

x3n

. . . x1k1

3

7 . . . x2k1 7 7 7 7 5 k1 . . . xn

3. Create n partitions by computing ½Ank  ½~ RT k1 ¼ ½Cn1 , where C ¼ ½c1 ; c2 ; . . . ; cn T . 4. The partitions are denoted by the pair pi ¼ fxi ; ci g, 1 6 i 6 n. Output ! pi ¼ fxi ; ci g; 1 6 i 6 n Algorithm 2b. (k,n) Data Reconstruction Input any k partitions pi ¼ fxi ; ci g 1. Construct the k  k matrix B by choosing the rows of matrix A corresponding to the given pairs pi and compute B1 . 2. Reconstruct the k shares by evaluating ½~ RT k1 ¼ ½B1 kk  ½Ck1 . 3. Use Algorithm 1b to reconstruct data d, where the input is the algorithm is ~ R. Output ? d 2

The complexity of Algorithms 2a and 2b is Oðnlog nÞ [4,16]. A more efficient method, that eliminates multiplications during partition creation, would be to use a n  k matrix with elements from GF(2). Such matrices are used in error-correction coding [19]. This will reduce all multiplication operations to additions operations for partition creation (assuming the cost of multiplications with 0 and 1 can be ignored). And inversion of such a matrix during data reconstruction would require at most Oðn3 Þ operations and Oðn2 Þ multiplications. An example of such a matrix in GF(2) is given below, where we have assumed n ¼ 4 and k ¼ 3.

2

3 1 0 1 60 1 07 6 7 A¼6 7 40 1 15 1 1 0

3328

A. Parakh, S. Kak / Information Sciences 179 (2009) 3323–3331

Any k ¼ 3 columns the above matrix A are linearly independent. Since the above matrix only requires multiplications with 0 and 1, the time complexity of the scheme becomes linear for partition creation. This is one of the advantages of our scheme over traditional schemes, such as that of Shamir’s, where a matrix GF(2) cannot be used to simplify computations. 2.4.3. A note on applications 2 As seen above, Algorithms 1a and 1b require OðkÞ computations while Algorithms 2a and 2b require Oðnlog nÞ computations. Both these implementations are efficient and suitable for resource constraint environments such as wireless sensor networks. Further, while sensors generate crucial data and transmit it securely to the base station, the reconstruction of data happens for most cases only at the base station. Since base stations have much higher computational capacity than the sensors themselves, a matrix in GF(2) may be used to reduce the computational burden on the sensors. 3. An alternate approach to partitioning Once the user has computed all the k roots then he may compute the equation resulting from (4) and store one or all of the roots on different servers and the coefficients on different servers. Recreation of original data can now be performed in two ways: either using (5) or choosing one of the roots at random and retrieving the coefficients and substituting the appropriate values in the equation (3) to compute 0

d  xk þ ak1 xk1 þ ak2 xk2 þ    þ a1 x mod p

ð9Þ

0

Therefore, the recreated data d  ðp  d Þ mod p. Parenthetically, this may provide a scheme for fault tolerance. Additionally, a user may store just one of the roots and k  1 coefficients on the network. These together represent k partitions. Note. Distinct sets of coefficients (for a given constant term in the equation) result in distinct sets roots and vice versa. This is because two distinct sets of coefficients represent distinct polynomials because two polynomials are said to be equal if and only if they have the same coefficients. By the fundamental theorem of algebra, every polynomial has unique set of roots. Two sets of roots R1 and R2 are distinct if and only if 9r i 8rj ðr i – rj Þ, where r i 2 R1 and rj 2 R2 . To compute the correspondQk and ing polynomials and the two sets coefficients C 1 and C 2 , we perform i¼1;r i 2R1 ðx  r i Þ  0 mod p Qk j¼1;r j 2R2 ðx  r j Þ  0 mod p and read the coefficients of from the resulting polynomials, respectively. It is clear the at least one of the factors of the two polynomials is distinct because at least one of the roots is distinct; hence the resulting polynomials for a distinct set of roots are distinct. Theorem 2. Determining the coefficients of a polynomial of degree k P 2 in a finite field Zp , where p is prime, by brute force, pk1 eÞ computations. requires Xðdðk1Þ! Proof. If A represents the set of coefficients A ¼ fa1 ; a2 ; . . . ; ak1 g, where 0 6 ai 6 p  1, then by the above note each distinct instance of set A gives rise to a distinct set of roots R ¼ fr1 ; r2 ; . . . ; r k g, and conversely every distinct instance of set R gives rise to a distinct set of coefficients A. Therefore, if we fix d to a constant then (5) can be used to compute a set of roots. Every distinct set of roots is therefore a k  1 combination of a multiset, where each element has infinite multiplicity, and equivalently a k  1 combination of set S ¼ f0; 1; 2; . . . ; p  1g with repetition allowed. Thus, the number of possibilities for the choices of coefficients is given by the following expression:



p k1





¼

p  1 þ ðk  1Þ



k1 ðp þ k  2Þ! ¼ ðp  1Þ!ðk  1Þ! ðp þ k  2Þðp þ k  3Þ    ðp þ k  kÞðp  1Þ! ¼ ðp  1Þ!ðk  1Þ! ðp þ k  2Þðp þ k  3Þ    p ¼ ðk  1Þ!

Pd

ð10Þ

pk1 e ðk  1Þ!

Here we have used the fact that in practice p  k P 2, hence the result. We have ignored the one prohibited case of all coefficients being zero, which has no effect on our result. 4. A stronger variation to the protocol modulo a composite number An additional layer of security may be added to the implementation by performing computations modulo a composite number n ¼ p  q; p and q are primes, and using an encryption exponent to encrypt the data before computing the roots of the equation. In such a variation, knowledge of all the roots and coefficients of the equation will not reveal any information about the data and the adversary will require to know the secret factors of n. For this we can use an equation such as the follows:

A. Parakh, S. Kak / Information Sciences 179 (2009) 3323–3331

xk þ

k1 X

y

aki xki þ d  0 mod n

3329

ð11Þ

i¼1

where y is a secretly chosen exponent and GCDðy; uðnÞÞ ¼ 1. If the coefficients are chosen such that (11) has k roots then k Y

y

r i  d mod n

ð12Þ

i¼1

Appropriate coefficients may be chosen in a manner similar to that described in previous sections. It is clear that compromise y of all the roots and coefficients will at the most reveal c ¼ d mod n. In order to compute the original value of d, the adversary will require the factors of n which are held secret by the user. 5. Addressing the data partitions The previous sections consider the security of the proposed scheme when prime p and composite n are public knowledge. However, there is nothing that compels the user to disclose the values of p and n. If we assume that p and n are secret values then the partitions may be stored in the form of an ‘‘encrypted link list”. We define an encrypted link list as a link list in which every pointer is in encrypted form and in order to find out which node the present node points to, the user needs to decrypt the pointer which can be done only if certain secret information is known. If we assume that p and n are public values then the pointer can be so encrypted that each decryption either leads to multiple addresses or depends on the knowledge of the factors or both. Only the authentic user will know which of the multiple addresses is to be picked. Alternatively, one may use a random number generator and generate a random sequence of server IDs using a secret seed. One way to generate this seed may be to find the hash of the original data and use it to seed the generated sequence and keep the hash secret. 6. Security by key distribution In some cases, when there is a large amount of data, a user may find it inefficient to create data partitions and distribute them over the network but may wish to encrypt the data and store it on a single server which he trusts and keep encryption key secret. Almost always, encryption keys are very large random numbers and cannot be memorized. Therefore, the user may create partitions of the key and spread them over the network. This approach may be more efficient, if not more secure, than creating data partitions of enormous amounts of data. Furthermore, the access to the servers on which the key partitions are stored may be password restricted. We stress that this scenario is different from the secret sharing scenario where multiple parties are involved since our case involves data security for a single party. Keys may be partitioned using the scheme presented in the previous sections where the data is replaced by the key. Alternatively one may use the key partitioning scheme presented in the following sub-section that uses polynomials in Galois Field GF(2m ), for which, given the distributed nature of the scheme, the security will be better than polynomial generalization of power encryption transformations [13,9]. 6.1. Key partitioning in galois field 2m A Galois Field (2m ) consists of polynomials of degree m  1 or less, such that their coefficients lie in GF(2). For example, am1 xm1 þ am2 xm2 þ    þ a1 x þ a0 , where ai ¼ 1 or 0 represents a polynomial belonging to GFð2m Þ and it may be written in

Fig. 3. Illustration of key partitioning.

3330

A. Parakh, S. Kak / Information Sciences 179 (2009) 3323–3331

binary representation as (am1 am2 . . . a1 a0 ). Multiplication in GFð2m Þ is performed modulo an irreducible polynomial gðxÞ ¼ xm þ bm1 xm1 þ    þ b1 x þ b0 , where bi 2 GFð2Þ. A polynomial gðxÞ is said to be irreducible if it cannot be factored into two or more polynomials each with coefficients in GF(2) and each of degree less than m. Therefore, a user can generate a random key and partition it into k partitions using the following procedure. He generates k random polynomials of any random degree in GF(2m ) and computes their product modulo the irreducible polynomial gðxÞ. The resultant polynomial of degree m  1 with binary coefficients is taken as the required key in its binary representation and the randomly generated polynomials are the partitions (Fig. 3). Example 2. In order to generate a random 8 bit key and create five partitions of it, the user may proceed as follows. Let gðxÞ ¼ x8 þ x2 þ 1 be an irreducible polynomial in GF(2m ). He chooses five polynomials of degree m  1 or less (at random), such as p1 ðxÞ ¼ x7 þ x6 þ x, p2 ðxÞ ¼ x4 þ x2 þ x þ 1; p3 ðxÞ ¼ x5 þ x3 ; p4 ðxÞ ¼ x7 þ x6 þ x5 þ x and p5 ðxÞ ¼ x6 þ x2 þ x þ 1 and computes there product, p1 ðxÞp2 ðxÞp3 ðxÞp4 ðxÞp5 ðxÞ  kðxÞ mod gðxÞ, where kðxÞ is the ‘‘key polynomial” and the coefficients are the binary representation of the key.

kðxÞ  p1 ðxÞp2 ðxÞp3 ðxÞp4 ðxÞp5 ðxÞ  x6 þ x4 þ x3 þ x þ 1 mod gðxÞ Therefore, the random key generated is k ¼ 01011011 and the partitions are, p1 ¼ 11000010; p2 ¼ 00010111; p3 ¼ 00101000; p4 ¼ 11100010; p5 ¼ 01000111. Note that the knowledge of k  1 partitions, in the scheme presented above, leaves the kth partition completely undetermined and the adversary still requires Hð2m1 Þ trials to determine the encryption key. This is because all the partitions 2 GFð2m ) and that each one of them is independently and randomly chosen from the field. Hence, knowledge of k  1 partitions does not reveal any information about the choice made for the kth partition. Further, since the kth partition can be any random polynomial from the field, in order to compute this polynomial by brute force would require on an average half the number of trial of 2m , i.e. Hð2m1 Þ. Since the right hand side of the equation (the key) is completely dependent on the choice of left hand side of the equation (the factors of the key), one would require Hð2m1 Þ trial to compute the key with the knowledge of k  1 partitions [24]. In practice, keys are of 128–512 bits in AES, i.e. Hð2127 Þ to Hð2511 Þ possible trials on an average. The above scheme can be extended to a ðk; nÞ partitioning scheme by converting the existing partitions into decimal integers and using the extension presented in Section 2.3. 7. Partition redundancy v/s no redundancy In cases where the accessibility to the partitions is guaranteed and the data disks are sufficiently backed-up, partition redundancy may lead to un-necessary wastage of storage space. This is certainly true for in-system storage where data partitions may be distributed over numerous sectors randomly and not contiguously stored. Also in case of network storage larger number of partitions would require larger upload time which may be avoidable. 8. Application in Internet voting protocols Internet voting is a challenge for cryptography because of its opposite requirements of confidentiality and verifiability. There is the further restriction of ‘‘fairness” that the intermediate election results must be kept secret. One of the ways to solve this problem is to use multiple layers of encryption such that the decryption key for each layer is available with a different authority. This obviously leaves open the question as to who is to be entrusted the encrypted votes. A more effective way to implement fairness would be to avoid encryption keys altogether and divide each cast ballot into k or more pieces such that each authority is given one of the pieces [20]. This solves the problem of entrusting any one authority with all the votes and if any of the authorities (less than the threshold) try to cheat by deleting some of the cast ballots, then the votes may be recreated using the remaining partitions. Such a system implicitly provides a back-up for the votes. 9. Conclusions and future work We have described an implicit security architecture suited for the application of online storage. In this scheme data is partitioned in such a way that each partition is implicitly secure and does not need to be encrypted. These partitions are stored on different servers on the network which are known only to the user. Reconstruction of the data requires access to each server and the knowledge as to which servers the data partitions are stored. Several variations of this scheme are described, which include the implicit storage of encryption keys rather than the data, and where a subset of the partitions may be brought together to recreate the data. One can propose variations to the scheme presented in this paper where the data partitions need to be brought together in a definite sequence. One way to accomplish it is by representing partition in the following manner:

A. Parakh, S. Kak / Information Sciences 179 (2009) 3323–3331

3331

p1 ; p1 ðp2 Þ; p2 ðp3 Þ; . . ., where pi ðpj Þ represents the encryption of pj by means of pi . Such a scheme will increase the complexity of the brute force decryption task for an adversary. The proposed scheme is also of potential use in sensor networks. To protect sensitive information on sensors that are physically compromised, one may partition the data and store these parts on other sensors in the network. The key that is used to find the location of these parts is encrypted and this encryption is changed from time to time. This provides solutions that are more general than other distributed security models used in sensor networks [27]. Our approach leads to interesting research issues such as the optimal way to distribute the parts in a network of n sensors, so that the network can work even if some of the sensors happen to malfunction by themselves or as a consequence of intervention by an adversary. It also leads to the question of reliability and tolerance to faults in such a system. Acknowledgement This work was partly funded by Central Rural Electric Corporation, Stillwater, Oklahoma. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31]

Reuters – second life. . Second life herald. . Expanding the Use of Electronic Voting Technology for UOCAVA Citizens, Department of Defence, May 2007. A. Aho, J. Hopcroft, J. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974. A. Bharucha-Reid, M. Sambandham, Random Polynomials, Academic Press, New York, 1986. T. Claveirole, M. de Amorim, M. Abdalla, Y. Viniotis, Securing wireless sensor networks against aggregator compromises, Communications Magazine IEEE 46 (4) (2008) 134–141. M.H. Dehkordi, S. Mashhadi, New efficient and practical verifiable multi-secret sharing schemes, Information Sciences 178 (9) (2008) 2262–2274. D. Denning, Cryptography and Data Security, Addison-Wesley Publishing Company, 1982. L. Dickson, Linear Groups with an Exposition of the Galois Field Theory, Dover Publications, 1958. A. Eldin, A. Ghalwash, H. ElShandidy, Enhancing packet forwarding in mobile ad hoc networks by exploiting the information dispersal algorithm, Communications, Computers and Applications, 2008. MIC-CCA 2008. Mosharaka International Conference, August 2008, pp. 20–25. E. Gheringer, Choosing passwords: security and human factors, Proceedings of IEEE 90 (2002) 369–373. B. Greenstein, D. Estrin, R. Govindan, S. Ratnasamy, S. Shenker, Difs: a distributed index for features in sensor networks, Ad Hoc Networks 1 (2003) 333–349. S. Kak, Exponentiation modulo a polynomial for data security, International Journal of Computer and Information Science 13 (1983) 337–346. S. Kak, On the method of puzzles for key distribution, International Journal of Computer and Information Science 14 (1984) 103–109. S. Kak, A cubic public-key transformation, Circuits, Systems and Signal Processing 26 (2007) 353–359. D. Knuth, The Art of Computer Programming, vol. 2, Addison-Wesley, 1969. C. Kuang-Hui, C. Wen-Tsuen, A new group key generating model for group sharing, Information Sciences 64 (1-2) (1992) 83–94. C.-Y. Lee, Y.-S. Yeh, D.-J. Chen, K.-L. Ku, A probability model for reconstructing secret sharing under the internet environment, Information Sciences 116 (2-4) (1999) 109–127. T. Moon, Error Correction Coding: Mathematical Methods and Algorithms, Wiley-Interscience, 2005. A. Parakh, S. Kak, Internet voting protocol based on implicit data security, in: Proceedings of 17th International Conference on Computer Communications and Networks 2008, ICCN 08, pp. 1-4. R. Proctor, M. Lien, K. Vu, G. Salvendy, Improving computer security for authentication of users: influence of proactive password restrictions, Behavior Research Methods, Instruments and Computers 34 (2002) 163–169. A. Renvall, C. Ding, A nonlinear secret sharing scheme, Information Security and Privacy, LNCS 1172 (1996) 56–66. S. Riley, Password security: what users know and what they actually do, Usability News 8.1 (2006). K. Rosen, Discrete Mathematics and its Applications, McGraw-Hill, 2007. A. Shamir, How to share a secret, Communication of ACM 22 (11) (1979) 612–613. R. Sinnott, First report-december 2004 (irish commission on electronic voting) appendix 2c, 2004, pp. 153–191. N. Subramanian, C. Yang, W. Zhang, Securing distributed data storage and retrieval in sensor networks, Pervasive and Mobile Computing 3 (6) (2007) 659–676. December. G. Wang, W. Zhang, G. Cao, T.L. Porta, On supporting distributed collaboration in sensor networks, in: IEEE Military Communications Conference, October 2003. F. Ye, H. Luo, J. Cheng, S. Lu, L. Zhang, A two-tier data dissemination model for large-scale wireless sensor networks, in: ACM International Conference on Mobile Computing and Networking, 2002, pp. 148–159. T. Yuan, S. Zhang, Secure fault tolerance in wireless sensor networks, in: CITWORKSHOPS’08: Proceedings of the 2008 IEEE Eighth International Conference on Computer and Information Technology Workshops, 2008, pp. 477–482. W. Zhang, G. Cao, T.L. Porta, Data dissemination with ring-based index for sensor networks, in: IEEE International Conference on Network Protocol, November 2003.

Suggest Documents