2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
Distributed Verification Protocols for Data Storage Security in Cloud Computing Priodyuti Pradhan∗ , P. Syam Kumar∗ , Gautam Mahapatra† , and R. Subramanian∗
∗ Department
of Computer Science, School of Engineering and Technology, Pondicherry University, India
[email protected],
[email protected],
[email protected] † Department of Computer Science, Asutosh College, University of Calcutta, India,
[email protected]
Abstract—Recently, storage of huge volume of data into Cloud has become an effective trend in modern day Computing due to its dynamic nature. After storing, users delete their original copy of the data files. Therefore, users cannot directly control over that data. This lack of control introduce security issues in Cloud data storage, one of the most important security issue is integrity of the remotely stored data. Here, we propose a Distributed Algorithmic approach to address this problem with publicly probabilistic verifiable scheme. Due to heavy workload at the Third Party Auditor side, we distribute the verification task among various SUBTPAs. We use Sobol Random Sequence to generate the random block numbers that maintain the uniformity property. In addition, our method provide uniformity for each subtasks also. To make each subtask uniform, we use some analytical approach. For this uniformity, our protocols verify the integrity of the data very efficiently and quickly. Also, we provide special care about critical data by using Overlap Task Distribution Keys. Index Terms—Data integrity, public verifiability, Distributed algorithm, Sobol Random Sequence, String Reconciliation, Modified de Bruijn Graph.
I. I NTRODUCTION Storing data files into the Cloud storage gives better benefits to the large Enterprises as well as individual users because they can dynamically increases their storage space without buying any storage devices. In addition, users can access the remotely store data files anytime and from anywhere, and also gives permission to shares these data to the authorized users. Besides all of these advantages of data outsourcing, there are some security risks due to store data files into remote servers and one of the most important is checking the integrity of the remotely store data. The integrity of the data may be lost due to internal failure (disk failure) on the server side [14], or external attacks due to unauthorized users may deletes or edit some part of the files [7] because Clouds uses the concept of multi-tenancy where multiple users processes run on the same physical hardware [5]. Even sometimes, to increase the profit, the Cloud Service Provider may deletes some rarely accessed files [6]. Thus, by keeping huge amount of data into the Clouds, user cannot ensures the integrity of the stored data [12]. Therefore, user needs some scheme to check the integrity of the outsourced data without downloading the whole files and in a periodic manner. In the prior work [6], [7], [8], [9], uses Pseudo Random sequence to verify the file blocks, but due its nonuniform
978-1-4577-2078-9/12/$26.00©2011 IEEE
nature, the error block detection probability is less and also takes more time than Sobol Random sequence. Recently, in [10], [11] uses Sobol Random sequence, to verify the integrity of the data, and ensures strong integrity. But all these above work uses single Third Party Auditor (TPA) for verifying the integrity of the data. Here, we use multiple TPAs termed as SUBTPAs to check the integrity of the outsourced data under a Main TPA. In case of single TPA based verification protocols, One TPA receives the request from the Client and performs the verification task. Thus, in this single Auditor system if TPA system will crash due to heavy workload then all the verification process will be abort. On the other hand, during verification process the network traffic will be very high near the TPA organization and may create network congestion. Thus, the performance will be degrading in this scheme. Here, we device a distributed verification scheme, where the Main TPA will distributes the verification task uniformly among a number of SUBTPAs. We have shown our proposed model in Figure 1. In our protocols, the Client/Main TPA is acting as a Coordinator and all the SUBTPAs are working under the Coordinator. Hence, all the SUBTPAs are performing this verification tasks concurrently and giving the verification result to the Coordinator. Thus, concurrency increases the performance of this scheme. In addition, we use Sobol Random number generator [4] instead of Pseudo Random number generator to generate the random file block numbers being verify. After that, distributes the generated sequence among SUBTPAs in an uniform manner. In section 2, we briefly discuss about our basic approach for task distribution. In section 3, we expain some theoretical underpinnings used in our protocols. Thereafter, in section 4, we give our proposed algorithms. Then, we analyze performance and give the experimental results in section 5. Finally, in section 6, we present a brief conclusion about our work. II. A PPROACH Here, the Coordinator will randomly generates a bit string for each SUBTPA termed as Task Distribution Key (TDK). Each SUBTPA will successively apply their TDK on the generated Sobol sequence as a mask upto the sequence will exhaust and take the corresponding sequence number as block number for verification.
1
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India Client
Main TPA
SUBTPA1
SUBTPA2
SUBTPAn
Cloud Data Storage Server
Fig. 1.
Block Diagram of Distributed Audit System Architecture
For example, consider the TDK for the SUBTPA1 and SUBTPA2 are 10101 and 01010 respectively. Let, the generated Sobol random sequence is {1216, 5312, 3264, 7360, 704, 4800, 2752, 6848, 1728}, where file blocks are numbered from 0 to 8191. If we place the TDK for SUBTPA1 on the left end of the generated sequence and takes the block numbers corresponding to the 1, after that we slides the string to the right to the same length of the TDK and apply the same procedure then it generates the subtask for SUBTPA1 in (1) and similarly for SUBTPA2 in (2). {1216, 5312, 3264, 7360, 704, 4800, 2752, 6848, 1728} 1 0 1 0 1 {1216, 5312, 3264, 7360, 704, 4800, 2752, 6848,1728} 1 0 1 0 1 {1216, 3264, 704, 4800, 6848}
{$10, 100, 001, 010, 101, 010, 101, 01$} $1
1
0
00 0
0 01 1 1$
Fig. 2.
(3)
0
1
10
1 11
Modified de Bruijn Graph Corresponding to the string $10010101$
(1)
IV. E FFICIENT D ISTRIBUTED V ERIFICATION P ROTOCOL
(2)
Our distributed verification protocols are based on the probabilistic verification scheme and we classify our protocols into two different types depending on the task distribution. First, we are describing our basic protocol based on the simple partition approach. In the second, we use TDK to partition the task. To enhance the performance of our protocols, we used (m, n) threshold scheme [15] with 𝑚 < 𝑛, where Coordinator can stop the audit operation or detect the fault region after taking responses from any subset of 𝑚 out of 𝑛 SUBTPAs, because each subtask is uniformly distributed over the entire file blocks due to use of Sobol Sequence.
{1216, 5312, 3264, 7360, 704, 4800, 2752, 6848, 1728} 0 1 0 1 0 {1216, 5312, 3264, 7360, 704, 4800, 2752, 6848, 1728} 0 1 0 1 0 {5312, 7360, 2752, 1728}
by applying this protocol host A will know 𝜎𝐵 and host B will know 𝜎𝐴 with minimum communication and optimal computational complexity. In this protocol hosts are independently divides their strings into multiset of “puzzle pieces” by using predetermined mask length and resultant multisets are stored using Modified de Bruijn graph, and enumerates the Eulerian cycles to get the index of the original string. The multisets are reconciled using Set Reconciliation Protocol [2]. At the final step, each host independently construct other’s Modified de Bruijn graph from reconciled multisets and enumerates the Eulerian Cycles and use the index given by other’s to decide the other host’s string data. As an example, consider the string $10010101$ (including $ as an end marker) under the mask 111 of length 3. The multiset of puzzle pieces for the string is in (3) and corresponding graph is in figure 2.
In our protocols, we use two types of TDK for uniformly distribute the task among SUBTPAs and sometime, we adjust the TDK length to balance the subtask for each SUBTPA. III. T HEORETICAL BACKGROUND A. Sobol Sequence
A. Protocol 1: Simple Partition with Threshold Seheme
Sobol Sequence [3], [4] is a low discrepancy, quasi-random sequences that generates sequences between the interval [0, 1). One salient features of this sequence is that the sequences are uniformly distributed over the interval [0, 1). Also, it maintains uniformity for any segment out of the sequence. That means Sobol Sequence is segment wise uniform. For generating this sequence, we need a primitive polynomial, 𝒫, of degree 𝑑 over the finite field ℤ2 , and direction numbers 𝑉𝑖 [3] .
In the first protocol, the Coordinator randomly chooses one Sobol random key 𝜎𝑟 , generate the Sobol Random Block Sequence by using 𝑓𝜎𝑟 (⋅), where 𝜎𝑟 consist one randomly chosen primitive polynomial, 𝒫 of order 𝑑 out of 𝜙(2𝑑 −1)/𝑑 primitive polynomials [4], randomly chosen initial values 𝑚𝑖 , where 𝑖 ∈ {1, 2, . . . , 𝑑}, 𝑆𝐾𝐼𝑃 and 𝐿𝐸𝐴𝑃 values respectively. In the next step, partition the generated sequence ℒ by using partition function 𝑓𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 (⋅), with partition length 𝑃 𝐿𝑒𝑛𝑖 and denotes each subsequence as 𝑆𝑢𝑏𝑖 , should maintain the equivalence relation property and also maintain the uniformity property. Algorithm 1 gives the detail of key generation and Distribution phase.
B. String Reconciliation The general String reconciliation protocol [1] states that if two distinct hosts A and B, holding two string 𝜎𝐴 and 𝜎𝐵 , then
978-1-4577-2078-9/12/$26.00©2011 IEEE
2
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India In the distributed challenge and verification phase, each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 independently communicates to the Cloud Servers for proof. Here, 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 at a time sends 10% of the subsequence to the Cloud Servers as a challenge, instead of sending the whole subsequence. Therefore, it reduces the workload at the Server side as well as reduce network congestion. After sending each 10% challenge, 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 will waits for the proof from any Server because Cloud Computing is based on Distributed Control and Distributed Data paradigm. If the proof matches with the stored metadata then store TRUE in its own table, Report, and send next 10% subsequence, and waits for the next proof. If any mismatch will occur during proof verification, then 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 will immediately send a signal to the Coordinator for fault region and store FALSE in Report table. Algorithm 2 gives details of distributed challenge and verification phase.
Algorithm 2: DistributedChalandProofVerification1( ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Algorithm 1: Key Generation and Distribution 1
2
3
Coordinator randomly chooses one Primitive Polynomial, 𝒫 of degree 𝑑 and initialization number 𝑚𝑖 , 𝑖 ∈ {1, 2, . . . , 𝑑} ; Coordinator decides Sobol Random Key as 𝜎𝑟 = ⟨𝒫, 𝑚𝑖 , 𝑆𝐾𝐼𝑃, 𝐿𝐸𝐴𝑃, 𝑆𝑒𝑞𝐿𝑒𝑛, 𝐶𝑂𝑁 𝑆𝑇 𝐴𝑁 𝑇 ⟩ Generated Sequence ℒ ← 𝑓𝜎𝑟 (𝑆𝑒𝑞𝐿𝑒𝑛)
4
5
6 7 8 9
10
Multiply CONSTANT powers of 2 with ℒ, to make each element as integer block number; Coordinator Determines the Number of SUBTPAs, 𝑛, and threshold value, 𝑚 ; for 𝑖 ← 1 to 𝑛 do 𝑆𝑢𝑏𝑖 ← 𝑓𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 (𝑃 𝐿𝑒𝑛𝑖 , ℒ) end where∪𝑃 𝐿𝑒𝑛𝑖 is the length of each partition, and ℒ ⊆ 𝑖∈{1,...,𝑛} 𝑆𝑢𝑏𝑖 , ∩ 𝑆𝑢𝑏𝛼 𝑆𝑢𝑏𝛽 = 𝜙 𝑓 𝑜𝑟 𝑎𝑛𝑦 𝛼, 𝛽 ∈ {1, . . . , 𝑛} ; Distributes Subtask, 𝑆𝑢𝑏𝑖 to 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 ;
Analysis of Protocol 1 Protocol 1 follows the Centrally Controlled and Distributed Data paradigm, where all SUBTPAs are controlled by the Coordinator but communicate to any Cloud Data Storage Server for verification. Here, Coordinator will decide the partition length, 𝑃 𝐿𝑒𝑛𝑖 , and divides the sequence to each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 . Due to the use of Sobol sequence each subsequence must be uniform. After partitioning the sequence, the Coordinator will send the subsequence, 𝑆𝑢𝑏𝑖 , to each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 . This protocol gives very good performance to detect errors in the file blocks. Nevertheless, for sending 𝑆𝑢𝑏𝑖 to 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 , from the Coordinator takes extra network bandwidth. Although, it can not take any extra care about the critical data. To reduce the bandwidth usage and increase the efficiency, and also, taking extra care about critical data, we device the Task Distribution Key (TDK) to divide the sequence
978-1-4577-2078-9/12/$26.00©2011 IEEE
17
18 19
𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 Calculates 10% of 𝑆𝑢𝑏𝑖 ; for eaeh 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 do 𝑙 ← 𝑙𝑒𝑛𝑔𝑡ℎ(𝑆𝑢𝑏𝑖 ); 𝐶𝑜𝑢𝑛𝑡𝑒𝑟𝑖 ← ⌊(10/100) ∗ 𝑙⌋; end for 𝑘 ← 1 to 10 do for 𝑆 ← 1 to 𝐶𝑜𝑢𝑛𝑡𝑒𝑟𝑖 and 𝑡 ⩽ 𝑙 do 𝐶ℎ𝑎𝑙𝑖,𝑘 [𝑠] ← 𝑆𝑢𝑏𝑖 [𝑡]; end Send ⟨𝐶ℎ𝑎𝑙𝑖,𝑘 ⟩ as a challenge to the Cloud Server; Wait for the Proof, 𝑃 𝑅𝑖,𝑘 from any Server; 𝑃 𝑅𝑖,𝑘 ← 𝑅𝑒𝑐𝑒𝑖𝑣𝑒(); if 𝑃 𝑅𝑖,𝑘 equals to Stored Metadata then 𝑅𝑒𝑝𝑜𝑟𝑡[𝑘] ← 𝑇 𝑅𝑈 𝐸; else 𝑅𝑒𝑝𝑜𝑟𝑡[𝑘] ← 𝐹 𝐴𝐿𝑆𝐸; Send Signal, ⟨𝑃 𝑎𝑐𝑘𝑒𝑡𝑖,𝑘 , 𝐹 𝐴𝐿𝑆𝐸⟩ to the Coordinator; end end
to subsequences. Our second scheme describes about TDK based techniques in more details. B. Protocol 2: Task Distribution Key Based Distribution Scheme In our second protocol, Coordinator and each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 will know the Sobol random key, 𝜎𝑟 , for generating the Sobol random sequence. In each new verification, Coordinator decides the parameters to generate the Sobol Random Key, 𝜎𝑟 and publicly send to all 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 . In addition, Coordinator generates 𝑛 number of random TDKs, 𝜎𝑆𝑖 , and distributes among 𝑛 SUBTPAs by using String Reconciliation Protocol [1] with some modifications. Here, Coordinator knows 𝜎𝑟 , 𝜎𝑆1 , 𝜎𝑆2 , . . . , 𝜎𝑆𝑛 and each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 knows only 𝜎𝑟 , thus for reconciling, each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 performs the maximum computation (Characteristic Polynomial Interpolation[2]) need for string reconciliation. The Coordinator maintains individual Modified de Bruijn graph for each 𝜎𝑆𝑖 , and 𝜎𝑟 , and each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 maintains only Modified de Bruijn graph for 𝜎𝑟 . After reconciling each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 will know their TDK, 𝜎𝑆𝑖 . Therefore, sending the 𝜎𝑆𝑖 to 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 , takes minimum communication. Now, each SUBTPA will generate Sobol Random Sequence and interpret their subsequence by using their own TDK. We have given Sobol Random key, TDK generation and distribution in Algorithm 3. Algorithm 4, describes about subtask interpretation, distributed challenge and verification for protocol 2. In this protocol, we use two types of TDKs, one is NonOverlapping TDK and another is Overlapping TDK. Overlapping TDK will apply when we want to verify critical data. We give the steps for generating Non-Overlapping TDK as follows:
3
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India Algorithm 3: Key Generation and Distribution 1
2
3
4 5 6 7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Coordinator randomly chooses one Primitive Polynomial, 𝒫 of degree d and initialization number 𝑚𝑖 , 𝑖 ∈ {1, 2, . . . , 𝑑} ; Coordinator decides Sobol Random Key, 𝜎𝑟 = ⟨𝒫, 𝑚𝑖 , 𝑆𝐾𝐼𝑃, 𝐿𝐸𝐴𝑃, 𝐶𝑂𝑁 𝑆𝑇 𝐴𝑁 𝑇, 𝑆𝑒𝑞𝐿𝑒𝑛⟩ Coordinator Determines the Number of SUBTPAs, 𝑛, and threshold value, 𝑚 ; Coordinator send 𝜎𝑟 to all 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 ; Determine number of 1′ 𝑠, 𝑡, each 𝑇 𝐷𝐾 will contain; 𝑇 𝐷𝐾𝐿𝑒𝑛 ← 𝑛 × 𝑡; Generates Random Permutation index from 1, 2, . . . , 𝑇 𝐷𝐾𝐿𝑒𝑛; for 𝑖 ← 1 to 𝑛 do for 𝑗 ← 1 to 𝑇 𝐷𝐾𝐿𝑒𝑛 do 𝑇 𝐷𝐾[𝑖][𝑗] ← 0; end end 𝑖 ← 1; 𝑘 ← 1; while 𝑖 < 𝑛 do for 𝑗 ← 1 to 𝑡 do 𝑙 ← 𝑅𝑎𝑛𝑑𝑃 𝑒𝑟𝑚[𝑘 + +]; 𝑇 𝐷𝐾[𝑖][𝑙] ← 1; end 𝑖 ← 𝑖 + 1; end if 𝑔𝑐𝑑(𝑇 𝐷𝐾𝐿𝑒𝑛, 𝑆𝑒𝑞𝐿𝑒𝑛) ← 1 then TDKLength is acceptable; else TDK Length adjust to the next nearest Primes; end Generated 𝑇 𝐷𝐾𝑖 for 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 are represented as, 𝜎𝑆1 , 𝜎𝑆2 , 𝜎𝑆3 , . . . , 𝜎𝑆𝑛 respectively, distributes among SUBTPAs by using String Reconciliation Protocol;
Non-Overlapping Task Distribution Key Generation 1. First Coordinator will decide the total number of SUBTPAs, 𝑛 and the number of 1′ 𝑠, 𝑡, which each TDK will contain. 2. After multiplying total number of SUBTPA with 𝑡, Coordinator will generate the TDK length, 𝑇 𝐷𝐾𝐿𝑒𝑛. 3. For placing the 1′ 𝑠 inside each TDK, generates random permutation index from 1, 2, . . . , 𝑇 𝐷𝐾𝐿𝑒𝑛. 4. Now take first 𝑡 index from the permutation index set and place 1′ 𝑠 for these indices into the first TDK, after that take next 𝑡 indices for second TDK and places 1′ 𝑠 inside the second TDK corresponding to the 𝑡 indices. This process will continue up to all TDK will be generated. 5. After generating the TDK, if the length is co-prime to the sequence length, then Coordinator will distributes among SUBTPAs, otherwise, Coordinator will adjust the TDK length to the next nearest prime or next Co-prime to the sequence length, 𝑆𝑒𝑞𝐿𝑒𝑛, to maintain the uniformity
978-1-4577-2078-9/12/$26.00©2011 IEEE
Algorithm 4: DistributedChalandProofVerification2( ) 1
Each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 generates Sequence ℒ ← 𝑓𝜎𝑟 (𝑆𝑒𝑞𝐿𝑒𝑛)
2
3
Multiply CONSTANT powers of 2 with ℒ, to make each element as integer block number.; Interpret subsequence by using 𝜎𝑆𝑖 as 𝑟𝑖,𝑗 where ∪ 𝑟𝑖,𝑗 ℒ← 𝑖∈[1,...,𝑛] 𝑗∈[1,...,𝑝]
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22
𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 Calculates 10% of 𝑟𝑖,𝑗 ; for each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 do 𝑙 ← 𝑙𝑒𝑛𝑔𝑡ℎ(𝑟𝑖,𝑗 ); 𝐶𝑜𝑢𝑛𝑡𝑒𝑟𝑖 ← ⌊(10/100) ∗ 𝑙⌋; end for 𝑘 ← 1 to 10 do for 𝑆 ← 1 to 𝐶𝑜𝑢𝑛𝑡𝑒𝑟𝑖 and 𝑡 ⩽ 𝑙 do 𝐶ℎ𝑎𝑙𝑖,𝑘 [𝑠] ← 𝑟𝑖,𝑗 [𝑡]; end Send ⟨𝐶ℎ𝑎𝑙𝑖,𝑘 ⟩ to 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 ; Wait for the proof, 𝑃 𝑅𝑖,𝑘 from any Cloud Server; 𝑃 𝑅𝑖,𝑘 ← 𝑅𝑒𝑐𝑒𝑖𝑣𝑒(); if 𝑃 𝑅𝑖,𝑘 equals to Stored Metadata then 𝑅𝑒𝑝𝑜𝑟𝑡[𝑘] ← 𝑇 𝑅𝑈 𝐸; else 𝑅𝑒𝑝𝑜𝑟𝑡[𝑘] ← 𝐹 𝐴𝐿𝑆𝐸; Send Signal, ⟨𝑃 𝑎𝑐𝑘𝑒𝑡𝑖,𝑘 , 𝐹 𝐴𝐿𝑆𝐸⟩ to the Coordinator; end end
among subtasks. 6. The generated TDK for 𝑛 𝜎𝑆1 , 𝜎𝑆2 , 𝜎𝑆3 , . . . , 𝜎𝑆𝑛 respectively.
SUBTPA
are
Overlapping Task Distribution Key Generation 1. After generating the Non-Overlapping TDK, Coordinator decides the % of overlap needs. 2. Generates Random Permutation index of size same as % of overlap within 1, 2, . . . , 𝑇 𝐷𝐾𝐿𝑒𝑛 and place 1′ 𝑠 according to the permutation index inside TDKs where 0’s present previously. Analysis of Protocol 2 In TDK generation phase, we take the mask length as co-prime to sequence length or prime length, because after applying TDK on ℒ, subsequence, 𝑟𝑖,𝑗 becomes nonuniform, and to make it uniform, we use these adjustment. In algorithm 3, Coordinator generates Sobol Random Key and send to the SUBTPAs. In addition, send different TDK, 𝜎𝑆𝑖 , for each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 . In Algorithm 4, 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 generates the Sobol Random sequence by using key, 𝜎𝑟 and stored in ℒ. Then, each 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 interpret their task by using corresponding TDK, 𝜎𝑆𝑖 , and we denoted subtask for 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 as 𝑟𝑖,𝑗 and
4
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India defined as ℒ=
∪
is uniformly distributed over [0, 1) and cover the whole region. To make integers, we multiply constant powers of two with the generated sequence. Here, we consider one concrete example, taking 32 numbers from the Sobol sequence and Pseudo Random sequence and takes consecutive four numbers successively, calculates the arithmetic mean and have shown in figure 3.
𝑟𝑖,𝑗
𝑖∈[1,...,𝑛] 𝑗∈[1,...,𝑝]
where 𝑝=
𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝐿𝑒𝑛𝑔𝑡ℎ +𝜉 𝑇 𝐷𝐾𝐿𝑒𝑛𝑔𝑡ℎ
𝜉 = 𝑁 𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 1′ 𝑠 𝑖𝑛 𝑓 𝑖𝑟𝑠𝑡 (𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝐿𝑒𝑛𝑔𝑡ℎ % 𝑇 𝐷𝐾𝐿𝑒𝑛𝑔𝑡ℎ) 𝑙𝑒𝑛𝑔𝑡ℎ 𝑖𝑛 𝑇 𝐷𝐾
Then, 𝑆𝑈 𝐵𝑇 𝑃 𝐴𝑖 will calculate 10% of 𝑟𝑖,𝑗 and creates challenge, 𝐶ℎ𝑎𝑙𝑖,𝑘 and send to the server and waits for the proof, 𝑃 𝑅𝑖,𝑘 . After receiving the proof SUBTPA will verify with the stored matadata, and if the proof is correct then store TRUE in its table and if not match then store FALSE and send a signal to the Coordinator for corrupt file blocks. The Coordinator will receive signals from any subset of 𝑚 out of 𝑛 SUBTPAs and ensures the fault location or stop the Audit operation. In the final step, Main TPA will give the Audit result to the Client. Here, we generalize the integrity verification protocol in a distributed manner. Therefore, we can use our protocols on existing RSA based [11] [13] or ECC [10] based protocol to make distributed RSA or ECC protocols. In the next section, we discuss about the performance of our protocols. V. I MPLEMENTATION AND E XPERIMENTAL R ESULTS
It is very natural that audit activities would increase the communication and computational overhead of audit services. To enhance the performance, we used the String Reconciliation Protocol to distribute the TDK, that reduces the communication overhead between Main TPA and SUBTPAs. For each new verification, Coordinator can change the TDK for any SUBTPA and send only the difference part of the multiset element to the SUBTPA. In addition, we used probabilistic verification scheme based on Sobol Sequence that provides not only uniformity for whole sequence but also for each subsequence, so each SUBTPA will independently verify over the whole file blocks. Thus, there is a high probability to detect fault location very efficiently and quickly. Therefore, Sobol sequence provides strong integrity proof for the remotely stored data. Table I shows comparison between two protocols. TABLE I P ERFORMANCE C OMPARISON BETWEEN TWO P ROPOSED P ROTOCOLS
180 Random Sequence Sobol Random Sequence
160
140
120
100
80
60 0
5
10
15
20
25
30
Fig. 3. Comparison between Successive Mean of Pseudo Random and Sobol Random Sequences
We can see from figure 3, that the arithmetic mean of the consecutive Sobol sequence span from 95-145 but in case of Pseudo Random sequence it span from 65-165. It imply that, in case of Sobol Sequence the uniformity property will hold for any consecutive numbers. Therefore, each subtask should be balance for each SUBTPA. Let, the number of file blocks is 10000, and logically partition into four consecutive segment, and if, Coordinator generates Sobol sequence with length 128 and distributes among 4 SUBTPA, then each has 32 block number. Therefore, each SUBTPA must ensures that it will verify 8 blocks from each segment out of four, but in case of Pseudo Random sequence we cannot ensure this fact. We also observed that, if the TDK length is powers of 2 then the subsequences are forming cluster for each SUBTPA and as a result nonuniform subtask, to handle this problem we adjust the TDK length to the nearest prime or Co-prime to the sequence length. Figure 4, illustrate our observation for only one subtask, if any SUBTPA will takes 16-bit TDK for any Sobol sequence, then the sutask becomes nonuniform but if it extends to 17 bits and apply on the same sequence then the generated subsequence will maintain uniformity property. 0.2
Public Verifiability Coordinator Controlled Probabilistic Privacy Preserving Task Distribution Fault Detection Coordinator Computation Communication Complexity
Protocol 1 Yes Yes Yes Yes uniform fast more more
Protocol 2 Yes Yes Yes Yes uniform very first more less
16−bit TDK 17−bit TDK
0.15
0.1
0.05
0
−0.05
−0.1 0
In our protocols, we use Sobol Random sequence generator to generate the file block numbers, because sequence
978-1-4577-2078-9/12/$26.00©2011 IEEE
128
Fig. 4.
256 Sobol Sequence
384
512
16-bit TDK Extends to 17-bit
5
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India Now, we present and discuss about the implementation and experimental results of our protocols. To verify the performance of our protocols, we use desktop with Core2 Duo 2.2 GHz processor and 2GB RAM and 500 GB SATA Hard Drive. All programs are written in C language on Windows Server 2008 OS. We use MATLAB R2009a software for generating Sobol Random Sequences[4]. We test, for 3,00,000 file blocks, with 1% corrupt blocks and compares the detection probability for Sobol Random Sequence and Pseudo Random Sequence. We have shown our experimental results in Table II. TABLE II
D ETECTION P ROBABILITY FOR 1% CORRUPTION OUT OF 300000 BLOCKS Number of samples as Percentage of total samples 𝑆𝑜𝑏𝑜𝑙𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑃 𝑠𝑒𝑢𝑑𝑜𝑟𝑎𝑛𝑑𝑜𝑚𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 10% 20% 14% 27% 16% 30% 21% 37% 23% 41% 26% 49% 30% 54%
Detection Probability 0.6 0.7 0.8 0.85 0.9 0.95 0.9999
Theorem 1: Subtask for each SUBTPA must be uniform after applying TDK, if TDK length is prime or TDK length is relatively prime to the Sequence length. Proof: We know that Sobol Sequences are Quasi Random Sequences of the uniformity property. Now, if we generate random block numbers by using Sobol Random generator for a given length, then it must be uniform. In addition, if we simply partition the sequence into subsequence and distributes among various SUBTPAs, then each subsequence must be maintain the uniformity. But, when we use TDK then subtask may or may not be uniform. We saw that when the TDK length is powers of 2, then generated subtask does not maintain the uniformity property. Because, Sobol sequence maintain some pattern, if we take 4 consecutive number then we can see that these numbers are from four region over the Sequence, if we divide the full sequence into four region, and for 8, 16, 32,. . . it also hold. When we placed the TDK over the generated Sequence then Subtask contain those numbers whose corresponding TDK bit is 1 and successively applying this TDK to generate the subsequence. Thus, if the TDK length is power of two then for each successive TDK shifting, the chosen block numbers must be very close to each other and form cluster. If, we take TDK length as prime then in each successive shifting the chosen block numbers are spreading over the segment. Therefore, maintains the uniformity for each subtask or subsequence. Now, if the TDK length is Co-prime means 𝑔𝑐𝑑(𝑇 𝐷𝐾𝐿𝑒𝑛𝑔𝑡ℎ, 𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝐿𝑒𝑛𝑔𝑡ℎ) = 1 Then there is no factor equals to the power of 2, that means for each successive TDK shifting block numbers are spreading over the whole sequence and maintain the uniformity property for each subtask. Therefore, generated subtask must be
978-1-4577-2078-9/12/$26.00©2011 IEEE
uniform if the TDK length relatively prime or prime to the sequence length. VI. C ONCLUSION In this paper, we addressed the efficient Distributed Verification protocol based on the Sobol Random Sequence. We have shown that our protocols uniformly distribute the task among SUBTPAs. Most importantly, our protocols can handle failures of SUBTPAs due to its uniform nature and also gives better performance in case of unreliable communication link. Here, we mainly focussed on the uniform task distribution among SUBTPAs to detect the erroneous blocks as soon as possible. We used String Reconciliation Protocol to minimize the communication bandwidth between Coordinator and SUBTPA side. In addition, we reduce the workload at the Server side and also reduce the chance of network congestion at the Server side as well as Coordinator side by distributing the task. Thus, our Distributed Verification Protocol increases the efficiency and robustness of data integrity in Cloud Computing. R EFERENCES [1] Sachin Agarwal, Vikas Chauhan, Ari Trachtenberg, Bandwidth Efficient String Reconciliation Using Puzzles, IEEE Trans. on Parallel and Distributed Systems, Volume 17, Issue 11, pp. 1217-1225, Novemver 2006. [2] Y. Minsky, A. Trachtenberg, and R. Zippel, Set Reconciliation With Nearly Optimal Communication Complexity, IEEE Trans. on Information Theory, vol. 49, Issue 9, pp. 2213-2218, September 2003. [3] I. M. Sobol and Y. L. Levitan, A Pseudo-Random Number Generator for Personal Computers, Computers and Mathematics with Applications, Volume 37, pp. 33-40, 1999. [4] P. Bratley and B. L. Fox, Algorithm 659: Implementing Sobol’s Quasirandom Sequence Generator, ACM Trans. on Mathematical Software, Volume 14, Issue 1, pp. 88-100, March 1988. [5] T. Jaeger, and J. Schiffman, Outlook: Cloudy with a chance of Security Challenges and improvements, IEEE Security and Privacy, January/February 2010. [6] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson,and D. Song, Remote Data Checking using Provable Data Possession, ACM Trans. on Information and System Security, Vol. 14, Issue No. 1, Article 12, pp. 12.1-12.34, May 2011, . [7] C. Wang, Q. Wang, K. Ren, N. Cao,W. Lou, Towards Secure and Dependable Storage Services in Cloud Computing, IEEE Trans. on Services Computing, 2011. [8] Q. Wang, C. Wang, K. Ren, W. Lou, and J. Li, Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing, IEEE Trans. on Parallel and Distributed Systems, Volume 22, Issue 5, May 2011. [9] Y. Zhu, H. Hu, G.J. Ahn, and S.S. Yau, Efficient audit service outsourcing for data integrity in clouds, Elsevier, The Journal of Systems and Software, No. of pages 13, (2011) [10] P. Syam Kumar, R. Subramanian, An Efficient and Secure Protocol for Ensuring Data Storage Security in Cloud Computing, IJCSI, Volume 8, Issue 6, No 1, pp. 261-274, Nov 2011. [11] P. Syam Kumar, R. Subramanian, RSA-based Dynamic Public Audit Service for Integrity Verification of Data Storage in Cloud Computing Using Sobol Sequence, Special Issues Security, Privacy and Trust in IJCC, Volume-1, No.-2/3, pp. 167-200, 2012. [12] N. Oualha, M. Onen, Y. Roudier, A Security Protocol for Self-Organizing Data Storage, Tech. Rep. EURECOM 2399, Institute Eurecom, 2008, France. [13] Z. Hao, S. Zhong, and N. Yu, A Privacy-Preserving Remote Data Integrity Checking Protocol with Data Dynamics and Public Verifiability, IEEE Trans. on Knowledge and Data Engg., Volume 23, No. 9, September 2011. [14] E. Pinheiro, W.D. Weber, and L.A. Barroso, Failure Trends in a Large Disk Population, In Proc. FAST, February 2007. [15] A. Shamir, How to Share a Secret, Communications of the ACM, Volume 22, Issue 11, November 1979.
6