Secure Multi-party Computation Using Virtual Parties for ... - CiteSeerX

12 downloads 0 Views 187KB Size Report
Abstract. In this paper, we propose a new Virtual Party Protocol (VPP) protocol for Secure Multi-Party Computation (SMC). There are many computations and ...
Secure Multi-party Computation Using Virtual Parties for Computation on Encrypted Data Rohit Pathak1, Satyadhar Joshi2, and Durgesh Mishra1 1 Acropolis Inst. Of Technology & Research, Shri Vaishnav Inst. Of Technology & Science, Indore, M.P., India {rohitpathak, satyadhar_joshi}@ieee.org, [email protected] 2

Abstract. In this paper, we propose a new Virtual Party Protocol (VPP) protocol for Secure Multi-Party Computation (SMC). There are many computations and surveys which involve confidential data from many parties or organizations. As the concerned data is property of the organization or the party, preservation and security of this data is of prime importance for such type of computations. Although the computation requires data from all the parties, but none of the associated parties would want to reveal their data to the other parties. We have proposed a new protocol to perform computation on encrypted data. The data is encrypted in a manner that it does not affect the result of the computation. It uses modifier tokens which are distributed among virtual parties, and finally used in the computation. The computation function uses the acquired data and modifier tokens to compute right result from the encrypted data. Thus without revealing the data, right result can be computed and privacy of the parties is maintained. We have given a probabilistic security analysis and have also shown how we can achieve zero hacking security with proper configuration. Keywords: Secure Multi-party Computation (SMC), Information Security, Privacy, Protocol.

1 Introduction Yao has described millionaires’ problem and gave the solution by using Deterministic Computations and introduced a view of Secure Computation [1]. We see about collaborative benchmark problem and a proposed solution in which the private shares are changed but in a manner that the sum remained the same [2]. Mikhail et al. has provided privacy-preserving solutions to collaborative forecasting and benchmarking that can be used to increase the reliability of local forecasts and data correlations, and to conduct the evaluation of local performance compared to global trends [3]. Wenliang et al. has proposed development of practical solutions to SMC problems, a new paradigm, in which we use an acceptable security model that allows partial information disclosure [4]. Linda et al. presents a unified approach

to multi level database security based on two ideas: a trusted filter and an inference engine [5]. Wenliang et al. proposes the privacy preserving cooperative linear system of equations problem and privacy-preserving cooperative linear least-square problem [6]. Ran et al. has shown how uncorrupted parties may deviate from the case where even protocol by keeping record of all past configurations [7]. Mikhal et al. have given a protocol for sequence comparisons in which neither party reveals anything about their private sequence to the other party [8]. A Secure Supply-Chain Collaboration (SSCC) protocols that enable supply-chain partners to cooperatively achieve desired system-wide goals without revealing the private information of any of the parties, even though the jointly-computed decisions require the information of all the parties is proposed by Atallah et al. [9]. The problem of defining and achieving security in a context where the database is not fully trusted, i.e., when the users must be protected against a potentially malicious database is discussed by Ueli et al. [10]. We have seen building a decision-tree classifier from training data in which the values of individual records have been perturbed, and reconstruction procedure to accurately estimate the distribution of original data values has been described [11]. We have already seen the Anonypro Protocol, which had a good concept to make the incoming data of anonymous identity [12]. Anonypro Protocol assumed the connection between the party and anonymizer to be secured. If we have to perform a calculation which includes data from many organizations, than the safety of the data of the organization is the prime concern. Suppose a statistical calculation is to be performed among several organizations. This calculation includes information related to various person’s related to the organization, may it be employees working for the organization or the customers of the organization such as customers of a bank. In this case, information of every person is to be kept secure so as to keep privacy of every individual. We have proposed a new protocol called the Virtual Party Protocol (VPP) which can be used safely to ensure the privacy of individual and preserving the data of the organization as a whole by not revealing the right data. In this method we will create some fake data and some virtual parties. Since the calculation is not dependent upon the number of parties, we can create any desired number of virtual parties. Now we will encrypt the data and create modifier tokens correspondingly. This modified data is mixed with fake data. These modifier tokens are related to the modification done in the data and will be used in the final computation to obtain the correct result. Now this modified data and the modifier tokens are distributed among the virtual parties. These parties will send their data to trusted anonymizers. These trusted anonymizers will send the data to un-trusted anonymizers. The un-trusted anonymizers will send this data to Third Party for computation. Third Party will use the data and the modifier tokens to compute the result. The modifier tokens will aid to bring the result obtained by the encrypted data values. The modifier tokens in any manner will not reveal the identity of the party or such. The modifier is a short collection of information which is used in the final computation to ensure the right result. The method of encryption, modifier tokens, encrypted data and the method of computation all are interdependent.

2 Proposed Protocol – VPP (Virtual Party Protocol)

2.1 Informal Description We have to compute the function f(a1, a2, a3…, an) where the function is dependent on the number of data items sent by the organization. There are n parties P1, P2, P3…, Pn. Each party Pi has data Xi1, Xi2, Xi3…, Xim. Each party Pi has some trusted anonymizers Ai1, Ai2, Ai3…, Aix. There are z number of un-trusted anonymizers A1, A2, A3…, Az. Each party Pi will create some fake trivial data entries Fi1, Fi2, Fi3…, Fiq, where q is the total number of fake entries. The total number of fake entries q may be different for every party Pi but for the sake of simplicity in explanation it is kept same for every party. The fake data is generated in a manner that it doesn’t effects the overall result. We will group this data with original data entries Xi1, Xi2, Xi3…, Xim. Thus the new group of data having m+q total number of data items, i.e. Di1, Di2, Di3…, Di(m+q). The value of each data Di1, Di2, Di3…, Di(m+q) is encrypted to obtain the encrypted data Ei1, Ei2, Ei3…, Ei(m+q). P1

P2

Pn

P11

P12

P1k

P21

P2k

P31

P3k

A11

A12

A1x

A21

A2x

A31

A3x

A1

A2

A3

Az

TTP Fig.1. Data flow in VPP with five layer structure consisting of party layer, virtual party layer, trusted anonymizer layer, untrusted anonymizer layer and computation layer from starting to end respectively.

Every party Pi will create k virtual parties Pi1, Pi2, Pi3…, Pik. Encrypted data Ei1, Ei2, Ei3…, Ei(m+q) is distributed randomly among the virtual parties Pi1, Pi2, Pi3…, Pik.

Modifier tokens Ti1, Ti2, Ti3…, Tik are generated for every party Pi. These modifier tokens are randomly distributed among the virtual parties Pi1, Pi2, Pi3…, Pik such that every virtual party gets one modifier token. Encryption of data and generation of modifier tokens is explained in later sections of the paper. Now the virtual parties Pi1, Pi2, Pi3…, Pik distributes their data and modifier tokens randomly among the trusted anonymizers Ai1, Ai2, Ai3…, Aix. Trusted anonymizers distribute their data randomly among the un-trusted anonymizers A1, A2, A3…, Az. Anonymizers can take data from multiple parties. The data of the un-trusted anonymizers is sent to third party. The function h() uses the encrypted data and the modifier tokens to compute the right result. Function h(), will vary for different types of computation and will depend highly on f(). Third party will compute the value of function h(E11, E12, E13…, E1j…Ei1, Ei2, Ei3…, Eij, T11, T12, T13…, T1j…,Ti1, Ti2, Ti3…, Tij) which is the desired result, same as the result computed by the function f(X11, X12…, X1m, X21, X22…, X2m, X31, X32…, X3m…, Xn1, Xn2…, Xnm,), and this result is declared publicly. The whole scenario can be seen in Fig. 1. 2.2 Formal Description

VPP Algorithm Identifier List: Pi – parties where i ranges from 1 to n Xij – Data of party Pi where j ranges from 1 to m Fij – Fake data of party Pi where j ranges from 1 to q Dij – total data including the fake and the original data Pij – Virtual Party of party Pi where j ranges from 1 to k Eij – Encrypted data associated with party Pi where j ranges from 1 to m+q Aij – trusted anonymizer of party Pi where j ranges from 1 to x Ay – untrusted anonymizer, where y ranges from 1 to z TP – third party Start VPP  Create k virtual parties Pij for every party Pi  Create fake data Fij for every party Pi  Group fake data Fij with original data Xij to get Dij  Encrypt data Dij to get Eij  Create modifier tokens Tij for every party Pij  Distribute the encrypted data Eij among the virtual parties Pij  Send the data and modifier tokens from party Pij to trusted anonymizer Aij  Send the data and modifier tokens from trusted anonymizer Aij to untrusted anonymizer Ay

 Send the data from un-trusted anonymizer Ay to TP  Calculate the result at TP using the encrypted data and the modifier tokens  The result is announced by TP End of Algorithm

3 Encryption Suppose each party is sending multiple data such that party Pi has data Xi1, Xi2, Xi3…, Xim, where m is the total number of data items. The number of data items, m may be different for every party Pi but for the sake of simplicity in explanation it is kept the same for every party. Suppose we have to perform a calculation such as Summation: f( X11, X12, X13…, X1m, X21, X22, X23…, X2m, X31, X32, X33…, X3m…, Xn1, Xn2, Xn3…, Xnm ) = ∑ g(Xij) We know that: f(Xi1, Xi2, Xi3…, Xim) = g(Xi1) + g(Xi2) + g(Xi3) +…, g(Xim) Now we can create fake data Fi1, Fi2, Fi3…, Fiq such that: f(Fi1, Fi2, Fi3…, Fiq) = g(Fi1) + g(Fi2) + g(Fi3) +…, g(Fiq) = 0 Multiplication: f(X11, X12, X13…,X1m, X21, X22, X23…,X2m, X31, X32, X33…, X3m…, Xn1, Xn2, Xn3…, Xnm) = g(X11) × g(X12) × g(X13) ×…, g(X1m) × …, g(Xn1) × g(Xn2) × g(Xn3) × …, g(Xnm) We know that: f(Xi1, Xi2, Xi3…, Xim) = g(Xi1) × g(Xi2) × g(Xi3) × …, g(Xim) Now we can create fake data Fi1, Fi2, Fi3…, Fiq such that: f(Fi1, Fi2, Fi3…, Fiq) = g(Fi1) × g(Fi2) × g(Fi3) × …, g(Fiq) = 1 One way to find this is to take random data value for all but one, and find the one. Like taking Fi2, Fi3, Fi4…, Fiq all as random data values and finding Fi1 such that: Summation: f(Fi1) = 0 – (g(Fi2) + g(Fi3) + f(Fi4) + …, g(Fiq)) Multiplication: f(Fi1) = 1 / (g(Fi2) × g(Fi3) × f(Fi4) × …, g(Fiq)) Now this fake data is to be grouped with the original data. Now the total number of data entries is m+q. {Di1, Di2, Di3…, Di(m+q) } = {Xi1, Xi2, Xi3…, Xim } U { Fi1, Fi2, Fi3…, Fiq } Now let us assume that party Pi has data Di1, Di2, Di3…, Di(m+q), which consist of the fake and original data. Now encryption of this data is to be done in a manner not to affect the overall result. We have to find the encrypted data Ei1, Ei2, Ei3…, Ei(m+q) such that:

f(Ei1, Ei2, Ei3…, Ei(m+q)) = f(Di1, Di2, Di3…, Di(m+q)) One way of doing this to take random data values for Ei2, Ei3, Ei4…, Ei(m+q) and finding Ei1 that satisfies the above equation. Mathematically, finding Ei1 such that it satisfies the following equation: f(Ei1, ri2, ri3…, ri(m+q)) = f(Di1, Di2, Di3…, Di(m+q)) where ri2, ri3, ri4…, ri(m+q) are randomly generated values and may directly be assigned to Ei2, Ei3, Ei4…, Ei(m+q) The method of encryption is highly dependent upon the type of computation and may vary for different types of computation.

4 Modifier Tokens There are certain kinds of computations which will be affected by encryption and increase in number of data items. In these types of computations we have to modify the computation method accordingly to process encrypted data and to use some additional information to compensate the increase in number of data items. This additional information is sent in the form of modifier tokens. Modifier tokens contain certain information or fixes which will ensure correct computation on encrypted data. The method of generating modifier tokens may vary for different types of computations. For average: Suppose we have to take a number from n parties and calculate average of all the values and announce the average publicly. P1, P2, P3…, Pn are parties with data X1, X2, X3…, Xn. Pi1, Pi2, Pi3…, Pik are the virtual parties created by party Pi. The number of virtual parties, k may be different for every party Pi but for the sake of simplicity in explanation it is kept same for every party. Now for average we have f(X1, X3, X3…, Xn) = ( ∑ Xi ) / n where n is total number of parties. If we create fake parties, the value of n is changed and hence the result may be changed, but to obtain the right result we have to modify the average computation method according to our modifier token and the encryption method. The modified method of average computation would be: h(E11, E12, E13…, E1k…, En1, En2, En3…, Enk, T, T11, T12, T13…, T1k…, Tn1, Tn2, Tn3…, Tnk) = ( ∑ Ei1 + Ei2 + Ei3 + …, Eik ) / ( T – ( ∑ Ti1 + Ti2 + Ti3 + …, Tik)) where Eij is the encrypted data of party Pij , Tij is the modifier token of party Pij and T would be the total number of data entries of computation. The data of the virtual parties is Xi1, Xi2, Xi3…, Xik, where k is the total number of virtual parties. Then so as to keep the average same, this data has to satisfy the equation Xi1 + Xi2 + Xi3 + …, Xim = Xi Taking the data of the virtual parties keeping the above equation in mind we can encrypt the data and keep the sum of the data values the same as the original sum. The

modifier tokens for each virtual party is Ti1, Ti2, Ti3,…, Tik, where k is total number of virtual parties, then they should satisfy the equation: Ti1 + Ti2 + Ti3 + …, Tik = k - 1 If each party has created k virtual parties then total parties would be T = k × n We know that Ti1 + Ti2 + Ti3 + …, Tik = k-1 so, ∑ ( Ti1 + Ti2 + Ti3 + …, Tik ) = n × (k-1) And hence T – (∑ (Ti1 + Ti2 + Ti3 + …, Tik )) = k × n - n × (k-1) = n And hence h() = ( ∑ Ei1 + Ei2 + Ei3 + …, Eik ) / ( T – ( ∑ Ti1 + Ti2 + Ti3 + …, Tik)) would yield the right result.

5 Security Analysis If the TTP is malicious then it can reveal the identity of the source of data. A set of anonymizers from the anonymizer layer will make the source of data anonymous and will preserve the privacy of individual. The more the number of anonymizers in the anonymizer layer the less will be the possibility of hacking the privacy of the data. Each virtual party reaches TTP on their own. Each party will reach TTP as an individual party and TTP will not know the actual party which created the virtual party. The probability of hacking data of virtual party Pir is

1

P VPir  

(1)

n

k

i

i 1

When party Pi has ki number of virtual parties, the probability of hacking data of any virtual party of party Pr is

P VPr  

kr

(2)

n

k

i

i 1

Even if the data of virtual party is hacked it will not breach the security as this data is encrypted. Probability of hacking the data of any party r is calculated as

P  Pr  

kr



n

kr  1 n

k k i

i 1

i 1

i

1



1

(3)

n

k i 1

i

 kr

Fig. 2. Graph between number of Virtual Parties (x axis) vs Probability of hacking (y axis).

Fig. 3. Graph between number of Parties (x axis) vs Probability of hacking(y axis).

The graph between number of virtual parties k vs. the probability of hacking P(Pr) for n=4is shown in Fig. 2. which clearly depicts that probability of hacking is nearly zero when the number virtual parties is three or more. Also the graph between number of parties and probability of hacking for k=8 is shown in Fig. 3. As the number of virtual parties is eight the probability of hacking is in the order of 10-5 or we can say nearly zero. Suppose that the number of virtual parties is ka then

P  Pa  

ka



n

ka  1 n

k k i

i 1

i 1

i

1



1

(4)

n

k i 1

i

 ka

For kb number of virtual parties we have

kb

P  Pb  

kb  1



n

k k i

i 1

i

1



n

k

1

i 1

(5)

n i

 kb

i 1

if ka > kb then P(Pa) < P(Pb) by Eq. (4) and Eq. (5). We can see that as the number of virtual parties increases the probability of hacking the data will decrease by harmonic mean. Special Case 1- When the number of virtual parties is increased from ka to ka+1, the effect in probability of hacking is evaluated as

ka

P  Pa  



n

ka  1 n

k k i

i 1

P  Pa 1  

i

ka  1

k

i



1

i 1

(6)

n

k

1

i 1

n

1



i

 ka

i 1

ka



n

k i 1

i

1

(7)

n

k k i

a

i 1

from Eq. (6) and Eq. (7) we can evaluate the ratio as

P  Pa 1  k 1  na P  Pa   ki  1

(8)

i 1

There is a linear increase in the security of data when the number of virtual parties is increased, providing no significant change in security ratio. Special Case 2- When the number of virtual parties are increased from ka to kb where kb > ka then the security ratio is evaluated as

P  Pb   ka  1   ka  2   kb (9)  n P  Pa     n   n    ki  1    ki  2     ki  kb  ka   i 1   i 1   i 1  which shows that that changes in probability is represented as harmonic mean and it is clear that if the number of virtual parties is increased in multiple then there is a significance change in security ratio. It depicts that we should increase the number of virtual parties in multiples to increase the security. Even if data of all virtual parties of a particular party is hacked it will not breach the security. The data is encrypted and can only be used for computation and exact values can never be obtained from it.

Conclusion In this paper we have proposed an SMC protocol named Virtual Party Protocol (VPP). We have corroborated that we can create fake data and distribute it among the

generated virtual parties and send this data along with modifier tokens to carry out computations on encrypted data using an improvised computation method. Anonymizer is used to hide the identity of the parties. An example of computing average salary among several organizations without revealing the actual salary data has been substantiated. Encryption methods have been built for certain common functions and the process of generating modifier tokens for a collective method has been shown. SMC’s are used for many big surveys and large scale statistical calculations. With the use of VPP most of the statistical calculations and other computations can be performed without revealing the data to other parties and even to the third party. A probabilistic security analysis was given and it was shown how we can achieve zero hacking security with proper configuration.

References 1.

Yao, Andrew C.: Protocols for secure computations. Proc. of 23rd Annual Symposium Foundations of Computer Science. 160-164 2. Mikhail Atallah, Marina Bykova, Jiangtao Li, Keith Frikken, Mercan Topkara: Private collaborative forecasting and benchmarking. Proc. of the 2004 ACM workshop on Privacy in the Electronic Society (2004) 3. Mikhail Atallah, Marina Bykova, Jiangtao Li, Keith Frikken, Mercan Topkara: Private collaborative forecasting and benchmarking. Proc. of the 2004 ACM workshop on Privacy in the electronic society (2004) 103 – 114 4. Wenliang Du, Zhijun Zhan: A practical approach to solve secure multi-party computation problems Proc. of the New Security Paradigms Workshop (2002) 5. Linda M. Null, Johnny Wong: A unified approach for multilevel database security based on inference engines. Transaction of ACM New York, NY, USA, Vol. 21 , Issue 1. (Feb 1989) 6. Wenliang Du; Atallah, M.J.: Privacy-preserving cooperative scientific computations. Proc. 14th IEEE Computer Security Foundations Workshop (Jun 11-13 2001) 273 – 282 7. Ran Canetti, Uri Feige, Oded Goldreich, Moni Naor: Adaptively secure multi-party computation. Proc. The 28th annual ACM symposium on Theory of computing. 8. Mikhail J. Atallah: Secure and Private Sequence Comparisons. Proc. The 2003 ACM workshop on Privacy in the electronic society (2003) 9. Atallah, M.J. Elmongui, H.G. Deshpande, V. Schwarz, L.B.: Secure supply-chain protocols. Proc. IEEE International Conference, E-Commerce (2003) 10. Ueli Maurer: The role of cryptography in database security. Proc. The 2004 ACM SIGMOD international conference on Management of data (2004) 11. Rakesh Agrawal, Ramakrishnan Srikant: Privacy-Preserving Data Mining. Proc. The ACM SIGMOD Conference on Management of Data (2000) 12. Mishra D.K., Chandwani M.: Anonymity enabled secure multi-party computation for Indian BPO. In Proceeding of the IEEE Tencon 2007: International conference on Intelligent Information Communication Technologies for Better Human Life, Taipei, Taiwan (29 Oct. - 02 Nov. 2007) 52-56

Suggest Documents