A Distributed Randomization Framework for Privacy Preservation in Big Data Dr. G Sadashivappa**
Samiksha Shukla* Department of Computer Science, Christ University Bangalore, India
[email protected]
Abstract— The privacy preservation is a big challenge for data generated from various sources such as social networking sites, online transaction, weather forecast to name a few. Due to the socialization of the internet and cloud computing pica bytes of unstructured data is generated online with intrinsic values. The inflow of big data and the requirement to move this information throughout an organization has become a new target for hackers. This data is subject to privacy laws and should be protected. The proposed protocol is one step toward the security in case of above circumstances where data is coming from multiple participants and all are concerned about individual privacy and confidentiality. Keywords— Anonymization, Confidentiality, Packetization, Privacy, Security, Secure Multi-Party Computation (SMC), Trusted third party (TTP)
I.
INTRODUCTION
Today information enhancement has achieved a new velocity. The volume of information is blowing up from diverse sources. It leads to the requirement of security redefinition. [14] As big data represents inviting opportunity for business security vendors identified that big data requires a different approach to security, secure multi-party computation (SMC) can be one solution for the security issues with big data. [15] “Future of data security and privacy: controlling big data” identified that security authorities apply most of the conditions at network level. In this case if the attacker breaks the circumference of the network, they can get complete unrestricted access to the big data. So it is beneficial to keep these conditions, closures to the data. As the participants' first concern is their data security so the data packets must be highly secure against attacks. Retailer consumer‟s data is often analysed to decide production and maintain inventory system. In this scenario, individual privacy and confidentiality is the main concern. The proposed Secure Sum protocol can be applied in the above mentioned circumstances. Secure multi-party computation (SMC) is required in a joint computation; individuals are interested in a joint computation for financial growth, and identify consumer behavior etc. "978-1-4799-3064-7/14/$31.00 ©2014 IEEE"
**
R V College of Engineering Mysore Road, Bangalore-59, India
[email protected]
"978-1-4799-3064-7/14/$31.00 ©2014 IEEE" Here individual participants are more concerned about their privacy and confidentiality. The proposed protocol is an initiative to fulfil the user‟s privacy concern and to maintain confidentiality during computations. II.
RELATED WORKS
As a strategy for privacy preservation of users two party knowledge exchange protocol was proposed by Yao‟s [1] [2]. [3] Presented how a polynomial time algorithm can be used to solve mental game problems when the majority of players are honest. [4] Demonstrated a look-ahead approach for SMC to attain distributed k-anonymity it helps the parties to decide whether the utility gain from the protocol is satisfactory before starting the protocol. Mauro et. al. In [5] stated how to use SMC for processing, biomedical signals, they presented an automatic diagnosis system for privacy preservation through which remote server classifies biomedical signals provided by the parties without acquiring any information about the signal. The system presented in this paper demonstrated that in semi-honest model difficult tasks like ECG classification in encrypted domain is efficiently attainable. [6] Demonstrated architecture for secure outsourcing of data and computation which provides integrity and privacy. Justin and Vitaly [7] considered a setting where two parties each with a graph want to perform computation over joint graph while maintaining privacy of inputs except what can be revealed from the final output. In this paper different approaches of SMC have been presented (1) APSD (all pair shortest distances) (2) SSSD (single source shortest distance) (3) secure set union, provided players are “Honest but Curious (semi honest)”. [8] Presents an efficient SFE framework which allows mutually untrusted parties to correctly compute any function on their private input without revealing input. It could be beneficial for any client server environment where privacy is a key feature like medical, financial analysis, face recognition. According to [9] genomic data raise security concerns as it can uniquely identify individual and contains sensitive data related to individual (health) disease. For more comprehensive analysis genomic information must be combined with
environmental and clinical data which are sensitive. In this paper system for the disease risk test has been proposed for securely processing genomic, clinical and environmental data using privacy preserving integer comparison and homomorphic encryption. [10] Presented solution to SMC problem which ensures zero-hacking. In this paper author considered multiple TTP‟s and one TTP is selected unanimously for joint computation. [11] Described application of SMC in Indian-BPO. [12] Presented a new SMC algorithm based on FKN protocol [13] using randomization. III.
PROPOSED ARCHITECTURE AND PROTOCOL
In this paper randomization and packetization are used together to solve secure sum problem. This protocol will work properly in case when the majority of participants are “honest but curious” Anonymization layer is used to hide the identity. The party chooses different anonymizer to send randomized data and random numbers. Trusted third parties (TTP) on receiving data keep it in data pool and random number in the random number pool. Once complete data and all random numbers are received, Trusted third party (TTP) perform validation of the data packet, and random number by comparing the total number of expected packets and received packets if validation is successful TTP proceed for computation.
Fig. 1. Architecture of Proposed Protocol
A. Protocol (proposed)
Input: (D1, D2… Dn) are data of each party respectively.
Step 1: Parties divide data into „k‟ packets and generate „k‟ pseudo-random numbers, and makes „k‟ pairs as (D ji, rji) (here j represents party whose data is pseudorandomized and rji is the pseudo-random number added to packet Dji. Step 2: All the packets (Dji, ri) are sent to anonymizers such that if Ai is getting Dji then total of all the random number added to parties packets
must
be sent to another anonymizer Aj. Step 3: Anonymizers after receiving all the packets, forward data packets to data pool and pseudo-random number to separate pool. Step 4: After receiving all the packets TTP first verify whether the expected number of packets is same as the received number of packets. Step 5: If the verification is successful, then TTP proceed for secure collaborative computation of over the data packets received such that no information is leaked other than what can be interpreted from the output. IV.
ALGORITHM (PROPOSED)
Algorithm: Distributed Randomized Secure Sum (DRSS) Assumption: 1. Participating parties are “semi-honest”. 2. Same number of packets is used by all the parties. 3. Data and pseudo-random number are provided to different anonymizers 4. TTP is trusted. 5. Parties provide correct input. Input: (D1, D2, …, Dn) are input of each party respectively. Variable list: n: number of parties. m: number of anonymizers. „k‟: number of packets for each party. ri : pseudo-random number //initialized to 0. SD: Data pool SR: Pseudo-Random number pool. Countdttp: Total number of data packets at TTP (initialized to zero). Countrttp: Total number of packets (pseudo-random number) at TTP (Initialized to zero). Edttp: Expected number of data packets at TTP. Erttp: Expected random number packets at TTP. Max_Limit_A: Maximum limit of anonymizer. //Max_Limit_A≥n*tpk. Phase 1: (Packetization) for ( j =1 to n ) do begin a) Each party divides data „Dj‟ into „k‟ packets; b) Each party individually generates „k‟ pseudorandom number; for ( i =1 to k) do begin c) Add each Dji to rji ; // here
d) Forward (Dji) to randomly selected anonymizer; //each data packet to different anonymizer. e) rj = rj + rji; end; // end of loop i. f) Forward rj to randomly selected anonymizer; // here random number and data packets should not be given to same anonymizer. end; //end of loop j.
Phase 3: (Receive Data and Pseudo-Random number at TTP) for ( i = 1 to m ) do begin while (count (Ami) >= 1) do begin a) Send data packets to SD; b) Countdttp = Countdttp + 1; c) Send pseudo-random number packets to SR; d) Countrttp = Countrttp + 1; end; end;
Phase 2: (Anonymization) Phase 4: (Data Validation and Computation) for ( i =1 to n ) do begin for ( j =1 to k ) do begin a) Randomly select one anonymizer Amd ; if (count(Amd) < Max_Limit_A) then
E dttp n k ;
a)
n; If (Countdttp = Edttp and Countrttp = Erttp ) then begin b)
;
c) broadcast ; end; else d) Return “Packet Lost”; endif; // end of if
begin c) send Dij to Amd; d) increase count(Amd) by 1; e) else f) Repeat step 2(a) endif; //end of if statement.
V.
PERFORMANCE ANALYSIS
Case 1: In [12] it is shown when only one random number „r‟ is added to the data, In that case even if „1‟ out of „n‟ party becomes malicious and share „r‟ with anonymizer then there is some probability of privacy loss. In current scenario, the above problem of privacy loss can be avoided as, let „k‟ is the number of packets and after adding „r‟ it will become „k+1‟,
end; //end of loop j
One party‟s data distribution to „p‟ malicious anonymizers out of „m‟ is represented as
g) Randomly select an anonymizer Amr;
p Pr(1, p, m) m
if (count(Amr) < Max_Limit_A) then begin i) send rj to Amr; j) Increase count(Amr) by 1; else l) Repeat step 2(g) endif;// end of if statement. end; // end of loop i.
( k 1)
(1)
If out of „p‟, „m‟ anonymizer combine together, then the probability of breaking the protocol will be k 1 (2) 1 p Pr( p, m)
n
m
VI.
Probability of Breaking Protocol in case of Malicious Anonymizers 0.14
Probability
0.12 0.1 0.08 0.06
Series1
0.04 0.02 0 1
2
3
4
5
6
7
8
Malicious Anonymizers
Fig. 2. Probability of Breaking Protocol in Case of Malicious Anonymizers
“Fig. 2” shows that even if „m‟ (here m is the total number of anonymizers) anonymizers combine probability of breaking protocol is insignificant. (For analysis 8 anonymizers are considered). Case 2: When different random numbers r1, r2, . . . rk are added to each packet as D11+ r11, D12+ r12… D1k+ r1k then the probability of breaking the protocol will be, if „l‟ out of „m‟ party, combine together to get other parties data as well as random numbers then the probability of breaking in case of different number of packets is shown in Fig. 3.
l Pr(l , m) m
k 1
(3)
Probability of Breaking when different random number is added to each packet
Probability
0.5 0.4
Number of packets=3
0.3 0.2
Number of packets=4
0.1
Number of packets=5
0 1
2
3
4
5
6
7
8
Number of Anonymizers
Fig. 3. Probability of Breaking when different random number is added
“Fig. 3” shows that when different random number is added to each data packet, then the probability of breaking the protocol is very less. And as we increase the number of packets the probability of breaking the protocol decreases further. Case 3: When „n-1‟ out of „n‟ party join together to get the private input of nth party, in the proposed framework, they cannot break the protocol and get the private input of another party. Because if those parties try to get data of nth after getting the computation result, they need to have random numbers of nth party that is known to party only.
CONCLUSIONS
In this paper authors presented an approach for privacy preserving secure sum technique in which randomization and packetization are used together to achieve twofold objective of privacy and security. It can be applied to the big data environment where pica bytes of data come from various sources. The proposed protocol can be extended by adding extra TTP so that computational work can be divided among them for faster processing. According to probabilistic analysis proposed protocol will perform better in case of semi-honest adversaries. In future further efforts can be made to improve computation complexity. REFERENCES Yao. Protocols for secure computations. In IEEE Symposium on Foundations of Computer Science (FOCS '82), pages 160-164. IEEE Computer Society, 1982. Yao. How to generate and exchange secrets. In IEEE Symposium on Foundations of Computer Science (FOCS '86), pages 162-167. IEEE Computer Society, 1986. Goldreich, S. Micali, and A. Wigderson. How to play ANY mental game. In ACM Symposium on the Theory of Computation (STOC '87), pages 218-229, ACM, 1987. Mehmet Ercan Nergiz et. al. A Look-Ahead Approach to Secure Multiparty Protocols, In IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 7, JULY 2012. Mauro Barni et. al. Privacy-Preserving ECG Classification With Branching Programs and Neural Networks, IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 6, NO. 2, JUNE 2011. Ms Sonali Lunawat and Mr. Abhijit Patankar, “Architecture For Secure Cloud Computing Using Garbled Circuits”, International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 2 Issue 6, June – 2013. Justin Brickell and Vitaly Shmatikov, Privacy-Preserving Graph Algorithms in the Semi-honest Model, ASIACRYPT 2005, LNCS 3788, pp. 236–252, 2005.
Kolesnikov, Vladimir, Ahmad-Reza Sadeghi, and Thomas Schneider. "A systematic approach to practically efficient general two-party secure function evaluation protocols and their modular design." Journal of Computer Security 21.2 (2013): 283-315. Ayday, Erman, et al. "Privacy-Preserving Computation of Disease Risk by Using Genomic, Clinical, and Environmental Data." (2013). Mishra, Durgesh Kumar, and Manohar Chandwani. "A zero-hacking protocol for secure multiparty computation using multiple TTP." TENCON 2008-2008 IEEE Region 10 Conference. IEEE, 2008. Mishra, D. K., and M. Chandwani. "Anonymity enabled secure multiparty computation for Indian BPO." In TENCON 2007-2007 IEEE Region 10 Conference, pp. 1-4. IEEE, 2007. Samiksha Shukla, Dr. Sadashivappa G, "An Algorithm for SMC with analysis of malicious conduct", International journal of advanced research in computer science and software engineering 3(10), October -2013, pp. 667-673. Uri Feige, Joe Killian, and Moni Naor. A minimal model for secure computation (extended abstract). In ACM symposium on Theory of Computing (STOC '94), pages 554{563, New York, NY, USA, 1994. ACM. Ren, Kui, Cong Wang, and Qian Wang. "Security challenges for the public cloud." Internet Computing, IEEE 16.1 (2012): 69-73. Bryant, Randal, Randy H. Katz, and Edward D. Lazowska. "Big-Data Computing: Creating Revolutionary Breakthroughs in Commerce, Science and Society." (2008): 1-15.