May 8, 2013 - Without Sharing Key with Query Users. Youwen ... scheme for k-NN query on encrypted cloud data in which th
Secure k-NN Computation on Encrypted Cloud Data Without Sharing Key with Query Users Youwen Zhu
Institute of Mathematics for Industry, Kyushu University, Fuokuoka, 819-0395, Japan
[email protected]
Rui Xu
School of Graduate, Kyushu University, Fuokuoka, 819-0395, Japan
[email protected]
ABSTRACT
[email protected]
economies of scale. However, it also arouses the security and privacy concerns, since the direct control will be transferred to cloud service provider while data owner outsources his dataset to a remote cloud server. Thus, data owner has to encrypt the sensitive information of his outsourced data, such as income level, health records, personal photos [1] before the dataset is uploaded to the cloud server such that his privacy is not breached. On the other hand, data owner may plan to make use of the strong computation ability of cloud service provider to process or query the database stored in the cloud server to obtain beneficial knowledge. However, the goal of most existing traditional encryption schemes is to protect the hidden plaintext, and their ciphertext cannot be as smoothly processed/analyzed as the plain dataset. Therefore, a big number of efficient secure schemes have been proposed [2, 3, 4, 5, 6, 7, 8] to support the execution of application on encrypted data in the new promising cloud computing paradigm. As the fundamental data mining query operation, the goal of knearest neighbors (k-NN) computation [9, 10] is to search k nearest points of a given query point according to some distance metric measures, such as Minkowski distance, Euclidean distance, and edit distance. To securely process k-NN query on outsourced encrypted data, Wong et al [2] proposed an Asymmetric Scalar-productPreserving Encryption (ASPE) scheme. ASPE can achieve better security through using an invertible matrix, instead of orthogonal matrix in [11, 12], to transform the database point. Heretofore, Wong’s ASPE scheme has been used as a black-box in many problems [13, 14, 15]. However, in ASPE, data owner will share the key for encryption and decryption with all query users, thus ASPE only supports fully trusted query users to conduct the k-NN query on encrypted cloud data. It will bring about serval problems. Firstly, since each query user has the access to the total key, the adversary can obtain the key if he successfully decomposes one query user, i.e, the wide distribution will seriously increase the risk of key leakage. Secondly, in numerous concrete situations, data owner only has scant trust on each query user. For instance, some online social networking service site can take advantages of cloud service, and outsource the database of its members to cloud server for reducing the expense and enjoying other benefits of cloud. Simultaneously, its members may want to search the kNN members with some similar social proximity. If using ASPE scheme, the members will encrypt the attributes with the same key as the one that the site encrypts the outsoured database. However, it is not realistic, since the site cannot share the key with the members otherwise its business confidentiality and members’ privacy may be violated. Thirdly, data owner cannot control the queries of the users who has received the key, and it is of much difficulty for data owner to re-
In cloud computing, secure analysis on outsourced encrypted data is a significant topic. As a frequently used query for online applications, secure k-nearest neighbors (k-NN) computation on encrypted cloud data has received much attention, and several solutions for it have been put forward. However, most existing schemes assume the query users are fully trusted and all query users share the total key which is used to encrypt and decrypt data owner’s outsourced data. It is constitutionally not feasible in lots of real-world applications. In this paper, we propose a novel secure and efficient scheme for k-NN query on encrypted cloud data in which the key of data owner to encrypt and decrypt ousourced data will not be completely disclosed to any query user. Therefore, our scheme can efficiently support the secure k-NN query on encrypted cloud data even when query users are not trustworthy enough.
Categories and Subject Descriptors H.2.7 [Database Administration]: security, integrity, and protection
General Terms Algorithm, Security
Keywords cloud computing, privacy, k-nearest neighbors
1.
Tsuyoshi Takagi
Institute of Mathematics for Industry, Kyushu University, Fuokuoka, 819-0395, Japan
INTRODUCTION
Recently, cloud computing is becoming a more and more prevalent computing paradigm. At the same time, much attention has been paid to consider the special security and privacy problems in cloud computing. On the one hand, as cloud computing can provide convenient pay-as-you-go storage space, huge data are being increasingly centralized into the cloud to enjoy the coherence and
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CloudComputing’13, May 8, 2013, Hangzhou, China. Copyright 2013 ACM 978-1-4503-2067-2/13/05 ...$15.00.
55
voke the key of a query user. At last but not the least, a malicious cloud server may employ some bot-net user to steal the key in this system. Generally speaking, the existing query schemes which allow query users to have the access to the key of data owner are still far from being feasible in most real-world situations. In this paper, considering a part of the above problems, we study the secure kNN query computation on encrypted cloud data in the application situation that (1) data owner encrypts his private database and outsources the encrypted dataset to cloud server for enjoying the storage and computation ability of the cloud; (2) cloud server is honest-but-curious, that is, he will exactly follow the demand of data owner and query user to perform the computation but he also tries to infer as much private original value of encrypted outsourced dataset as possible by analyzing the information from data owner and query user; (3) each query user would like to find out the kNN of her private query point without disclosing any privacy to data owner and cloud server, and before being submitted to cloud server for kNN computation, her query point will be encrypted by using the key associated with the one of data owner such that cloud server can smoothly compute the kNN while learning nothing about the original query point, but data owner cannot reveal his key to query user because query user is not trustworthy enough and she may reveal her knowledge about the key to the adversary (the cloud server in this work) or it will seriously violate the business secret and privacy. For the problems, a simple solution is that data owner and query user collaboratively perform a secure two-party computation protocol [16, 17], which takes the key of data owner and query user’s private point as input, such that query user receives the encrypted query points and data owner learns nothing. However, most of secure computation protocols are based on garbled-circuit protocol [17, 18, 19], secret sharing [20, 21] and homomorphic encryption system [22] which are highly inefficient and thus not feasible for practical use. We consider the aforementioned problems and propose a new secure kNN query scheme. Similar to the existing schemes [2, 11, 12], the novel scheme is also a linear transformation and it can efficiently and effectively meet the requirements of our above application. A significant advantage of our scheme is that data owner will only release partial information of his key to query users, instead of the complete disclosure in existing schemes [2, 11, 12]. The rest of the paper is organized as follows. Section 2 discusses the related work. In Section 3, we introduce the system model, design goals, some notations and the framework of our solution. In Section 4, we propose the new secure kNN query scheme on encrypted cloud data, following by its analysis. At last, Section 5 concludes the paper.
2.
[2] extends the scheme, by random asymmetric splitting, to resist a stronger attacker that also learns a set of original tuples and their corresponding encrypted items. As the better security, ASPE has been used as the basic building of many secure query solutions [13, 14, 15]. However, nearly all of them directly apply ASPE as a substep and still require that data owner share the full key with each query user. In this paper, we consider the security risk of releasing the total key to query users, and propose an improved scheme where query users can efficiently conduct the k-NN computation on cloud server but not obtain the full key of data owner.
Figure 1: Architecture of k-Nearest Neighbor Queries on Encrypted Cloud Data
3. PROBLEM STATEMENT AND FRAMEWORK In this section, we introduce our system model, design goals and the framework of our new solution.
3.1 System Model This paper focuses on the security and privacy problems of cloud data storage and query service, which involves three types of entities: a data owner, a cloud server and some query users. They are shown in Figure 1. The cloud server owns huge storage and computing resources. The data owner possesses a large private database D consisting of m d-dimensional points: p1 , p2 , · · · , pm ∈ Rd . In practice, we have d p0h (q 0 )T iff. D(pi , q) > D(ph , q), i (1) p0i (q 0 )T = p0h (q 0 )T iff. D(pi , q) = D(ph , q), 0 0 T 0 0 T pi (q ) < ph (q ) iff. D(pi , q) < D(ph , q).
p0i (q 0 )T −p0h (q 0 )T = αβ
d d ³X ´ X (p2ij −2pij qj )− (p2hj −2phj qj ) j=1
Here, iff. denotes “if and only if".
= αβ
j=1
d d ³X ´ X (pij − qj )2 − (phj − qj )2 j=1
P ROOF. In QueryEncryption stage, for the j-th element of q 0 , there is qj0 = Xj +
j=1
β
Correctness Analysis
d X
d d X X (rj −2αpij )qj +β sj tj +rd+1 β + j=1
• Outsourcingk-NNComputation(D0 , q 0 , k) 7→ {Iq }: Query user uploads q 0 to cloud server, then cloud server computes the index set of k-NN in D0 of the encrypted query point q 0 according to the distance p0i (q 0 )T , and sends the index set Iq of the k-NN in D0 to the corresponding query user.
4.2
βq (Mj,2l − Mj,2l−1 )(ql + 2tl )
l=1
j=1
³ ´ = αβ D(pi , q) − D(ph , q) .
Yjl (ql + 2tl )
As α and β are positive, then, the equation (1) holds.
l=1
58
Scheme
ASPE in [2]
Our scheme
Scheme ASPE in [2] Our scheme
Table 1: Computation Complexity Stage data owner query user Key Generation O(d2 ) Tuple Encryption O(md2 ) Query Encryption O(d2 ) kNN Computation Total O(md2 ) O(d2 ) Key Generation O(d2 ) Tuple Encryption O(md2 ) Query Encryption O(d2 ) O(d2 ) kNN Computation Total O(md2 ) O(d2 )
Tuple Encryption m(d + 1)b0 m(2d + 2)b0
Table 2: Communication Overheads Query Encryption Protocol kNN Computation (d2 + 2d + 1)b0 (d + 1 + k)b0 (2d2 + 5d + 2)b0 (2d + 2 + k)b0
P ROOF. According to Theorem 1, the cloud server can exactly compare the distance between q and different tuples in D through using the value p0i (q 0 )T . Consequently, cloud server can correctly compute k-NN of q in our scheme which completes the proof of Theorem 2.
Cost Analysis and Comparison
Here, we evaluate our scheme’s computation complexity and communication overheads, and compare it with Wong’s scheme ASPE in [2]. The computation complexity of each stage of ASPE [2] and our scheme is illustrated as Table 1. The only difference between them is that during query encryption, data owner in ASPE does nothing, but our scheme introduces new computing load to the data owner. The reason is that to achieve better security, the data owner in our scheme will not directly reveal the key to query user, and he also can effectively manage each query, thus, data owner will burden extra O(d2 ) computational task. Generally speaking, d is not big in real-world applications, therefore, the extra computation load of data owner is still feasible. If each dimension or the index is b0 bit, then the communication cost of ASPE [2] and our scheme is as Table 2 where the communication overheads of Tuple Encryption is brought about by uploading the encrypted database to cloud server. The comparison shows that communication overheads of our scheme is about as twice as that of ASPE [2], since we extend the d-dimensional original tuples into (2d + 2)-dimensional encrypted ones for high security.
4.4
Total O(d2 ) O(md2 ) O(d2 ) O(md) O(md2 ) O(d2 ) O(md) O(d2 ) O(md) O(md2 )
Total (md + m + d2 + 3d + k + 2)b0 (2md + 2m + 2d2 + 7d + k + 4)b0
βM q¨T where β and M are privately kept to data owner, cloud server cannot figure out the query point q, either. In general, the query point can be well preserved while cloud server does not collude with data owner. As being mentioned in our foregoing design goals (Section 3.2), we do not consider the security of query points against the collusion attack of data owner and cloud server, because data owner and cloud server can inherently infer the query point with high accuracy from its k-NN, which is the output of the query, by collusion. Data Privacy: While executing k-NN query, the query user receives X and Y , from which query user can derive out (Mj,2l − Mj,2l−1 )/(Mx,2l − Mx,2l−1 ) = Yjl /Yxl (only for j, x ∈ [2d + 2] and l ∈ [d]) about the matrix M(2d+2)×(2d+2) in the key of data owner Key = {α, r, s, M , M −1 }. In the scheme, the encrypted database D0 ={p01 , p02 , · · · , p0m } is stored in cloud server which can also legally obtain the encrypted query point q 0 . While no collusion occurs and cloud server only achieves D0 and q 0 , our scheme is almost the same as the secure scheme against level-2 attack in [2] and it has been shown that the data can be secure to resist an adversary who knows the encrypted database and some plain tuples. If some corrupted query users release their knowledge about the key of data owner to cloud server, in ASPE [2], the original database D will be completely disclosed. However, in our novel scheme, query users only learn X and Y but not the full key of data owner. Even when X and Y are revealed to cloud server by some unreliable query users, the cloud server still cannot deduce more information about M in Key = {α, r, s, M , M −1 }, therefore, it cannot recover the database D. Additionally, we use a similar manner to encrypt the query points and the tuples of data owner, thus, in this situation, the query points of other honest query users are also secure.
T HEOREM 2. In the Outsourcingk-NNComputation of our scheme, the cloud server can correctly find out the k-NN of the query point q according to the distance p0i (q 0 )T .
4.3
cloud server O(md) O(md) O(md) O(md)
Security Discussion
The data privacy and query privacy are discussed, respectively, as follows. Query Privacy: For the query point q, data owner receives nothing but {q˙1 , q˙2 , · · · , q˙d } from query user in which q˙x = qx + tx (for ∀x ∈ [d]), as tx is random to data owner, thus he cannot find out qx . Cloud server can learn q 0 about the query point q. As q 0 =
5. CONCLUSION In this paper, we focused on the problem of supporting efficient k-NN computation over encrypted cloud data while data owner cannot share its key with query users and proposed a new solution through which k-NN query can be smoothly performed on encrypted data while releasing only partial information about the
59
key to query users. For the future work, we will address the problem and devote to achieving schemes with better properties, such as completely hiding the key.
6.
[12] S.R.M. Oliveira and O.R. Zaiane. Privacy preserving clustering by data transformation. In Proc. of the 18th Brazilian Symposium on Databases, pages 304–318, 2003. [13] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou. Privacy-preserving multi-keyword ranked search over encrypted cloud data. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM), pages 829–837, 2011. [14] N. Cao, Z. Yang, C. Wang, K. Ren, and W. Lou. Privacy-preserving query over encrypted graph-structured data in cloud computing. In 31st International Conference on Distributed Computing Systems (ICDCS),, pages 393–402. IEEE, 2011. [15] M. Li, S. Yu, W. Lou, and Y.T. Hou. Toward privacy-assured cloud data services with flexible search functionalities. In IEEE ICDCS Workshops, pages 466–470, 2012. [16] A. C. Yao. Protocols for secure computations. In the 23rd Annual IEEE Symposium on Foundations of Computer Science, pages 160–164, 1982. [17] O. Goldreich. Foundations of Cryptography: Volume II, Basic Applications. Cambridge: Cambridge University Press, 2004. [18] A.C. Yao. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (FOCS), pages 162–167. IEEE, 1986. [19] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game, or a completeness theorem for protocols with an honest majority. In Proc. of the 19th Annual ACM Symposium on Theory of Computing (STOC),, pages 218–229. ACM Press, 1987. [20] T. Pedersen. Non-interactive and information-theoretic secure verifiable secret sharing. In CRYPTO, pages 129–140. Springer, 1991. [21] I. Damgård, M. Fitzi, E. Kiltz, J. Nielsen, and T. Toft. Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation. pages 285–304, 2006. [22] P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. In EUROCRYPT, pages 223–238. LNCS 1592, Springer, 1999. [23] C.C. Aggarwal and S.Y. Philip. Privacy-preserving data mining: models and algorithms, volume 34. Springer, 2008. [24] K. Chen, G. Sun, and L. Liu. Towards attack-resilient geometric data perturbation. In SIAM data mining conference, 2007. [25] K. Liu, C. Giannella, and H. Kargupta. An attacker’s view of distance preserving maps for privacy preserving data mining. In 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 297–308, 2006. [26] K. Liu, C. Giannella, and H. Kargupta. A survey of attack techniques on privacy-preserving data perturbation methods. In Privacy-Preserving Data Mining, pages 359–381. Springer, 2008.
ACKNOWLEDGMENTS
We would like to thank the anonymous reviewers for their valuable comments and Prof. Mingwu Zhang for his helpful discussions. This work was supported in part by Japan Society of the Promotion of Science and the National Natural Science Foundation of China (No. 61272404).
7.
REFERENCES
[1] S. Kamara and K. Lauter. Cryptographic cloud storage. In Financial Cryptography: Workshop on Real-Life Cryptographic Protocols and Standardization, LNCS 6054, pages 136–149, 2010. [2] W.K. Wong, D.W. Cheung, B. Kao, and N. Mamoulis. Secure knn computation on encrypted databases. In Proceedings of the 35th SIGMOD, pages 139–152, 2009. [3] M.D. Singh, P.R. Krishna, and A. Saxena. A cryptography based privacy preserving solution to mine cloud data. In the 3rd Annual ACM Bangalore Conference, 2010. [4] H. Hu, J. Xu, C. Ren, and B. Choi. Processing private queries over untrusted data cloud through privacy homomorphism. In IEEE 27th International Conference on Data Engineering (ICDE), pages 601–612, 2011. [5] B. Hore, S. Mehrotra, M. Canim, and M. Kantarcioglu. Secure multidimensional range queries over outsourced data. The VLDB Journal, 21(3):333–358, 2012. [6] C. Wang, K. Ren, S. Yu, and K.M.R. Urs. Achieving usable and privacy-assured similarity search over outsourced cloud data. In Proceedings of IEEE INFOCOM, pages 451–459, 2012. [7] C. Wang, N. Cao, K. Ren, and W. Lou. Enabling secure and efficient ranked keyword search over outsourced cloud data. IEEE Transactions on Parallel and Distributed Systems, 23(8):1467–1479, 2012. [8] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou. Secure ranked keyword search over encrypted cloud data. In 30th IEEE International Conference on Distributed Computing Systems (ICDCS), pages 253–262, 2010. [9] Y. Qi and M.J. Atallah. Efficient privacy-preserving k-nearest neighbor search. In the 28th IEEE International Conference on Distributed Computing Systems(ICDCS), pages 311–319, 2008. [10] M. Shaneck, Y. Kim, and V. Kumar. Privacy preserving nearest neighbor search. In IEEE ICDM Workshops, pages 541–545, 2006. [11] K. Chen and L. Liu. Privacy preserving data classification with rotation perturbation. In 5th IEEE International Conference on Data Mining (ICDM), 2005.
60