New order preserving encryption model for ... - Semantic Scholar

4 downloads 18595 Views 880KB Size Report
This property results in good performance and requires mini- mal changes to ... database applications based on the cloud storage, where OPE scheme will be ...
Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca

New order preserving encryption model for outsourced databases in cloud environments Zheli Liu a, Xiaofeng Chen b, Jun Yang a, Chunfu Jia a, Ilsun You c,n a

College of Computer and Control Engineering, Nankai University, China State Key Laboratory of Integrated Service Networks, Xidian University, China c School of Information Science, Korean Bible University, South Korea b

art ic l e i nf o

a b s t r a c t

Article history: Received 18 October 2013 Received in revised form 7 June 2014 Accepted 7 July 2014

The order of the plaintext remains in the ciphertext, so order-preserving encryption (OPE) scheme is under threat if the adversary is allowed to query for many times. To hide the order in the ciphertext, the only ideal-security OPE scheme (Popa et al., 2013) requires the database server to maintain extra information and realize comparison or range query by user defined functions (UDFs). However, order operations will no longer be performed directly on the ciphertext. It will affect the efficiency and make this scheme to be not suitable for some cases. In this paper, we aim at constructing efficient and programmable OPE scheme for outsourced databases. Firstly, we introduce the system model of outsourced database where OPE scheme will be used, show that ciphertext-only attack is basic and practical security goal. Secondly, we discuss the statistical attack for OPE schemes, point out how to hide data distribution and data frequency is important when designing OPE schemes. Thirdly, we propose a new simple OPE model, which uses message space expansion and nonlinear space split to hide data distribution and frequency and further analyze its security against two kinds of attack in details. Finally, we discuss implementation details including how to use our OPE scheme in the database applications. And we also evaluate its performance through the experiment. The security analysis and performance evaluation show that our OPE scheme is secure enough and more efficient. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Order preserving encryption Outsourced database Privacy protection Cloud computing Ciphertext-only attack

1. Introduction Order-preserving encryption (OPE) is a common encryption scheme which ensures that the order of plaintexts remains in the ciphertexts. It is appealing because systems can perform order operations on ciphertexts in the same way as on plaintexts: for example, a database server can build an index, perform SQL range queries, and sort encrypted data, all in the same way as for plaintext data. This property results in good performance and requires minimal changes to existing software, making it easier to adopt. In the cloud computing and big data environments, OPE will be more useful, because: (1) outsourced database has attracted much attention recently due to the emergence of cloud computing, however, how to protect the outsourced data storing in the untrusted cloud server becomes a serious problem. Since order-preserving, OPE allows untrusted server to perform database operations, such as comparison and range query over encrypted data, without decrypting them; (2) in the big data environment, a fruitful direction for

n

Corresponding author. E-mail address: [email protected] (I. You).

future research in data mining will be the development of cryptology techniques that incorporate privacy concerns. For most of the data mining algorithms usually rely on the order of data, OPE will be also the ideal tool when to protect data privacy using the cryptographic techniques and ensure the right results can be mined. The ideal security goal for an OPE scheme, IND-OCPA (Boldyreva et al., 2009), is to reveal no additional information about the plaintext values besides their order (which is the minimum requirement for the order-preserving property). Until now, the only idealsecurity OPE scheme is mutable order-preserving encoding (mOPE) scheme (Popa et al., 2013), which is proposed by Popa et al. in 2013, where the ciphertexts reveal nothing except for the order of the plaintext values. The mOPE works by building a balanced search tree containing all of the plaintext values encrypted by the application in the database side, and it requires the encryption protocol to be interactive and for a small number of ciphertexts of alreadyencrypted values to change as new plaintext values are encrypted (e.g., it is straightforward to update a few ciphertexts stored in a database), and these operations in database side can be implemented by user define functions (UDFs). It has been a problem of OPE that how to improve security but ensure the function and the efficiency. Although mOPE has ideal

http://dx.doi.org/10.1016/j.jnca.2014.07.001 1084-8045/& 2014 Elsevier Ltd. All rights reserved.

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

security, but the interaction and tree balancing will affect its efficiency, besides, UDFs and the maintained balance tree make it be not suitable for the cases in which: (1) user has no permission to create UDFs in the database, for example, some small companies deploy their web applications into the rented web server using the rented database; (2) the application requires the direct order comparison on the ciphertext, for example, the OPE is used to achieve privacy-preserving data publishing for special data mining task. Another scheme (Boldyreva et al., 2009) has provable security guarantees: the encryption is equivalent to a random mapping that preserves order, however, the experiment in Popa et al. (2011) shows that it has a poor efficiency and its execution time of encryption is 9 ms. Except them, some other OPE schemes (Kadhem et al., 2010; Seungmin et al., 2009; Yum et al., 2012) have been proposed, however, they all leak more information than just the order of values. Thus, it is always necessary to propose an efficient OPE scheme with the practical security. In this paper, we aim at proposing feasible, programmable and secure OPE scheme which is practical on the outsourced database or privacy-preserving data publishing (Fung et al., 2010). In particular, we assume that: (1) the database should support the direct order comparison on the ciphertext, i.e., the ciphertext should also be numerical data; (2) the new OPE scheme should have a good performance and lead to minimal change for existing software, and the ideal OPE ciphertext can be stored in the original field; (3) the basic security goal for outsourced database is against the ciphertext-only attack, besides, the security against chosenplaintexts attack can also be achieved if we make some restriction of the database system due to the different scene.

 

be untrusted and can be defined as “honest but curious”, i.e. it is interested in the users' private data. Owner: It is the data provider, who stores data to the rented cloud database. Application server: It is not the necessary role in our model, but database applications based on three layer architecture usually use it to process business operation. For database applications based on “client/server” model, the owner and application server will be the same role. So, in our model, we assume that application server is trusted as the owner, and we call them as “OPE client”.

There are also two data flows in the model, i.e., storing data and querying data.

 Storing data: To store data, data owner should firstly use OPE to



encrypt the data which needs for preserving order in the OPE client (owner or application server), and then store the ciphertext to the cloud. Querying data: To perform a query, data owner should firstly use OPE to encrypt the keyword in the range query or exact query SQL sentence, and then send the new SQL query to the cloud. The cloud database can directly execute the SQL sentence and return the results to the OPE client.

Notice: In our system model, the OPE operations are happened in OPE client, but comparison or range query can be directly supported by the database server. And thus, the OPE scheme will be suitable for privacy-preserving data publishing.

2. System model

2.2. Adversary model

In this section, we will briefly discuss the system model for database applications based on the cloud storage, where OPE scheme will be applied, and further discuss its adversary model.

If the OPE encryption executes in the application server side, we assume that sufficient access control or other effective methods are applied, to make sure the application server will not leak the key information. We consider two types of attackers:

2.1. Basic model As shown in Fig. 1, there are three different roles in the model, which are owner, cloud database service provider and application server.

 Cloud database service provider: It is the service provider, who provides the cloud storage service and allows paying customers to store their application data. It helps customers to reduce the management and maintenance cost, and avoids purchasing expensive hardware and database software. However, it must

1. Attackers have access rights of database, such as DBA or cloud service provider of outsourced database. They can see encrypted data, database structure, but can only launch ciphertextonly attack. The security against such attackers is the basic and practical security notion. 2. Attackers have access rights of both application system and database, who can access SQL interpretation interface deployed in database applications, construct SQL sentences with plaintext, gain interpreted SQL sentences with encrypted data, view all fields and structure of database. They have more information to guess the encryption details. They can launch chosen plaintext or ciphertext attacks, in order to guess encryption key. The security against such attackers is the advanced security notion. In the practical applications, the main threat is the first type attacker. In this case, curious attacker is easy to get the data storing in database, but he is difficult to get encryption key which can be protected by the cryptographic method. So that the security against the ciphertext-only attack is our basic and practical security goal.

3. Related works

Fig. 1. System model for outsourced database.

In this section, we will make a summary on the related works, discuss statistical attack for OPE schemes and introduce two typical OPE schemes.

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

3

Table 1 Comparison between typical OPE schemes. Scheme

Efficiency level

Security level

Order comparison

Agrawal'04. (Agrawal et al., 2004) Boldyreva'09 (Boldyreva et al., 2009) Agrawal'09 (Agrawal et al., 2009) Boldyreva'11 (Boldyreva et al., 2011) Liu'13 (Liu and Wang, 2013) Popa'13 (Popa et al., 2013)

Medium Low Medium Low High Low

Low Medium Medium Medium Low High

Directly Directly Directly Directly Directly UDFs

3.1. Summary Table 1 shows the comparison between the typical OPE schemes. About the usage, i.e., order comparison operation, except (Popa et al., 2013) and (Boldyreva et al., 2011), the OPE schemes can support direct comparison in the ciphertext, for the reason that their ciphertext is in the form of numerical data. For database, it is obvious that it would cost plenty of time if we have to perform other operations (e.g., UDFs like in Popa's scheme Popa et al., 2013) to realize the ciphertext comparison. The ideal ciphertext is in the form of numerical data with the nature order involved, and can be stored in the original field. About the security. Boldyreva et al. (2009) were the first to provide a rigorous treatment of the security; in fact, they showed that it is infeasible to achieve ideal security for OPE, under certain implicit assumptions. As a result, they settled on a weaker security guarantee that was later shown to leak at least half of the plaintext bits (Boldyreva et al., 2011). Popa et al. (2013) presented the first ideal-security order-preserving encoding scheme where the ciphertexts reveal nothing except for the order of the plaintext values. Although Liu and Wang's scheme (Liu and Wang, 2013) will leak more information and be dangerous for users, but it has a good efficiency and is programmable, thus, the further research based on their scheme may be helpful for proposing the practical and efficient OPE scheme for database applications. About the efficiency, we can see that security and efficiency are contradictory, the high security has, the low efficiency have. Through the above summary, we can also conduct a conclusion: to achieve high security, OPE must hide the order in the ciphertext and use additional function to finish order comparison with privacy concerns, such as operations in Popa et al. (2013); however, this approach will lead to low efficiency, extra storage and no direct comparison on ciphertext.

Fig. 2. Data distribution of salaries.

can get some useful statistical information to launch an attack, including:

 Data distribution: From the data distribution between plaintext



and ciphertext, the adversary can easily confirm the range of ciphertext. For example, for the given field like “salary”, assume adversary can easily know data distribution of employees’ salaries like in Fig. 2, he can firstly make a statistics on ciphertext and tries to confirm which range contains the dense data from 4000 to 5000. Data frequency: From the knowledge of data with high frequency, the adversary can easily confirm some value of ciphertext with the same frequency, and then launch further attack. As shown in Fig. 2, the adversary maybe know the salary with highest frequency is 5000, then he can easily guess the ciphertext of 5000 by frequency attack.

3.2. Statistical attack for OPE schemes As described in above section, the schemes supporting direct order comparison in ciphertext always have the relative low security, because the order will give adversary more background knowledge. But these schemes always have good efficiency and require minimal changes to existing database application softwares, so the further research is necessary. In this paper, we consider the practical OPE scheme for outsourced database, in which the most common attack is the ciphertext-only attack. In this condition, statistical attack may be the most effective method under such attack. We consider the adversary with the background knowledge, i.e., he can obtain some statistical information from other data providers. This kind of adversary is often mentioned in the privacy-preserving data mining (Agrawal and Srikant, 2000; Lindell and Pinkas, 2000; Vaidya and Chris, 2000) in the big data environment, but have not been discussed in other OPE schemes yet. These adversaries

All the proposed OPE schemes did not take the statistic characteristics in consideration except Agrawal et al. (2004). In fact, even Agrawal et al. (2004) did not offer a feasible method against the statistic attack. So, how to hide the rule of data distribution and data frequency is very important for OPE scheme supporting direct order comparison, and it is the goal of OPE scheme. 3.3. Liu and Wang's linear scheme In 2013, Liu and Wang presented a basic scheme which works as follows: The secret K ¼ fða; bÞ; a; b A Z þ ;g. To each value of the plaintexts v, we assign the encryption of v: Here we randomly choose the noise from f0; 1; …; a  1g so that the order of the plaintexts would not change after the encryption. Unluckily, this scheme owns no security guarantee because the a; b are constant during all the encryption. In this way, if the adversary obtains two pairs of

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

plaintexts and ciphertexts: ( av1 þ b þnoise1 ¼ Encðv1 Þ : av2 þ b þnoise2 ¼ Encðv2 Þ We can easily get the equation aðv1  v2 Þ þ Δnoise ¼ Encðv1 Þ  Encðv2 Þ, where Δnoise ¼ noise1  noise2 A f0; 1; …; a 1g still. Then the adversary can get a range of a: Encðv1 Þ Encðv2 Þ Encðv1 Þ Encðv2 Þ rar v1  v2 þ 1 v1  v2 1 So the key space K lessens much and the parameter a will be easily learned. After some statistical analysis of a series of the form b þ noise, the value of b will be get also. Liu and Wang's quasi-linear encryption scheme is insecure against such an attack. Although it is obviously feasible for outsourcing, it fails to protect the information anywhere in the database. But we should mention that this quasi-linear is indeed efficient with the certain form for encryption and decryption. 3.4. Popa et's mutable tree scheme To achieve an ideal security, Popa et al. proposed an orderpreserving encoding scheme through the binary searchable tree. In their scheme, all the values of the plaintexts are put in order and given each of them the sequence number as the encoding. Suppose the values of the plaintexts are fvi g and we already have vj ovj þ 1 to each j ¼ 0; 1; …; n, then the encoding value Encðvj Þ ¼ j. In this way, we can put all the pairs ðj; vj Þ in a binary tree. According to their path from the head node, each ðj; vj Þ pair owns a binary sequence with different length. Therefore, to keep the same length, the OPE encoding is defined as follows: OPE encoding of vj ¼ [path]10…0 The order of the plaintexts value remains here. To prevent encoding of some nodes in the binary tree get too large, they applied the tree balancing to keep a B-tree. But the cost of the operation-insert, delete and lookup is always too high. For instance, a full binary tree with the height of h owns n ¼ ∑hi ¼ 10 2i ¼ 2h  1 nodes. Taking the common operations on such a tree, we may visit about h nodes which cost is Oðln nÞ. As a result, we are sure that it is indeed a strictly secure scheme. It will only leak the order of plaintext but nothing else. Unfortunately, the interaction and tree balancing will affect its efficiency.

4. Technique preliminary 4.1. Order-preserving encryption A regular cryptosystem fM; C; K; Enc; Decg contains the space of plaintext M, the space of ciphertext C, the space of the key K, the encryption algorithm Enc and the decryption algorithm Dec. In this way, an order-preserving encryption is an encryption of encoding schemes that if we have an order o of plaintext and x o y, then the corresponding ciphertext will satisfy EncðxÞ o EncðyÞ (and vice versa: if the ciphertext EncðxÞ o EncðyÞ, then we must have x o y). 4.2. General idea The order of the plaintext remains in the ciphertext, so such a cryptosystem is under threat if the adversary queries for many times, which is proved in Lemma 4.1. Although mOPE can achieve ideal security, the interaction and tree balancing affect its efficiency. The reason that Liu's scheme is not secure is that any two successful chosen-plaintext attack will lead to the leak of the

whole cryptosystem, however, the linear operation makes it can have a good efficiency and be programable. With the further research based on Liu's scheme, we try to present an efficient and more secure OPE scheme for outsourced database. The intuition for how our OPE scheme works is simple. Firstly, we randomly split the original message space into successive intervals with different length. Secondly, we select an extended ciphertext space and split it into the same number of intervals. Finally, we use some nonlinear mapping functions to map the original element into another one in the extended message space. For different interval, the different mapping function should be used. There are two key points described as following: 1. Extend message space: Extending message space is the precondition of our OPE scheme, for the two following reasons: the first reason is that the databases support to store the ciphertext in the extended message space. For example, it is feasible that change the original field's datatype from “number(8,0)” to “number(10,6)” in Oracle, and such change from low-precision to high-precision will not cause the loss of data; the other reason is that it is helpful to hide the data frequency. The same data can be randomly mapped into a range of the extended ciphertext space, thus, the frequency of this data will be hidden. 2. Nonlinear split message space: Splitting message space provides an effective way to hide the data distribution. For example, one adopted method is that: for the range where more data exists in, the more interval can be split; for a dense interval containing the high frequency data, its corresponding ciphertext interval has a large range.

4.3. Extended message space For both the plaintext spaces M and ciphertext space C, they will be treated as a metric space. In this way, there is a function dðx; yÞ to measure the distance, where

 dðx; yÞ ¼ 0 if and only if x ¼y;  dðx; yÞ ¼ dðx; yÞ;  dðx; yÞ þ dðy; zÞ Z dðx; yÞ. Since the plaintext spaces M is encoded into a successive subset where dðx; yÞ ¼ jx  yj, it will always satisfy the above three conditions. But what the ciphertext space C need to satisfy is only the first two, where we only care about the dispersion. Suppose M ¼ f1; 2; …; Mg, thus jMj ¼ M, i.e., the number of elements in M is M, and the distance between any adjacent elements in M is obviously 1. We can set jCj ¼ M and C ¼ fc1 ; c2 ; …; cM g, where ci ¼ EncðiÞ, and the distance of any two adjacent element (e.g., ci and ci þ 1 ) is always much greater than 1. Otherwise, if dðci ; ci þ 1 Þ ¼ 1, the space of ciphertext is almost the same to the plaintext (C is just a shift of M). In fact, we hope C is much larger than M, i.e., jCj c M. Lemma 4.1. Suppose jMj ¼ M and Enc is order-preserving, the range including any Enc(x) can be get after about log M times queries under chosen-plaintext attack. Proof. Consider the adversary wants to get the plaintext x of a known Enc(x). Under chosen-plaintext attack, the adversary can randomly select a value m and query the encryption oracle to get its ciphertext Enc(m). For convenience, let M0 be the space where x may be in.

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

5

In the beginning, M0 ¼ f1; 2; …; Mg, and the adversary will firstly select m A M0 . Considering the efficiency, m can be selected as about M=2. Because the function Enc is order-preserving, if EncðmÞ oEncðxÞ, then x A f1; 2; …; mg; otherwise, x A fm þ 1; m þ2; …; Mg. So, after this query, the reduced range M0 where x is in will be confirmed. By repeating above operations, i.e., select m A M0 (m can be selected as the middle element of M0 ) and further confirm the reduced range M0 , after about k ¼ log M times, he will finally get x. For example, for jMj ¼ 2k , the range will be reduced by half, and it will contain only one element after at most k queries. Lemma 4.2. If M is a set of integers with jMj ¼ M and the encrypt function Enc(x) satisfy kr

dðEncðiÞ; EncðjÞÞ r K; dði; jÞ

  If we add a random noise δ to the Enc(i) where δ o 12ki , the Enc(i) is still reversible. In other words, there is a certain i mapping to each Enc(i) Proof. In this case, the C is a k to K times extension of M. For any adjacent integer i; iþ 1, we have kr

dðEncðiÞ; Encði þ1ÞÞ ¼ dðEncðiÞ; Encði þ 1ÞÞ rK; dði; iþ 1Þ

so the gap between Enc(i) and Encði þ 1Þ is no more than ki. If EncðiÞ þ δ-EncðiÞ; Then Enc(i) is also reversible since values in the set fEncðiÞgM i ¼ 1 are still isolated to each other. So none of them will merge while the order can be preserved somewhat. □ We should mention that if dðEncðiÞ; EncðjÞÞ=dði; jÞ ¼ k and k is a constant, then the encrypt function is linear such as EncðxÞ ¼ kx þ b. In this way, the function Enc(x) is invalid after just two queries. Similarly, if jdðEncðiÞ; EncðjÞÞ=dði; jÞ kj r δ and δ ¼ oðkÞ, the Enc(x) will be semi-linear as we discussed above in Liu's scheme. The conclusion we want to present here is that: the more that value of dðEncðiÞ; EncðjÞÞ=dði; jÞ (with different(i,j)) varies, the more secure the order-preserving cryptosystem is.

5. New OPE model Based on the two key points described in Section 4.2, we will extend the ciphertext to be more discrete and propose a new OPE model. 5.1. Notations For convenience, we introduce the notations used in the rest of the paper:

 Let Di be an interval of mathematics of original message space,

 



and Di ¼ ðli ; r i , ði ¼ 1; 2; …; mÞ, where li is the minimum value but ri is the maximal value of Di, and if x A Di , then li o x r r i . For two adjacent intervals Di and Di þ 1 , we further have r i ¼ li þ 1 . Let Ci be an interval of mathematics in the extended message 0 space, and C i ¼ ðli ; r 0i , ði ¼ 1; 2; …; mÞ. Let EncðÞ be a family of monotone increasing functions, which is composed of different function Enci ðÞ for different interval Di. Thus, Enci(x) denotes a concrete function, where x A Di , i.e., li ox rr i . Let range(i) be the function to get the left and right value of interval Di, and its output is like (l,r).

Fig. 3. Operational principle of OPE.

 Let range0 ðiÞ be the function to get the left and right value of 0

interval Ci, and its output is like ðl ; r 0 Þ.

 Let index(x) be the function to find the index number i which interval Di contains x.

5.2. Operational principle As shown in Fig. 3, there are three steps in our OPE model: 5.2.1. Splitting the message space To extend the message space, the first step is splitting the message space M to a sequence of intervals such as Di ¼ ðli ; r i , (i ¼ 1; 2; …; m). Because the M is always discrete space, we set li ; r i A Z which satisfy: 8 m m > < M ¼ ⋃ Di ¼ ⋃ ½li ; r i  i¼1 i¼1 : > : ½l ; r  \ ½l ; r  ¼ ϕðia jÞ i i j j Splitting the message space is helpful to destroy the rule of data distribution: for a data collection, the more data exists, the more intervals should be divided. How to split is described in Section 5 in detail. 5.2.2. Splitting the ciphertext space The second step is splitting the ciphertext space C to a sequence 0 0 of intervals such as C i ¼ ðli ; r 0i , (i ¼ 1; 2; …; m), and we set li ; r 0i A Z which satisfy: 8 m m > < C ¼ ⋃ C i ¼ ⋃ ½l0i ; r 0i  i¼1 i¼1 : > : ½l0 ; r 0  \ ½l0 ; r 0  ¼ ϕðia jÞ i i j j Given an original data interval Di, its corresponding ciphertext 0 interval will be Ci, and we can further have Encðli Þ ¼ li and Encðr i Þ ¼ r 0i . Notice that this split is also helpful to destroy the rule of data distribution: the more data in Di, the large range should be in Ci. 5.2.3. Mapping each data to extended space After both split, the third step is mapping each data to the extended ciphertext space C. We use function Enci(x) (a piecewise 0 function) to extend Di ¼ ½li ; r i  to C i ¼ ½li ; r 0i  as following: For each value x in M, we get the interval Di ¼ ½li ; r i  which contains x. For different interval Di, the different function Enci ðÞ 0 will be used to mapping each value to the target interval C i ¼ ½li ; r 0i . The overall process here is like this: Enci ðÞ

0

x A Di ¼ ½li ; r i ⟶ Enci ðxÞ A C i ¼ ½li ; r 0i  To achieve security against frequency attack, an effective way is using one to many mapping function EncðÞ. So, the key of our OPE

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

scheme is how to design such encryption functions, and we discuss its details in Section 5.4. 5.3. Split methods To implement the split, we should give the parameters: paras ¼ ðxmin ; xmax ; fT i g; dmin Þ where xmin and xmax is the start and end point of the plaintext, fT i g is the set of dense intervals and dmin is the minimal length of interval we can set. In fact, with no dense intervals offered, our extension plan will still break the statistic characterizers. It is common that there is no information about the distribution before encryption. In the next section, we will prove that our scheme is secure against the ciphertext-only attack with the breaking of distribution. The goal of split is to break the statistical characteristics. We will discuss how to split the message space M and the ciphertext C to achieve it. The principles of designing such functions are that:

 For an original data collection, the more data exists, the more 

intervals should be divided, and we call these intervals as dense interval. For an original dense interval Di containing the high frequency data, its corresponding ciphertext interval Ci should have a large range, i.e., for two adjacent element x1 ; x2 A C i , dðx1 ; x2 Þ c 1, 0 or, jr 0i li jc jr i li j.

The above principles are helpful to destroy the data distribution, because the dense interval will be extended to a ciphertext interval with large range, but the sparse interval to a ciphertext one with small range, and by this way, the ciphertexts will be close to uniform distribution. Notice: In order to be feasible, the method to split the ciphertext space should satisfy: (a) for any ciphertext y A C, it is easy to get the index index(y) which is the number of split interval contains y; (b) for any index number i, it is also easy to get the 0 range bound ðli ; r 0i Þ of the interval Ci. Moreover, for the range and index, paras is the input of them which requires feasibility in computing. Designing such an ideal split may be difficult. For our OPE model, we will provide a simple solution in Section 7.2. 5.4. Encrypt function The encrypt function EncðÞ should satisfy the following two properties: 1. EncðÞ is solvable: With any x A Di , the cipher Enci(x) is easy for programable computation. And if get yi ¼ Enci ðxi Þ, the corresponding xi ¼ Enci 1 ðyi Þ is computable also. 2. The ki and Ki in ki r dðEnci ðx1 Þ; Enci ðx2 ÞÞ=dðx1 ; x2 Þ r K i : the ki must not be too small to keep the security, the Ki must not be too large to avoid wasting the storage space. Computational process: To be programable, for ith interval and x A Di , Enci(x) will take the range of Di and Ci as input, i.e., 0 ðli ; r i Þ’rangeðiÞ and ðli ; r 0i Þ’range0 ðiÞ, and output the result yA C i . The simplest computational process is as follows:

 compute scale ¼ ðl0i  r0i Þ=ðli  ri Þ;  map the x to the ciphertext interval, i.e., x0 ¼ l0i þ scalenðx  lÞ;  add noise to x0 , i.e., compute x0 ¼ x0 þ r, where r is the random value in ð0; scaleÞ;

In fact, Enc(x) can be generated from any increasing function to achieve nonlinear mapping. But here, we use the above linear mapping to introduce our OPE model, and for convenience, we use the form of Liu's scheme to describe the Enc(x) as Enci ðxÞ ¼ ai x þ bi þ δi ; and in this description:

 The ai ¼ scale, and bi ¼ l0i scalenl.  The δi is the noise that δi  o 12ai and ki ¼ E0i ðxÞ ¼ ai ¼ K i . As a result, the encrypt function EncðÞ can be denoted as follows: m

EncðxÞ ¼ ∑ Enci ðxÞ  δðx; Di Þ i¼1 m

¼ ∑ ðai x þ bi þδi Þ  δðx; Di Þ i¼1

where, δðx; Di Þ ¼

(

0; 1;

x A Di ; x2 = Di :

Noise: It should be mentioned that the noise δi can help break the statistical characteristics. For instance, for some xi with higher probability in distribution. If we keep the δi obeying the uniform distribution with the restriction that jδi j oai =2, the distribution of ciphertext would be   Prfx ¼ xi g : Pr y ¼ ai xi þ bi þδi ¼ jδi j We have proved it in Section 4 that by this way, we assign a one-to-many function which is xi -y ¼ ai xi þ bi þ δi

with

1 jδi j o ai : 2

In the standpoint of a map from a point to a set, it is also oneto-one function which in another word, is reversible. So after this the dense distribution around xi can be deduced in this way. And the statistical characteristics can be altered. 5.5. OPE cryptosystem Based on the above split function and encrypt function, we will use three algorithms OPE ¼(Setup, Encrypt, Decrypt) to describe our OPE cryptosystem:

 Setup(): This algorithm is run by the OPE client to set up the scheme. It must set the right parameters, including: 1. Message space M. It must set the minimum value xmin and maximal value xmax. 2. Ciphertext space C. It must set the minimum value ymin and maximal value ymax. 3. Parameters for message space split. For our implementation, it must set the dense intervals fT i g and the minimal length of interval dmin. 4. Parameters for ciphertext space split. For our implementation, it must set the corresponding dense intervals fT 0i g. The final key sk will be composed of (xmin, xmax, fT i g, fT 0i g, dmin).

 Encrypt(x,sk): This algorithm is run by the OPE client to encrypt the data x and outputs its OPE ciphertext. To encrypt x, it firstly get i by computing index(x) and then runs Enci(x) to output its ciphertext. We should mention that the δi is randomly generate  which satisfy δi  o 12ai .

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

 Decrypt(y,sk): This algorithm is run by the OPE client to decrypt the data y and outputs its OPE plaintext. In fact, the decryption is done by   y  bi x¼ : ai Before decryption, it should firstly get i of which interval Ci contains y.

7

We have referred it in Section 5 that the noise δi can help to prevent such attack. The map here is in such form Enci ðÞ

x⟶ fai x þ bi þ δi g;

Deci ðÞ

fai x þbi þ δi g ⟶ x;

where jδi j oa=2 and the size of the set jfai x þ bi þ δi gj  a so the distribution will be altered by adding the noise δi. As a result, the probability turns to be   Prfx ¼ xi g Pr y ¼ ai xi þbi þ δi ¼ jδi j

6. Security and maintenance For the above OPE cryptosystem, we will analyze its security against ciphertext-only attack and a particular chosen-plaintext attack which we will define in the following part. In fact, in our model, we just store the ciphertext in the untrusted database so the adversary can only get some ciphertext but nothing else.

So it succeeds in preventing the statistic attack. By the way, to avoid being attacked by statistic analysis, X can be made roughly obeying even distribution as δ. The statistical results here are almost even distributed. And it is also obvious that our model is not less safe than this basic one linear function.

6.1. Ciphertext-only attack

6.2. A particular chosen-plaintext attack

In fact, we can prove that the base of our model EncðxÞ ¼ axþ b þ δ can be safe enough under ciphertext-only attack.

We have mentioned that by the continues arbitrary queries, almost all the OPE cryptosystem is insecure under the common chosen-plaintext attack. So we have turn to a restricted chosenplaintext attack which we call it sparse and random chosenplaintext attack: SR-CPA. The SR-CPA owns the following properties: Sparse: If any adversary want to get the ciphertext of fx1 ; x2 ; …; xk g, but the fx1 ; x2 ; …; xk g is a dense one, he will only get f0; …; Encðxi1 Þ; …; Encðxil Þ; …; 0g, which only contains l ciphers and l{k. Here we set a restriction that after return any ciphertext, the database will not allow any query for the ciphertext close enough for some time. Random: In SR-CPA, the adversary cannot get a sequence whatever he wanted immediately. For instance, if he want to get the ciphertext of fx1 ; x2 ; …; xk g, he cannot get the cipher sequence fEncðx1 Þ; Encðx2 Þ; …; Encðxk Þg at the same time. In another word, he can only get f0; …; Encðxi1 Þ; …; Encðxil Þ; …; 0g which only contains l ciphers and l{k. In fact, these two restrictions are reasonable for the queries in database or cloud server. Obviously, no client will keep on asking for the adjacent ciphertext with a high frequency and a time limit of reply. In this way, we can prevent any intervals being attack by arbitrary queries and leaking the encryption function on it.

Lemma 6.1. Suppose EncðxÞ ¼ ax þ b þδ ðx A XÞ and the expectation of X :E(X) and variance of X:Var(X) are known. And the parameter δ is irrelevant with x whose expectation is EðΔÞ and variance is VarðΔÞ. Then EðEncðXÞÞ ¼ aEncðXÞ þ b þ EðδÞVarðEncðXÞÞ ¼ a2 VarðXÞ þ VarðΔÞ

The proof of this lemma is easily got with some basic knowledge of statistic. 6.1.1. Method by statistic The common ciphertext-only attack is the statistic attack by calculating the frequency of each ciphertext. Under this attack, if the plaintexts M ¼ f1; 2; …; Mg have a totally different distribution, which means Pfx ¼ 1g; Pfx ¼ 2g; …; Pfx ¼ Mg display a sequence of probability that have a obvious order by the value. Then, if we put it in the order like Pfx ¼ σð1Þg 4 Pfx ¼ σð2Þg 4 … 4 Pfx ¼ σðMÞg, where the function σðiÞ is a permutation of f1; 2; …; Mg. We can assert that the ciphertext should maintain this order as Pfy ¼ Encðσð1ÞÞg 4 ⋯ 4 Pfy ¼ EncðσðMÞÞg: Meanwhile, by statistic analysis, we can also get the order of the cipher text by the frequency as Pfy ¼ y1 g; Pfy ¼ y2 g; …; Pfx ¼ yM g. As a result, after a comparison between the two sequence, we can reasonably think yi ¼ EncðσðiÞÞ Then the adversary can get the entire cryptosystem. 6.1.2. Prevention In fact, before encryption, it is easily to prevent the statistic attack by distributing the large probabilities. We can spread the some elements of the sequence. For example, there can be a oneto-many map g : f1; 2; …; Mg-f1; 2; …; M 0 g;

g : i-Si ¼ fki;1 ; ki;2 ; …; ki;nðiÞ g;

where n(i) is the size of the set. From another point of view, g is a one-to-one which maps a number to a set. Any dense distribution can be reduced by this way. And the final sequence can be a series of number with almost equal probability such as even distribution.

6.2.1. Two kinds of attack (a) Whole-range attack: In this situation, we suppose that the adversary get a sequence of ciphertext as fc1 ; c2 ; …; cn g randomly which means no selection but just record them. (b) Exact-height attack: In this situation, the adversary is curious about only a exact range such as ½D; U of the ciphertext and also get a sequence of ciphertext as fc1 ; c2 ; …; cn g. But all of them satisfy ci A ½D; U. It is remarkable that the second attack can be generalized in such two form: one is basic but danger and the other is complex but safe: 1. ( i A f1; 2; …; mg that ½D; U  ½Enci ðli Þ; Enci ðr i Þ. 2. ∄i A f1; 2; …; mg that ½D; U  ½Enci ðli Þ; Enci ðr i Þ, which is 2 ½D; U  [ m i ¼ m1 ½Enci ðli Þ; Enci ðr i Þ. At first, as the restriction on of the attack, any adversary will spend plenty of time to succeed in getting such information he wants. In case of failing by chance to resist the attack, since the updating algorithm we offered in the following part, if the encryption on any intervals leaked, it can be altered in a certain time. And the leaked encryption will also be out of date at the same time.

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

6.3. The updating and maintenance In case of failing to insist the attack, there can be an algorithm to alter or update our cryptosystem. Obviously, the new one should have different parameters to keep the security. Meanwhile, the algorithm should feasible on a large database which is easy for the server. In another word, our cryptosystem should be both mutable and feasible. Here we just offer an idea and algorithm to keep the cryptosystem mutable. 6.3.1. Change ðai ; bi Þ In case of the leak of some parameter pairs as ðai ; bi Þ, we can keep all of them mutable but only alter one pair of them at one time. For instance, if ðai ; bi Þ are revealed, we will do the following step to update the data: (a) Traverse the database, if any y satisfy ai li o y o ai r i , decrypt it and get the corresponding x. 0 (b) Set a new parameter pair: ða0i ; bi Þ-ðai ; bi Þ. 0 0 (c) Replace y by y ¼ ai x þ bi þ δi . At last, we should mention that this is just an algorithm in case of leakage at any intervals. The operation above may lead a high cost of time in computing and updating. So, if it is unnecessary, we do not prefer to keep such an updating scheme as routine maintenance.

7. Implementation and evaluation 7.1. Implementation details Popa's scheme (Popa et al., 2013) proves that any stateful OPE scheme that is IND-OCPA-secure has ciphertext size exponential in the plaintext size, and thus, extending ciphertext space can effectively enhance the security. And for our OPE scheme, there are two ways to implement:

 Preserving data format and generating the numerical OPE



ciphertext. In the practical application, the number range is generally not great, and the database provides some datatypes with large range for numerical data, for example, the range of “real” of SQL server is from  3.40E þ38 to 3.40E þ 38, and it is enough for some OPE applications. By this way, the OPE ciphertext will be the real number, and can be stored in the original field, thus it leads to minimal changes to existing software. Generating the OPE ciphertext with the datatype of string. To use the more large ciphertext space, the OPE ciphertext will be character string, which can preserve the characteristic of order. However, the comparison of string is different from numerical data, to ensure the correctness, the OPE ciphertexts must have the same data length. To represent the big number, we can use the fixed-length hex string to achieve the goal. For the data whose length of hex string is less than the fixed-length len, we will fill ’00’ in the prefix, for example, assume len¼10, for a given integer 15, its hex string will be “000000000E”. By this way, we can use the datatype like “varchar” with the length of len, to represent the range with the maximal value of 16len=2 . This way will be more inefficient than the above.

Figure 4 shows how to use our OPE scheme in the database applications. We can see that the OPE operations including encryption and decryption are only executed in the OPE client. As a result, it is easily to be deployed and realized by any programming language, including java, C þ þ, C#, and so on. The

Fig. 4. OPE usage in database application.

field should be changed to two possible datatypes, i.e., “real” or “varchar”, the former for most of database applications, the later for security consideration. If the “real” is applied, the changing of existing softwares will be reduced.

7.2. Performance evaluation To evaluate OPE scheme's performance, we focus on following two issues: (1) whether the performance of OPE encryption algorithms can meet the needs of batch data encryption and concurrency among multiple users; (2) whether the distribution of original data is protected in the OPE ciphertext.

7.2.1. Concrete split method To implement our OPE model and experiment, we need the concrete split function. We suppose there is only one dense interval T1 in our experiment. So M ¼ R1 þT 1 þ R2 where R1 ; R2 are the remaining intervals. Then we can randomly select two integer r 1 ; r 2 , where R1 ; R2 will be split into equal intervals with the number of r 1 ; r 2 . But for dense interval T 1 ¼ ðl; r, it can be split by (shown in Fig. 5): T 1 ¼ dmin þm [ … [ dmin [ …dmin þ m: Besides, the ciphertext space y A C can be split in the same way as plaintext, and we will ignore the details. By the above split method, the index(x) and range(i) can be confirmed, i.e.,

indexðxÞ ¼

8 x x 0 > ; x r l; > > > t1 > > qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > > > > < Dmin þ m þ ðDmin þ mÞ2 þ4ðx  R1 Þ 2 > > > > l ox rr; > > > x x2 > > > þ r 1 þr 2 ; r ox: : t 3

þ r1 ;

It is obvious to see that the undefined parameters here are the end points value or the total number of corresponding intervals. So the index(x) can be put in such form which is easy to compute. And so is range(i) with the hypothesis of split method above.

7.2.2. Experimental result We set the ciphertext space C as the range of “real”, and implement it using Cþ þ to experiment for average execution time.

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

9

Fig. 5. An example of splitting the message space.

Fig. 6. System model for outsourced database. (a) Execution time of OPE, (b) data distribution of score and (c) data distribution of ciphertexts.

We measured the performance on a machine with an Intel Core (TM) i7-3517U processor running Windows 7. The results are in Fig. 6, and we can see that: 1. Figure 6a shows the average execution time of OPE. We can see that the average execution time is about 0.00025us, so that our OPE scheme has a very high efficiency. 2. Figure 6b shows the data distribution of the original data collection of students’ scores. In Fig. 6b, the scores of students are mainly from 70 to 88. 3. Figure 6c shows the data distribution of the OPE ciphertexts of students’ scores. In Fig. 6c, the data distribution is obviously destroyed.

7.3. Comparison Table 2 shows the comparison between our OPE scheme and other typical OPE schemes including Popa'13 (Popa et al., 2013) and Liu'13 (Liu and Wang, 2013). From Table 2, we can see that: 1. About the efficiency, the Popa's scheme has the lowest performance. In Popa's scheme, the client is required to interact with

Table 2 Comparison between our scheme and other typical OPE schemes Scheme

Efficiency level

Security level

Programmability

Our scheme (Boldyreva et al., 2011) Liu'13 (Liu and Wang, 2013) Popa'13 (Popa et al., 2013)

High

Medium

High

High Low

Low High

High Low

the server when encrypts a value, and the server is also required to adjust the encoding tree to be balance when adds or removes a node. Our scheme and Liu's scheme are constructed by some linear mathematical functions without any interaction, and they can be regarded as have the same efficiency. 2. About the security, the Popa's scheme has the ideal-security but Liu's scheme has the lowest security. Compared with Liu's scheme, our scheme can achieve the security against ciphertext-only attack, in particularly, our scheme uses message space expansion and nonlinear space split to hide data distribution and frequency, and thus it can resist statistic attack.

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

10

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

3. About the programmability, our scheme and Liu's scheme will be better than Popa's scheme. It is easy to implement the linear OPE scheme including our scheme and Liu's scheme, and the implementation of our scheme has been discussed in the previous description. However, in Popa's scheme, except interaction and tree balance operations, user defined functions should be implemented for different databases, which increases the difficulty of implementation. 8. Conclusion Through the summary of proposed OPE schemes, we conduct a conclusion that OPE must hide the order in the ciphertext to achieve high security. However, this approach will result in the database server cannot support the direct order operations. And it will limit the application of OPE scheme. We also find that most of the proposed OPE schemes did not take the statistic characteristics in consideration and further introduce the practical statistic attack. We point out that how to hide the rule of data distribution and data frequency is very important for OPE scheme while supporting direct order comparison. And it is also the goal of OPE scheme. Based on the further research of Liu's scheme (Liu and Wang, 2013), we proposed a new OPE model. With the help of the noise and extended space, we offer several ways to break the statistical characteristics of plaintext to insist the ciphertext-only attack. The security analysis and performance evaluation show that our OPE scheme is both secure and efficient. Our OPE model can be implemented by any programming language, and users can define their split methods and encrypt function. We will further study on how to provide a formal nonlinear encrypt function and a new general and perfect split function. Acknowledgments This work is supported by the National Key Basic Research Program of China (No. 2013CB834204), National Natural Science

Foundation of China (Nos. 61272423 and 61300241), National Natural Science Foundation of Tianjin (Nos. 12JCYBJC10100, 13JCQNJC00300 and 14JCYBJC15300), and Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20120031120036).

References Agrawal R, Srikant R. Privacy-preserving data mining. ACM Sigmod Record 2000;292:439–50. Agrawal R, Kiernan J, Srikant R, Xu Y. Order preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data. ACM; 2004. Agrawal D, El Abbadi A, Emekci F, Metwally A. Database management as a service: challenges and opportunities. Data engineering. In: IEEE 25th international conference on ICDE'09. IEEE; 2009. Boldyreva A, Chenette N, Lee Y, O’Neill N. Order-preserving symmetric encryption. Advances in Cryptology-EUROCRYPT 2009. Berlin, Heidelberg: Springer; 2009. p. 224–41. Boldyreva A, Chenette N, O'Neill A. Order-preserving encryption revisited: improved security analysis and alternative solutions. Advances in CryptologyCRYPTO 2011. Berlin, Heidelberg: Springer; 2011. p. 578–95. Fung B, Wang K, Chen R, Yu PS. Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv CSUR 2010;42(4):14. Kadhem H, Amagasa T, Kitagawa H. MV-OPES: multivalued-order preserving encryption scheme: a novel scheme for encrypting integer value to many different values. IEICE Trans Inf Syst 2010;939:2520–33. Lindell, Y, Pinkas B. Privacy preserving data mining. Advances in CryptologyCRYPTO 2000. Berlin, Heidelberg: Springer; 2000. Liu D, Wang S. Nonlinear order preserving index for encrypted database query in service cloud environments. Concurr Comput Pract Exp 2013;2513:1967–84. Popa RA, Redfield MSC, Zeldovich N, Balakrishnan H. CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the twentythird ACM symposium on operating systems principles. ACM; 2011. Popa RA, Li FH, Zeldovich N. An ideal-security protocol for order-preserving encoding. In: 2013 IEEE symposium on IEEE Security and Privacy (S&P). 2013. Lee S, Park TJ, Lee D, Nam T, Kim S. Chaotic order preserving encryption for efficient and secure queries on databases. IEICE Trans Inf Syst 2009;92(11):2207–17. Vaidya, J, Clifton C. Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2002. Yum DH, Kim DS, Kim JS, Lee PJ, Hong SJ. Order-Preserving encryption for nonuniformly distributed plaintexts. Information security applications. Berlin, Heidelberg: Springer; 2012. p. 84–97.

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journal of Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Suggest Documents