Correctness Verification in Database Outsourcing - Semantic Scholar

Correctness Verification in Database Outsourcing: A Trust-Based Fake Tuples Approach Simin Ghasemi, Morteza Noferesti, Mohammad Ali Hadavi, Sadegh Dorri Nogoorani, and Rasool Jalili Data and Network Security Lab Sharif University of Technology Tehran, Iran {sghasemi@ce., mnoferesti@ce., mhadavi@ce., dorri@ce., jalili@}sharif.edu

Abstract. An important security challenge in database outsourcing scenarios is the correctness verification of query results. The proposed approaches in the literature, impose high overhead on both the service provider and specially the clients. In this paper, we propose the Trust-Based Fake Tuples approach to audit the correctness of query results. In this approach, some fake tuples are included among the real ones in order to verify the correctness of the results. The experience learnt from past results is used in this paper to evaluate the trust toward the service provider. This trust value is used to tune the number of fake tuples and subsequently the imposed overhead. As the trust value toward the service provider increases, the number of fake tuples and imposed overhead decreases. The experimental results confirm the effectiveness of our approach on the reduction of overhead while keeping the probability of incorrect results in its acceptable level.

Keywords: Database Outsourcing, Correctness Verification, Trust, Fake Tuple.

1 Introduction In data outsourcing scenarios, or Database-As-a-Service (DAS) model [1], data and its management are outsourced to an untrusted third party service provider. This scenario confronts new security challenges including confidentiality and correctness verification as the most important ones. An untrusted service provider may access or distribute data, and violate the confidentiality of the sensitive data [2]. Moreover, as database management is handed over to the service provider, it must execute the queries honestly. Hence, there should be a way to examine to what extent the returned tuples satisfy the query condition. Such kind of problem refers to the correctness assurance or correctness verification of query results which includes integrity, completeness, and freshness of the results. Integrity means that the result must be generated solely based on the outsourced data and not to be tampered with. Completeness indicates that all tuples satisfying the query condition are definitely included in the query result. Freshness signifies that the result is generated based on the latest updates on the outsourced data items. This aspect of correctness is

The original publication is available at www.springerlink.com Proc. 8th Int'l. Conf. Information Systems Security, Guwahati, India, Dec. 2012, pp. 343-351 DOI: 10.1007/978-3-642-35130-3_24

particularly important in dynamic environments where data is updated frequently. This paper focuses on the correctness assurance problem especially on the completeness and freshness aspects and proposes a novel way to efficiently verify the correctness of query results. In our setting of the DAS model, the data owner in the role of a Client (C) outsources the data and its management as well as query execution to a Service Provider (SP). The SP has a significant amount of resources and is proficient in building and managing distributed cloud storage servers. We assume that C is the single querier that sends queries to SP and audits the correctness of the returned results. To address the correctness verification issue, a database outsourcing scheme should provide a correctness proof for every possible query that C may execute. C should be equipped with security means so that it can make correctness assurance of the outsourced data. Therefore, SP sends some extra information to C as a Verification Object (VO) along with the query results. The VO will be used by C to audit the correctness of the query results. In a database outsourcing scheme the clientside verification overhead is more crucial than the server-side because the client has limited computation, communication, and storage resources. The existing approaches for correctness verification use methods such as digital signature [3, 4] and Merkle Hash Tree (MHT) [5]. Xie et al. [6, 7] have also proposed a probabilistic approach by adding a number of fake tuples to the outsourced database. In their approach, the fake tuples are generated by a deterministic function. The client verifies the completeness of query results by checking if all the fake tuples satisfying the query conditions are returned by the server. The integrity auditing of query results is guaranteed by encrypting the outsourced data. To verify the freshness of the results, the data owner frequently modifies the outsourced database with some deterministic functions and checks the outsourced database status. If the server modifies the database based on the update queries, the data owner concludes with some probability that the server’s results are fresh. A practical approach of correctness verification must have an acceptable efficiency in operation. However, previous approaches impose high overhead on the DAS components. In this paper, we propose a more efficient probabilistic approach based on the trust concept. In this approach, the correctness ratio of past query results provides some sense of the behavior of SP. Based on this factor, the overhead can be decreased when the past behavior of SP indicates its honesty. We experimentally show that our approach has a higher performance compared to the exiting methods. The remainder of this paper is organized as follows. Our proposed approach is described in Section 2 and its pros and cons are discussed in Section 3. The empirical evaluation is explained in Section 4. Finally, Section 5 concludes the paper.

2 The Proposed Approach Our proposal builds upon [5] in which the correctness of query results is verified by introducing some fake tuples among the real ones. These fake tuples induce storage, bandwidth, and processing overheads on C and SP. The number of fake tuples and consequently these overheads are fixed in [5]. This is because the level of

trust between C and SP does not change. In our proposal, we remove this constraint and dynamically tune the overheads according to the behavior of SP. The Bayesian interpretation of probability (belief) is used in this paper to represent the trust between C and SP. Considering this level of trust, the number of fake tuples in the outsourced database is controlled according to the history of SP's behavior. More specifically, a trust value is calculated by C which reflects the past behavior of SP. Based on this value, the number of fake tuples maintained in the outsource database is determined. Hence, we call our proposal the Trust-Based Fake Tuples (TBFT) approach. TBFT is a history-based light-weight probabilistic correctness verification technique.

Fig. 1. General Scheme of TBFT

The overall scheme of TBFT is illustrated in Fig. 1. There are three main components in our scheme, namely Deterministic Fake Tuple Generation, Updating Trust, and Updating Fake Tuples. The Deterministic Fake Tuple Generation component refers to the initialization setup by C for outsourcing the database to the SP. C creates a number of fake tuples using some deterministic functions and outsources them alongside the real tuples to the untrusted SP. This component is explained in more detail in Section 2.1. While each transaction between C and SP is being processed, the correctness verification process changes the C’s trust value toward SP. In TBFT, trust value increases additively and decreases multiplicatively. The trust parameter and trust update operations are explained in Section 2.2. Increment of the trust value gradually results in decreasing the number of outsourced fake tuples by C. Similarly, reduction of the trust value causes more precise

correctness auditing process through inserting extra fake tuples. These operations are explained in more detail in Section 2.3. We have focused on completeness and freshness aspects of the outsourced database in TBFT. Nevertheless, the confidentiality and integrity aspects are guaranteed by adopting an approach built on top of Order Preserving Encryption Scheme similar to [5]. More specifically, we encrypt each field (ai ) of the table with encryption function (E) and k as a private key (Ek(ai )).

2.2 Deterministic Fake Tuple Generation We use some predefined and deterministic functions to generate fake tuples which are proposed by Xie et al. [5]. These functions map n-1 attributes of a table of n attributes to one attribute (  : 1  2  n1  n ) where is the domain of the i-th attribute. C stores these functions beside some other meta-data such as initial values and also the number of fake tuples generated by each function to audit the correctness of query results. During the verification process, C uses these functions to obtain the number of fake tuples which satisfy the query conditions and compares it with the number of fake tuples in query results. If they are equal, the query results are correct with some probability; otherwise they are definitely wrong. The number of fake tuples is an important parameter at the beginning (when the owner outsources the database), as it imposes the initial overhead on the C and SP sides. This number is proportional to the number of real tuples as depicted in Equation 1, where K is the number of fake tuples, N is the number of real tuples in the outsourced table, and I is a fixed coefficient. The coefficient I (0≤ I ≤1) is applicationdependent and initialized by C. If an application needs more assurance, I is initialized closer to 1.

K  I N

(1)

In order to generate fake tuples according to the distribution of real ones, the whole n-dimension feature space is partitioned into a grid of cells and the ratio of the real tuples in each cell determines the number of fake tuples in that cell [5].

2.3 Updating Trust The trust parameter T is a value in [0,1] indicating our trust toward SP. Initially, it is set to zero because SP is fully untrusted. The trust value increases as transactions are validated and smoothly minimizes the verification overheads. In order to have the verification process even for a fully trusted SP, the trust value is limited to an upper bound called the Trust Threshold ( 0  TT  1 ) which is initialized by C. The trust value is calculated and updated according to the correctness of returned results per query. If the number of fake tuples in the result is the same as the number expected by C, then C increases the trust value and accepts the results with some

probability. Otherwise, C decreases the trust value and rejects the results. In this case, C also uses the mechanisms suggested by a Service Level Agreement between C and SP to penalize SP. The trust value increases additively with the correct execution of queries (Equation 2) and decreases multiplicatively with an incorrect answer (Equation 3) to decrease the risk of sudden changes in the behavior of SP.

Tnew  min(Tcurrent  

Tnew  min(

QK , TT ) QN

(2)

QK Tcurrent , TT ) QN

(3)

In both equations, Tcurrent is the current trust value to SP, QN is the number of tuples in the query result, QK is the number of fake tuples in the query result (QK ≤ QN), α is a coefficient in [0,1] to adjust the speed of trust changes. Fig. 2 depicts an example of trust increment and decrement which demonstrates the speed of trust value increment during 1000 valid transactions and decrement in just two incorrect query results. It indicates that the risk of trust consideration is low. Trust Decrement

Trust Increment

1

1

Alpha=0.02

Alpha=0.1 0.8

Trust Value

Trust Value

0.8

0.6

0.4

0.4

0.2

0.2

0

0.6

0 0

50

100

150

200

1

1.5

Number of Transactions

2

2.5

3

3.5

4

Number of Transactions

Fig. 2. Examples of Trust Variations

2.4 Updating Fake Tuples Variations in the trust value affect the number of fake tuples at the SP side to optimize the overhead. When the trust value increases/decreases, C decreases/increases the number of fake tuples in the outsourced database by deleting/inserting some fake tuples. Equation 4 shows how the new number of fake tuples Knew is calculated.

Knew  I  N  Tcurrent  I  N

(4)

In order to prevent SP from detecting fake tuples, these queries must have the same distribution as the real update queries. In addition, C performs these updates in batch mode when SP is not so busy.

Another advantage of deterministic fake operations is auditing the freshness of query results. More specifically, fake update queries resulting from trust variations change the outsourced fake tuples. Accordingly, an honest SP should produce its results using an up-to-date snapshot of the data.

3 Analysis and Discussion of TBFT Correctness verification in TBFT is more efficient than previous work, due to considering the past behavior of SP. Recommendation systems having numerous transactions between C and SP can be considered as one of the most relevant applications of TBFT. In such systems, there is no need to have an absolutely correct query answer; rather correct answer with some desired probability is acceptable. In other words, we compensate accuracy of the query result set in favor of having lower overhead and higher performance. In the probabilistic approaches including TBFT, SP has a chance to delete some tuples from the result set and not to be spotted which is called the Escape Probability (EP). In TBFT, assuming the existence of K fake tuples, N real tuples, the trust value of Tcurrent, EP (the probability of deleting m tuples and not being detected) can be calculated by Equation 5.

 N  (Tcurrent  N )    m ( N  (Tcurrent  N )  i)  EP   =  N  ( T  N )  K ( N  ( T  N )  K  i )   i0 current current   m   m 1

(5)

Where the maximum of m is N and EP for m ≥ N is zero. This is because deleting more than N tuples inevitably results in some fake tuples being deleted and definitely will be detected by C. The equation is based on the fact that when there is Tcurrent trust on SP, Tcurrent×N tuples are assumed to be safe and cannot be removed without being detected. If the deleted tuples were from the real tuples, C will catch the deletion. The Correctness Probability (CP) of query results is the complement of EP. 0.9

1

Escape Probability

0.8

0.6

0.4

30% Fake Tuple & unTrusted 30% Fake Tuple & 25% Trust 30% Fake Tuple & 50% Trust 30% Fake Tuple & 75% Trust

0.8 0.7

Escape Probability

10% Fake Tuple & unTrusted 10% Fake Tuple & 25% Trust 10% Fake Tuple & 50% Trust 10% Fake Tuple & 75% Trust

0.6 0.5 0.4 0.3 0.2

0.2

0.1 0

0

20

40

60

Number of Deletion

80

100

0

0

20

40

60

Number of Deletion

Fig. 3. The Escape Probability of TBFT

80

100

The diagrams in Fig. 3 show the different values of EP according to Equation 5. We calculated EP for two different fake tuple percentages and trust values to show the remarkable effect of considering trust on the EP. Comparing these results indicates that a little trust value compensates the leakage of fake tuples. For example, the EP curve of 10% fake tuples and 50% trust value is the same as the EP curve of 30% fake tuples and untrusted SP. This means that with the introduction of trust, the overhead decreases significantly. This is the main advantage of our approach. In TBFT, there is a probability that SP behaves honestly at the beginning and misbehaves infrequently later. To prevent this treatment, we suggest that some times the current value of trust randomly decreases to zero. Using this suggestion, SP cannot guess the value of trust anymore.

4 Empirical Evaluation We used a PC with Pentium IV Core i5 processor and 4GB RAM to act as SP and C. The Java language has been used to simulate the method on both sides and MySQL DBMS to store the outsourced tuples. A table for an online shopping site is generated as the test table with four attributes, namely Object_ID, Object_Price, Object_Weight, and Header. The Header attribute is a digest for the tuple which is used to audit integrity of tuples and determines the fake tuples from the real ones. SP executes queries on the hosted database and sends the result to C. Then, C verifies the correctness of the results. In this evaluation, we set I = 0.1. We generated a table with 1,000,000 real tuples and 100,000 fake tuples and used the following query template to evaluate the approach: SELECT * FROM TableName WHERE Price BETWEEN A AND B;

4.1 Comparison with Other Methods In TBFT, the trust value toward SP increases during the verification of transactions between SP and C and so the number of fake tuples decreases proportionally. After executing some queries, the required storage on SP can be decreased especially when the trust value reaches its maximum. This effect has been shown in Fig. 4 (a). Although in database outsourcing scenario the SP’s storage is less important, the decrement in the storage overhead of the SP leads to more performance during the query execution process. The extra storage in TBFT is proportional to the number of fake tuples which itself is inversely proportional to the trust value. In the experiment upon which Fig. 4 (a) is based, we initially populated the experiment table with 1100 thousand tuples. As can be seen in Fig. 4 (a), the required storage on SP decreases smoothly as more queries are executed. By repeating the query with a large amount of results after 500 executions, this has been decreased significantly and the server side storage overhead has reached its minimum value according to TT. Fig. 4 (b) shows the decrement of communication cost for TBFT in comparison with other approaches. In this experiment, we repeated the execution of the query

with 10 thousand tuples in the result set. As the increment of trust value causes the decrease of outsourced fake tuples, the number of fake tuples in the query result decreases. This operation leads to communication cost reduction which is one of the important overhead in correctness verification process. The staircase shape indicates that the trust effect is accomplished gradually. For example, from P1 to P2 the number of outsourced fake tuples is constant and consequently the communication cost is stable, but after applying the new increased trust value, the number of outsourced fake tuples and the communication overhead decrease.

Server Storage (Kilobytes)

2.5

2.4

5

x 10

TBFT Fake Tuples [6] Digital Signature-Based Approach [3] MHT-Based Approach [5]

Communication Cost (Kilobytes)

x 10

2.3

2.2

2.1

2 0

100

200

300

400

500

600

700

800

6

TBFT Fake Tuples [6] Digital Signature-Based Approach [3] MHT-Based Approach [5]

2.35

2.3

2.25

2.2

P1

P2

2.15

2.1

0

100

Number of Query Execution

200

300

400

500

Number of Query Execution

(a)

(b) Fig. 4. Storage and Communication Analysis

Client Computation Cost (Milliseconds)

The most important performance improvement of TBFT is the reduction of client computation. This effect has been shown in Fig. 5. Trust increment results in decrement in the number of outsourced fake tuples and subsequently reduces the computation time in C, as the computation overhead is mostly related to the fake tuple verification. Therefore by reducing the number of fake tuples, the computation time in C decreases. The decrement in computation cost after some queries resulting to a high trust value is remarkable as it is confirmed in Fig. 5. 8000 7000 6000

TBFT With 75% Trust Fake Tuples [6] Digital Signature-Based Approach [3] MHT-Based Approach [5]

5000 4000 3000 2000 1000 0.5

1

1.5

Result Size (Tuples) Fig. 5. Client Computation Analysis

2 x 10

5

5 Conclusion Database outsourcing scenario confronts some security challenges including the correctness assurance of query results. In this paper, we focused on this issue and proposed TBFT (a trust-based approach using fake tuples) to audit the completeness and freshness of the query results returned by the service provider. The approach is probabilistic and imposes acceptable overhead which makes it appropriate for recommendation systems or similar applications in which an absolutely correct result is not as crucial as having light-weight verification overhead. As future work, we plan to extend our approach to adopt the multi-querier model of database outsourcing [8], utilizing extended models of trust management for distributed environments.

References 1. Hacigumus, H., Iyer, B.: Providing Database as a Service. International Conference of Data Engineering. (2002). 2. Samarati, P., Capitani, S.D.: Data Protection in Outsourcing Scenarios: Issues and Directions. ASIACCS ’10 Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security. pp. 1-14 (2010). 3. Narasimha, M., Tsudik, G.: Authentication of Outsourced Databases using Signature Aggregation and Chaining. In Database Systems for Advanced Applications (DASFAA). pp. 420–436 (2006). 4. Noferesti, M., Hadavi, M., Jalili, R.: A Signature-Based Approach of Correctness Assurance in Data Outsourcing Scenarios. Information Systems Security (ICISS 2011). pp. 374-378 (2011). 5. Goodrich, M., Tamassia, R., Triandopoulos, N.: Super-Efficient Verification of Dynamic Outsourced Databases. The Cryptographer’s Track at the RSA Conference on Topics in Cryptology. pp. 407–424 (2008). 6. Xie, M., Wang, H., Yin, J.: Integrity Auditing of Outsourced Data. Conference on Very Large Databases (VLDB) (2007). 7. Xie, M., Wang, H., Yin, J.: Providing Freshness Guarantees for Outsourced Databases. Conference on Extending database Technology (EDBT). (2008). 8. Mykletun, E., Narasimha, M.: Authentication and Integrity in Outsourced Databases. NDSS, Internet Security (2004).