Queries over Encrypted Data in Relational Databases. Shaukat Ali .... It preserves the data privacy and confidentiality. III. .... R. L. Rivest and L. Adleman, â On Data Banks and Privacy. Homomorphisms,â Computer and Security, pp. 169-178 ...
An Efficient Algorithm for Range and Fuzzy Match Queries over Encrypted Data in Relational Databases Shaukat Ali
Azhar Rauf*
Huma Javed
Zahoor Jan Shahid Waheed*
Muhammad Haleem
Department of Computer Science, University of Peshawar, N.W.F.P, Pakistan {shaukat191, humajaved15, zahoor_jan2003, mu_haleem}@yahoo.com * {azhar.rauf, khasori}@upesh.edu.pk
Abstract-Data is the central asset of today’s dynamically operating organizations and their businesses. This data is usually stored in databases. An important issue for IT professionals is to secure such data from unauthorized access and intruders. For protecting business centric data, many levels of security are used. Among these levels, data encryption is the final layer of security. Although encryption makes it difficult to breach this level of security but it has a potential disadvantage of performance degradation, particularly for those queries which require operations on encrypted data. The work of this paper tries to fill this gap. It allows users to query over the encrypted column directly without decrypting it. This improves the performance of the system. In this technique the system will retrieve only those records that fulfill the user’s search criteria and the data will be decrypted on the fly. The proposed algorithm works well in the case of range and fuzzy match queries. From initial results the algorithm has shown greater efficiency than the existing techniques in data retrieval. It also has 100% accuracy for the range and fuzzy match queries.
I. INTRODUCTION Data is one of the most prominent assets of organizations and it is important to keep this data secure for the efficient growth of an organization. Every organization’s operative data is stored in databases. It is continues challenge for IT professionals and companies to develop and implement strategies to keep this data away from unauthorized access. Techniques are used to secure database that includes the use of Identification/ Authentication [1], Views [2], Kerberos Models [3], Mandatory Access Control (MAC) [4], Discretionary Access Control (DAC) [5] and data encryption [6]. Traditional security mechanisms cannot provide adequate security as compared to encryption mechanism for securing data. A number of security layers are provided to secure data in the relational databases. However, encryption is the final layer of security in the databases. Even if an intruder bypasses all the security layers and successfully penetrates into the database, still he/she will have to work hard for breaking the encryption applied to sensitive data. It means that the hacker will get the data in the unreadable form [8].
ISBN: 978-1-902560-22-9 © 2009 PGNet
Data encryption is a strong option for the security of data in databases and especially in those organizations in which the security risks are high. However, encryption degrades the system performance significantly because the Structural Query Language (SQL) queries cannot be executed directly on the encrypted data. In any case, first the encrypted data needs to be decrypt and then SQL query can be operated on it. This whole process requires a lot of processing time. Some techniques have been proposed to leverage this problem of performance degradation. However, these techniques are somehow limited in their applicability. For example, hashing technique does not decrypt the content of the entire encrypted column; rather it decrypts data on the fly by using hash values of the searchable criteria [9], [10] However, this technique is limited to full text matches and does not work for range queries and/or fuzzy match queries. The proposed research work presents a solution which improves not only the performance of searching but also quickly retrieves data as compared to the existing techniques. The proposed technique works with all types of queries including range queries and fuzzy match queries. It can also work with all types of encryption algorithms. The proposed algorithm is capable to retrieve only that records which fulfills the search criteria, which will improve the performance of system and data confidentiality. The rest of the paper is structured as that section II describes related work. In section III, the actual research problem and hypothesis are discussed. Section IV is focused on proposed security technique and section V describes how to search in the encrypted column. Section VI is about the architecture of the proposed security system and section VII presents the algorithm. Section VIII gives the statistical results of the algorithm after testing and section IX concludes the work done.
II. RELATED WORK Over a period of time the security was considered to be an extra problem in the databases as attacks on the security and integrity of the data had been occurred [7]. To add some extra and powerful security to data, you can use the encryption mechanism of the DBMSs although encryption provides security but it has some pros and cons. Adding additional security layers to protect data against the threats but adding these extra layers require addition expertise and reduces the system performance [12]. Database security is hot area for research [6], [7]. Encryption reduces the system performance significantly [10], [11], [13], [14]. Reference [10] proposes a hashing technique which can execute fast over encrypted data. In this technique they are using hash values as well as a number called “confuse number” which can distinguish two similar values in different records. Cipher index method is used to improve the performance and keeps the data confidentiality in the un-trusted servers [15]. In cipher index method, the hash values are used for the searching of data on the un-trusted servers without decrypting it on the server side. This method extracts the match records to the hash values and decrypts it on the client side. Reference [16] uses index method for searching which could search the range queries as well but this technique is useful only for the numeric data and cannot fit over the character data. Reference [9] proposes a model for querying over encrypted data. In this model they are using separate chipper index for character and numeric data. Reference [11] a technique is proposed which can execute directly over encrypted data using a mapping function for the translation of queries from client side to server side. It preserves the data privacy and confidentiality. III. RESEARCH PROBLEM AND HYPOTHESIS The search process in an encrypted column of a table is performed by decrypting the entire column before the data retrieval which takes a significant amount of time and reduces the performance of SELECT query in relational databases. Although there are some methods which could resolve the problem of performance like hashing method but this method is not suitable for the range and fuzzy match queries. Following is our hypothesis of research: Storing the encrypted column in decrypted form along with encrypted key column in a different table increases the performance of the data retrieval process. By using a separate
table for searching on encrypted data resolves the problem of range and fuzzy match queries. IV. PROPOSED SECURITY TECHNIQUE The proposed technique suggests two tables for a single main table to introduce the security. The first one named Actual_Table (Table 4.1) contains the actual data and the second one named Search_Table (Table 4.2) containing only that data on which the search query runs. The Actual_Table is basically the main table of the database with the only difference that it has its sensitive column in encrypted form. The sensitive columns in the main table are encrypted using strong encryption algorithm. A copy of the sensitive data column along with the key column is taken in the “Search_Table”. In the Search_Table, the data column copied form the Actual_Table is kept in the unencrypted form and the key column in the encrypted form. The order of the records in the Search_Table will not be as that in the Actual_Table. The rows of the Search_Table will be reorder randomly. The encryption of the sensitive data column in the Actual_Table and the encryption of the key column in the Search_Table hide the relationship between the Actual_Table and Search_table. The Search_Table is stored in the Secure_Schema. The Secure_Schema is that schema to which only those users are allowed who are also authorized to access the encrypted data. Extra security is added by introducing some noise to those records of the Search_Table from which the intruder can make some inferences. In the proposed technique, the actual security is introduced by hiding the relationship between the Actual_Table and Search_Table. Also, the Secure_Schema makes the Search_Table more secure. Similarly, in order to deceive the intruders, noise has been added to Search_Table. In addition to deceive automated schema generation tools, different column headings are used for the columns of Search_Table than that of the Actual_ Table. TABLE 4.1: ACTUAL_TABLE
Key
Emp_Name
Salary
Job Title
1
Ikram
Encrypted
Manager
2
Umar
Encrypted
Assist manger
3
Shahid
Encrypted
N/A admin
…
…
…..
…
TABLE 4.2 SEARCH_TABLE
ABC (Key column of the Actual_Table)
XYZ (Salary column of Actual_Table)
Encrypted
12000
Encrypted
10000
Encrypted
9000
……
………
V. SEARCH METHODOLOGY Whenever an authorized user wants to search some records form the Actual_Table and search condition is on the encrypted column, so the search will be performed on the Search_Table. The search query returns keys to Actual_Table based on the search condition, and then based on that keys the records will be returned to user. This technique will return only those records satisfying the user query and no extra record will be returned. This result to improve performance and data confidentiality. Whenever a query is posed on the encrypted data column, the proposed algorithm performs decryption at two points: first decryption is in the Search_Table to decrypt Keys and second decryption is in Actual_Table to decrypt actual column values. Although, it appears that this will result in more performance degradation than previous techniques. However, the experimental findings show that it has a tremendous performance gain over the existing techniques of querying over encrypted data. It is due to the fact that the proposed approach does not need to decrypt all the values of entire encrypted column; rather it decrypts only those values which satisfy the user query. The proposed technique is very efficient whenever the amount of data retrieval is less than 40% of the total data. In the typical environment less amount of data is retrieved in the search query, so the proposed technique is appropriate for typical environment. The searching mechanism of the algorithm can be described well by the following example. EXAMPLE Reference to Table 4.1 Suppose a user poses the following query over the Actual_Table. SELECT Emp_Name, Salary FROM Actual_Table WHERE Salary = 12000 The algorithm interprets this query and transform as following: SELECT Emp_Name, DecryptFunction (Salary) FROM Actual_Table WHERE Key IN (SELECT DecryptFunction (ABC) FROM Search_Table WHERE XYZ =12000
Here in this query, the user wants to retrieve data of those records whose salary is 12000. The proposed algorithm performs searching on the Search_Table, as the encrypted column is selected in WHERE clause of the query. The inner query performs the searching in the Search_Table for the keys of those particular records which satisfies the user’s search criteria. After that, the keys are returned in the WHERE clause of the outer query. Now, the outer query uses those keys to retrieve exactly the records of the user interest. Here, the decryption function is called twice, once for the decryptions of the keys in the inner query from Search_Table and the other for the decryptions of the actual values in the outer query form the Actual_Table. VI. ARCHITECTURE OF THE PROPOSED SYSTEM The architecture for the proposed security model, shown in figure 6.1, consists of three main parts. These are user, Actual_Table and Search_Table which is stored in the “Secure Schema”.
Figure 6.1: Architecture of the Proposed Model
C1: Checks the condition that whether the query is on encrypted column? C2: Checks the user validity to the secure
schema.
C3: Is/are any record(s) found? The user poses the query for searching. The query is checked for its encrypted nature. If it is unencrypted then data is retrieved from Actual_Table otherwise user is authenticated to the “Secure Schema” and retrieval of data is performed indirectly from Actual_Table viva Search_Table. VII. ALGORITHMIC OUTLINES The formal outlines of the proposed technique are given in the form of an algorithm shown bellow:
The security model works using the following algorithmic steps:
The proposed algorithm has been tested on a database of Lady Reading Hospital (LRH) Peshawar, Pakistan. A table of blood donors having 59299 records was taken for testing purposes. Out of these records, data with 2% difference was retrieved up to 44% of the total data. The testing results are very satisfactory whenever the percentage of data retrieval is less than 40% of the total. However in the typical environment there is less data retrieval so the proposed technique is more appropriate for the typical environment.
2 1.5 j 1 0.5 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 Percentage of retrieved data
Figure 8.1: Analysis Results of the Algorithm Here in figure 8.1, two curves have been shown. The upper curve, in the figure, depicts the time of decryption of the total contents of entire column and its searching. While the second curve, shown bellow in the figure, represents the decryption of only those values which satisfies the user’s search criteria. The proposed technique decrypts two values for a single record retrieval. In the first decryption key value of the Search_Table is decrypted and in the second the actual value of the Actual_Table is decrypted. As per working rules of the algorithm, both the curves should intersect at 50% of the total retrieval of the data but they intersect at 44%. The remaining 6% degradation of the results is due to the extra overhead of joining in between the tables: Search_Table and Actual_Table.
Figure 7.1: Outlines of the Algorithm
VIII. TESTING AND RESULTS
Proposed Technique Existing Technique
2.5
Time in seconds
1. [User Query] User poses query 2. [Check the Searching Column] If(searching column is not encrypted) Goto step 3 Else if (Authorized User) Goto step 4 Else Goto step 5 3. [Retrieval of Data] Retrieve data from Actual_Table Goto step 5 4. [Passing Control to the Secure Schema] [Check for query match] If (no query match) then i. Display (“Search is Unsuccessful”) ii. Goto step 5 Else [Retrieval of Encrypted Keys] i. Retrieve the corresponding encrypted key(s) ii. Decrypt the key(s) iii. Retrieve the data from Actual_Table based on key(s) 5. Exit
Comparative Analysis of Querying over Encrypted Column
IX. CONCLUSION This work proposes efficient algorithm for searching over encrypted data. The proposed algorithm efficiently eliminates the limitations of the existing techniques for fuzzy match and range queries. The algorithm is efficient for searching of data whenever the retrieval of data is less than 40% of the total data. REFERENCES [1]
P.B. Ambhore, B.B. Meshram and V.B. Waghmare, “A implementation of object oriented database security,” Fifth International Conference on Software Engineering Research, Management and Applications, pp359 – 365, August. 2007.
[2]
E. Bertino, C. Bettini, E. Ferrari, P. Samarati, “Supporting periodic authorization and temporal reasoning in data base access control”, Proceeding of the 22nd Intl. Conf. On very Large Data bases, Bombay (India), pp. 472-483, September 3-6, 1996.
[3]
J. Kohl and C. Neuman. “The Kerberos Network Authentication Service V5”, IETF RFC 1510, September 1993.
[4]
E. Bertino, P. Samrati, S. Jajodia, “An extended authorization mode,” IEEE Trans. On knowledge and data Engineering, pp. 85-101, January/February 1997.
[5]
J.A. Solworth and R.H. Sloan. “A layered design of discretionary access controls with decidable properties,” In Proc. IEEE Symp. Security and Privacy, pp. 56–67, 2004.
[6]:
D.E. Denning,, “Cryptography and Data Security,” Addison-Wesley Publishing Company, Inc., ISBN: 0-201-10150-5, 1982.
[7]
R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, “Hippocratic Databases,” Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002, 2002.
[8]
D. Kiely, “Protect Sensitive Data Using Encryption in SQL Server 2005,” SQL Server Technical Article, 2006.
[9]
Z. Wang, W. Wang. and B. Shi, “Storage and Query over Encrypted Character and Numerical Data in Database,” Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology, pp.77-81, 2005
[10]
Y. Zhang, W.X..Li. and X.M. NIU, “A Secure Cipher Index Over Encrypted Character Data In Database,” Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, pp. 1111-1115, July 2008
[11]
H. Hacigümüs, B.R.I., C. Li and S. Mehrotra, “Executing SQL over Encrypted Data in the Database-Service-Provider Model” ACM SIGMOD Madison, Wisconsin, USA, pp. 216-227, June 2002.
[12]
U.T. Mattsson and C. Protegrity “A Practical Implementation of Transparent Encryption and Separation of Duties in Enterprise Databases,” Proceedings of the Seventh IEEE International Conference on E-Commerce Technology, pp. 559-565, 2005.
[13]
R. L. Rivest and L. Adleman, “ On Data Banks and Privacy Homomorphisms,” Computer and Security, pp. 169-178, 1993.
[14]
G. Feng and Z. Danfeng., “A Cryptograph Index Technology Based on Wrong Hit Expectation,” international conference on electronic computer technology, 2009, pp.301-305, 20-22 Feb. 2009
[15]
D. X. Song, D. Wagner and A. Perring, “Practical techniques for searches on encrypted data,” IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 2000, pp. 44-55, 2000.
[16]
J. Li and E.R. Omiecinski, “Efficiency and Security Trade-Off in Supporting Range Queries on Encrypted Databases,” Technical Report, pp. 69-83,2005.