2009 Fourth International Conference on Internet and Web Applications and Services
A Novel Framework for Database Security based on Mixed Cryptography
1
Hasan Kadhem1 , Toshiyuki Amagasa1,2 , Hiroyuki Kitagawa1,2 Department of Computer Science, Graduate School of Systems and Information Engineering 2 Center for Computational Sciences, University of Tsukuba 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan
[email protected],{amagasa, kitagawa}@cs.tsukuba.ac.jp Abstract
sensitive data (www.searchsecurity.techtarget.com/news). The true cost of such security breaches is manifold in itself. As mentioned by Cavusoglu et al [2], security lapses can lead to the loss of consumer confidence and trust, over and above lost business and exposure to third-party liability. Ensuring a suitable level of protection to database content affects the overall security model. Traditional techniques rely on access control, user authentication, intrusion detection and policies on how data is used to prevent such thefts and intrusion. Unfortunately, existing techniques cannot ensure that a database is fully immune to intrusion and unauthorized access and these mechanisms are ineffective against most inside attacks. Encryption is a well-studied technique to protect sensitive data so that when a database is compromised by an intruder, data remains protected even when a database is successfully attacked or stolen [5]. Even though encrypting the data provides important protection, there are implementation decisions that affect the encryption process. Who will encrypt data? Where will the data encryption be done? How is the data transferred? How are the encryptions keys managed and protected? The answers to these questions affect data confidentiality, privacy and integrity. We can see that there are three approaches to database servers where encryption takes place: first, the trusted database server where the creator, or owner, of the data operates a database server, which processes queries and signs the results; second, the untrusted server where the owner’s database is stored at the service provider (Database As a Service). The third and final model we call the semi-trusted server where the database is shared between many parties. Here, part of the data is stored as trusted while other parts are considered untrusted. In traditional client server based encryption, the data is encrypted using either a server key in a trusted database or a client key in an untrusted database. Simply, if the data is encrypted using server key(s) and the administrator has the authority to use this key(s), then the whole system becomes vulnerable. The Database Administrator (DBA) or
Database security has become a vital issue in modern Web applications. Critical business data in databases is an evident target for attack. Therefore, ensuring the confidentiality, privacy and integrity of data is a major issue for the security of database systems. Recent high profile data thefts have shown that perimeter defenses are insufficient to secure sensitive data. This paper studies security of the databases shared between many parties from a cryptographic perspective. We propose Mixed Cryptography Database (MCDB), a novel framework to encrypt databases over untrusted networks in a mixed form using many keys owned by different parties. The encryption process is based on a new data classification according to the data owner. The proposed framework is very useful in strengthening the protection of sensitive data even if the database server is attacked at multiple points from the inside or outside.
1 Introduction The World Wide Web has experienced massive growth in recent years. Individuals, businesses and governments have intensively used web applications that can provide effective, efficient and reliable solutions to the challenges of communicating, managing and directing commerce in the current century. However, these web-based applications have numerous entry points that can put databases at risk. Recently, the number of reported data breaches involving sensitive private information at governmental, organizational and company levels has grown at an alarming rate. In some extreme cases, sensitive information belonging to millions of individuals has been revealed. For example, in May 2008, researchers at security vendors uncovered a server containing the sensitive email and Web-based data of thousands of people, including healthcare information, credit card numbers and business personnel documents and other 978-0-7695-3613-2/09 $25.00 © 2009 IEEE DOI 10.1109/ICIW.2009.31
163
Authorized licensed use limited to: Sichuan University. Downloaded on July 3, 2009 at 07:32 from IEEE Xplore. Restrictions apply.
Internet Client
Trusted Third Party
rity based data classification. In our framework, data is classified based on data ownership and other criteria. Formally, each relation Ri over a database schema S in a plaintext database, RiE is the encrypted relation where the encryption is performed at the attribute level based on data classification using encryption function E k (k is the key owned by a party). Also, we propose the Query Management System (QMS) to manage query requests executed over a mixed encrypted database and their query results. MCDB keeps the confidential data secure from (internal or external) intruders even if one key used in the encryption process has been compromised. Beyond that, data transmitted over an untrusted network is secured by more than one layer of encryption using different keys. This paper discusses a hospital database as an illustrative example based on our framework. A malicious intruder trying to compromise patient records will be able to collect large amounts of data easily if these records are available electronically as plaintext. And if this information is encrypted using only server keys, it is an easy target for inside attack. Therefore, protection of a patient’s privacy is a basic requirement for ethical and legal use of information technology in health information systems; protecting confidential information related to hospitals is also crucial. In this scenario, many hospitals databases could be stored on one or many servers. Also, it is possible to have other related databases such as health insurance databases. The client makes query requests to many hospital servers, coordinated by a trusted third party.
Internet DB Server 1 Server n
Figure 1. A semi-trusted database system. any other employee who has access to the key(s) can access all data in plaintext. On the other hand, if each client encrypts his data using his key, then many problems might appear, such as how can organizations use the data? In both cases, the most important issue is the loss of the key used in the encryption process. A lost key means all the data is lost. Recognizing the importance of encryption techniques, several database vendors offer an integrated solution that provides encryption functionality in their product. Those functions are used mostly in trusted servers. The weakness of this approach is that a user who has access to both the key table and the data table, and who can derive the key transformation algorithm, can break the encryption scheme. The goal of this paper is to outline a cryptographic framework that supports the design of semi-trusted databases and provides protection for sensitive data even if the database server is attacked at multiple points by an inside or outside attacker.
1.1
System Model
We view a semi-trusted database system as shown in Figure 1. This system consists of three sides: client, Trusted Third Party (TTP) and server. The server side contains the encrypted databases and is responsible for replying to query requests on the encrypted databases. The client side makes query requests to retrieve, update, insert and delete data. The role of the trusted third party is to organize query requests and replies between clients and server(s). This system could be used in many applications over the Web such as e-government, e-commerce and e-banking, where databases hold critical and sensitive information transferred over untrusted networks like the Internet. In egovernment services, for example, the client might make query requests to many servers in different ministries and organizations. In this case, ensuring an appropriate level of protection for database content and for data transmitted over an untrusted network is, therefore, a fundamental part of any comprehensive security program. Using data encryption, we propose a novel framework to ensure database security in the semi-trusted server scenario where the data in databases is shared by many parties. In this scenario, the data is encrypted in a mixed form using many keys owned by different parties. Our solution, called Mixed Cryptography Database (MCDB), is based on a new data classification that differs from traditional secu-
1.2
Organization of Paper
This paper is organized as follows: Section 2 discusses related work. Section 3 addresses the problem of confidentiality, privacy and integrity, and then characterizes the basic attacks that can be conducted against semi-trusted databases. Section 4 discusses the proposed data classification and encryption process. Section 5 addresses query management issues. Section 6 analyzes security of data storage and data transmission. Finally, Section 7 concludes the paper and sketches future research directions.
2 Related Work Much previous work is reported in literature that deals with database cryptography [5, 13, 9, 3, 12]. Previous relational database cryptography research studied the encryption and decryption process of fields within records using the Chinese Reminder Theorem (CRT) [5], symmetric encryption [9] or asymmetric encryption [3]. Other studies go farther theoretically to show how to integrate modern cryptography techniques into a Relational Database Management System (RDBMS) [13, 12]. 164
Authorized licensed use limited to: Sichuan University. Downloaded on July 3, 2009 at 07:32 from IEEE Xplore. Restrictions apply.
Davida et al [5] studied a cryptography system on the server side based on the Chinese Remainder Theorem (CRT). The system has sub-keys that allow the encryption and decryption of fields within records. Using symmetric encryption, Ge and Zdonik [9] propose a database encryption schema called FCE for column stores in data warehouses with a trusted server. Recent years have seen extensive database cryptography research [1, 14, 16, 6, 4] conducted in the area of Database As a Service (DAS). The idea behind this research is to encrypt data so that it becomes accessible only to the client, because the data belongs to the client and is stored on an untrusted server. Bouganim and Pucheral [1] propose chipped secured data access solution C-SDA, which enforces data confidentiality and controls personal privileges with a client based security component acting as a mediator between the client and an encrypted database. This component is embedded in a smartcard to prevent tampering. Another approach proposed by Mykletun and Tsudik [14] introduces a server coprocessor (SC), which is hardware with a processor, secure memory, an input device, a backup battery and a tamper proof container. An SC is installed on the server and supports cryptography operations. The client can communicate with the SC via server over a secure channel. As discussed earlier, the database encryption operation in previous database cryptography research is done either by server or client key(s) using different algorithms. This paper introduces a new cryptography database framework that can deal with multi-tiered environments and can be used over the web. The new framework supports databases that are shared between many parties over untrusted networks by encrypting databases in a mixed form, so not only the client or server encrypts the databases.
This section starts by introducing three security principles. It then illustrates the classification of attacks that are commonly directed against databases on the web.
3.1
Security Principles
3.1.1 Data Confidentiality Data confidentiality is a major concern in database systems. There are many definitions of confidentiality in different organizations. According to ISO/IEC [11], to protect and preserve the confidentiality of information means to ensure that it is not made available or disclosed to unauthorized entities. Hammer and Schneider [10] defined confidentiality as the concept of: 1) ensuring that information is accessible for reading, listening, recording or physical removal only to subjects entitled to it, and 2) that subjects only read or listen to information to the extent permitted. Generally, data confidentiality is the protection of private information from surveillance or leaks when it is stored, or is transmitted across vulnerable networks such as the Internet. Actually, the confidentiality of data varies, depending on organization type and data classification. As a consequence, a particular piece of data could be confidential in one organization, but may not be confidential in another. In semi-trusted databases, each party needs to ensure confidentiality of its data even from other parties who share the same database. 3.1.2 Data Privacy Society is experiencing exponential growth in the amount and variety of information collected on individuals. It is not surprising that organizations collect more personal-specific information than ever before, and often do so without any particular purpose [15]. From a user’s perspective, privacy is the right to secure sensitive personal data in digital form to protect it against fraud, identity theft or unauthorized use. Data privacy is the prevention of confidential or personal information from being viewed by parties and the control over its collection, use and distribution. Privacy differs from confidentiality in the sense that the data to be protected is personal. A special case of data confidentiality is data privacy [1]. The importance of data privacy is to make a trusted relationship between the client and company or organization that collects and stores their personal information. In our hospital database example, privacy means to protect the patient’s personal information from being viewed or changed by unauthorized parties.
3 The Problem The increasing amount of sensitive data, especially personal data collected in databases, makes data confidentiality, privacy and integrity primary issues vital for governments, organizations and companies, and raises the risk of data theft. Using server-based encryption or client-based encryption is not sufficient to encrypt semi-trusted databases where data is shared among many parties. Here, the problem is to ensure that confidentiality, privacy and integrity are achieved for a semi-trusted database where data is exchanged over an untrusted network such as the Internet. Consequently, we must protect databases from unauthorized disclosures and unauthorized modifications that may occur from inside and outside attacks. In our example, a hospital database, security of highly sensitive data must be ensured.
3.1.3 Data Integrity Data integrity is the assurance that unauthorized parties are prevented from changing data. Integrity includes the protec165
Authorized licensed use limited to: Sichuan University. Downloaded on July 3, 2009 at 07:32 from IEEE Xplore. Restrictions apply.
tion of information during storage, transition, manipulation and data backup. Maintaining data integrity is essential to the privacy, security and reliability of data.
3.2
that data against theft and inappropriate use. Traditional security-based data classification is the conscious decision to assign a level of sensitivity to data as it is being created, amended, enhanced, stored or transmitted. We introduce new data classifications according to the data owner. Large databases contain many relations, and to ensure confidentiality, privacy and integrity, our data classification in the three tiered environment relies on who owns the data. Each side is responsible for encrypting the data owned. According to the framework, we classify data into three groups.
Class of Attackers
Before discussing the principles of the proposed Mixed Cryptography Database (MCDB), we first identify two basic classes of attackers that can break data confidentiality, privacy and integrity in semi-trusted databases: • Outside attackers: An outside attacker is a person who may access the database directly or indirectly with the objective of achieving personal gain or causing harm to data. This is the commonly accepted ecracker attack from outside trying to commit fraud or interruption.
• Client data: data that belongs to the client. The client is responsible for encryping this data to ensure privacy. This data directly reveals identity, such as name. • Trusted third party data: data used to join many databases on one or many servers such as ID number. This data also includes data that may be used by servers without client permission. This data could indirectly reveal identity.
• Inside attackers: We classify inside attackers into two subclasses. Inside attackers in the first class are individuals with privileged access to the database; those who may misuse that position to create false transactions or interfere with legitimate transactions. This is the commonly accepted insider attack from other parts of the database server side not directly connected with the database. The attackers in the second class are individuals with privileged access to, or management responsibility for, the database, who may misuse those rights to interfere with or exploit the database. This is the generally accepted insider attack from operational staff responsible for the database system. An example is DBA use of privileged capabilities to alter records, create false accounting trails or create phantom users.
• Server data: data that belongs to an organization or company, which is responsible for collecting and maintaining it. This data could be used in statistical operations. The organization may apply traditional data classification on that data, so that some data will be encrypted and other data will be in plaintext. The data classification of our hospital database example is shown in Figure 2. We chose an example of health information to show how our framework will satisfy data confidentiality, privacy and integrity over important and essential information sectors. While health information has come into use by many organizations and individuals, there is a need to have regulatory protections for this highly sensitive and deeply personal information. We applied our data classification method to the hospital database. The data is classified into three classes: patient data, hospital data and trusted third party data. In addition, there is unencrypted data that is considered as insensitive, such as date.
4 MCDB Baselines The data exchange scenario comprising the setting for this paper is shown in Figure 1. It may include many database servers and many clients with an intermediate party. The database servers host primary data, and the intermediate party joins and coordinates between servers and clients. Database servers host what is owned by several parties, such as clients and intermediate parties, i.e, they contain data related to clients, servers and intermediate parties. The intermediate party could be a trusted third party or a public organization that coordinates many servers. Before discussing the encryption operation in our framework, we first illustrate data classification based on data owner. We then propose a mixed cryptography database based on that classification of data.
4.1
4.2
Database Encryption
In most databases used on the Web, data is stored in tables in the form they are loaded, mostly in plaintext, which does not satisfy high level security and privacy requirements. Several approaches exist to implement the encrypted operation in the database, as described in the related work section. Our proposed system, whose basic architecture is shown in Figure 1, adds encryption/decryption layers accumulatively while data moves from/to the database. The purpose of such design is to implement encrypted storage satisfying data confidentiality, privacy and integrity. In addition, it makes data usable for the server side, and makes it
Data Classification
Data classification is the process of placing data categories that will reflect the security controls to protect 166
Authorized licensed use limited to: Sichuan University. Downloaded on July 3, 2009 at 07:32 from IEEE Xplore. Restrictions apply.
p_id 1234 2345 3456 p_id 1234 1234 2345
p_fname aa cc ee
p_lname bb bb ff
d_id pd_date d_id 1111 1/1/2008 1111 2222 2/2/2008 2222 1111 3/3/2008 3333 patient_drug (a) Unencrypted Database
patient city Tokyo Tokyo Tsukuba d_name Adol Baclofen Vicodin drug
yhaf 27632 35435 68945
Hospital data Trusted Third Party data Patient data Server data (unencrypted)
yhaf 27632 27632 35435
gvsfq uidfpa kfndfj iperuo
asdui piruop hfguyk mvnsvb
iuwiw 149611 277927 149611
wdsfvc ousjn rwercsa rwercsa uerjbkj
pd_date iuwiw 1/1/2008 149611 2/2/2008 277927 3/3/2008 379945 rthgdff (b) Encrypted Database
yuhjskl ouijhere bvccrtw axzvuew olkmjh
Figure 2. Data classification of a hospital database. possible for the database to be joined to other databases by the coordinator (trusted third party). The encryption operation in our system is based on data classification as described in the previous section. Such classification is needed to draw a full map of the encryption process. Following the convention, E denotes the encryption function, and D denotes the decryption function. The general approach to encryption is the field level; that is, sensitive fields need to be encrypted to protect the information from inside or outside attack. Data is encrypted using a symmetric encryption algorithm. After the data classification process, each side encrypts the data it owns. Definition 1: For each database schema S(R1 , ..., Rn ) , where R1 to Rn are relations created on the database server, Ri name is encrypted using the server secret key SES . Definition 2: For each relation schema R(F1 , ..., Fl , Fl+1 , ..., Fm , Fm+1 , ..., Fn ) in a relational database, where Fi (1 ≤ i < l) field is classified as trusted third party data, Fj (l ≤ j < m) field is classified as client data, and Fk (m ≤ k < n) field is classified as server data, we store an encrypted relation: C C S RE = (FlT , ..., FlT , Fl+1 , ..., Fm , Fm+1 , ..., FnS ) where, FiT (1 ≤ i < l) in the encrypted relation RE stores the encrypted value of Fi in relation R using trusted third party secret key SET , viz. FiT = E SET (Fi ), FjC (l ≤ j < m) in the encrypted relation RE stores the encrypted value of Fj in relation R using trusted client secret key SEC , viz. FjC = E SEC (Fj ), FkS (m ≤ k < n) in the encrypted relation RE stores the encrypted value of Fk in relation R using server secret key SES , viz. FkS = E SES (Fk ). In a real database, the fields may be in different order, but we used this order to simplify the notation. Field names are encrypted as follows: Fi and Fj names are encrypted using trusted third party secret key SET . Fk name is encrypted using server secret key SES . With many clients, it is impossible to encrypt field names of client data using a client secret key. Therefore, field names of all data are classified as client data encrypted using the same key, which is the TTP secret key. Definition 3: Metadata rules updated in client, trusted third party and server whenever a change happens in the
relation schema R on the server side. In the hospital example, the database is encrypted based on the data classification proposed in this paper. Figure 2(a) shows an unencrypted hospital database and Figure 2(b) shows the encrypted database. After the data classification process, each party encrypts existing unencrypted data using his key simply by calling the SQL update statement. We assume that a patient encrypts his personal information using a secret key stored on a secure smart card. The patient may allow other specialists to view his health information. In this case, the patients smart card could be used by other users to retrieve or update data integrated with user access control and authorization rules. For example, doctors may have the authority to read and update specific records while a nurse does not have that authority. In addition, the hospital wants to keep its information secure and ensure confidentiality, so it encrypts data using the server secret key. While a trusted third party coordinates between many hospital servers and other related servers, the data used to join many databases on one or many servers is encrypted using the trusted third party secret key.
5 Query Processing To ensure data confidentiality, privacy and integrity, we should take care of two security issues: secure data storage and secure data transmission. We already discussed the first issue in database encryption. The second issue is related to client query requests and database server replies. The secure transmission of data is well studied and well supported in today’s e-business. Most web browsers and web servers support SSL (Secure Socket Layer) [8] or TLS (Transport Layer Security) [7]. In the proposed system, data is transferred via network using one of the secure transport protocols. In addition, transmitted data is already encrypted using either symmetric encryption or asymmetric encryption. This section proposes a query processing system that addresses the secure data transmission issue. In our framework, the client might send a query request (Q) to many databases servers. The trusted third party coordinates the query requests and distributes them (Q1, ..., Qn) to the responsible servers, then combines the query results 167
Authorized licensed use limited to: Sichuan University. Downloaded on July 3, 2009 at 07:32 from IEEE Xplore. Restrictions apply.
Encryption (Data transport protocol) Decryption (Data transport protocol) RA QM A patient.p_id, patient.p_fname, SELECT patient.p_lname, drug.d_name, drug.pd_date FROM patient,drug,patient_drug WHERE patient.p_id=patient_drug.p_id and drug.d_id=tient_drug.d_id and patient.p_id=1234 1
Plain Query
2
Request Result SELECT patient.yhaf, patient.gvsfq, patient. asdui, drug.d_name, drug.pd_date FROM patient,drug,patient_drug WHERE patient.yhaf =patient_drug.yhaf and drug.d_id=tient_drug.d_id and patient.yhaf =27632 SELECT wdsfvc.yhaf, wdsfvc.gvsfq, wdsfvc. asdui, olkmjh.yuhjskl, olkmjh.pd_date FROM wdsfvc, olkmjh,rthgdff WHERE wdsfvc.yhaf = rthgdff.yhaf and olkmjh.iuwiw = rthgdff.iuwiw and wdsfvc.yhaf =27632
Query Rewriter Metadata Query Result
5
Query Rewriter
Trusted Third Party
3
Metadata 6
Query Rewriter Metadata
Query Result
Server 4
Client
Execute
Encrypted Database
Query Result
p_id p_fnamep_lname d_name pd_date p_id p_fnamep_lname vbxdrw sdgweq yhaf gvsfq asdui d_name pd_date 1234 aa bb Figure Adol 1/1/20 1234 uidfpa piruop ersadaaand 232134a scenario. 27632 uidfpa piruop Adol 1/1/20 3. MCDB architecture 1234 aa bb Baclofen 2/2/20 1234 uidfpa piruop wqaioyr 566231 27632 uidfpa piruop Baclofen 2/2/20
Relation patient patient drug patient patient
before submitting the final result to the client. This paper discusses an example of a query request over one server, which is the fundamental component of query processing in the MCDB framework. Two components deal with queries in our framework. The first is Query Management Agent (QMA), which deals with query requests; the second is Result Analysis (RA), which deals with query results. The two components use metadata to perform the query process.
Owner TTP TTP client client
prepares the query request to be executed over the encrypted database on the server side. In the hospital database example, the query we discuss is a client who wants to retrieve all drugs taken by a patient. This query over the hospital database includes patient id, name and the drug names. Figure 3 shows execution of this query over mixed encrypted databases. The query request follows the sequence of algorithms in Figure 5. At the end of the SQL query rewriting process, the translated SQL statement is executed over the encrypted database and returns the result to a temporary query result in server result analyzer RA. Figure 4 shows the query executed on the server side.SES is the server secret key and SET is the trusted third party secret key.
Figure 4. Query executed on server side.
Metadata
Metadata contains the minimum data required to rewrite and analyze the query. Each party has different metadata according to the data classification. For example, a client needs data only on fields he owns while the trusted third party needs data on fields owned by the client or TTP. The data consists of field name, type, owner and table name. Table 1 shows an example of the TTP metadata used to perform algorithms in QMA and RA. That data is also used by TTP to encrypt existing data and schema.
5.2
Type Big integer Big integer text text
Table 1. Trusted Third Party (TTP) metadata in the hospital example.
SELECT ESEs(patient).ESEt(p_id),ESEs(patient).ESEt(p_fname),ESEs (patient).ESEt(p_lname),ESEs(drug).ESEs(d_name),ESEs(drug).pd_date FROM ESEs(patient), ESEs(drug), ESEs(patientdrug) WHERE ESEs(patient).ESEt(p_id)=ESEs(patientdrug).ESEt(p_id) and ESEs (drug).ESEs(d_id)=ESEs(patientdrug).ESEs(d_id) and ESEs(patient).ESEt(p_id) = ESEt (1234)
5.1
Field p id p id p fname p lname
5.3
Result Analyzer (RA)
The second component in the query management system is the result analyzer (RA). The objective of RA is to decrypt the query result to make it readable for the client. In addition, RA uses client public key (P UC ) to add an additional asymmetric encryption layer to the result before it is transmitted to other parties; in this way only the client can decrypt it using the client private key (P RC ). The query result follows the sequence of algorithms in Figure 6.
Query Management Agent (QMA)
Figure 3 shows the query management system in MCDB. The first component in the query management system is the Query Management Agent (QMA). The main objective of QMA is to encrypt, decrypt and rewrite the query request. The structure of QMA is similar for the client, the trusted third party and the server, but the procedure differs. QMA 168
Authorized licensed use limited to: Sichuan University. Downloaded on July 3, 2009 at 07:32 from IEEE Xplore. Restrictions apply.
// 1. Client rewrites the query Input: plaintext query (Qp) Output: client query (Qc) Begin For each field (Fi) owned by client in (Qp) Do If value (Vi) assigned to (Fi) then replace (Vi) with ESEc(Vi) End if End for End // 2. TTP rewrites the client query Input: client query (Qc) Output: TTP query (Qt) Begin For each field (Fi) owned by TTP in (Qc) Do If value (Vi) assigned to (Fi) then replace (Vi) with ESEt(Vi) End if End for For each field (Fi) owned by client in (Qc) or TTP Do replace (Fi) with ESEt(Fi) End for End // 3. Server rewrites the TTP query Input: TTP query (Qt) Output: server query (Qs) Begin For each relation (Ri) in (Qt) Do replace (Ri) with ESEs(Ri) End for For each field (Fi) owned by server in (Qt) Do If value (Vi) assigned to (Fi) then replace (Vi) with ESEs(Vi) End if End for For each field (Fi) owned by server in (Qt) TTP Do replace (Fi) with ESEs(Fi) End for End
// 4. Server analyzes the query results Input: dataset from encrypted database (DSs) Output: server dataset (DSt) Begin For each encrypted field (Fi) owned by server in (DS) Do decrypt field name and data Fi=DSEs(Fi) encrypt field name and data Fi=EPUc(Fi) End for End // 5. TTP analyzes the query results Input: dataset from server (DSs) Output: TTP dataset (DSt) Begin For each encrypted field (Fi) owned by TTP in (DSs) Do decrypt field name and data Fi=DSEt(Fi) encrypt field name and data Fi=EPUc(Fi) End for For each encrypted field (Fi) owned by client in (DSs) Do decrypt field name Fi=DSEt(Fi) encrypt field name Fi=EPUc(Fi) End for End // 6. Client analyzes the query results Input: dataset from TTP (DSt) Output: client dataset (DSc) (plaintext) Begin For each encrypted field (Fi) in (DSt) Do decrypt field name Fi=DPRc(Fi) End for For each field (Fi) owned by client in (DSt) Do decrypt field name and data Fi=DSEc(Fi) End for For each encrypted field (Fi) not owned by client in (DSt) Do decrypt field data Fi=DPRc(Fi) End for // fields owned by server or TTP End
Figure 6. Result analyser algorithms.
Figure 5. Query rewriter algorithms.
ent encryption algorithms. So, even if an inside or outside attacker gets one key, it is difficult to decrypt all data in the database. Generally, the security measure depends on the encryption algorithm and the key size. Let us analyze a brute force attack on our framework and compare it with traditional server encryption approaches. Case 1: Assuming the encryption algorithm is known, in the server encryption approach, the probability that an outside attacker gets the encryption key using brute force attack is 21S , where S is the key size. On the other hand, in a mixed cryptography database, the probability is much less, because of the three different keys needed to decrypt 1 1 1 1 information; the probability is 2SS , where 2ST N 2SC 3 CN SS is the size of the server key, ST is the size of the trusted third party key, SC is the size of the client key, N is the number of clients in the relation, and CN is the number of columns in the relation. Case 2: In the server encryption approach, assume that an inside attacker gets the server key, the probability that this attacker decrypts data will be 1. But in the mixed 1 1 cryptography database, the probability will be 2ST , N 2SC where ST is the size of the trusted third party key , N is the number of clients in the relation, and SC is the size of the client key.
6 The Security Measure The objective of this paper is to propose a general database cryptography framework, without explaining the fixed symmetric or asymmetric encryption algorithms used in that framework. The encryption algorithms affect the performance of query processing and security analysis. We will investigate these issues in detail in future work. Our framework does not rely on a specific symmetric encryption algorithm; any symmetric algorithm with the appropriate key size can be used. Also, it is possible to apply different encryption algorithms on different sides. For example, the client could use algorithm A, the trusted third party could apply algorithm B, and the server could do it with algorithm C. This provides a flexible and more secure encrypting environment. This section analyzes the security of data storage and data transmission that deals with basic attack classes, comprising inside and outside attacks, without technical details related to the encryption techniques used.
6.1
Secure Data Storage
In the encrypted relational schema, values of sensitive fields based on data classification are stored in an encrypted form. We believe that data is safe as long as the keys are secure. The database is encrypted using many keys owned by different parties, and different parties might use differ-
6.2
Secure Data Transmission
Secure data transmission is performed by the secure transport protocol. In addition, personal information is 169
Authorized licensed use limited to: Sichuan University. Downloaded on July 3, 2009 at 07:32 from IEEE Xplore. Restrictions apply.
transmitted with either asymmetric or symmetric encryption using the client private or secret key. That means two layers of encryption are used. Still, the ciphertext could be modified by inside or outside attackers without knowledge of the keys, resulting in meaningful modification of the data when it is decrypted by the recipient. We can add a hash function or digital signatures to ensure integrity. Case 1: When an intruder tries to decrypt a query result transmitted over an untrusted network, the probability will 1 be 2SP1RC 2SC , where SP RC is the size of the client private key, and SC is the size of the client secret key. This means an intruder should first decrypt the result using the client asymmetric algorithm key, then encrypt the result using the client symmetric algorithm key. This procedure provides privacy for a client’s personal information that is transmitted over an untrusted network to ensure data integrity.
[4]
[5]
[6]
[7]
7 Conclusion and Future Work
[8]
The number of web applications involving sensitive private information at the governmental, organizational and company levels is growing rapidly. Preserving data confidentiality, privacy and integrity in the semi-trusted database context, where the database is shared between many parties, is becoming one of the most challenging issues for the database community. This paper addresses this issue and contributes the following. First, it introduces new data classification based on who owns the data. Second, it proposes a mixed cryptography database based on data classification methods. Third, it illustrates a query management system over a mixed encryption database. Finally, it analyzes the security of data storage and data transmission in the new cryptography framework. There are other important research issues related to our framework: first, the best encryption algorithm used in the mixed cryptography database on performance and security perspectives; second, access control methods used to control access for all parities using the database; and finally indexing and joining between different databases. We will discuss these in future papers.
[9]
[10]
[11] [12]
[13]
[14]
[15]
References
[16]
[1] L. Bouganim and P. Pucheral. Chip-secured data access: confidential data on untrusted servers. In VLDB ’02: Proceedings of the 28th international conference on Very Large Data Bases, pages 131–142. VLDB Endowment, 2002. [2] H. Cavusoglu, B. Mishra, and S. Raghunathan. The effect of Internet security breach announcements on market value: Capital market reactions for breached firms and internet security developers. Int. J. Electron. Commerce, 9(1):70–104, 2004. [3] C.-C. Chang and C.-W. Chan. A database record encryption scheme using the RSA public key cryptosystem and its
master keys. In ICCNMC ’03: Proceedings of the 2003 International Conference on Computer Networks and Mobile Computing, page 345, Washington, DC, USA, 2003. IEEE Computer Society. E. Damiani, S. D. C. Vimercati, S. Jajodia, S. Paraboschi, and P. Samarati. Balancing confidentiality and efficiency in untrusted relational DBMSs. In CCS ’03: Proceedings of the 10th ACM conference on Computer and communications security, pages 93–102, New York, NY, USA, 2003. ACM. G. I. Davida, D. L. Wells, and J. B. Kam. A database encryption system with subkeys. ACM Trans. Database Syst., 6(2):312–328, 1981. S. D. C. di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. A data outsourcing architecture combining cryptography and access control. In CSAW ’07: Proceedings of the 2007 ACM workshop on Computer security architecture, pages 63–69, New York, NY, USA, 2007. ACM. T. Dierks and E. Rescorla. The TLS protocol version 1.2, 2006. A. O. Freier, P. Karlton, and P. C. Kocher. The SSL protocol version 3.02, 1996. T. Ge and S. Zdonik. Fast, secure encryption for indexing in a column-oriented DBMS. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 676–685, 2007. J. H. Hammer and G. Schneider. On the definition and policies of confidentiality. In IAS ’07: Proceedings of the Third International Symposium on Information Assurance and Security, pages 337–342, Washington, DC, USA, 2007. IEEE Computer Society. ISO/IEC. Information security management standard. 2005. B. Iyer, S. Mehrotra, E. Mykletun, G. Tsudik, , and Y. Wu. A framework for efficient storage security in RDBMS. In EDBT, 2004. H. Jingmin and M. Wang. Cryptography and relational database management systems. In IDEAS ’01: Proceedings of the International Database Engineering & Applications Symposium, pages 273–284, Washington, DC, USA, 2001. IEEE Computer Society. E. Mykletun and G. Tsudik. Incorporating a secure coprocessor in the Database-as-a-Service model. In IWIA ’05: Proceedings of the Innovative Architecture on Future Generation High-Performance Processors and Systems, pages 38–44, Washington, DC, USA, 2005. IEEE Computer Society. L. Sweeney. Information explosion. in confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies. 2001. W. K. Wong, D. W. Cheung, E. Hung, B. Kao, and N. Mamoulis. Security in outsourcing of association rule mining. In VLDB ’07: Proceedings of the 33rd international conference on Very large data bases, pages 111–122. VLDB Endowment, 2007.
170
Authorized licensed use limited to: Sichuan University. Downloaded on July 3, 2009 at 07:32 from IEEE Xplore. Restrictions apply.