Preserving Database Privacy in Cloud Computing Saed Alrabaee1, Khaled Khateeb2, and Fawaz A. Khasawneh3 1
Computer Security Laboratory, Concordia University, Montreal, Canada
[email protected] 2 Jordan University of Science and Technology, Irbid, Jordan
[email protected] 3 ETS - University of Quebec, Montreal, Canada
[email protected]
Abstract. Due to the rapid advances in the networking technologies and the continued growth of the internet have triggered a new trend of cloud computing towards outsourcing data to the cloud service providers. The ever cheaper and most powerful database as a service (DaaS) computing paradigm that enables organizations to minimize their operational cost in a way that they no longer need to purchase infrastructure and hire human resources. In this paper, we need to present a secure model that provides data security at cloud and anonymization of data in a way that satisfies e-differential privacy. Keywords: SAAS, Cloud Computing, Data Privacy.
1
Introduction
Due to the rapid advances in the networking technologies and the continued growth of the internet have triggered a new trend of cloud computing towards outsourcing data to the cloud service providers. The ever cheaper and most powerful database as a service (DaaS) computing paradigm that enables organizations to minimize their operational cost in a way that they no longer need to purchase infrastructure and hire human resources. By outsourcing the workload to the cloud service provider, organizations could use unlimited computing resources by paying affordable service charges without investing in software, hardware and operational overheads [1]. Despite all these benefits, the major obstacle towards the large adoption of cloud computing paradigm is the data security and privacy concerns. In cloud computing (DaaS), the data owner outsources his private, sensitive data and querying services to the cloud provider, which is basically an untrusted server. Data owner needs protection of his data from cloud and the querying clients. While data gives superb opportunities to querying clients in relation with data mining tasks but opens the privacy issues as well. On the other hand client also seeks privacy of his queries to cloud and data owner. In this paper, we need to present a secure model that provides data security at cloud and anonymization of data in a way that satisfies e-differential privacy [2]. G. Martínez Pérez et al. (Eds.): SNDS 2014, CCIS 420, pp. 485–495, 2014. © Springer-Verlag Berlin Heidelberg 2014
486
1.1
S. Alrabaee, K. Khateeb, and F.A. Khasawneh
Motivations
Existing frameworks that provide the security and privacy in the cloud computing environment are based on two kind of framework. In a trusted server framework, there has to be one additional trusted server placed outside cloud and that works between client and cloud for secrecy. In the other framework, client needs to perform encryption, decryption, distance and other calculation to avoid trusted server. Moreover, existing database-As-a-Service (DaaS) models are unable to support advanced queries such as aggregation while maintaining the secrecy of data simultaneously [3]. Aggregate queries permit one to retrieve succinct information such as counters from such a database, since they can cover many data items while returning a small result. The shared storage area motives our work. For example, various users access the storage for different purposes like the database relates to the hospital, it could be accessed by doctors and researcher as well. Hence, for some types of information and some classes of cloud computing users, privacy and confidentiality rights, onuses, and status may change when a user discloses information to a cloud provider. 1.2
Contributions
Theft of sensitive private data and trade of information in the open market for profit is significant problem. Online database management systems (DBMSs) are a lucrative target for malicious users either for gaining sensitive information or even just for the fun of penetrating organizations network for sensitive data as they often contain huge volume of sensitive information. When individual users or enterprises store their sensitive data in a DBMS today, they must trust that the server hardware and software are uncompromised, that the data center itself is physically protected, and assume that the system and database administrators (DBAs) are trustworthy. But nowadays, more and more organizations both small and large are undertaking internet to capture business and they do not have that budget to setup their own datacenter so they are getting more biased towards the cloud technology for service. But as sensitive data of the organizations will be stored in a third party location and under their full control so it is a matter of concern from the organizations perspective to ensure the proper privacy of their data from illegitimate access. In this paper, to address this privacy concern of database a new framework has been designed and implemented to fit organizations long pending demanding storing corporate database in a third party location in a secure manner and also ensuring secure access control on the data to legitimate users. In this paper, we assumed that the key server on the client end is trusted and the only entrusted part in our setup is the cloud storage server which is hosted in an entrusted third party location. Figure 1, shows the general cloud computing. The rest of this paper is organized in the following manner: Section 2, cloud computing architecture review. Section 3, formally defines the problem. Section 4, elaborates the different modules of our framework. Section 5, Experimental results and section 6. Section 7, conclusion and future directions are presented.
Preserving Database Privacy in Cloud Computing
487
Fig. 1. A general cloud computing
2
Related Work
Over the past few years, researchers have developed cryptographic tools for encrypted database that is stored in cloud storage as well as for searching keywords over encrypted text [1, 2] and some researchers have proposed using these tools to process SQL queries on encrypted data [3, 4]. Basically, the closest works related to our framework are [5, 6]. In [5], they use trusted cloud between the storage cloud and the user. In our framework, we have a trusted key server instead of Trusted Cloud which is located either on the data owner server or on a different server, away from the cloud. It is cheap to have only a server instead of having trusted cloud, more flexibility and no restriction over the space for storing the keys. In the paper a secure channel was used as SSL/TLS. Basically SSL/TLS has strong capabilities offered for secure channel such as handshake protocol, MAC computation, PRF function for master secret and key material. In [6] the researchers use new encryption scheme which is called Onion encryption. Hence, we took the idea of layers and the idea of joining two different tables. We provide a literature review of cloud computing features in Section A followed by a description of using cloud as storage in Section B. 2.1
Cloud Computing Architecture
Cloud computing, which dynamically provides reliable services over the Internet, is one of the most emerging technologies in current world. Recently, many academic and industrial organizations have started investigating and developing technologies and infrastructure for cloud computing. There presentative cloud platforms include Amazon Elastic Compute Cloud (EC2), Google App Engine, and Microsoft Live Mesh. Mainly 3 (three) types of services we can get from cloud service provider as SaaS, PaaS and IaaS.
488
S. Alrabaee, K. Khateeb, and F.A. Khasawneh
Fig. 2. Storage Cloud Computing Architecture
2.2
SaaS in Cloud Computing
Cloud computing involves highly available massive compute and storage platforms offering a wide range of services. One of the most popular and basic cloud computing services is storage-as-a-service (SAAS). It provides companies with affordable storage, professional maintenance and adjustable space. On one hand, due to abovementioned benefits, companies are excited by the public debut of SAAS. On the other hand, companies are reticent about adopting SAAS. One of the major concerns is the privacy as the cloud service is generally provided by a third party.
3
Proposed Approach
In our framework, we tried to address the following challenges regarding how to preserve the privacy of the database on the cloud: • Challenge 1- how to protect outsourced data from theft by hackers or malware infiltrating the cloud server? • Challenge 2 - how to protect outsourced data from abuse by the cloud server? • Challenge 3 - how to realize content-level fine grained access control for users The general idea of our proposed method, depicted in Fig. 3, can be summarized in five phases: (1) Preliminary Stage: implementation of the cloud using UEC (Ubuntu Enterprise Cloud) in VMware (2) Key server (trusted): is deployed for users’ access control and legitimate access over cloud database. At the end of second stage the connectivity between the user and the key server has also been encrypted using secure SSL/TLS. (3) Encryption of the whole database based on dynamic key generation from the key server. (4) Migration of organizations database to cloud. (5) Access control mechanism for user accessing data stored in cloud.
Preserving Database Privacy in Cloud Computing
489
Fig. 3. Proposed cloud setup
3.1
Cloud Setup
To implement a cloud computing environment, we first need to define what type of services our cloud will provide and what resources we need to support those services. In our case, we consider that an HR department of an organization wants to store a database on a cloud. This database consists of several tables, among which there are tables with more sensitive information. In practice, large databases require certain resource capacity and well trained personal to manage it. Having on mind this, our HR department would need a storage location which can provide all of the above requirements. Therefore, our cloud should provide storage services. These services will be provided to the users of the HR department in a remotely accessible fashion. Several deployment models exist: Public cloud, Community cloud, Hybrid cloud and Private cloud. For the purposes of our framework, we simulate a Public cloud which is considered as an untrusted storage, managed by a cloud provider. To set up our cloud we need two separate machines -one will be the cloud controller and the other will be the node controller. One of these physical machines will have the cloud controller installed as a virtual machine on VMware and also will accommodate the user host. The other physical machine is used only for the node controller and it has CPU Virtualization enabled which is a requirement to have a successful setup. The cloud controller consists of storage controller, cluster controller, Walrus and the cloud controller itself. The cloud controller will provide the front end to the entire cloud infrastructure. The node controller is used to manage the instances
490
S. Alrabaee, K. Khateeb, and F.A. Khasawneh
that can be run on the node. For the installation of the cloud and the node controller we are using the Ubuntu Enterprise Cloud (UEC), which is free software and includes Eucalyptus which is a cloud platform. First, we need to install the cloud controller using our UEC software. Because of hardware limitations, we use VMware to install the cloud controller as a virtual machine. During the installation, some important steps are to give a name of our cluster, to give a range of IP addresses which later will be given to each instance that tries to connect to the cloud controller. The next step is to connect the machine that has the cloud controller with the machine that will be used to install the node controller. For simplicity in this paper, we use a crossover cable to connect physically the two machines and we configure them to be in the same subnet. Then we can start installing the node controller while the two machines are connected. This will ensure that the installation of the node will automatically detect the existing cloud controller and will associate with it. During this installation, the node gets registered with the cloud and public SSH keys are being exchanged, as well as, the services are configured and published. Having set up the main components of the cloud, we need to do several more configurations to provide full operability. The next step is to make sure each user of the cloud can obtain credentials from the cloud. There is an admin user who can manage all the user accounts. The credentials need to be downloaded from the cloud which can be reached through a web browser from the node or from any other location in the network through the URL: //cloud-ipaddress:8443/. After that, certain images are proposed that can be installed on the cloud. These images are stored on the Walrus controller and are used to create instances on the cloud. We can install one of them, which is enough for this simulation, so we have installed the image Ubuntu 9.10 Karmic Koala. Later on, the instances can be managed by different tools such as: Elastic fox or Hybrid Fox, and command line euca2ools. But before running any instances, we need to make a key pair that is used to login as a root. Store the client’s database. It can be seen that the cloud provider can have full access to the data that the client wants to store on the cloud. So the data confidentiality and integrity can be compromised. Having set up this, we can now start our instance which is basically a virtual machine run on the node. On this instance we install a Linux-ApacheMySQL-PHP (LAMP) server which will be used to protect this data, the client has to enforce encryption of the database before it to be stored on the cloud. This leads to complications for providing queried information to the users, but we have achieved to implement a way which is both functional and secure. 3.2
Encryption Scheme
Basically the cloud storage is considered a very active part as storage area where it provides a good space for storage but the main concern is the privacy issue. The Encryption ensures the data stored is confidential and its privacy is preserved from the untrusted cloud or even from any attacker who tries to steal some useful data. The encryption in our framework is achieved by using different layers of encryption. The algorithm for the encryption is AES and to strengthen the security titles of the columns are hashed by MD5.
Preserving Database Privacy in Cloud Computing
491
Fig. 4. Encrypted Table in Cloud
1. Advanced Encryption Standard (AES): It is symmetric-key encryption adopted by National Institute of Standards and Technology (NIST). Our choice for AES is based on the following characteristics: it is a simple design, a high speed algorithm, with low memory costs. Due to the fact the key security feature is the size of the key, we choose AES 128 as key length. Encryption Layers: In our framework, we have two layers of encryption - the outset layer for both tables were encrypted with the same key but the inner columns - with different keys as shown on Fig. 4. In the above figure, even if the intruder has the key Kath at means the intruder can access the employee id column only because the other columns are encrypted by either Kc or Kb. So by using layers, we provide more confidentiality. However, if two columns indifferent tables are to be joined, they need to be encrypted with the same key. Hence, Ka is used for such purposes. 2. Encrypted Query using MD5: MD5 is cryptographic hash function that produces a 128bit hash value. Due to the fact the query privacy has become an important issue in the past few years because of confidentiality problems with the client queries, we decided to use the MD5 to enforce security of the queries. The clients, in some cases, require the secrecy for his query. To achieve this, we used MD5 to hash the values of the queries. Table 1. Hashing Table
492
S. Alrabaee, K. Khateeb, and F.A. Khasawneh
The above table represents the MD5 value for some titles so in cases where the client or user needs to search for first name - let’s say “Ali”- so the trusted key server transfers the first name to MD5 hash value and then sends the query through the network in the following manner: • mysql> SELECT * FROM Table1 Where20db0bfeecd8fe60533206a2b5e9891a =’6d1baa8615bb02a4c779949127b612d4. Where ‘20db0bfeecd8fe60533206a2b5e9891a‘ is an encryption of the first name and‘6d1baa8615bb02a4c779949127b612d4‘ - is the following: • Ali is encrypted first by Kb so the value will be ‘7a9b46ab6d983a85dd4 d9a1aa64a3945‘ and then this value is encrypted by Ka so the value will be‘6d1baa8615bb02a4c779949127b612d4‘. Moreover, if an intruder captures the clients’ both the query and the result, he will never get any knowledge by combining both because the intruder cannot understand the hashed values so he will not know what the client is looking for. 3. Key Management Having solved the problem of what scheme is going to be used to encrypt the database, another challenge arises which is finding a way of managing the keys used to encrypt and decrypt the database. This important issue has to solve the problem of what keys will be provided to what users. To implement this, first we need to distinguish the different types of users which is achieved based on the access control mechanism. The other step is to provide a trusted storage for the keys. For this purpose, we have a key server which is located in a totally different place than the cloud. We have implemented the Key server as a normal Ubuntu Server as a virtual machine on VMware. The key server is the connection between the user and the cloud. Whatever the user wants to search in the database, the query will first pass through the Key server and then transferred to the cloud. The Key server is the point where the query is encrypted and where the encrypted data is brought from the cloud and decrypted for the user. 1. Key Generation: To provide strong security, one of the main parts is to ensure the keys used for encryption and decryption are stored on a safe place and are managed in a way that is hard for any adversary to capture and make use of them. Therefore, we have decided that our keys will be stored on the key server and generated dynamically. This means, that in a certain period of time, the keys are being refreshed. So this ensures that even if an encryption key was captured, it cannot be used after that period. Specifically, we use Symmetric keys, a public key that is the same for all of the departments and a private key that is different for each department. The public key is used to encrypt or decrypt the whole database. But then, the private key is used to encrypt and decrypt specific fields of the tables from the database. The relation between the department and the keys is stored on the key server in a table which maps the department with the assigned keys. This way, the key distribution happens completely transparent to the users – the users do not hold the keys and do not need to know the keys.
Preserving Database Privacy in Cloud Computing
493
2. Process Flow: The data owner wants to store the data in an encrypted format on the cloud. And he wants to make sure the keys for encryption are stored on a separate location, which in our case is the key server. First, an operator user inserts data for the employees − the operator browses the URL: https://keyserver/CloudDBPrivacy/index.html which is stored on the Key server (the connection is secured via SSL/TLS) − enters his credentials − inserts the data and submits it The data gets encrypted using the keys assigned to the operator and it is sent for storage to the cloud by the key server. Then an HR user wants to check who receives a certain amount of salary: − − − − −
HR logs in with his credentials The Key server verifies the credentials HR queries the salaries The Key server encrypts the query and sends it to the cloud The cloud provides the queried data in an encrypted format to the key server. The salary is stored on a different table than the general information, so the cloud provides only the queried data from the specific table. − The Key server uses the assigned keys to the HR in order to decrypt the data and provides it to the HR’s interface.
4
Experimental Results
The following diagram outlines the network topology used for the implementation.
Fig. 5. Cloud network topology
Having simulated several cases, we have achieved the desired results.
494
S. Alrabaee, K. Khateeb, and F.A. Khasawneh
Fig. 6. Encrypted record of the database
Fig. 7. Decrypted record of the database
In this paper, we addressed the problems as follows with solutions: • To protect the outsourced data from theft by hackers we implemented encryption methodology so that without the proper secret key no adversary can decode the database. But it may have some performance degradation. • Secondly, to protect the outsourced data from abuse by the cloud server we encrypt the database with secret keys before outsourcing the database to the cloud so that even the cloud administrator will not be able to guess the content of the stored database. Also the SQL queries from the client end are encrypted so that eavesdropping on the query itself will not be helpful to get the idea about the database. • To address the content level fine grained access control we implemented another trusted key server which will create and store secret keys based on the user roles and will provide these keys to user on the fly while accessing the database in the cloud. Even users will not have any idea about the keys.
5
Conclusion
In this paper, we presented a new privacy preserving scheme in cloud storage environment, which meets the needs of the current industry. Preserving privacy of mission critical sensitive data of the enterprise world is of great importance. There are still some aspects that can be improved in our design. One weakness of our scheme is that there is no mechanism for checking the integrity of the stored data in the cloud. For now we just concentrated on the privacy of the data stored in a cloud, so the protection of data integrity is not carefully considered. We will try to mitigate the above integrity problem in our future work.
Preserving Database Privacy in Cloud Computing
495
References [1] Hu, H., Xu, J., Ren, C., Choi, B.: Processing Private Queries over Untrusted Data Cloud through Privacy Homomorphism. In: Proc. of the 27th IEEE International Conference on Data Engineering (ICDE 2011), Hannover, Germany (2011) [2] Dwork, C.: Differential privacy. In: ICALP (2006) [3] Thompson, B., Haber, S., Horne, W.G., Sander, T., Yao, D.: Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases. In: Goldberg, I., Atallah, M.J. (eds.) PETS 2009. LNCS, vol. 5672, pp. 185–201. Springer, Heidelberg (2009) [4] Chang, Y.-C., Mitzenmacher, M.: Privacy preserving keyword searches on remote encrypted data. In: Ioannidis, J., Keromytis, A.D., Yung, M. (eds.) ACNS 2005. LNCS, vol. 3531, pp. 442–455. Springer, Heidelberg (2005) [5] Bugiel, S., Sadeghi, A.R., Schneider, T., Nürnberger, S.: Twin Clouds: An Architecture for Secure Cloud Computing (Extended Abstract). In: Workshop on Cryptography and Security in Clouds (CSC 2011) (March 2011) [6] Popa, R.A., Zeldovich, N., Balakrishnan, H.: CryptDB: A Practical Encrypted Relational DBMS. Technical Report MIT-CSAIL-TR- 011-005. Computer Science and Artificial Intelligence Laboratory, Cambridge (2011)