Application Layer Encryption for Cloud

4 downloads 763 Views 954KB Size Report
mal level of security and efficiency for cloud databases [2]. However ... most cloud databases do not use ALE and consequently, ..... For instance, Oracle's TDE.
Application Layer Encryption for Cloud Amitabh Saxena

Vikrant Kaulgud

Vibhu Sharma

Accenture Technology Labs Bangalore, India Email: {amitabh.saxena, vikrant.kaulgud, vibhu.sharma}@accenture.com

Abstract—As we move to the next generation of networks such as Internet of Things (IoT), the amount of data generated and stored on the cloud is going to increase by several orders of magnitude. Traditionally, storage or middleware layer encryption has been used for protecting data at rest. However, such mechanisms are not suitable for cloud databases. More sophisticated methods include user-layer-encryption (ULE) (where the encryption is performed at the end-user’s browser) and application-layer-encryption (ALE) (where the encryption is done within the web-app). In this paper, we study security and functionality aspects of cloud encryption and present an ALE framework for Java called JADE that is designed to protect data in the event of a server compromise.

I. Introduction The biggest concern in a web application is that of the server getting hacked. Often the real damage is done by gaining access to the database. Even read-only access can compromise privacy of customers or cause financial loss [1] Further, if the attacker gets write access to the database, damage can be even greater since he could corrupt records which could go undetected for years. Hence it is not surprising that the biggest concern in hosting an application on cloud is how do I protect my data at rest? The suggested solutions range from Storage-Layer Encryption (SLE) at one extreme (where encryption is done by the filesystem) to User-Layer Encryption (ULE) at the other (where encryption is performed in the user’s webbrowser [2], [3]). Other solutions such as Database-Layer Encryption (DLE) (where the DBMS encrypts data before writing to the filesystem), Middleware-Layer Encryption (MLE) (where a middleware sitting between the web-app and DBMS does the encryption), and Application-Layer Encryption (ALE) (where the web-app itself is responsible for encryption) fall in between. We claim that of these methods, ALE provides the optimal level of security and efficiency for cloud databases [2]. However, implementing ALE is a non-trivial task because in additional to significant knowledge of cryptography, the developer needs to consider factors such as data compliance, performance/functionality tradeoffs, security risks and key management strategies. Due to these limitations, most cloud databases do not use ALE and consequently, incidents of data compromise are commonplace. Our contribution: We analyze cloud database security with respect to three tunable parameters: Functionality

(F), Security (S), and Encryption type (E). The functionality refers to the type of queries that can be issued to the underlying DBMS. The security we desire is measured in terms of integrity, confidentiality and unlinkability with respect to various types of attackers. The encryption type defines the type and strength of the underlying cryptography. We describe a classification system using these parameters that can be used by designers to fine-tune the security, functionality and compliance requirements of their data. We then describe an ALE framework for Java called JADE (Java Application Data Encryption) that takes this classification and applies the optimal encryption in a transparent manner. Finally, we describe how we can secure the ALE keys in the event of server compromise. II. Overview In this section we summarize the various problems and solutions for cloud database security. Let us consider confidentiality, which is achieved using encryption. A. Data Encryption Layers There are typically five distinct places for cloud encryption [2], shown in Figure 1. The classification is based on the depth in the software stack where encryption is done.

Fig. 1. Cloud encryption methods

1) User-layer encryption (ULE) : Encryption happens at the top of the stack – at the user-layer – inside the browser or at a web-proxy. If used correctly, ULE provides the highest security but lowest functionality, since the application never has access to plaintexts. To increase functionality, the security

2)

3)

4)

5)

is often weakened (such as the use order-preserving encryption [4]). ULE is also very restrictive in terms of the ciphertext format and database queries and is primarily suited for SaaS applications. Application-layer encryption (ALE): Encryption is done inside the application, below the user-layer. Unlike ULE, in ALE, the application has unrestricted access to plaintexts. Consequently, the functionality provided by ALE is much higher than in ULE. Since the application is aware of the data security and compliance requirements, ALE can deliver highly targeted protection only when necessary, thereby ensuring optimal performance. Additionally, ULE can be used on top of ALE for enhanced security. ALE is most suited for PaaS clouds. Middleware-layer encryption (MLE): Encryption is done in between the application and DBMS. This could be implemented using predefined stored procedures, views and triggers in the DBMS itself or via middleware. Some DBMS functionality (such as cross-references) can be lost depending on the encryption being used. Database-layer encryption (DLE): In this, data is encrypted by the DBMS before being written to disk. The encryption is performed at cell-level granularity. In other words, each time a page is loaded from disk, all encrypted values in that page are decrypted (each one separately), and each time a page is stored to disk, all sensitive values in that page are encrypted (again, each one separately). Storage-layer encryption (SLE): In this layer, pages are encrypted/decrypted by the operating system when they are written/read from disk. This layer has the advantage of being totally transparent, thus avoiding any changes to the DBMS and to existing applications. However, selective encryption is not possible, nor is per-user access control [5].

B. Application Layer Encryption In many ways the application is the obvious place for encryption because it knows exactly which data is sensitive and can apply selective protection. Contrast this with encryption at other layers, where data is typically encrypted on an all-or-nothing basis because those systems have no knowledge of data. With ALE, access can be controlled in a fine-grained manner and data protection mandates (such as PCI-DSS) can be easily enforced. However, there are some subtleties in using ALE: 1) Encryption can degrade performance, limiting capacity and introducing latency. 2) Existing tools and frameworks require the developer to have at least some knowledge of cryptography. 3) It may not always be possible to utilize database search functionality on encrypted data. 4) Attackers can gain access to encryption keys or simply to turn off encryption.

C. Overview of Solution Our solution is an ALE framework called JADE (Java Application Data Encryption) that tries to overcome some of the issues discussed above. JADE can be used with JVM languages such as Scala and Java to encrypt application data. In particular, we have the following design goals: 1) It should allow a domain expert without any cryptography knowledge to specify data security and operational requirements in a fine-grained manner. 2) Various security configurations created using expert knowledge (both domain and cryptographic) would be stored in an extensible knowledge-base, which is used by developers to secure their applications. 3) It should retain as much DBMS functionality as possible with minimal data expansion. JADE operates at column level and provides a library API for developers to integrate ALE in their applications. The API is a wrapper on JDBC and provides seamless encryption capability without needing any developer support. The API provides support for the following: 1) Data Classification: A domain expert begins by classifying table columns using various criteria relating to functionality (F) and security (S). The functionality is related to the way the data will be used by the application, while the security is based on the sensitivity of data or other compliance requirements. 2) Requirement Matching: After the F and S values for all columns are defined, a tool analyzes these values and determines the optimal encryption method E (or outputs an error if the values are incompatible). 3) Cryptographic Mapping: Using the E values, internal mappings are derived from the original (unencrypted) data and the stored (encrypted) data. The table mapping converts the tables that a developer defines to the encrypted tables that are actually used. The query mapping maps the developer’s queries to queries on the encrypted tables. Finally, the result mapping converts the encrypted results back to plaintext that the application uses. A developer primarily interacts with the framework during the first step (data classification) when he defines his database schema. When the application needs to access data, the library provides an API for querying the DBMS in a type-safe manner – a query using a column is allowed only if the column’s security and functionality classification does not prohibit it. Incompatible queries are caught during compile-time. In this situation, the developer has two options: either use an alternate valid query or adjust the data classification to make the query compatible. Once all possible query use-cases have been tested, the classification can be finalized. III. Data Classification The first step in our approach entails classifying the columns based on functionality and security requirements. In this section we describe the classification method.

A. Based on DBMS Functionality (F) We first classify data based on the DBMS functionality (F) using levels F0 to F6, each with increasing level of functionality. We only consider relational databases in this work. This classification is based on the types of SQL queries that the app can issue to the DBMS for retrieving the data. The DML/DDL query operations we consider are given in Figure 2, which are based on a typical application.

SL E I M S A L G FK

SQL meaning Select Equals/Primary-key Inequality Min/Max Sum Avg Like Groupby/Having Foreign-key

Example query snippet select name or select * where name = ’Alice’ where age < 10 select max(age) select sum(age) select avg(age) where name like ’%Alice%’ group by age foreign key age

Fig. 2. Properties defining SQL functionality of a column

In this classification, a column is assigned a subset of the above properties to enable the corresponding SQL operations. For example, any column with property E can be used as a primary key in DDL and for equality in DML. Without any security a column has all properties. As we gradually add security, some of the properties are lost. Therefore, there is a compromise between functionality and security. For optimal protection, only the necessary properties must be assigned to a column. Assigning properties to columns this way entails a fine-grained classification. For convenience, we also define a coarse-grained functionality classification by grouping maximal subsets of properties that are satisfied together in various encryption mechanisms and layers. This classification, given by levels F0-F6 is given in Figure 3. F0 has the lowest functionality – it gives no access to the column. The app can only insert data into the column but it cannot select the data. This ‘write-only’ functionality, can be assigned to columns that the app never reads (such as those for storing uploaded KYC documents for a bank – the app only needs to be able to write data – a different app will read the data). F1 is the next level of functionality where we can select the data but issue no other queries. For instance, if Name is assigned F1, then a query such as select Name from Cust is allowed while select * from Cust where Name = ’Bob’ is not. However, for F2, this would be allowed. Similarly, levels F3 - F5 have increasing functionality that supersede the previous levels. Finally, level F6 has full DBMS functionality allowing all queries. The purpose of having lower F-levels is to achieve higher security, which varies inversely with functionality. B. Based on Desired Security (S) We next classify the data based on the desired security (S). This is a measure of the sensitivity of data and entails

F-level F6 F5 F4 F3 F2 F1 F0

SL Y Y Y Y Y Y -

E Y Y Y Y Y -

I Y Y -

M Y Y -

S Y Y -

A Y Y -

L Y -

G Y Y Y Y Y -

FK Y Y Y Y -

Fig. 3. Functionality (F) Matrix

protection from theft, malicious modification and information leakage. In particular, we focus on the following: 1) Confidentiality (C): This pertains to data theft when an attacker gets read-access to the DBMS and subsequently gains knowledge of plaintexts. We classify confidentiality based on the attacker capability: A1 is an external attacker, A2 is the DBMS or 3rd party middleware, while A3 is the application hosting provider. We have three levels of confidentiality: C1 A1 cannot access plaintexts. C2 A1 or A2 cannot access plaintexts. C3 Neither A1, A2 or A3 can access plaintexts. 2) Integrity (IG): Integrity is broken when the attacker makes undetected modification to data after obtaining write-access to DBMS. In contrast to confidentiality, here we don’t differentiate based on attacker capability. We only have one integrity classification (IG) that requires all of the following: a) Attacker cannot delete or add an entire row. b) Attacker cannot modify a field. c) Attacker cannot duplicate rows. 3) Unlinkability (U): If an attacker gets read-access to a DBMS containing encrypted data, he may not be able to violate confidentiality but he could still infer certain information from the ciphertexts depending on the encryption used. In particular, an attacker could infer which rows in the same table have identical values for a particular column or, which rows in two different tables have primary-foreign key relationship. Our two unlinkability requirements say that an attacker should not be able to decide if two data elements are identical. Specifically: U1 He cannot infer identical data across tables. U2 He cannot infer identical data in a table. Figure 4 gives the security levels and their features. S-level S6 S5 S4 S3 S2 S1 S0

C1 Y Y Y Y Y Y -

C2 Y Y Y Y Y -

C3 Y Y -

IG Y Y Y Y Y Y

U1 Y Y Y Y -

U2 Y Y Y -

Fig. 4. Security (S) Matrix

Highest F F0 F0 F1 F2 F5 F5 F6

Similar to the functionality, we allow an advanced user to specify security at a fine-grained level using the individual security properties outlined above (C1-3, IG, U1,2). However, we also provide a coarse-grained classification indicated by the S-matrix, which can be used by standard users. The security primitives are grouped into similar levels based on the encryption layers and the available cryptographic primitives. For instance, the reason for separating layers S5 and S6 is because ULE generally does not provide a way to detect integrity (IG) violations such as row-deletion or duplication, while ALE does (see Section IV-B). Therefore, ULE only achieves up to S5, while ALE can go up to S6. A higher security level implies better protection. For any given scenario, we desire the highest F and S values. However, it is not possible to achieve both a high level of security and functionality, as seen by the following example: we cannot achieve unlinkability of a field across tables (U1) and also use that field as a foreign key (FK), since a foreign key implies linkage between tables. Hence any combination that has both FK and U1 included (for example, (F3, S3)) is forbidden. Using this idea, we have a strict upper limit on the functionality for any given security level, as shown in last column of Figure 4. A designer can use this classification to get the optimal S and F levels for each column of the data. C. Classification of Encryption Layers We can use the above classification to study the various encryption layers discussed in Section II-A. Each layer has an upper limit on the functionality available and the security provided as depicted in Figure 5. Layer ULE ALE MLE DLE SLE

Max F F4 F5 F5 F5 F6

Max S S5 S6 S1 S1 S1

Remarks low F optimal low S low S low S

Example CipherCloud [6] CipherDB [7] Stored procedures Oracle TDE [8] Bitlocker [9]

Fig. 5. Encryption layers and their F/S levels

The Max F column gives the functionality when the minimal security at that layer is enabled (with security disabled, the highest F is always F6). The Max S column gives the maximum security that can be practically achieved at that layer. Note that the examples do not necessarily provide the maximum values. For instance, Oracle’s TDE – an example of DLE – provides only F2. The maximum F values are determined as follows. In ALE and ULE, a weak form of encryption called OPE (Section B) allows us to do inequality (I) and MIN/MAX (M), enabling both to achieve F4. Another weak form of encryption in ALE called MOD (Section A) allows us to do SUM/AVG, thereby allowing it to go up to F5. Layers below ALE can achieve the same functionality using identical algorithms. Layers below ALE also have

low security due to reliance on 3rd parties or the DBMS. Hence, we do not consider them to be a viable solution. Of the remaining options (ALE, ULE), we claim that ULE is much more restrictive because: 1) ULE does not allow an easy way to achieve integrity (IG), in particularly for row deletion or duplication. 2) ULE ciphertexts often needs to satisfy various formatting constraints to ensure that the application does not break [10], thereby weakening security. 3) There is no possibility of query rewriting in ULE. Consequently, operations like AVG cannot be performed easily. Furthermore, there is no known encryption method that preserves both AVG and SUM. Hence, we do not consider ULE to achieve F5. CipherDB [7] is an example of an ALE framework similar to the one we propose. However, there are some differences. Firstly, our framework works on the JVM, while CipherDB does not. Secondly, CipherDB achieves only F3 functionality, while ours achieves F5 with the same security. IV. Description of Solution The above classification defines the security and functionality of a column, which in turn determine the underlying cryptographic primitives used. A. Cryptographic Primitives Figure 6 lists the cryptographic primitives used. The first table lists the algorithm. We use standard primitives for most operations except when extra functionality is needed. We use two non-standard primitives described in the Appendix. These are for performing operations on encrypted numeric data directly via the DBMS. 1) OPE: This is an order-preserving cipher that allows us to do MIN/MAX/Inequality (Section B). 2) MOD: A homomorphic cipher using modular arithmetic that allows us to do SUM/AVG (Section A). The second table gives the resulting E-levels of combining the primitives in different ways along with the corresponding F and S levels. The MOD cipher, for example, when used with a table-specific salt (represented by E2) results in an F-level slightly below F5 (everything except FK). Similarly, the OPE cipher without a table-specific salt (represented by E3) achieves F4. Furthermore, both ciphers have weak security due to statistical leakage. B. Integrity Checks Integrity checks as done as follows. Each encrypted table has a column containing a Message Authentication Code (MAC) of all the remaining fields of every row. This is called the row-MAC. The row-MAC uses a table-specific salt that is stored in a separate meta-data table. The key of the MAC is generated from the master ALE key. The meta-data table, shown in Figure 7, has a row corresponding to each encrypted table (the first row represents itself). The columns include the row-count, the tablespecific salt and the last modified date. The rows of this table are also protected using a row-MAC.

Term PUB PROB SYM MOD OW MAC OPE TS

Meaning Pub-key probabilistic enc Sym-key probabilistic enc Sym-key deterministic enc Sym-key mod arithmetic enc One-way hash Message authentication code Sym-key order-preserving enc Table-specific salt

Default RSA-OAEP AES-OCB AES-ECB Section A SHA256 HMAC Section B Random(256)

E-Level Crypto used TS F-level S-level E8 PUB F0 S6 E7 PROB F1 S4 E6 OW/SYM Y F2 S3 E5 OW/SYM F3 S2 E4 OPE Y F4 - FK S3∗ E3 OPE F4 S2∗ E2 MOD Y F5 - FK S3∗ E1 MOD F5 S2∗ E0 MAC Y F6 S1 ∗ Weakened security due to statistical leakage. Fig. 6. Cryptographic tools in JADE toolbox

table Metadata Cust Orders Items

rows 4 23 40 5

table salt r2tWAfzo pnJ45nya Zv3QrdwE MJk9k4jw

modified ...9953655 ...9952982 ...2094599 ...9953655

row-MAC 3wSj2l0K mlkE9BRZ k4wEn2E2 TRkEj3JY

Fig. 7. Metadata table for integrity

The various tables are kept updated as follows: 1) Each time a row is inserted or updated, its row-MAC is also updated. It is verified each time any data is read. Consequently, the entire row is read on each select query irrespective of the actual columns being selected. This prevents malicious row modification. 2) Each table has a primary key (user-defined or hidden). This combined with the row-MAC prevents an attacker from duplicating rows, because of the unique primary-key constraint. 3) Finally, because of the row-count in the meta-data, it is not possible for the attacker to delete rows. C. System Components JADE framework consists of the following components. 1) Schema Designer: This module allows the designer to define the various security (S) and functionality (F) requirements for each column of the table. 2) Requirements Analyzer: This module uses the functionality (F) and security (S) requirements given by a designer and computes the optimal E-level satisfying both (see Figure 8), or a suggested adjustment to these requirements if no security configuration matches (for example, no E satisfies S4 and F4). 3) Schema Wrapper: This module uses the output of the requirements analyzer to define the encrypted schema based on the specified E-level. Additional security is gained by table/column name encryption

Fig. 8. Security requirements analyzer

and type remapping. For instance an Int column will be mapped to a Varchar if its functionality specification does not involve any numeric operations. 4) Query wrapper: This module wraps the user’s SQL queries into encrypted queries for the encrypted schema. Each column’s E-level is used to decide if a query is allowed or not (via the S and F matrices). 5) Data wrapper: This module is used to wrap the data passed between the user’s SQL queries to the encrypted queries. This module also authenticates the data being read via various integrity checks. 6) Key manager: This module is used to manage keys for the various crypto primitives used. A master key is used to seed a key generator for the individual keys, which are computed in a deterministic manner. Thus, the problem of securing several keys reduces to the problem of the master key. In the case of public keys, the master key is used for authentication. Figure 9 summarizes this. An architect defines the schema along with the security/functionality classification. The security analyzer uses this to generate the cryptographic mapping, which is transparently accessed via an API.

Fig. 9. JADE high-level architecture

D. Secure ALE Bootstrap Recall that the key manager uses a master key to generate the individual keys deterministically. If an attacker compromises the server and this key is stored alongside the code (say in a configuration file), then he can access it and decrypt the entire database. Therefore, additional measures need to be taken to secure this key in the event of server compromise. We describe below a method to securely bootstrap a key manager.

Assumptions: We allow the attacker to compromise the server under the following assumptions: 1) The attacker has read-write access to the database, configuration files and application code. 2) Attacker cannot run a modified version of the application code on the live server undetected. 3) Attacker cannot inspect run-time memory of an already bootstrapped server running unmodified code. The second assumption can be realized with tools that detect file modification (such as Tripwire [11]) used alongside techniques for authenticated bootstrap (such as UEFI [12], Trusted Computing [13], [14] and Trusted VM Snapshots [15]). The third assumption can be realized by hypervisor-based techniques such as [16] and [17]. Approach: We do not aim for prevention but rather on detection (a weaker requirement). The basic idea is that the master key is never stored on the server. The key is provided by a human or a trusted key-server at boot-time after ensuring that the server is in a secure state (using Assumption 2 above). If the server is later compromised, the key cannot be extracted under Assumption 3 above. In order to prevent the attacker from supplying a different key, a password (i.e., a secondary key) is used for authorization. The key-server/human must additionally supply this password along with the master key. The problem then boils down to ‘securely’ storing the password on the remote server. Our modified requirements are: (1) Attacker cannot guess the password, (2) Attacker cannot change the password, and (3) Attacker cannot bypass the password check. Notation: Let C be the server code, p the password we want to hide alongside C, and k a secret unknown to attacker. Assume that attacker knows C. Storing the password: Instead of storing p alongside C, we store its ‘obfuscated’ form F (p) with the properties: 1) Given {(p, F (p))}p∈P and F (p0 ) for any set of passwords P and p0 ∈ / P , it is hard to compute p0 . 2) Given {(p, F (p))}p∈P for any set of passwords P , it is hard to compute F (p0 ) for p0 ∈ / P. 3) Given (F (p), p0 ), it is easy to verify if p = p0 . 4) Given (k, p), it is easy to compute F (p) for any p. Code C encodes the following logic: On receiving password ? p0 , read F (p) from a configuration file, verify if p = p0 (via Property 3) and abort if the check fails. Security: Given F (p), C and S, a running instance of C, It is hard for the attacker to do any of the following: 1) Compute a valid F (p0 ) for p 6= p0 . 2) Compute the password p. 3) Make S accept p0 as a password if p 6= p0 . Construction: We can construct F via any UF-CMA1 signature: Let (SK, P K) be the (private, public) key-pair for a signature scheme and let H be a one-way hash function. Then F (p) = (h, SigSK (h)), where h = H(p). 1 Unforgeable

under an adaptive chosen message attack

To validate a password p0 , the code C has logic as follows: First compute h0 = H(p0 ) and validate that F (p) contains a valid signature on h0 under P K. The secret k is SK. Why a password? Observe that we could have stored the obfuscated ALE key directly. However, using a password has two advantages: (1) The obfuscated password is subject to static analysis, and has reduced security, (2) Due to this, we need to change the password frequently. Using this method allows us to change the password without needing to change the ALE key. Password changes: To change password to p0 , replace F (p) with F (p0 ), which can be computed using (k, p0 ). Preventing old password replay: An attacker could gain access to an older password p0 along with its obfuscation F (p0 ), which he can then replace on the server and conduct a successful attack (replacing ALE key). One way to protect from such replay attacks would be to enforce password expiry by embedding additional information in F (p). Another way would be to store F (p) in a trusted public database instead of on the hosting server. V. JADE Library The library defines high-level types that can be used with any JVM language such as Java or Scala. These internally map to native SQL types as shown in the first table of Figure 10. The first column has the library types; the second column has the SQL types which are hidden from the user. The remaining columns show the operations allowed at the lowest security level (S0) using the symbols E, I, M, S and G of Section III-A. The second table gives the JVM values that can be assigned to these types. JADE type Int UInt Long ULong JBigInt UJBigInt VarChar VarBin

DB type (hidden) INT INT BIGINT BIGINT VARCHAR VARCHAR VARCHAR VARBINARY

E Y Y Y Y Y Y Y Y

I Y Y Y Y Y -

M Y Y Y Y Y -

S Y Y Y Y -

G Y Y Y Y -

‘U’ prefix in the type indicates an unsigned value JADE type (U)Int (U)Long (U)JBigInt VarChar VarBin

Numeric? Y Y Y N N

Simple? Y Y N N N

JVM types int long BigInteger String Byte[ ]

Fig. 10. JADE library primitives

Figure 11 shows how the library primitives can be combined with various encryption methods to obtain desired functionality. The first column represents the E-level. The second column gives the permitted library type for that Elevel. The remaining columns give the functionality available at that E-level using the notation of Section III-A.

E-level E0 E1 E2 E3 E3 E4/SYM E4/OW E5/SYM E5/OW E6 E7

JADE type Any Numeric Numeric Simple (U)JBigInt Any Any Any Any Any Any

SL Y Y Y Y Y Y Y Y -

E Y Y Y Y Y Y Y Y Y -

I Y Y Y -

M Y Y Y -

S Y Y -

G Y -

FK Y Y Y Y Y Y -

Fig. 11. JADE library primitives with encryption

val (i1, i2)=(Spec(id, F6, S0), Spec(id, F2, S3)) val db1 = SecDB(usr)(n1, t1, i1) // Configuration 1 val db2 = SecDB(usr)(n2, t2, i2) // Configuration 2

Configuration 1 has lower security (S2, S0) compared to Configuration 2 (S3). We measured the time to insert and select indexed rows. Our results are given in Figure 12. Rows 100000 1000000

Operation insert selectAll insert selectAll

Insecure 9033 1778 291050 18580

Config 1 30433 5740 564254 62244

Config 2 30764 7035 606889 75379

Fig. 12. Operation times (in milliseconds)

A. Example Library Usage The JADE library (written in Scala) allows developers to define columns using the data-types described above. It optionally allows them to specify the S and F values for these columns. These values can either be specified at a course grained level (F0-F5 and S0-S5) or a more fine grained level using the individual columns of the F and S matrices. Once these values are specified, the underlying encrypted tables are created. Below we give examples of the library usage in Scala. In all the cases, the rows and row objects will contain identical data. If some of the security specs are incompatible with the operations encountered, an error will be thrown at compile-time. type Str = VarChar(255) // create an alias val (name, age, id) = // define columns (Col("name", Str), Col("age", Int), Col("id", Str)) val usr=Tab("usr")(name,age,id)(id) // id is pri-key /********* EXAMPLE 1: no security **************/ val db = DB(usr) /********* EXAMPLE 2: default security *********/ val db = SecDB(usr) /********* EXAMPLE 3: pre-set security *********/ val nSpec = Spec(name, F1, S2) //spec for name val aSpec = Spec(age, F2, S3) //spec for age val db = SecDB(usr)(nSpec, aSpec) /********* EXAMPLE 4: custom security **********/ // first define functionality for name and age val (fName, fAge) = (Func(E, SE), Func(E, I, S)) // next define security for name and age val (sName, sAge) = (Sec(IG, C1, U1), Sec(IG, C2)) val nSpec = Spec(name, fName, sName) //spec for name val aSpec = Spec(age, fAge, sAge) //spec for age val db = SecDB(user)(nSpec, aSpec) /******** how to use db (all examples) *********/ db.insert("Bob", 20, "id1") //insert row val rows = db.selectAll //read all rows val row = db.select(Whr(name, Eq, "Bob")) //search

B. Performance Evaluation Here we give some performance metrics of JADE. The benchmarks were done on a Windows 7 PC (Intel core i3) with 500 GB HDD, 4 GB RAM and 64bit Java 7 using H2 DB. We used the table usr defined below: type Str = VarChar(255); type Num = UJBigInt(100) val (name, tel, id)= // define columns (Col("name", Str), Col("tel", Str), Col("id", Num)) val usr = Tab("usr")(name, tel, id)(id) val (n1, n2)=(Spec(name, F3, S2), Spec(name, F2, S3)) val (t1, t2)=(Spec(tel, F3, S2), Spec(tel, F2, S3))

VI. Conclusion Application Layer Encryption (ALE) is an important tool for protecting data stored on the cloud. However, currently there are no easy-to-use ALE frameworks for developers. Furthermore, the kind of encryption must be carefully selected for each column vis-á-vis the type of database queries to be made. As a general rule, stronger protection leads to reduced DBMS functionality. Hence, over-protection needs be avoided. Finally, problems such as protecting encryption keys in the event of server compromise remain. To this end, we presented JADE, a framework for JVM languages that allows developers to seamlessly use ALE without requiring any crypto knowledge. We also presented a data classification system based on DBMS functionality (F) and security (S) that allows developers to classify the columns of their schema using only domain knowledge. The framework uses these F and S values and applies the optimal level of encryption (E) for each column, ensuring maximum DBMS query efficiency. JADE also does automatic integrity checks as it reads data to ensure that an attacker cannot modify the database without detection. Finally, we presented a method to securely bootstrap ALE keys so that data on the cloud remains secure in the event of server compromise. References [1] Elizabeth Palermo. 10 worst data breaches of all time. , 2015. [2] Iqra Basharat, Farooque Azam, and Abdul Wahab Muzaffar. Article: Database security and encryption: A survey study. International Journal of Computer Applications, 47(12):28–34, June 2012. [3] Luc Bouganim and Yanli Guo. Database encryption. In Henk C. A. van Tilborg and Sushil Jajodia, editors, Encyclopedia of Cryptography and Security, 2nd Ed., pages 307–312. Springer, 2011. [4] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu. Order preserving encryption for numeric data. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 563–574. ACM, 2004. [5] Erez Shmueli, Ronen Vaisenberg, Ehud Gudes, and Yuval Elovici. Implementing a database encryption solution, design and implementation issues. Computers & Security, 44(0):33 – 50, 2014. [6] CipherCloud. Ciphercloud for salesforce. , 2013. [7] Crypteron. Cipherdb developer’s guide. , 2014. [8] Oracle. Transparent Data Encryption (TDE) frequently asked questions. , 2015.

[9] Jesse D. Kornblum. Implementing bitlocker drive encryption for forensic analysis. Digital Investigation, 5(3-4):75–84, 2009. [10] Mihir Bellare, Thomas Ristenpart, Phillip Rogaway, and Till Stegers. Format-preserving encryption. In Selected Areas in Cryptography, pages 295–312. Springer, 2009. [11] Gene H. Kim and Eugene H. Spafford. The design and implementation of tripwire: A file system integrity checker. In Proceedings of the 2Nd ACM Conference on Computer and Communications Security, CCS ’94, pages 18–29, New York, NY, USA, 1994. ACM. [12] Richard Wilkins and Brian Richardson. Uefi secure boot in modern computer security solutions. 2013. [13] Stéphanie Delaune, Steve Kremer, Mark D. Ryan, and Graham Steel. A formal analysis of authentication in the TPM. In Pierpaolo Degano, Sandro Etalle, and Joshua Guttman, editors, Revised Selected Papers of the 7th International Workshop on Formal Aspects in Security and Trust (FAST’10), volume 6561 of Lecture Notes in Computer Science, pages 111–125, Pisa, Italy, September 2010. Springer. [14] Najwa Aaraj, Anand Raghunathan, and Niraj K. Jha. Analysis and design of a hardware/software trusted platform module for embedded systems. ACM Trans. Embed. Comput. Syst., 8(1):8:1–8:31, January 2009. [15] Abhinav Srivastava, Himanshu Raj, Jonathon Giffin, and Paul England. Trusted vm snapshots in untrusted cloud infrastructures. In Davide Balzarotti, SalvatoreJ. Stolfo, and Marco Cova, editors, Research in Attacks, Intrusions, and Defenses, volume 7462 of Lecture Notes in Computer Science, pages 1–21. Springer Berlin Heidelberg, 2012. [16] DagArne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and countermeasures: The case of aes. In David Pointcheval, editor, Topics in Cryptology Ű CT-RSA 2006, volume 3860 of Lecture Notes in Computer Science, pages 1–20. Springer Berlin Heidelberg, 2006. [17] J. Xu, Z. Kalbarczyk, and R.K. Iyer. Transparent runtime randomization for security. In Reliable Distributed Systems, 2003. Proceedings. 22nd International Symposium on, pages 260–269, Oct 2003. [18] Alexandra Boldyreva, Nathan Chenette, Younho Lee, and Adam OŠneill. Order-preserving symmetric encryption. In Advances in Cryptology-EUROCRYPT 2009, pages 224–241. Springer, 2009. [19] Alexandra Boldyreva, Nathan Chenette, and Adam OŠNeill. Order-preserving encryption revisited: Improved security analysis and alternative solutions. In Advances in Cryptology– CRYPTO 2011, pages 578–595. Springer, 2011.

Appendix A The MOD Cipher For encrypted SUM and AVG, we need a symmetric-key cipher that preserves SUM (using which we can do AVG). An example of this, called the MOD cipher, is given below. Assume that plaintext domain is [1..n]. Typically n = 232 Key Gen: Select random prime m > n and an integer k such that m > k > m 2 . The (en/de)cryption key is (k, m). Encryption/Decryption: To encrypt x ∈ [1..n], compute c = xk mod m. The ciphertext is c. To decrypt c, compute x = ck −1 mod m. The cipher is additively homomorphic, and thus preserves SUM. Security: Since k > m 2 , any plaintext > 1 ensures a wrap around in the multiplication modulo m. If plaintexts are uniformly distributed, then the cipher is information theoretically secure. However, the cipher has statistical leakage if the plaintexts’ distribution is known. We leave a detailed security analysis of this cipher as future work.

Appendix B The OPE Cipher Order Preserving Encryption (OPE), a block cipher that preserves the lexicographic ordering of plaintexts must leak some information. However, a weak cipher can be created as shown in several works [4], [18], [19]. We describe another such scheme that preserves lexicographic ordering of the first n characters and is simpler to implement than the other proposals. We will assume that plaintexts will have at least n characters (using padding, if necessary) Basic idea: The encryption uses a standard algorithm such as AES and additionally includes a hash of the first n characters of the plaintext. The sorting will be done using the hash, which has an “order preserving” property. We call this an OP-Hash. To construct this OP-Hash function, we will use a character encryption scheme described below. For each i ∈ [1 .. n], the character encryption will encrypt the ith character using an encryption key ki . The hash will be generated by combining these i ciphertexts. Character Encryption: The encryption scheme Ek for key k and characters from alphabet Λ works as follows: 1) The encryption is similar to ECB mode. 2) The ciphertext space is integers in [x .. y] such that: a) x, y ∈ [−∞ .. ∞]; x < y and x − y > |Λ| b) Ek (a) < Ek (b) if lexicographically a < b c) The values x, y are determined by k 3) The gaps between encryption of consecutive characters is also pseudo-random and determined by k. Construction: Let ` = |Λ|, the size of the alphabet. Let σ1 , σ2 , . . . , σ` be the ordered characters of Λ. λ is a security parameter that can be made larger for better security. Let k be the key. Typically λ = 10 and k ≈ 225 . 1) Initialization: Choose a random number: r1 = Random([λk` .. λk`]). 2) Code gen: For i ∈ [2 .. `], set ri = Random([1 .. k]). For any i, the gap between encryption of the two consecutive characters σi−1 , σi ∈ Σ is ri . 3) Normalization: Define c1 = r1 and ci = ci−1 + ri for i ∈ [2 .. `]. 4) Encryption: Ek (σi ) = ci for i ∈ [1 .. `]. The above is the encryption of the ` possible single characters of our alphabet using one key k. The OP-hash: To compute the hash of the first n characters, encrypt these characters individually with a different key. Therefore, for characters (a1 , a2 , . . . , an ), the hash is (b1 , b2 , . . . , bn ) with bi = Eki (ai ) for i ∈ [1..n]. Order-preserving: The hash preserves the order. First we sort by the first number, then the second, and so on. Security: If the initialization vector r1 is chosen from a large enough range (in this case [−λk` .. k`λ]), and k is large enough, then the above scheme is secure as a character encryption scheme provided that the attacker does not know a large number of distinct ciphertexts. Otherwise, it is vulnerable to frequency analysis attacks.

Suggest Documents