An Architecture for a Resilient Cloud Computing ...

5 downloads 21178 Views 499KB Size Report
cloud computing infrastructure that provably maintains cloud functionality against persistent successful corruptions of cloud nodes. The architecture is composed ...
An Architecture for a Resilient Cloud Computing Infrastructure Joshua Baron, Karim El Defrawy and Aleksey Nogin

Rafail Ostrovsky

{jwbaron, kmeldefrawy, anogin}@hrl.com Information Science and Systems Laboratory HRL Laboratories, LLC

[email protected] Departments of Computer Science and Mathematics UCLA

Abstract—This paper proposes an architecture for a resilient cloud computing infrastructure that provably maintains cloud functionality against persistent successful corruptions of cloud nodes. The architecture is composed of a self-healing software mechanism for the entire cloud, as well as hardware-assisted regeneration of compromised (or faulty) nodes from a pristine state. Such an architecture aims to secure critical distributed cloud computations well beyond the current state of the art by tolerating, in a seamless fashion, a continuous rate of successful corruptions up to certain corruption rate limit, e.g., 30% of all cloud nodes may be corrupted within a tunable window of time. The proposed architecture achieves these properties based on a principled separation of distributed task supervision from the computation of user-defined jobs. The task supervision and enduser communication are performed by a new software mechanism called the Control Operations Plane (COP), which builds a trustworthy and resilient, self-healing cloud computing infrastructure out of the underlying untrustworthy and faulty hosts. The COP leverages provably-secure cryptographic protocols that are efficient and robust in the presence of many corrupted participants – such a cloud regularly and unobtrusively refreshes itself by restoring COP nodes from a pristine state at regular intervals. Index Terms—Cloud computing, secure computation, resilient computation, proactive security.

I. I NTRODUCTION Cloud computing is transforming how organizations perform required computation. Cloud-based solutions provide significant scalability, are cost effective, can be quickly deployed, and enable full transparency and scalability in managing operational costs. The Department of Homeland Security (DHS) and other federal agencies, in addition to the private sector, are rapidly migrating many critical applications to the cloud. However, the scale, the potential high homogeneity of cloud nodes, and the multi-task/multi-user nature of clouds introduce new security risks and amplify many existing ones that could potentially pose a significant danger to the nation’s information and computing infrastructure. This paper proposes a novel architecture to implement a resilient cloud computing infrastructure. The architecture leverages the distributed nature of cloud computing to provide a high level of operational resilience even in the face of persistent and successful security breaches of nodes comprising the cloud computing infrastructure. Current approaches for securing cloud computing are largely based on virtualization, separation, and access control [1]. A comprehensive cloud security solution must be resilient in

the face of on-going node corruptions1 , especially when such corrupted nodes, that try to disrupt operations, compromise a large percentage, e.g., up to 30%, of the cloud nodes. A comprehensive cloud security solution must also be regenerative so that lost nodes are automatically recovered in order to be mission effective later in the computation. Preventing an attacker that compromises a node from stealing data, corrupting computation and learning about the cloud control state currently requires unacceptably high overhead. Furthermore, most cloud security approaches assume a static architecture that favors the attackers since this provides them with time to discover and implement an effective attack strategy and execute various attack vectors. This paper proposes an architecture that addresses the above shortcomings by creating a resilient secure self-healing cloud supervised by a novel software mechanism, the Control Operations Plane (COP). The goals of the COP-secured cloud2 are: 1) to maintain critical integrity of the cloud’s control plane with minimal communication and computation overhead beyond the state of the art; 2) to enable computing control plane functionalities such as job assignment and result checking, even in the face of ongoing coordinated attacks that corrupt up to a significant fraction of nodes during a tunable regeneration period. By separating COP-secured cloud control from user-defined computations, the architecture turns the distributed nature of cloud computing into a security advantage: numerous successful attacks on cloud nodes will not erode the integrity of the overall system or its operational capabilities. Homeland Security Use Case for a COP-secured Cloud: As an example for a homeland security relevant application of a COP-secured cloud consider DHS’ US-VISIT [2] (and its replacement since March 2013, the Office of Biometric Identity Management, OBIM). A key component of DHS’ USVISIT is its biometric IDENT system [3]. As of 2010, it had two-fingerprint biometric data for over 120 million subjects, processing over 150,000 transactions per day [4] and serving 30,000 authorized federal, state and local government users. 1 This paper uses the following terminology: a physical host (or simply cloud host), usually a server or another physical machine that makes up a physical component of the cloud, contains many computation nodes, where each node is a separate virtual machine. 2 Throughout this paper, “COP-secured cloud” is used to mean a COP-secured cloud computing infrastructure.

Fig. 1.

The Architecture of a COP-Secured Cloud

and cannot alter the booting mechanism; if this is not the case, then additional security mechanisms should be employed, e.g., using Trusted Platform Module (TPM); these mechanisms however are outside the scope of this paper. Proactive Security Model: The proactive security model was first proposed in [7] for secure multiparty computation protocols with several follow-up works. In this model, at any given time, only a certain number of parties can be corrupted (i.e., any specific point in time is governed by the rules of the honest majority model), but over time, all parties can be corrupted. Therefore, security in the proactive model is defined by a corruption rate over time; for instance, one can assume that 10% of parties may be corrupted every hour. [7] also introduced the notion of proactive refreshing. After some fixed interval of time, parties are refreshed to a pristine state. A physical-world analogy of such proactive refresh is an adversary that over time may wiretap every room of a facility, but periodically, each room is swept for bugs and is newly certified as secure. In the cloud setting, where processes are frequently executed by virtual machines that are spawned for single jobs, each new virtual machine instantiation may be thought of as a lightweight form of proactive refreshing. A more robust form of proactive refreshing is periodic reboot from Read Only Memory (ROM) under control of a secure boot system (SBS) and a trusted hardware timer.

According to [5], the IDENT system is an ideal candidate for migration to cloud computing given the growth of the database, the desire to increase biometric matching to more accurate 10-fingerprint data and other modalities (e.g., iris, face, voice, handwriting), the resource demands of multi-modal biometrics, and increased interest by US and allied customers. In particular, DHS may use the Software as a Service (SaaS) model of cloud computing on a community cloud, owned by U.S. Governmentcontrolled entities [6]. US-VISIT (or similar systems) hosted on a COP-secured cloud will be able to tolerate persistent attacks that continuously compromise up to 30% of the cloud nodes involved in performing the computations without affecting the performance of critical cloud control and operations tasks, including job assignment, assurance of job correctness, and node communication. Furthermore, an attacker will gain no information usable to map-out cloud operations and jobs being executed on other nodes. The end result will be that customers of the USVISIT IDENT service will be assured that their operations will compute correctly and will not be disrupted, even by persistent and fast-spreading threats worms, and/or automated attacks. Outline of Paper: Section II describes the architecture of a COP-secured cloud and how it processes data. Section III overviews various building blocks required to instantiate such a cloud architecture. Finally Section IV provides initial performance data supporting feasibility of realizing a COPsecured cloud as described in this paper. Finally, Section V discusses related work.

B. Overview of a COP-Secured Cloud The COP-secured cloud relies on three main components: two major novel software components and a third one that leverages existing virtualization and secure boot mechanisms. Figure 1 shows how these components fit together. On each physical cloud host, there is at most one node denoted the COP node; all other nodes are denoted service nodes — these nodes perform the bulk of the cloud computation work. The collection of COP nodes constitutes the control operations plane; the collection of service nodes constitutes the computation and storage plane. The first major software component is the actual COP — the root of trust of the cloud defenses. It is responsible for job supervision and overall cloud control, but does not perform computations required by users, which are executed by service nodes directly. Devoting the highest degree of security to a small set of critical operations (e.g., job allocation and result checking) rather than for job computation strikes the

II. A RCHITECTURE AND O PERATION OF A COP-S ECURED C LOUD This section provides the technical details of a COP-secured cloud. It first describes the considered adversary model and then overviews the proactive security model, the core principle behind the resilience provided by the COP. The section then follows with a description of the architecture and operational details of a COP-secured cloud. A. The Adversary Model and the Proactive Security Model Adversary Model: The considered adversary can launch any remote active and/or passive attacks. It can compromise any node in the cloud (under certain restrictions on the rate of compromise as explained in the proactive security model below). One of the main assumptions of a COP-secured cloud is that an attacker does not have physical access to cloud nodes 2

such activity is outside the scope of this paper.

right balance of security and efficiency, since resilient secure computation mechanisms have high computational cost. COP nodes communicate using advanced but efficient and robust cryptographic protocols (described in more detail in Section III). The COP can perform distributed computations on this protected cloud state information without ever reconstituting or revealing it at any single node of the cloud. One of the central tasks of the COP is anonymous allocation and assignment of jobs — by using its decentralized and resilient computation protocol, the COP is able to allocate jobs in such a way that only the host to which a component of the job is allocated learns where the job is. The other physical hosts and their nodes, an attacker monitoring the other physical host’s traffic and the user submitting the job are all unable to tell where in the cloud the job is executed and how many times it is replicated (when COP is tuned to perform replication). Similarly, the COP is able to deliver job output to the user without revealing any information that would allow an attacker to “map out” the cloud or learn which parts of the cloud would be the most advantageous to attack. The robust result delivery mechanism also provides the ability to spotcheck the result by selectively replicating a small portion of the jobs and comparing the results of the replicas in a way that is unpredictable to an attacker. A higher degree of replication and more COP supervision can be used to provide more security, as a majority of the replicas must be successfully attacked to overcome majority voting and to keep the correct result from being reported to the user. The second major component to be implemented in software is the Auxiliary COP Services, which implement a set of auxiliary COP tasks that are apart from the core COP protocols. In order to prevent an attacker from mapping out (and then targeting) the cloud job assignments, it is essential that any single component of a distributed multi-node job does not know the allocation of the other job components and that these allocations change dynamically. In order to achieve this, COP sets up, in a private manner, multi-node routing tables for a job, where a node required to send a message will only know its next hop neighbor in the routing table. This will implement low-overhead anonymous communication channels without resorting to higher overhead of traditional anonymous communication protocols, such as onion routing. The third component is a mechanism that periodically reboots physical hosts and nodes and restores them to a pristine state. Rebooting physical hosts can be achieved using a secure boot system (SBS). The SBS can be realized with features present in commodity CPUs, e.g., by modifying the Basic Input Output System (BIOS) configuration to load an operating system image from a read-only disk or optical drive and securing BIOS so that it cannot be changed without physical access to individual physical hosts in the cloud. Rebooting cloud nodes can be achieved using a similar mechanism by only allowing them to boot from a read-only image that is stored on an optical drive or a read only flash drive and protecting them using a trusted microhypervisor; if necessary, such a hypervisor may be further hardened and/or formally verified [8], however

C. Operation and Data Flow for a COP-Secured Cloud The overall data flow in a COP-secured cloud is as follows (also shown in Figure 2): A user, or an application on behalf of the user, has a job to be computed by the cloud. The job is securely transmitted by the user to COP — by either using an appropriate secret sharing scheme (see Section III-A) to disseminate shares of the job information to COP control nodes, or by encryption using a key that is already shared with the COP and then broadcasting the encrypted job assignment simultaneously to each COP node. The job defines a sequence of subjobs, which are individual tasks to be performed to execute the job, as well as how many rounds of interactions are needed to execute each subjob. The COP nodes then jointly compute a schedule, which is a list of host assignments for each job, via a resilient and secure instance of secure multiparty computation (see Section III-B for a technical discussion of secure multiparty computation as it applies to the COP). The schedule is only available to the COP as a whole, while up to 30% of COP nodes together cannot learn anything about it. At the start of execution for a particular subjob, the COP node on whose host the subjob is tasked receives the required input for the computation as a distributed input from the entire COP. It is important that a COP cannot know any job assignment before that assignment is carried out; preventing an attacker from being able to “map out” the system. The COP node on the host tasked for the subjob execution then assigns the task to a particular service node on that host. After the service node has executed the subjob, the COP node on that host distributes this output accordingly (again, either to the entire COP or a particular COP node, depending on the security settings). Since the COP node on a host may be corrupted, assigning a subjob to a corrupted host could break the resilience of the computation. Therefore, the COP can assign the same subjobs in the computation to multiple service nodes on multiple hosts. This repetition rate is a tunable parameter that can be changed throughout the computation, and ranges from spot-checking a small subset of subjobs to large-scale subjob replication. In order to ensure that both intermediate and final outputs are consistent, the COP executes majority voting on the (unencrypted) outputs of the repeated service node subjob computations. Service nodes are generally reset after computing a job; in particular, service nodes, once used to compute a job component, are not used for the next subjob task of the same job but rather might be used for another subjob task for a different job so that jobs cannot be mapped out by an adversary. The rate of service node reset is another tunable parameter that can be dynamically changed over the source of the computation. Output to the user consists of sending the result of the final output consistency check computed by the COP on the final service node outputs, which is returned by the COP to the user. III. F UNDAMENTAL COP B UILDING B LOCKS This section reviews required cryptographic building blocks for a COP-secured cloud. 3

corruption. Clearly, u0 (i) is distributed independently from u(i), and u0 (0) = u(0) + r(0) = u(0) + 0 = s. See [11] for the full porticol specification. B. Lightweight Secure Multiparty Computation

Fig. 2.

A COP-secured cloud utilizes secure multiparty computation (MPC) protocols to allow COP nodes to jointly and securely compute the necessary functionalities to maintain the COP. Such functionalities include secure job assignment and cloud state management (see Section II-B). MPC protocols allow multiple parties, e.g., a set of servers, to securely compute a function without revealing either their own private inputs or their private outputs. More precisely, n parties P1 , . . . , Pn have inputs x1 , . . . , xn , respectively, and wish to compute a given function f such that each party Pi receives the output f (x1 , . . . , xn )i without any party learning any other party’s input or output. Generally, functions are represented by circuits which fall into two categories: Boolean circuits and arithmetic circuits over a field. Typically, the number of communication rounds is close to the depth of the circuit; at a given circuit depth, all intermediate gates are computed in parallel, using the information recovered from previous gates to compute the current ones without revealing the intermediate values. The goal of the MPC implementation for the COP is not to compute arbitrary functions but rather to maintain a lightweight control-loop of the cloud’s most critical COP control functionalities as well as to maintain the COP state. Such continuously running MPC protocols are called “reactive functionalities,” one needs a highly-efficient implementation of MPC to accomplish such a functionality.

Operation of a COP-Secured Cloud

A. Secure Cloud State Maintenance One can use both proactively and non-proactively secure secret sharing schemes to securely distribute cloud state information across the COP. This can be the primary way that data is securely transmitted in the COP and is the underlying cryptographic primitive for the efficient secure multiparty computation that is required for the COP. A secret sharing-scheme, introduced independently by Blakely [9] and Shamir [10], allows a dealer with a secret s to distribute shares of the secret to the other parties such that only authorized subsets of parties can reconstruct the secret. More specifically, a t-out-of-n secret-sharing scheme among n parties is where a dealer (D) who has a secret (s) distributes shares to all parties such that any t parties can reconstruct s from its shares, and no group of t parties or less can learn any information about s. The first practical extension of secret sharing secure in the proactive security model (see Section II-A) was introduced by Herzberg et al. [11], who built on the initial work of [7] to proactivize the Shamir secret sharing scheme so that the polynomials corresponding to the secrets are periodically rerandomized, and all previously held data is erased. The persecret communication is quadratic in the number of parties. For a dealing party Pd to share a secret s amongst n parties P1 , . . . , Pn , a random polynomial u(x) ∈ Zq [x] of degree d is selected so that u(0) = s, where q is a large prime. Each party Pi receives as her share the point u(i) on the polynomial; d + 1 parties must work together to jointly reconstruct s (or compromise security). Refreshing the shares occurs by a joint computation performed by the parties to obtain a new polynomial u0 (x) such that u0 (0) = s but u0 (i) is distributed independently from u(i) so that an adversary cannot use one share to determine previous or future shares. Intuitively, shares can be rerandomized by jointly constructing a degree d random polynomial r(x) ∈ Zq [X] such that r(0) = 0; u0 (x) is set to be u(x) + r(x), which can be obtained locally (e.g., from the secret shares) by having each party Pi add u(i) + r(i). All parties then securely erase u(i) so that it cannot be obtained upon future

IV. P RELIMINARY S ECURITY A NALYSIS , P ERFORMANCE AND E XPERIMENTAL R ESULTS AND DATA This section provides initial results that demonstrate the feasibility of implementing a COP-secured cloud. Results are provided for the following: 1) simulating random party refreshing to maintain a low overall system corruption rate; 2) implementation-based performance results for an existing protocol to proactively secure and distribute cloud state; 3) asymptotic communication requirements of the most efficient MPC in the literature; and finally 4) an estimate of the computational complexity of implementing COP tasks using MPC. Together these results demonstrate that by carefully combining existing cryptographic primitives (with some additional modifications and optimizations) a practical implementation of a COP-secured cloud is possible. All performance data in this section was obtained using a Macbook Pro with a 2.5GHz dual-core Intel Core i5 processor and 4GB of 1600MHz DDR3 memory. A. Proactive Node Refresh Rates In the proactive security model, parties are periodically refreshed to a pristine state in order to eliminate system corruption. Existing detection and prevention mechanisms (e.g., antivirus, intrusion detection systems, and firewalls) may not be enough to accurately discover which hosts must be reset. 4

The data above reveals that required computation and bandwidth scales linearly with the number of secrets. Refreshing 5MB of secret shared data using [11] requires 170MB in communication, which takes seconds on a local area network. This does not constitute significant overhead since such refreshing is expected to occur on the order of minutes, hours or even on a daily basis. A more detailed analysis of the overhead for varying refreshing period and utilizing other recent proactive secret sharing schemes is planned in future work.

This section investigate the effects of randomly resetting hosts in order to maintain a level of system corruption below a fixed threshold. The following parameters are involved in assessing effectiveness of randomly refreshing parties: 1) n is the number of parties involved in the computation, e.g., the COP nodes. 2) t is the total threshold of corruption. That is, if tn + 1 parties are ever corrupt at one time, the proactive protocol f ails to guarantee security. 3) c is the per stage corruption rate of the adversary. That is, at every stage, the adversary can corrupt cn parties. 4) r is the fraction of parties selected uniformly at random to be reset per stage. We define r˜ = 1 − r. Formal analysis of the model results in the following lemma, whose proof follows from a straightforward use of Chernoff and union bounds. Lemma 1: If t−1 c < r, proactive security in the random reset model can be maintained except with negligible probability. Figure 3 provides simulation results to augment Lemma 1. The simulation results indicate that the per stage refresh rate is indeed on the order of t−1 c < r, thus proactive security as required can be achieved. In summary, by carefully selecting refreshing period value and the fraction of nodes that will be randomly refreshed, proactive securty can be guaranteed agianst a total corruption threshold of 1/3.

C. Efficiency of Secure Multiparty Computation Table I shows the asymptotic complexity of the most efficient MPC schemes in the literature. Paper [13] [14] [15]

The protocols of [14] and [15] are impractical to implement: [14] has very large practical complexity owing to its use of party virtualization [16] to obtain its corruption threshold, while the n7 term in the complexity of [15] dominates for most settings. As an alternative one should consider the practical complexity of the relatively simple scheme of [13]. The protocol of [13] is sufficiently simple to be implementable in practice; see their paper for full details of their scheme. Its practical complexity is dominated by calls to the Berlekamp-Welch (BW) error correction algorithm [17] in order to process shares, i.e., to interpolate degree O(n) polynomials. One particular benefit of the BW algorithm is that it can interpolate polynomials, and therefore reconstruct states, even in the presence of incorrect points on the polynomial, which in this case correspond to data that corrupted parties send. Table II contains performance results of an implementation of the BW algorithm as given in [17] in C++ using the NTL library3 .

1000 Parties 100 Parties

Security Failure Probability

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.11

0.12

0.13

0.14

0.15

0.16

0.17

Per Stage Adversary Corruption Rate, c

Fig. 3.

Communication Complexity O(Cnk + Dnk ) + poly(nk) O(C log n log C) + D2 poly(n, log C) O(C(n log n + k) + Dn2 k + n7 k)

TABLE I A SYMPTOTIC COMPLEXITY OF MPC. C= ARITHMETIC CIRCUIT SIZE , D= CIRCUIT DEPTH , n= NUMBER OF PARTIES , k= SECURITY PARAMETER , = ARBITRARY POSITIVE CONSTANT.

1 0.9

Threshold n/3 (1/3 − )n n/2

Number of parties 10 50 100

Party Refresh Rate, t = 1/3, r = 1/2, 1000 stages

B. Efficiency of Proactively Secure Secret Sharing A Python implementation of the proactive secret sharing scheme in [11] yields the following performance results. The secret is shared mong 6 Amazon EC2 servers [12]. Initial Sharing Phase of a Secret of 50Bytes: required memory: 3.2MBytes, required computing time: 0.4 seconds, required communication bandwidth: 31KBytes. Refreshing Phase: required memory per Amazon EC2 server: 10 − 15MBytes, required computing time per Amazon EC2 server: 0.56 seconds for no corrupted parties and 0.77 seconds for 2 corrupted parties, required communication bandwidth per Amazon EC2 server: 17KBytes for no corruptions, 41KBytes of broadcasted data (multiply by 5 if broadcast is not native in the underlying network) for 2 corrupted parties.

512-bit prime 1ms 28ms 225ms

1024-bit prime 2ms 63ms 587ms

2048-bit prime 4ms 183ms 1548ms

TABLE II B ERLEKAMP -W ELCH IMPLEMENTATION DATA , POLYNOMIAL DEGREE = NUMBER OF PARTIES .

D. Efficiency of COP-Secured Functionalities The COP-secured cloud requires securely computing a limited number of functionalities as MPC executions. Since, as seen in Table I above, the complexity of executing MPC is dominated by the number of parties, the size of the circuit, the depth of the circuit, and the security parameter. Table 3 Note that performance time could, in further work, be improved per the algorithm of [18].

5

III provides approximate circuit complexities of required COP functionality and primitives. The tasks to be computed through MPC by the COP nodes include anonymous allocation and assignment of jobs, as well as the construction of a mutli-node routing table; the reader can refer to Sections II-B and II-C for details. In particular, the majority voting by the COP nodes is performed “in the clear” and need not be computed through MPC. Constructing the exact circuits that compute the other functionalities is beyond the scope of this paper; such circuits can be instantiated through repeated calls to primitive functionalities such as value equality and inequality, in order to perform measurements on hidden cloud state values. The arithmetic circuit complexity of such functionalities have been detailed in the work of [19], and relevant complexities are shown in Table III below. In all recent MPC work, complexity is dominated by interactive multiplication computations, and so Table III provides the number of multiplications rather than circuit size. In addition, circuit depths are given without including preprocessing rounds, which, over repeated calls to these functionalities, add only a negligible amount to the complexity. Functionality Equality of two elements Inequality of two elements Maximal value of ` elements

Multiplications 17k + 1 84k + 5 2` − 2

nodes. The proposed architecture achieves these properties based on a principled separation of distributed task supervision from the computation of user-defined jobs. Task supervision and end user communication are performed by a new software mechanism called the Control Operations Plane (COP), which builds a trustworthy and resilient, self-healing cloud computing infrastructure out of the underlying untrustworthy and faulty hosts. The COP leverages provably-secure cryptographic protocols that are efficient and robust in the presence of many corrupted participants. Initial experimental and analytical results of the underlying cryptogrpahic building blocks demonstrate feasibility of implementing a COP-secured cloud. R EFERENCES [1] K. Trivedi and K. Pasley, Cloud Computing Security. WebEx Communications, 1st ed., 2012. [2] “DHS office of biometric identity management,” tech. rep. [3] W. Graves, “IDENT multimodal limited production pilot,” 2011. Accessed 09/04/12. [4] K. Lewis, “US-VISIT overview,” December 2010. Accessed 09/04/12. [5] M. Colosimo, “Cloud computing for biometrics,” 2011. Accessed 09/04/12. [6] “The NIST definition of cloud computing.” Special Publication 800-145. [7] R. Ostrovsky and M. Yung, “How to withstand mobile virus attacks (extended abstract),” in PODC, pp. 51–59, 1991. [8] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood, “seL4: Formal verification of an OS kernel,” in SOSP, pp. 207–220, 2009. [9] G. R. Blakley, “Safeguarding cryptographic keys,” in International Workshop on Managing Requirements Knowledge, (Los Alamitos, CA, USA), p. 313, IEEE Computer Society, 1979. [10] A. Shamir, “How to share a secret,” Commun. ACM, vol. 22, no. 11, pp. 612–613, 1979. [11] A. Herzberg, S. Jarecki, H. Krawczyk, and M. Yung, “Proactive secret sharing or: How to cope with perpetual leakage,” in CRYPTO, pp. 339–352, 1995. [12] “Amazon elastic compute cloud (Amazon EC2).” https://aws. amazon.com/ec2/. [13] I. Damg˚ard and J. B. Nielsen, “Scalable and unconditionally secure multiparty computation,” in CRYPTO, pp. 572–590, 2007. [14] I. Damg˚ard, Y. Ishai, and M. Krøigaard, “Perfectly secure multiparty computation and the computational overhead of cryptography,” in EUROCRYPT, pp. 445–465, 2010. [15] E. Ben-Sasson, S. Fehr, and R. Ostrovsky, “Near-linear unconditionally-secure multiparty computation with a dishonest minority,” in CRYPTO, pp. 663–680, 2012. [16] G. Bracha, “An o(log n) expected rounds randomized byzantine generals protocol,” J. ACM, vol. 34, no. 4, pp. 910–920, 1987. [17] E. R. Berlekamp, Algebraic Coding Theory. Aegean Park Press, 1984. [18] M. Sudan, Efficient Checking of Polynomials and Proofs and the Hardness of Approximation Problems. PhD thesis, University of California, Berkeley, 1992. [19] T. Toft, “Primitives and applications for multi-party computation.” PhD Thesis. University of Aarhus., 2007. [20] A. Keromytis, R. Geambasu, S. Sethumadhavan, S. Stolfo, J. Yang, A. Benameur, M. Dacier, M. Elder, D. Kienzle, and A. Stavrou, “The MEERKATS cloud security architecture,” in ICDCSW, pp. 446–450, june 2012. [21] “Ibm’s cloud security solutions,” tech. rep.

Circuit depth 8 12 10 log2 `

TABLE III C IRCUIT SIZE OF REQUIRED COP FUNCTIONALITIES FROM [19]. k= SECURITY PARAMETER .

V. R ELATED W ORK Due to space constraints, we give a brief review of related work. An overview of various cloud computing security challenges and solutions can be found in [1]. The most relevant work is that of the MEERKATS Cloud Security Architecture [20]. MEERKATS relies on instruction set randomization, deliberate injection of noise and decoys to achieve a moving target defense with checkpointing and anomaly detection. MEERKATS also uses proactive secret sharing (see Section III-A) but only to enable migration of data by proactively sharing the data encryption key. In order to process the data, each cloud node must decrypt the data and then store the unencrypted data locally. By contrast, the proposed COPsecured cloud processes cloud state data in a distributed and unreconstructed form, thus providing more security without assuming trustworthiness of any individual cloud node. Other traditional security approaches (e.g., [21]) rely on various forms of access control, anomaly detection, and errorrecovery techniques. By contrast, a COP-secured cloud does not attempt to distinguish between normalcy and anomaly, instead remaining agnostic of system internals and host-specific details that are key in most other approaches. VI. C ONCLUSION This paper proposes an architecture for a resilient cloud computing infrastructure that provably maintains cloud functionality against persistent successful corruptions of cloud 6

Suggest Documents