2011 IEEE Conference on Open Systems (ICOS2011), September 25 - 28, 2011, Langkawi, Malaysia
CloudZone: Towards an Integrity Layer of Cloud Data Storage Based on Multi Agent System Architecture Amir Mohamed Talib
Rodziah Atan, Rusli Abdullah & Masrah Azrifah Azmi Murad
Faculty of Computer Science & IT Information System Department, University Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
[email protected]
Faculty of Computer Science & IT Information System Department, University Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia rodziah, rusli,
[email protected] encryption of certain types of data, and permit only specified users to access the data [2].
Abstract— The computing power in a cloud computing environments is supplied by a collection of data centers, or cloud data storages (CDSs) housed in many different locations and interconnected by high speed networks. CDS, like any other emerging technology, is experiencing growing pains. Data integrity checking of data and data structures has grown in importance recently in cloud computing due to the expansion of online cloud services, which have become reliable and scalable. In this paper we propose an integrity layered architecture of a typical cloud based on MAS architecture consists of two main layers cloud resources layer (cloud server-side) and MAS architecture layer (cloud client-side). At the cloud resources layer there exist massive physical cloud resources (storage servers and cloud application servers) that power the CDS. MAS’s architecture layers consist of two agents: Cloud Service Provider Agent (CSPA) and Cloud Data Integrity Backup Agent (CDIBA). This layered architecture named as “CloudZone”. A prototype of our proposed “CloudZone” will be designed using Prometheus Methodology and implemented using the Java Agent Development Framework Security (JADE-S).
Cloud Data Storage (CDS) is composed of thousands of cloud storage devices clustered by network, distributed file systems and other storage middleware to provide cloud storage service for users. CDS provide cloud storage resources for all kinds of clients, and the fee can be based on CDS capacity or CDS bandwidth periodically. CDS systems offer services to assure integrity of data transmission (typically through checksum backup). However, they do not provide a solution to the CDS integrity problem. Thus, the cloud client would have to develop its own solution, such as a backup of the cloud data items, in order to verify that cloud data returned by the CDS server has not been tampered with. Multi Agent Systems (MASs) are techniques in the artificial intelligence area focusing on the system where several agents communicate with each other. In [3], MAS is defined as “a loosely coupled network of problem-solver entities that work together to find answers to problems that are beyond the individual capabilities or knowledge of each entity”.
Keywords-cloud computing; cloud data storage; cloud service provider; integrity; multi agent system
I.
Integrity of the cloud data has to deal with how secure and reliable the data cloud computing. This could mean that you have provided secure backups, addressed security concerns, and increased the likelihood that your data will be there when you need it. In a cloud environment a certification authority is required to certify entities involved in interactions, these include certifying physical infrastructure servers, virtual servers, environments users and the networks devices. Data integrity in the cloud system means to preserve information integrity (i.e., not lost or modified by unauthorized users). As data is the base for providing cloud computing services, such as Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), keeping data integrity is a fundamental task. The integrity requirement lies in applying the due diligence within the cloud domain mainly when accessing data. Therefore ACID (atomicity, consistency, isolation and durability) properties of the cloud’s data should without a doubt be robustly imposed across all cloud computing deliver models.
INTRODUCTION
Cloud computing can be defined as “a type of parallel and distributed system consisting of a collection of interconnected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the CSP and cloud users” [1]. The computing power in a cloud computing environments is supplied by a collection of data centers, or cloud data storages (CDSs) housed in many different locations and interconnected by high speed networks. The ultimate challenge in cloud computing is data-level security, and sensitive data is the domain of the enterprise, not the cloud computing provider. Security will need to move to the data level so that enterprises can be sure their data is protected wherever it goes. For example, with data-level security, the enterprise can specify that this data is not allowed to go outside of the specific cloud server. It can also force
978-1-61284-931-7/11/$26.00 ©2011 IEEE
127
2011 IEEE Conference on Open Systems (ICOS2011), September 25 - 28, 2011, Langkawi, Malaysia
The main contribution of this paper is to introduce the first provably-secure and practical backup cloud data regularly that provide reconstruct the original cloud data by downloading the cloud data vectors from the cloud servers.
considers static data files and does not explicitly studies the problem of data error localization. Data stored in Amazon S3, Amazon SimpleDB, or Amazon Elastic Block Store is redundantly stored in multiple physical locations as a normal part of those services and at no additional charge. Data that is maintained within running instances on Amazon EC2, or within Amazon S3 and Amazon SimpleDB, is all customer data and therefore (Amazon Web Services) AWS does not perform backups.
In this paper, in section II we present a discussion of the related works. Section III provides an overview of our research methodology. Section IV describes our proposed an integrity layer based on MAS architecture (Proposed “CloudZone”). In, section V presents some concluding remarks. II.
RELATED WORKS
III.
We believe data security in cloud computing, an area full of challenges and of paramount importance, is still in its infancy now, and many research problems are yet to be identified. In cloud computing security there are two main areas that security experts look at securing in a cloud system: Virtual Machine (VM) vulnerabilities and message integrity between cloud systems.
RESEARCH METHODOLOGY
This research shall be carried out in five steps that defined as A Secure System Development Life Cycle (SecSDLC) as illustrated in Fig. 1. [14]: Phase 1: Investigation
Santos et al. [4] proposed a trusted cloud computing platform (TCCP) for Infrastructure as a Servive (IaaS). Using TCG/TPM, TCCP verifies the platform integrity of a computing node in IaaS before it offers services to cloud users; this can be leveraged in our CloudZone architecture to enhance the trustworthiness of weblet containers running on the cloud.
Phase 2: Analysis Phase 3: Design
Ari Juels et al. [5] described a Proofs’ of Retrievability (POR) model to ensure the outsourced data security. This scheme efficiently detects data corruptions and, achieves the guaranty of file retrievabilty. Shacham et al. [6] introduced a new model of POR, which enables unlimited no of queries for public verifiability with less overhead. Bowers et al. [7] proposed a theoretical framework for the design of POR. It improves the [5, 6] models. All the schemes produce weak security, because they work only for single server. Later, in their subsequent work Bowers et al. [8] introduced a HAIL protocol, which extended the POR schemes on multiple servers. HAIL achieves the integrity and availability of data in cloud. However, this protocol will not address all the data security threats.
Phase 4: Implementation Phase 5: Testing & Validation
\Figure 1. Secure System Development Life Cycle (SecSDLC)
Secure System Development Life Cycle (SecSDLC) Phases:
Ateniese et al. [9] described a Provable Data Possession (PDP) to verify the integrity of outsourced data; it detects the large fraction of file corruption, but no guaranty of file retrivability. In their subsequent work Pietro et al. [10] proposed a Scalable Data Possession (SDP), this scheme overcomes all problems in the PDP scheme [9], but this scheme also works only for single server. Later, Curtomola et al. [11] described a Multiple Replica-Provable Data Possession (MR-PDP), which is an extension of PDP to ensure data availability and reliability of outsourced data on multiple servers. Compare to PDP [9], it requires only single set of tags to challenging servers.
A. Phase 1 - Investigation: This phase defined the processes and goals, and documents them in the program security policies and its attributes. Table 1 shows the investigation phase that proposed data security policies, attributes and its definition. TABLE 1. INVESTIGATION PHASE
Filho et al. [12], proposed to verify data integrity using RSA-based hash to demonstrate uncheatable data possession in peer-to peer file sharing networks. However, their proposal requires exponentiation over the entire data file, which is clearly impractical for the server whenever the file is large. In the same work [13], proposed to ensure file integrity across multiple distributed servers using erasure-coding and blocklevel file integrity checks. However, their scheme only
128
DATA SECURITY POLICY
DATA SECURITY POLICY ATTRIBUTES
INTEGRITY
Backup the cloud data regularly (CloudZone)
DESCRIPTION Ensuring that data held in a system is a prior representation of the data and that it has not been modified by an unauthorized person.
2011 IEEE Conference on Open Systems (ICOS2011), September 25 - 28, 2011, Langkawi, Malaysia
A key aspect of Information Security is integrity. Integrity means that assets can be modified only by authorized parties or in authorized ways and refers to data, software and hardware. Data Integrity in cloud computing refers to protecting cloud data from unauthorized deletion, modification or fabrication. Managing an entity’s admittance and rights to specific enterprise cloud resources ensures that valuable data and services are not abused, misappropriated or stolen. By preventing unauthorized access, cloud computing environments can achieve greater confidence in data and system integrity. Additionally, such mechanisms offer the greater visibility into determining who or what may have altered data or system information, potentially affecting their integrity. Authorization is the mechanism by which a system determines what level of access a particular authenticated user should have to secure resources controlled by the system. Due to the increased number of entities and access points in a cloud environment, authorization is crucial in assuring that only authorized entities can interact with data. A cloud computing provider is trusted to maintain data integrity and accuracy. The cloud model presents a number of threats including sophisticated insider attacks on these data attributes.
Figure 2. Cloud Data Security Adversary Analysis Approach
Integrity requires that authorized changes are allowed, changes must be detected and tracked, and changes must be limited to a specific scope. Integrity is defined as a property of the object, not of a mission. To verify integrity, one must examine the net effects on the control cloud data related to integrity. That is, one must be able to: isolate the object, isolate all the behaviors that can modify the object, detect any modifications to the cloud data, and ensure that all transformations of the cloud data across the object are within the pre-defined allowable subset. Our MAS architecture facilitates the security policy of integrity for CDS. It is used to enable the cloud user to reconstruct the original cloud data by downloading the cloud data vectors from the cloud servers. Main responsibility of this agent is backing up the cloud data regularly from “CloudZone” and sending security reports and/or alarms to CPSA when: 9
Human errors when cloud data is entered.
9
Errors that occur when cloud data is transmitted from one computer to another.
9
Software bugs or virus.
9
Hardware malfunctions, such as disk crashes.
o
Weak Adversary: The adversary is interested in corrupting the user’s CDS stored on individual servers. Once a server is comprised, an adversary can pollute the original CDS by modifying or introducing its own fraudulent cloud data to prevent the original cloud data from being retrieved by the cloud user.
o
Strong Adversary: This is the worst case scenario, in which we assume that the adversary can compromise all the cloud servers so that it can intentionally modify the CDS as long as they are internally consistent. In fact, this is equivalent to the case where all servers are colluding together to hide a cloud data loss or corruption incident.
C. Phase 3 - Design: To ensure the security for CDS under the aforementioned secure MAS architecture, we aim to design efficient mechanisms for integrity of cloud user’s data “CloudZone” and achieve the following goals: Integrity: To enable the cloud user to reconstruct the original cloud data by downloading the cloud data vectors from the cloud servers using a new technique of backing up the cloud data regularly. Backup and Recovery: Even if you don't know where your data is, a CSP should tell you what will happen to your data and service in case of a disaster. "Any offering that does not replicate the data and application infrastructure across multiple sites is vulnerable to a total failure," Gartner says. Ask your CSP if it has "the ability to do a complete restoration, and how long it will take."
B. Phase 2 - Analysis: In this phase, our system requirements, components and performance will be analysis using adversary model [15, 8]. We will also evaluate the performance of our secure MAS architecture via implementation of the proposed agents type (Agent Agency). The results of this analysis will be described based on Cloud Data Security Adversary Analysis Approach that considers two types of adversary with different levels of capability as illustrated in Fig. 2.
MAS design shall be specified to determine the types of agents, events, protocols and agent capabilities, using the Prometheus methodology [16]. The sections below go through each of Prometheus’ three phases in more detail and will discuss the notations used by the methodology as well as some specific techniques.
129
2011 IEEE Conference on Open Systems (ICOS2011), September 25 - 28, 2011, Langkawi, Malaysia
The Prometheus methodology consists of three phases: 9
9
9
action on a certain resource such as a piece of code, but also executes that code. A policy specifies which permissions are available for various principals;
System Specification: where the system is specified using goals (see Fig. 3.) and scenarios; the system’s interface to its environment is described in terms of actions, percepts and external data; and functionalities are defined.
Certificates and Certification Authority: the Certification Authority (CA) is the entity that signs all the certificates for the whole platform, using a public/private key pair (PKI infrastructure).
Architectural design: where agent types are identified; the system’s overall structure is captured in a system overview diagram; and scenarios are developed into interaction protocols.
Delegation: this mechanism allows the “lending” of permissions to an agent. Besides the identity certificate, an agent can also own other certificates given to it by other agents;
Detailed design: where the details of each agent’s internals are developed and defined in terms of capabilities, data, events and plans; process diagrams are used as a stepping stone between interaction protocols and plans.
Secure Communication: communication between agents on different containers/hosts, are performed using the Secure Socket Layer (SSL) protocol. This enables a solid protection against malicious attempts of packet sniffing.
Each of these phases includes models that focus on the dynamics of the system, (graphical) models that focus on the structure of the system or its components, and textual descriptor forms that provide the details for individual entities.
Figure 4. Software architecture of one JADE agent platform Figure 3. “CloudZone” Design Goals
E. Phase 5 - Testing & Validation: To test and validate our system, we have asked a permission of the Cloud Service Provider (CSP) of Malaysian Institute of Microelectronic Systems (MIMOS) to allow us to test and validate our secure MAS architecture in their cloud computing platform.
D. Phase 4 - Implementation: Our system architecture will be developed using FIPA compliant JADE-S agent framework version 2. JADE (Java Agent DEvelopment framework) is a FIPA compliant software framework fully implemented in the Java programming language, which simplifies the implementation of MASs. The platform can be seen as a middleware (Fig. 4) providing a set of useful tools that support the debugging and deployment phase [17, 18, 20].
In order to test and validate our MAS architecture against the scale of the CDS system, we will measure the times required for the agents to travel around different number of cloud users before and after implementing our MAS technique based on the linearly over the Round Trip Time (RTT) for each agent.
JADE-S is formed by the combination of the standard version of JADE with the JADE security plug-in [19]. JADE-S includes security features such as user/agent authentication, authorization and secure communication between agents into the same platform. With more details: Permissions and Policies: Permission is an object that describes the possibility of performing an
130
2011 IEEE Conference on Open Systems (ICOS2011), September 25 - 28, 2011, Langkawi, Malaysia
IV.
PROPOSED “CLOUDZONE”
¾
Cloud Client-Side
A. “CloudZone” Overview CloudZone consists of two main layers cloud resources layer (cloud server-side) and MAS architecture layer (cloud client-side) is shown in Fig. 5. At the cloud resources layer there exist massive physical cloud resources (storage servers and cloud application servers) that power the CDS. MAS’s architecture layers consist of two agents: Cloud Service Provider Agent (CSPA) and Cloud Data Integrity Backup Agent (CDIBA). The responsibilities of these agents shown in table 2
¾ Cloud Data Integrity Backup Agent (CDIBA)
B. “CloudZone” Requirements
CDIBA
Cloud Resources Layer
Cloud Server-Side
¾
“CloudZone” only backs up the MS SQL databases. It does not back up other MS SQL files such as program installation files, etc.
¾
“CloudZone” does not support component-based backup.
¾
“CloudZone” does not use Visual SourceSafe (VSS) for backup and restore.
¾
The “CloudZone” supports backup and recovery of Windows Oracle 11i.
MAS Architecture Layer
CSP
¾ ¾
Display the security policies specified by the CSP and the rest of the agents. Designing user interfaces that prevent the input of invalid cloud data. Creating security reports/ alarm systems. Main responsibility is to enable the CDS by the new backup technique using Structural Query Language (SQL) programming.
With “CloudZone” Cloud Backup, you can select any of the following as backup objects:
Cloud Applications Social networking
Workflow
Data processing
CDS
¾
Oracle Server 11g running on Windows
¾
Microsoft SQL Server 2000, 2005 and 2008
¾
Microsoft Exchange Server 2003 and 2007
¾
MySQL server 5.x V.
Figure 5. “CloudZone”
CONCLUSIONS
This paper proposed MAS architecture based on integrity policy for secure CDS. The architecture consists of two types of agents: Cloud Service Provider Agent (CSPA) and Cloud Data Integrity Backup Agent (CDIBA). CSPA provides a graphical interface to the cloud user that facilitates the access to the services offered by the system. CDIBA is enable the cloud user to reconstruct the original cloud data by downloading the cloud data vectors from the cloud servers using a new technique of backing up the cloud data regularly using Structural Query Language (SQL) programming. “CloudZone” is proposed to meet the need of integrity layer the era of cloud computing.
Cloud data travels between cloud servers and cloud users as a series of standardized messages, queries, and events written in language scripts and sent using exist cloud protocols. CDIBA defines such events and the "choreography (CloudZone) that allows data to move back and forth between the cloud servers and cloud users. CDIBA define the set of rules to backup the cloud data within a "CloudZone" using a logical grouping of cloud components (user, server, application, software…etc) in which agents communicate with each other through a central cloud communication point. CDIBA is appeared of software that exist either internal to cloud applications or installed next to it. The functionality of CDIBA appeared as extensions of each cloud application and serve as the intermediary between the cloud software application and the “CloudZone”.
The authors would like to thank the anonymous reviewers for their valuable remarks and comments.
TABLE 2. RESPONSIBILITIES OF OUR PROPOSED MAS ARCHITECTURE
REFERENCES
Agent Name Cloud Service Provider Agent (CSPA)
ACKNOWLEDGMENT
[1]
Responsibilities ¾ Provide the security service task according to the authorized service level agreements (SLAs) and the original message content sent by the CDIBA and CDAuA. ¾ Receive the security reports and/or alarms from the rest of other agents to respect. ¾ Monitor specific activities concerning a part of the CDS or a particular cloud user. ¾ Translate the attack in terms of goals.
[2] [3]
131
R. Buyya, C. S. Yeo, and S. Venugopal. “Market Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008. J. Rittinghouse, and J.F. Ransome, “Cloud Computing: Implementation, Management, and Security”, CRC, 2009. E.H. Durfee, V.R. Lesser, and D.D. Corkill, “Trends in Cooperative Distributed Problem Solving,” IEEE Transactions on knowledge and data Engineering, 1989, pp. 63-83.
2011 IEEE Conference on Open Systems (ICOS2011), September 25 - 28, 2011, Langkawi, Malaysia
[4]
N. Santos, K. P. Gummadi, and R. Rodrigues, “Towards Trusted Cloud Computing. In USENIX HotCloud, 2009. [5] A. Juels, J. Burton, and S. Kaliski, “PORs: Proofs of Retrievability for Large Files,” In Proceedings of CCS ’07, pp. 584– 597, 2007. [6] H. Shacham and B. Waters, “Compact Proofs of Retrievability,” In Proceedings.of Asiacrypt ’08, Dec. 2008. [7] K. D. Bowers, A. Juels, and A. Oprea, “Proofs of Retrievability: Theory and Implementation,” Cryptology ePrint Archive, Report 2008/175, 2008, http://eprint.iacr.org/. [8] K. D. Bowers, A. Juels, and A. Oprea, “HAIL: A High-Availability and Integrity Layer for Cloud Storage,” Cryptology ePrint Archive, Report 2008/489, 2008, http://eprint.iacr.org/. [9] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson,and D. Song, “Provable Data Possession at Untrusted Stores,” In Proceedings.Of CCS ’07, pp. 598–609, 2007. [10] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scalable and Efficient Provable Data Possession,”In Proceedings of SecureComm ’08, pp. 1–10, 2008. [11] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, “MR-PDP: MultipleReplica Provable Data Possession,” In Proceedings of ICDCS ’08, pp. 411–420,2008. [12] D.L. Gazzoni Filho, and P. Barreto, “Demonstrating Data Possession and Uncheatable Data Transfer,” Book Demonstrating data possession and uncheatable data transfer, Series Demonstrating data possession and uncheatable data transfer, ed., Editor ed.^eds., Citeseer, 2006, pp. 150.
[13] T.S.J. Schwarz, and E.L. Miller, “Store, Forget, and Check: Using Algebraic Signatures to Check Remotely Administered Storage,” IEEE, 2006, pp. 12. [14] M.E. Whitman, and H. J. Mattord, “Management of Information Security”, Course Technology Press Boston, MA, United States, 2004, pp. 1-18 [15] C. Wang, Q. Wang, K. Ren, and W. Lou, “Ensuring Data Storage Security in Cloud Computing,”,IEEE, 2009, pp. 1-9 [16] L. Padgham, and M. Winikoff, “Developing Intelligent Agent Systems: A Practical Guide”, Wiley, ISBN: 978-0-470-86120-2, 2004, pp. 1-240 [17] P. Fabio, P. Agostino, and R. Giovanni, “JADE—A FIPA-Compliant Agent Framework”, CSELT Internal Technical Report. Part of this report has been also published in Proceedings of PAAM'99, 1999, pp. 97–108 [18] F. Bellifemine, A. Poggi, and G. Rimassa, “Developing Multi-Agent Systems with JADE”, In Proceedings of the 7th International Workshop on Intelligent Agent. Theories, Architectures and Languages (ATAL2000), LNCS, 1986, pp. 89-103, ISBN 3-540-42422-9, Springer [19] P. Agostino, G. Rimassa, and M. Tomaiuolo, “Multi-User and Security Support for Multi-Agent Systems”, In Proceedings of WOA Workshop, Modena, Italy, 2001, pp. 1-7 [20] F. Bellifemine, A. Poggi, and G. Rimassa, “Developing Multi-Agent Systems with a FIPA-Compliant Agent Framework”, Software— Practice and Experience, 2001, pp. 103–128, no. 31
132