60. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014. Security as a Service Model for Cloud Environment.
60
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014
Security as a Service Model for Cloud Environment Vijay Varadharajan, Senior Member, IEEE, and Udaya Tupakula, Member, IEEE
Abstract—Cloud computing is becoming increasingly important for provision of services and storage of data in the Internet. However there are several significant challenges in securing cloud infrastructures from different types of attacks. The focus of this paper is on the security services that a cloud provider can offer as part of its infrastructure to its customers (tenants) to counteract these attacks. Our main contribution is a security architecture that provides a flexible security as a service model that a cloud provider can offer to its tenants and customers of its tenants. Our security as a service model while offering a baseline security to the provider to protect its own cloud infrastructure also provides flexibility to tenants to have additional security functionalities that suit their security requirements. The paper describes the design of the security architecture and discusses how different types of attacks are counteracted by the proposed architecture. We have implemented the security architecture and the paper discusses analysis and performance evaluation results. Index Terms—Cloud security, security architecture, security and privacy.
I. I NTRODUCTION LOUD computing [1]–[3] has become an important technology where cloud services providers provide computing resources to their customers (tenants) to host their data or perform their computing tasks. Cloud computing can be categorized into different service deliver models such as Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Virtualisation [4] is one of the key technologies used in the IaaS cloud infrastructures. For instance, virtualisation is used by some of the major cloud service providers such as Amazon [2] and Microsoft [3] in the provision of cloud services. We will use the term tenant to refer to cloud customers who wish to access services from cloud providers. Tenants can themselves be using their virtual machines to provide services to their own customers; we will refer to customers (or users) as those who use the services of the tenants. Hence customers in our architecture are the customers of the tenants. In general, the tenants in the cloud can run different operating systems and applications in their virtual machines. As the operating systems and applications of the tenants can be potentially large and complex, they may contain security vulnerabilities. Furthermore, there can be several tenants on the same physical platform sharing resources in a cloud infrastructure. The vulnerabilities in operating systems and
C
Manuscript received December 25, 2012; revised November 13, 2013 and February 24, 2014. The associate editor coordinating the review of this paper and approving it for publication was G. Martinez Perez. The authors are with the Advanced Cyber Security Research Centre, Faculty of Science, Macquarie University, Sydney, Australia (e-mail: {vijay.varadharajan, udaya.tupakula}@mq.edu.au). This work was supported by the Australian Research Council under Grant DP140100410. Digital Object Identifier 10.1109/TNSM.2014.041614.120394
applications can be potentially exploited by an attacker to generate different types of attacks. These attacks can be targeted against the cloud infrastructure as well as against other virtual machines belonging to other tenants. So there is a need to design security architecture and develop techniques that can be used by the cloud service provider for securing its infrastructure and tenant virtual machines. However there are several issues that arise when developing security as a service for cloud infrastructures. In the current environment, the cloud service providers do not generally offer security as a service to their tenants. For example, in [5] Amazon mentions that security of tenant virtual machines is the responsibility of the tenants since they are free to run any of the operating systems or applications1 (though it claims to secure the underlying infrastructure). Hence tenants need to make their own arrangements for securing their virtual machines that are hosted in the cloud. Although tenants can use different security tools such as anti-virus and host based intrusion detection systems to secure their virtual machines, the limitations arise [6] due to these tools residing in the same system as the one being monitored and hence are vulnerable to attacks. Also some tenants may not be capable of securing their tenant virtual machines. Hence there is a need for the cloud service provider to offer security as a service to such tenants. Furthermore, security requirements for tenants may vary and some tenants may opt for more security services from the cloud provider while others may opt for the baseline default security. For example, a tenant who is running financial services on its virtual machines is likely to need more security measures compared to a tenant who is providing basic web hosting. However, greater the level of security measures taken up by the tenant from the provider, greater is the possibility for the cloud provider to get to know more about the tenant’s system. That is, the security mechanisms and tools offered by the cloud provider (as part of its security as a service) can gather more information about the operating system and applications running in the tenant’s virtual machines. This in turn may lead to greater privacy concerns for the tenant. Here privacy concerns refer to the ability of the provider to find details about the services and applications’ data in a tenant’s machine.2 Our main contribution in this paper is a security architecture that provides a flexible security as a service model that a cloud provider can offer to its tenants and customers of its tenants. Our security as a service model while offering a 1 Google goes on to say that it reserves the right to review the tenant’s applications including the tenant customer data. (https://developers.google.com/storage/docs/terms) 2 A tenant requiring such privacy has to employ techniques such as homomorphic encryption for protecting application layer information.
c 2014 IEEE 1932-4537/14/$31.00
VARADHARAJAN and TUPAKULA: SECURITY AS A SERVICE MODEL FOR CLOUD ENVIRONMENT
baseline security to the provider to protect its own cloud infrastructure also provides flexibility to tenants to determine how much control they wish to have over their own virtual machines. The baseline security is needed by the provider to ensure that malicious tenants are not attacking the cloud infrastructure or even hosting malicious software. Every tenant has to have the security functionalities that forms part of the security baseline, which offers basic security guarantees in its default mode of operation. However there will be other tenants who would require additional security services (on top of the baseline) from the cloud provider to meet their requirements as well as to protect them from other malicious tenants. Hence our security as a service model provides options to have additional security functionalities that suit tenants’ security requirements. These additional security functionalities can require the tenant to reveal more information about its services and applications, which may create privacy concerns for the tenant.3 Our approach offers a choice to the tenant to managing this tension between the privacy concerns and the security controls offered by the cloud provider. An important feature of our model is that it makes this trade-off between security and privacy explicit. Furthermore, the choice by a tenant to opt in for additional security services can provide the cloud provider to develop a framework for charging the tenants for these additional security services. The paper is organized as follows. We describe the threat model in Section II and consider the different types of attackers and attacks that can occur in the infrastructure as a service cloud environment. Then we summarize the capabilities of the security architecture that is proposed in this paper. Section III describes the security as a service model for cloud infrastructure. It describes the design of the security architecture and discusses how different types of attacks are counteracted by the proposed architecture. Section IV describes the implementation and analysis of the security architecture, and discusses the performance evaluation results. Section V describes relevant related work and provides a comparison with the capabilities of our security architecture. Finally, Section VI concludes the paper. II. T HREAT M ODEL Our system model involves cloud service provider which includes cloud system administrators, tenant administrators (or operators) who manage the tenant virtual machines, and tenant users (or tenant’s customers) who use the applications and services running in the tenant virtual machines. Cloud providers are entities such as Amazon EC2 and Microsoft Azure who have a vested interest in protecting their reputations. The cloud system administrators are individuals from these corporations entrusted with system tasks and maintaining cloud infrastructures, who will have access to privileged domains. We assume that as cloud providers have a vested interest in protecting their reputations and resources, the adversaries from 3 In a separate piece of work, we have developed a role-based encryption (RBE) scheme for cloud [35], which allows the tenant to store his data in an encrypted form in the cloud and to grant access to that data for users with specific roles.
Fig. 1.
61
Threat model.
the cloud provider perspective are malicious cloud system administrators.4 Consider a typical configuration of our system architecture shown in Fig. 1. In determining the threat model we need to look at the different types of attacks that are possible in such a configuration. The circle in the figure shows the source of the attack and the arrow head shows the target of the attack. We identify three domains in our architecture that are relevant to the threat model. There is the tenant domain comprising tenant administrators and tenant users. Each tenant has its own tenant domain. There is the cloud system domain which consists of cloud system administrators and the VMM platform (with its privileged domain and hardware). Then there is the cloud cluster domain comprising cloud system domains that constitute the cloud infrastructure. There can be attacks from tenant administrators on the tenant virtual machines. That is, the tenant administrators can exploit the vulnerabilities in the tenant virtual machine for malicious purposes. Such attacks can target both the cloud infrastructure as well as co-located tenants. For example, attacks such as VM escape [7] enable the virtual machine to exploit the vulnerabilities in the virtual machine monitor (VMM) and access the privileged domain and the host operating system. As virtual machines that belong to different tenants can be hosted on the same physical server, a malicious tenant who has obtained access to the privileged domain can perform different attacks on the co-located tenant virtual machines [7]– [10]. For example, the malicious tenant can perform denial of service attacks by crashing the server or starving the resources to other tenant virtual machines or capture and/or tamper other tenant virtual machine traffic. If the resources 4 We envisage the existence of malicious cloud system administrators; this is despite the cloud providers having processes and methodologies to ensure the trustworthiness and integrity of their cloud system administrators.
62
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014
allocated to other virtual machines are significantly reduced, then this can result in tenant virtual machines not being able to serve all the legitimate client requests, leading to violation of the Service Level Agreement (SLA) policies between the cloud service provider and the tenants. For example, [7], [8] presents vulnerabilities which enable the malicious tenant virtual machine administrator to access the console of the privileged domain and [9] presents the case of exploiting the VMM scheduler to cause resource starvation to the colocated tenant virtual machines. The paper [10] proposed signature wrapping and XSS techniques for exploiting the control interfaces for Amazon and Eucalyptus cloud and have complete control on the victim. Such attacks cannot be easily detected in the current state of art. Then there can be attacks from tenant users (customers). Consider, for instance, a tenant which is a software development company making use of cloud resources. Although the tenant administrators have provided host based security tools in their tenant virtual machines, a malicious tenant user (tenant employee) may be able to circumvent such security tools. Consider Fig. 2 where the virtual machines which belong to a single tenant are hosted on multiple physical servers. This can happen for efficiency reasons where the cloud service provider hosts different customer virtual machines on a single physical server. In Fig. 2, although VMM11 , VMM13 and VMM14 are hosting virtual machines that belong to single tenant, VMM12 is hosting virtual machines that belong to different tenants. In such cases, tenants may want to ensure that data can be shared among their own virtual machines but at the same time may wish to protect their virtual machines from other tenant’s virtual machines. For instance, in Fig. 2, if Tenant 1 has requested free communication between its virtual machines, and hence the cloud service provider decides not to monitor any of the communication between Tenant 1’s virtual machines; then a malicious tenant user will be able to exploit vulnerabilities in TVM13 to generate attacks on the other virtual machines (TVM11 , TVM12 ). In this case, the tenant administrator can request the cloud service provider for additional security mechanisms to deal with such malicious insider attacks within the tenant organization. In a general situation, a malicious tenant user or more importantly a malicious tenant administrator can generate attacks against virtual machine belonging to another tenant (e.g. TVM21 ). The cloud service provider needs to provide secure isolation between the tenant virtual machines. However the cloud service provider may not be aware of the operating systems and applications running in a tenant virtual machine. Hence it is not an easy task for the cloud service provider to enforce security policies on the tenant virtual machines. Furthermore since the elastic nature of cloud allows the ability to dynamically increase the resources allocated to tenant virtual machines, the attacker can use this capability in compromised tenant virtual machines to generate sophisticated attacks. Alternatively, there can also be attacks from malicious cloud system administrators of the tenant virtual machines. These administrators have the technical expertise and the monetary motivation to misuse the privileges to examine the tenant data and monitor the tenant’s services at will. For instance, the privileged domain Dom0 in Xen [11] based cloud system has
access to runtime state information of tenants. Even if the tenant is using secure communications, Dom0 administrators can obtain all the required information from the memory allocated to the tenant when it is in unencrypted form. If the VMM fails to provide proper isolation between the tenant virtual machines then the co-located tenant can also access the victim tenant information. For example, [12] discusses techniques for retrieving the private key from the co-located tenant virtual machine. Hence the malicious party which has access to the private key information of the tenant can easily perform man in the middle attacks. Even if the system administrators are themselves benign, there could be exploits against the privileged domain, Dom0, which can compromise the tenants’ data. Such attacks on commodity VMMs can arise both out of the complexity of the VMM software as well as due to mis-configurations [7]– [10], [12]. Hence there can be: (i) insider attacks from the tenant domain such as a malicious tenant user described above, and (ii) insider attacks from the cloud service provider administrators. Finally, there can be attacks from the Internet targeting the cloud service infrastructure (including the tenants), and attacks from cloud infrastructure on external hosts in the Internet. The cloud has several attractive features for the attackers to compromise the tenant virtual machines and use them for generating attacks. For instance, a tenant virtual machine can be the victim of a denial of service attack or an attacker can compromise the tenant machine and use it for the generation of the attack traffic [13]. As the tenant is often charged on usage, both these attacks can incur financial losses to the victim tenant. Furthermore some of the attacks can also lead to disputes between the cloud service provider and its tenants. For example, if the tenant virtual machine is compromised and used for generating attack traffic with spoofed address, this could result in disputes between the cloud service provider and its tenants. The cloud service provider can charge the tenants for the generated attack traffic and the tenant can deny these charges since the attack traffic has spoofed identity. The study in Spoofer project [14] confirms that such spoofing attacks are easily possible.5 Furthermore, the attacker can use the compromised tenant virtual machine to generate different types of attack traffic such as ICMP flood, UDP flood, TCP SYN flood and Smurf attacks. Our Approach: Our security architecture assumes that the cloud service provider provides a trusted VMM platform (for example, equipped with Trusted Platform Module (TPM) [15]). We also assume that the security components of our architecture embedded within the VMM are trusted. The cloud provider also provides controls and auditing procedures which ensure the physical security of the cloud infrastructure to overcome hardware based attacks such as cold-boot attacks. Our security architecture allows a cloud provider to provide a default baseline set of security measures even if the tenants do not require any security feature for their virtual machines. The rationale behind this design choice is as follows. If security measures are not provided by the cloud provider, then 5 Something of the order of 25% of the autonomous systems permits such spoofing attacks.
VARADHARAJAN and TUPAKULA: SECURITY AS A SERVICE MODEL FOR CLOUD ENVIRONMENT
Fig. 2.
63
Cloud scenario.
the provider will not be able to detect any attack on its own infrastructure itself as well as on other tenant virtual machines and hosts on the Internet. For example, without any monitoring done by the cloud provider, malicious tenants can use their virtual machines to flood other tenant virtual machines or target different services (such as DNS and web server) within the provider infrastructure. On the other hand, greater the level of security measures, greater the possibility for the provider to get to know more details about the tenant’s operating system and applications that are running in its virtual machines. It is up to the tenant to decide whether it is acceptable depending on its privacy concerns. Our security architecture protects the cloud infrastructure from attacks generated within a tenant virtual machine by the tenant administrator and tenant users. It also protects colocated tenants from the attacks generated by such tenant entities. In our architecture, the baseline security mechanisms (in the SPAD component, Section III.D) allow the cloud service provider to protect their infrastructure, legitimate tenants as well as external hosts from the attacks by malicious tenants. The security policies and mechanisms (in the security component TSAD, Section III.D) in our architecture deal with the insider attacks from malicious tenant users. Our security architecture also protects tenants from threats posed by cloud system administrators who misuse their privileges and exploits against privileged domain. It enforces role based access control to partition the system administrators to different roles which enables to restrict their privileges to specified set of actions. This in turn helps to contain the harm of the insider attacks; in particular, it removes the possibility of a single administrator (or a single role) having all the privileges. Our security architecture also provides mechanisms to deal with some attacks on the VMM. This is done using a Security Gateway component which specifies cluster wide policies and mechanisms to detect attacks on the VMM platforms. Finally our security architecture provides the ability to charge a tenant depending on the security services that are
required by the tenant. For example, a tenant virtual machine that is running financial services may need more security measures than a tenant that is running basic web hosting. III. S ECURITY AS A S ERVICE M ODEL In this section, we describe our security as a service model for cloud infrastructure. First we give an outline of cloud architecture in Section A. We describe the assumptions made in the design of the security architecture in Section B. Section C gives an overview of the basic security architecture. Section D describes the details of the components in the security architecture. Section E considers the different types of attacks and how our architecture is able to deal with such attacks. Finally Section F extends the security architecture to counteract further attacks on the VMM such as VMM compromise. A. Cloud Architecture Overview Let us consider a generic cloud service provider architecture as shown in Fig. 2. Tenants (T1, T2, T3) are hosting one or more virtual machines on the cloud service provider infrastructure and remotely managing their virtual machines. The Cloud Controller (CLC) is the main interface for the cloud tenants and it is the top level management for the IaaS cloud. It can query other controllers such as the Cluster Controllers (CC) and Node Controllers (NC), Storage Controller (SC) to make high level decision on the implementation of the tenant virtual machines and storage of the data. CLC has policies required in the IaaS infrastructure. It also handles the authentication service for the users. Storage Controller provides storage for the VM images, and user data. Node Controller is implemented on each physical server. Node Controller is responsible for managing the tenant virtual machines hosted on each VMM. A group of Node Controllers report to the Cluster Controller. The cloud service provider can provide prebuilt VM images with generic OS and applications (such as web server) or the tenants can transfer their specific VM that is currently running
64
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014
in their environment into the cloud. Our security architecture extends the NC with the VMM security functionalities. In our implementation, the tenant virtual machines are hosted on the Xen [11] VMM. We have used Xen because it is open source and we have prior experience in using Xen for system development. Furthermore, some commercial cloud service providers such as Amazon [2] use a modified version of Xen for hosting the tenant virtual machines. In Xen, there is a privileged domain called Dom0 which is used for administering the guest domains referred to as Dom U or unprivileged domains. Xen VMM has physical control of the resources used by the tenant virtual machines, and it is able to monitor the interactions of its tenants. The security architecture proposed in this paper focus mainly on the infrastructure-as-a-service (IaaS) platform. There are also other delivery models for cloud such as software as a service (SaaS) and platform as a service (PaaS). In the case of SaaS or PaaS, the tenants have very limited access to the cloud resources compared to the IaaS. Hence the attacks that can be generated in SaaS or PaaS are limited to the specific application software or platforms to which they have access. For example, if an attacker can exploit the vulnerability in Gmail, the attacks are limited to the Gmail application. The SaaS and PaaS providers can use security features available in the operating system and traditional security tools to protect from such malicious tenants. Hence the proposed techniques can be used as an additional layer of defence in SaaS and PaaS deployments. In the case of IaaS, the tenants have complete control on the virtualised systems (applications, operating system and the resources allocated to the virtualised system). Hence the tenants can install any malicious software on their virtual machines to generate attacks. As mentioned in [16] there are significantly more security challenges in IaaS compared to SaaS and PaaS. Hence we address IaaS platform in this paper. Let us first consider the assumptions that our architecture makes in Section B. B. Assumptions Let us now consider the assumptions made in our architecture. We assume that tenant virtual machines accept a security baseline (mentioned earlier) functionalities specified by the cloud service provider. If there are any special requirements for the tenant which do not comply with the baseline security requirements of the cloud service provider, then these need to be resolved at the time of the registration. The security baseline is enforced by our architecture in the node controller. With respect to the applications running in the tenants, we assume that the tenants are aware of the applications that are running in their own machines. We also assume that tenants may have their own host based security tools (HBST) running in their virtual machines. Furthermore, default security baseline provides the best option for those tenants who are concerned about the privacy of the applications and services running on their virtual machines. That is, in this case, the tenants do not reveal any additional information on their applications and services to the cloud provider. Alternatively, a tenant may choose to use the applications provided by the cloud provider. For example, home users who
Fig. 3.
Basic security architecture.
are temporarily using the cloud may use the default virtual machine images provided by the cloud service provider. In this case, the cloud tenants may not have complete knowledge of the applications running in the default images. However the cloud provider will be providing the security baseline. We do not consider the situation of a malicious cloud provider providing malicious applications to its tenants in this paper. As we mentioned earlier in the threat model section, the cloud service provider has a vested interest in protecting his/her reputation, and hence does not deliberately provide malicious applications. Also a tenant can take some precautions in choosing a suitable cloud provider; we have addressed this issue of how a tenant can find a trusted cloud service provider in another paper [17]. As mentioned earlier, we assume that the VMM platform is trusted. That is, the VMM software is booted up in a trusted manner and is trusted to function correctly. Also the VMM software does not have any security vulnerabilities. Also the security components developed in this architecture which reside in the VMM are trusted. C. Security Architecture Overview Consider the basic security architecture diagram shown in Fig. 3. As mentioned above, the tenants may wish to have their own host based security tools (HBST) to run on the virtual machines that they are obtaining from the cloud provider. Since host based security tools have good visibility into the system being monitored, this acts as a primary layer of defense in our security architecture. The other important components in our security architecture shown in Fig. 3 are the Service Provider Attack Detection (SPAD) and the Tenant Specific Attack Detection (TSAD) components. First let us look the operation of our architecture at a high level. The tenant virtual machine traffic is received by the SPAD component. SPAD enforces the security baseline policies required by the cloud service provider. If a tenant virtual machine’s traffic violates any of the security policies in the SPAD, then the tenant virtual machine is isolated and an alert is generated to the tenant administrator and the cloud system administrator. In such cases, the tenant virtual machine can be activated only after the issues are resolved by the tenant administrator and the cloud system administrator. The security policies enforced by the SPAD component are concerned with the detection of spoofed source address and
VARADHARAJAN and TUPAKULA: SECURITY AS A SERVICE MODEL FOR CLOUD ENVIRONMENT
associated attacks. The SPAD component also has policies for logging traffic from tenants. Since the SPAD security policies are enforced on all the tenant virtual machines, they are designed to be lightweight and provide basic security baseline with minimal privacy violations. As the SPAD security mechanisms are needed for secure operation of the cloud provider, we envisage that these services will be offered to the tenant by the provider without any additional charges. In Section III.D we discuss the SPAD security mechanisms in detail and describe the types of attacks that these mechanisms are able to counteract. The traffic which is validated by the SPAD component is then forwarded to the actual destination or to the TSAD depending on the tenant’s service registration requirements. The traffic is forwarded to the actual destination if the tenant has not requested for any additional security services (apart from the default security baseline) from the cloud provider. If the tenant requires additional security from the cloud provider, then the traffic is forwarded to the TSAD component. The TSAD enforces tenant specific security policies on the tenant traffic. The security policies in the TSAD are decided by the tenants at the time of registration. As tenants’ requirements can change, tenants are able to update the security policies in TSAD. However to minimize misuse, policies can be updated with the consent of the cloud provider. Now let us consider the operation of the TSAD component in our security architecture. If the traffic does not violate any of the security policies in TSAD then it is forwarded to the actual destination. If the traffic is violating any of the security policies in TSAD, then the traffic is dropped or rate limited according to the tenant requirements. For example, the tenants can specify the policies to drop the traffic if it matches with any of the attack signatures. Alternatively, the policies can be configured to rate limit the traffic in the case of sudden burst traffic. TSAD can also be used for pre-monitoring the traffic destined to the tenant virtual machines. In this case, the incoming traffic is first received by the TSAD. This case is similar to the case in traditional networks where a security gateway such as a firewall monitors all the incoming traffic that is destined to the servers. The tenants are charged additional amounts depending on the overhead of the TSAD security policies. The TSAD component contains policies based on signatures and anomalies to detect attacks. These include dropping traffic that match with attack patterns, rate limiting traffic above specified thresholds (for instance, for ICMP, UDP and SYN packets), dynamic allocation of resources to tenants as well as identifying malicious communications using access policies on protocols and ports. TSAD policies also address runtime state based mechanisms not only to detect suspicious processes running in the tenants but also for ensuring the specified security related processes are running in the tenants, thereby validating security of tenants. We describe the mechanisms associated with the TSAD policies in Section III.D. Specific examples of policy specifications are given in the appendix. D. Component Description Service Provider Attack Detection (SPAD): SPAD is designed to enforce security policies in the baseline that is
65
offered by the cloud provider. They are intended to minimize the attacks on the cloud provider infrastructure as well as preventing attacks between the tenant virtual machines. Note SPAD policies are enforced on all the tenant virtual machines. In our architecture, the basic SPAD security policies prevent attacks with spoofed source address from the compromised tenant virtual machine and maintain traffic logs originating from the tenant virtual machines for detecting anomalies.6 Spoofing is one of the fundamental challenges which make it extremely difficult to deal with the attacks in the current environment [14]. Hence one security objective of the SPAD is to ensure that the tenant virtual machine will not send malicious traffic with spoofed traffic to external hosts. This is achieved by the SPAD monitoring all the traffic that is originating from the tenant virtual machines and dropping the traffic with spoofed source addresses. The traffic with correct source address is logged and forwarded to TSAD or actual destination depending on tenant’s security requirements. This mechanism in SPAD helps to prevent attacks from the tenant virtual machines with spoofed source addresses. Even if the attacker was successful in exploiting the vulnerability in the operating system or applications in the tenant machine, the attacker is not successful in generating the attacks with spoofed source address on other cloud tenant virtual machines or on the cloud infrastructure or external hosts on the Internet. In practice, the cloud provider will have (in fact, has to have) basic monitoring mechanisms for billing and charging. For example, if the cloud provider does not maintain a traffic log, then it will not be possible to resolve the billing issues that can arise between the tenant and cloud providers. Furthermore, tenants may not even be aware of their virtual machines might have been compromised, it is important for the cloud provider to maintain the traffic logs from the tenant virtual machines. For example, consider a tenant who has downloaded and executed a confidential file on its virtual machine since it appears to have been sent from his manager or friend, but in fact has been sent by an attacker. This can be a common occurrence. As soon as the file is executed, the malicious program in the file say makes copies of the file and distributes it to other tenants and/or hosts in the Internet, with correct or spoofed source address. Not only the contents of the file are compromised but this also increases the usage of network resources of the tenant, and hence is chargeable by the cloud provider. However since the tenant is not aware of the compromise of its virtual machine, the tenant may refuse to pay the additional charges incurred. These issues can be further complicated if the malicious file was distributed with spoofed source address. Hence there is a need for the cloud provider to resolve such issues which requires secure logging of traffic. Furthermore, note that the security enforced by the SPAD is only at the network level. Those tenants who are concerned about the privacy of their application data need to consider the use of encryption techniques such as homomorphic encryption [18] to protect their data. If the traffic originating from a tenant virtual machine has spoofed source address, then SPAD isolates the tenant virtual 6 Recently Amazon has started validating the IP address from all the tenant virtual machines for EC2 environment.
66
Fig. 4.
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014
Security processes of Sophos Security tool.
machine and generates an alert to the tenant administrator and the cloud system administrator. Hence, SPAD enables the tenant administrator to become aware of the compromise of its virtual machine. Note that attacks are still possible from a compromised tenant virtual machine with correct source address. However since the traffic from the tenant virtual machine is logged in SPAD, this provides a mechanism for the attacks to be traced and validated. For example, consider an attacker using a compromised tenant virtual machine to flood other tenants or the cloud infrastructure. If the victim is able to detect such an attack,7 then the victim can inform the cloud system administrator who can in turn determine the attacking source from the source address of the attack traffic and isolate the compromised tenant virtual machine. Tenant Specific Attack Detection (TSAD): TSAD component enforces tenant specific attack detection policies. In our architecture, the tenant can request the cloud service provider to enforce signature based detection and/or anomaly based detection policies in the TSAD. First, the tenant is able to specify application specific attack signatures depending on the applications that are running in its virtual machines. This will vary with the tenant as different tenants can have different applications and services at different times. Furthermore, greater the information that a tenant is willing to reveal about the applications and services that are running in its virtual machines, more specific security mechanisms can be implemented by the cloud service provider to detect service or application specific attacks. Though this provides a higher level of security, the tenant is revealing more information to the provider and hence potentially its privacy can be reduced. Let us now consider how such fine granular security can be enforced by the cloud service provider on the tenant virtual machines with the process validation (Pro_Val) module in the TSAD. We will also discuss how different types of attacks can be detected with the (Pro_Val) module. Pro_Val is implemented in the anomaly detection part of TSAD and it monitors the resources used by each tenant vir7 However it may not be always easy for the victim to determine that it is being attacked (it depends on the type of attack) or where it is coming from (to identify the cloud system administrator who should be contacted).
tual machine such as memory, CPU, network, disk space and the interactions of the tenant virtual machines. For example, Fig. 4 shows the security related processes for Sophos security tool that are running in Windows tenant virtual machine. Pro_Val checks (i) if any of the required critical processes are not running in the tenant virtual machines and (ii) if there is any suspicious process running in the tenant virtual machine. For example, attacks such as conficker and torpig disable any security tool and critical services such as error reporting, auto updates and windows defender services. In addition, there can be malicious hidden processes running in the virtual machine. If Pro_Val observes any suspicious behaviour, then the tenant virtual machine is isolated and further analysis is performed to detect if there is ongoing zero day attack. Let us now consider the operation of the Pro_Val module. The Pro_Val queries the tenant virtual machine to report the processes running in the machine. When the tenant virtual machine reports the running processes in the report TVM_Rep, Pro_Val obtains the list of processes that are actually running in the memory assigned to the tenant virtual machine. The report from the tenant virtual machine is untrusted since it could have been compromised by an attacker and the report may not include the processes that are controlled by the attacker. However the report obtained by Pro_Val is trusted since it contains the list of all the processes that are actually running in the memory allocated to the tenant virtual machine. Since the attacker does not have access to the Pro_Val component, the attacker cannot alter the process list observed by the Pro_Val component. (We will mention later some attacks which can bypass the VMM itself). Pro_Val first checks if the host based security tool related processes (see Fig. 4) are running in the tenant virtual machine. If the tenant virtual machine is compromised, then the processes related to the security tool in the tenant virtual machine will not be detected in the Pro_Val report. In such cases, the tenant virtual machine is considered to be compromised with malware. Hence attacks such as conficker and torpig which disable security tools in the tenant virtual machine are detected by Pro_Val. If security related processes are found to be running in the Pro_Val report, then the processes listed in the TVM_Rep is compared with the Pro_Val report. If there is any variation in the list of processes reported by the tenant machine and the list of processes observed by Pro_Val, then the tenant machine is considered to be compromised or infected with malware. For example, the attacker can use rootkits such as URK and AFT to compromise the tenant virtual machines and ensure that such processes are not detected by the security tool in the tenant virtual machine. Such attacks are detected by Pro_Val. The query by Pro_Val is carried out at certain specified times. Hence there is the possibility of some new malicious process being initiated and terminated between these successive Pro_Val query times. Hence to minimize the risk of false positives and false negatives, this query step is repeated several times, especially if there is a variation in the number of processes. If the variation persists after repeated validations, then the tenant machine is considered to be malicious and the hidden processes are detected by comparing process reports. On the other hand if security related processes are running
VARADHARAJAN and TUPAKULA: SECURITY AS A SERVICE MODEL FOR CLOUD ENVIRONMENT
Fig. 5.
TVM process validation.
in the tenant virtual machine and if no hidden processes are detected, then the traffic is forwarded to the destination. In our architecture, we not only analyze the processes but also analyze the files that have been accessed by these processes. However, as mentioned above, there can be situations where there can be attacks which bypass the VMM introspections. This can happen if the rootkits are able to modify the fields in the kernel data structures thereby manipulating the views created by the VMM introspection [19]. In order to overcome these attacks, it is necessary to remove the ability to manipulate kernel data structures such as global descriptor tables (GDTs) and interrupt descriptor tables (IDTs). Alternatively, hardware based mechanisms can be used to detect all the processes that are actually running including the rootkit processes. In the case of x86 based architecture, the CR3 register can be used to identify all the processes that are actually running, as it holds the address of the top-level page directory for the current context. Hence these kernel level attacks can be detected as writes to CR3 are trapped [20]. Furthermore, though Pro_Val allows us to detect different types of attacks, it reveals information about the tenant applications to the cloud service provider. For example, Fig. 5 shows the output using VMM based introspection. From the process list, the cloud service provider can easily determine that the tenant is running DNS application server in its virtual machine. Hence the privacy of the tenant is reduced. Also as the number of security policies requested by the client increases, it will increase the overhead on the servers of the cloud services provider. The cloud service provider therefore has the ability to charge additional amounts to the tenant depending on the overhead cost of these policies. E. Detection of Attacks Insider Attack from Tenant Domain In this section, we discuss how our architecture can be used by the tenants to deal with insider attacks from their users (tenant users in their domain). In our architecture, the tenant administrators can make use of the TSAD component to detect insider attacks. TSAD can be used to log the activity of the users on their systems. The logs can help to extract the user behaviour, identify security policies that need to be enforced on the user and also to analyse the attacks if the malicious insider is successful in exploiting the vulnerabilities in the tenant virtual machines. However monitoring user activity may not be effective against malicious insiders who have joined recently and who do not have much history. Hence TSAD detects the attacks by
67
monitoring the user activity and system state for suspicious behaviour. TSAD maintains logs of all activities of users and processes within the system. For example, the execve system call corresponds to the commands typed by the user. The user activity is extracted by filtering the execve system calls and relating the activity based on the timestamps and events such as login and chdir. In addition, the system state is extracted by monitoring the applications or processes running in the system, from the usage of resources by different processes, from the privileges of users accessing the system, and from the traffic that is generated.8 At any time, if the user activity or the system state is found to be suspicious or if the traffic from the user system is found to be matching with the attack signatures, then TSAD generates alerts to the tenant and cloud system administrators. For instance, TSAD is able to detect if a user installs an application that is not permitted (according to the tenant’s policies) or disables a mandatory application (such as a security tool). This may happen, for instance, if a program developer with admin privileges installs a Skype application, which is not permitted by the tenant organisation’s policies. This enables our architecture to detect the insider attacks by tenant users by monitoring user activities, system state and traffic originating from the system for suspicious behaviour. Insider Attacks from Cloud Service Provider In this paper, as mentioned earlier, we consider the cloud service provider to be a trusted entity. However there can be cloud system administrators involved in the management of the cloud infrastructure performing a range of activities such as software development, support, testing, and system and network administration. These system administrators will need different levels of access to the resources in the cloud to perform their tasks. However the privileged domains in the current VMMs do not support fine granular access control for the cloud administrators. An important principle in the design of secure systems is the notion of least privilege; that is, system administrators will need to have only those privileges that are needed for the tasks at hand. Accordingly, in our current system, we have developed a simple role based access control model for users of the cloud infrastructure from the cloud provider domain. The roles in our model are similar to the roles in any of the traditional role based access control techniques in that each role is a collection of privileges related to tasks in our system. In the current system, we have identified four types of cloud system administrators in different domains, namely Cloud Tenant Operator (CTO), Cloud Tenant Administrator (CTA), Cloud System Administrator (CSA) and Cloud Cluster Administrator (CCA). The roles CTO and CTA have privileges that interface with the administration of tenant virtual machines, whereas the roles CSA and CCA correspond to the more traditional roles in the cloud infrastructure, which are not directly related to the tenants. In this paper, we will treat the CSA role to be a single role, though this role can be decomposed into several other roles. We will not address role decomposition in this 8 Techniques such as the one described in [21] can be used to determine the user behaviour by analysing the commands typed by the users.
68
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014
paper. The CTO role has access privileges related to the SPAD component in our architecture. CTO has read level access to the alerts from the SPAD. In case of SPAD policy violation, the CTO can stop the tenant virtual machine that is generating attack traffic with spoofed source address. CTOs require this level of access on the tenant virtual machines since the violation of SPAD policies can lead to attacks on the cloud provider infrastructure and other tenant virtual machines. Also all actions by the CTOs are logged and monitored by the CTAs. In terms of the role hierarchy, CTA’s inherit all the privileges of CTOs. The CTAs have access privileges on the TSAD component. At the time of registration, the tenants discuss their requirements with the CTAs. Since TSAD supports techniques such as process validation, the CTAs have access to fine granular information on the application layer information in the tenant virtual machines (except when the data is encrypted by the tenant such as homomorphic encryption). The CTA has privileges to specify TSAD policies in conjunction with the CSA, to read SPAD and TSAD alerts as well as the privilege to respond to TSAD alerts. The Cloud System Administrator (CSA) has the privilege to access to SPAD and TSAD alerts to be able to respond to them (e.g. to rate limit or drop the traffic). The CSA role’s privileges also include the ability to configure TSAD policies and to manage them for consistency. Such a role based model allows us to partition the privileges into various categories thereby ensuring that only users in appropriate roles can perform administrative operations and access sensitive information related to tenants and cloud infrastructure. This not only gives the benefits of a role based system such as flexible security management and the ability to handle better the dynamic changes in user population, but also helps to make the system more accountable. Denial of Service Attacks Not only a tenant machine can be subjected to a denial of service attack, but also the compromised tenant virtual machine can be used by the attacker to generate further attack traffic. For instance, a tenant virtual machine can be used to flood the victims with attack traffic such as ICMP, UDP and TCP SYN floods. Also the attackers can make use of the elastic nature in the cloud to allocate more resources to the compromised tenant virtual machines and use them in the generation of the attack traffic. This could lead to serious attacks in the Internet. Hence there is a need for techniques to minimise denial of service attacks originating from tenant virtual machines. Our architecture counteracts the denial of service attacks in the following manner. The default security policies in the SPAD minimize different types of denial of service attacks from the compromised tenant virtual machines. For example, since SPAD validates all the tenant traffic for correct source address, it helps to eliminate Smurf attack and ICMP, UDP, TCP SYN attacks that are generated with spoofed source address. In a Smurf attack, the attacker can command the compromised tenant virtual machine to spoof the source address with that of the victim machine and send a flood of request
messages to a broadcast address. Since the source address of the attack traffic is spoofed with the address of the victim, all the machines that receive this request will send a reply message to the victim machine. Also the reply packets will be generated with the amplification factor N, where N is the number of hosts present in the broadcast domain. This can result in a severe reply flood at the victim’s machine or network. Since SPAD eliminates any spoof address based traffic, this attack is minimized. Let us now consider how flooding with correct source address can be minimized. In the case of ICMP and UDP traffic, the tenant specifies maximum threshold (λ) for ICMP and UDP traffic that can be generated from tenant virtual machines. The tenant can request the cloud service provider to drop the traffic if the ICMP or UDP traffic exceeds the threshold and raise an alarm to the tenant administrator. Such a policy in TSAD will be negotiated at the time of registration. In the case of TCP SYN flooding attacks, the attacker exploits the weaknesses present in the three-way handshake process. The attacker floods the victim machine with SYN packets and does not respond with the final acknowledgement packets. This leads to half-open connections at the victim machine. The victim has a data structure in its memory describing all the half-open connections. If the attacker can cause overflow at the victim with the half-open connections, then the victim cannot accept any new incoming connections. In our architecture, a tenant virtual machine can be the victim or it can itself be the attacker. There are several scenarios in this attack, all of which we will not go through here. TCP SYN flooding is minimized by setting the threshold ( SYN Final ACK ≥ γ). The threshold is enforced only when the tenant virtual machine is not responding with the final acknowledgement after receiving the SYN-ACK packet. Hence this will not have any impact on the legitimate cases where the tenant virtual machine is legitimately responding with final acknowledgements. F. Extended Security Architecture Let us now extend our basic security architecture for VMM platforms to the scenario shown in Fig. 2 with cloud clusters. The extended security architecture has an additional security component namely Cloud Cluster Security Gateway (CCSG) that is responsible for a clusters wide security policies and mechanisms (see Fig. 6). The main objectives of this extended security architecture are to provide mechanisms that can help to detect attacks on a VMM itself as well as to detect attacks across multiple VMM platforms. Hence CCSG can be thought of as providing an additional layer of defense. The extended architecture introduces another role namely Cloud Clusters Administrator (CCA) who is responsible for managing security policies at the cluster level. CCSG is implemented as virtual machine on the cluster controller and only CCA has access to this privileged domain. Consider now for instance a VMM attack that compromises the security policies in the VMM (in SPAD and TSAD components). Such a compromise can lead to attackers creating subtle attacks of giving more resources to a particular tenant and starving the other co-located tenants. CCSG has policies
VARADHARAJAN and TUPAKULA: SECURITY AS A SERVICE MODEL FOR CLOUD ENVIRONMENT
and mechanisms to detect such attacks. We outline the CCSG policies in the following four different categories. In the category related to tenants, there is the policy on single sign-on for users of tenants. A tenant’s users may need to access services provided by several of the virtual machines in different VMM cloud platforms and clusters. In this context, it is efficient to perform authentication once at the cloud clusters domain and this is achieved at the CCSG. Also when a tenant requests for a default virtual machine anywhere in a cluster, it is the policy in the CCSG which determines as to what services are offered in a default virtual machine configuration (such as VM with Microsoft Windows, VM with Linux/Unix) as well as how often these services need to have security updates. Then there is the category of policies which relate to attacks happening over multiple VMM platforms in clusters. For instance, CCSG has policies on when migration of virtual machines can occur between different VMM platforms in the clusters. For migration, though the policy in the source TSAD (from where the migration is happening) and the policy in the destination TSAD (to where the migration occurs) need to be satisfied, there are also meta-policies in the CCSG (such as Chinese wall policies that ensure there is no conflict) which determine when migration can occur. CCSG also has an important role to play in the detection of attacks that rely on correlating traffic from multiple VMM cloud platforms. In our architecture, CCSG has policies related to logging traffic from multiple platforms and analyzing them for detecting end to end attacks and denial of service attacks where each compromised machine may only contribute to small amount of attack traffic. Then there is a category of policies that are related to overall Service Level Agreements (SLAs) with the tenants. It is normal to expect such SLAs to have conditions that can only be observed at the clusters level in our architecture rather than at an individual platform level. These conditions vary with the services offered (such as DNS and web services), and they can be affected by attacks in any one of the platforms but the policy violation is only detected at the CCSG level. For instance, attacks increasing the response times above the specified threshold in the SLA or fall in the traffic level below specified minimum threshold level in the SLA. Consider the situation where as part of SLA, a tenant has requested a response time of 1sec, then an appropriate alert can be triggered by the policies in the CCSG. Such a situation might arise due to an attacker compromising a VMM and starving resources to certain co-located tenant virtual machines. The final category of policies in CCSG refers to metapolicies related to specific security policies already enforced in the VMM platform (by the SPAD and TSAD components). For instance, policies such as process validation in a tenant virtual machine clearly need to be specified in TSAD and enforced by TSAD on the VMM platform. However CCSG has meta-policies on the frequency with which TSAD needs to perform process validation of its tenant virtual machines. Similarly CCSG has meta-policies to detect traffic with IP
Fig. 6.
69
Implementation setup.
address not belonging to the cluster (just like SPAD in VMM detects traffic with IP address not belonging to the VMM platform). IV. A NALYSIS In this section, we discuss the implementation and analysis of our security architecture. Section A presents the implementation setup. Section B describes the implementation of our security architecture. Section C discusses performance evaluation results. A. Implementation Setup We have used the open source based system Xen hypervisor to implement our architecture. However it is to be noted that our security architecture can be implemented using other VMM based systems such as VMWare or HyperV. Fig. 6 shows the basic implementation of our security architecture at a single VMM platform level using Xen hypervisor. A tenant hosts its services on virtual machines that are running on Xen hypervisor, which belongs to the cloud provider. We have used different subnets for the cloud provider network, tenant domain, the tenant users (who are the customers of the tenant), and the attack domain. The tenant admin manages the tenant virtual machines that are hosted in the cloud. The tenant users are the customers of the tenant. RIP protocol is used between the routers R1 and R2. The SPAD and TSAD functionalities are implemented in the privileged domain Dom0. The attacking sources in the attack domain generate the different types of attack traffic on the tenant virtual machine that we have experimented with. First we present an overview of Xen that is relevant for our architecture and then discuss the implementation of the components of our security architecture. We then discuss how our architecture deals with different attack scenarios described above in Section III. B. Implementation of Security Architecture In Xen, the privileged VM Dom 0 [11] is used for hosting and the management of the guest virtual machines. In our
70
Fig. 7.
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014
Attacks with spoofed traffic from tenant virtual machine.
implementation, we are using two guest virtual machines based on Linux and one based on Windows running on top of Xen hypervisor version 3.1.2. We have extended Dom0 to provide the functionalities of our architecture. To minimize the code in the hypervisor, the designers of Xen have opted to include the device drivers in the privileged domain. Dom0 manages the actual device and exports to the guest virtual machines a generic class of device driver interface. The front end drivers are installed in the virtual machines and the back end driver is installed in Dom0 which has privileged access to the physical resources. The backend driver is also responsible for ensuring the fair access the usage of the devices by multiple guest virtual machines. XenStore is a database for sharing the configuration information and is used as a mechanism for controlling devices in guest domains. Xend daemon is a special process which runs as root in Dom0 and is responsible for providing administrative interface to the hypervisor allowing the Customer Tenant Administrator (CTA) to define security policies for virtual machines. Our implementation uses Dom0 as the Node Controller for all administrative functions of the cloud provider. When a new instance is created, Xen assigns its preconfigured network interfaces to the instance. Every interface has a uniform name consisting of a DomainID and interface ID. For example, vif1.2 is a second network interface in Domain1. An outgoing packet from DomainU passes its sending network interface, enters ebtables filter, then virtual bridge “virbr0” and iptables filter and then finally the physical network interface connected to an external network, peth0. Our architectural components SPAD and TSAD components are placed between the front end drivers of the tenant virtual machines and back end driver of Dom0. Let us now consider the implementation of certain specific mechanisms in SPAD and TSAD to illustrate how they deal with the attack scenarios. Let us first consider the implementation of the mechanism in SPAD that detects spoofed traffic from the tenant virtual machine. The SPAD module captures network packets from the kernel using iptables in connection with libipq module (ip queue kernel module) and validates the source address of the traffic. However, one issue was that since packets from tenant VMs enter iptables filter after passing “virbr0,” information about the original sending interface is lost. We have used ebtables to encode the information in packet:mark. This is done by patching Xen script which launches VMs. When a
Fig. 8.
Process validation.
VM is launched, a patched Xen adds an ebtables rule, which adds the information to every packet from the launched VM. SPAD then decodes the information and is able to determine reliably the sending VM regardless of the packet content (it may be spoofed). We have used scapy for generating the attack traffic with spoofed source address. Fig. 7 shows the alerts when attack traffic with spoofed source address is detected by the SPAD component. Let us now consider the implementation of mechanisms in the Pro_Val module in TSAD that detect rootkit attacks via process validation. Process validation is designed as a clientserver application to validate the runtime state of Dom U from Dom0. The server daemon runs in DomU and presents a list of processes in DomU on request. The list of processes is obtained by executing Linux utility ps. The client program runs in Dom0. It can query the server daemon for processes in DomU and also extract the list of processes that are actually running in the memory allocated to Dom U. The trusted view of DomU processes is obtained by traversing kernel memory of DomU with XenAccess [22]. The query to the server daemon is realized via XenStore shared memory between Dom0 and DomU. XenAccess is a virtual machine introspection library for Xen hypervisor. It allows a user in Dom0 to view the runtime state of DomU. The memory access allows for mapping arbitrary memory pages of DomU into the user space of Dom0. The current implementation supports memory access and disk monitoring. We have extended this implementation for network based monitoring. Fig. 8 shows different cases of process validation by Pro_Val. First run shows the result for a legitimate scenario, where no hidden processes are detected in the tenant virtual machine with Linux OS. Second run shows the result where
VARADHARAJAN and TUPAKULA: SECURITY AS A SERVICE MODEL FOR CLOUD ENVIRONMENT
Fig. 11. Fig. 9.
Without
SPAD
SPAD +TSAD
SPAD+TSAD+PV
30 20 10 0
Fig. 10.
TSAD policy updates.
Process validation time.
40
0
71
2
4
6
8
10
File transfer.
the tenant virtual machine was infected with URK rootkit and two hidden process with process id 2224 and 2226 are detected by the Pro_Val component. In this case, the traffic generated by the tenant VM is dropped and the tenant virtual machine is isolated for further analysis. In the third case, we considered the situation where the tenant has registered for additional security services from the cloud service provider and has also requested for premonitoring of the incoming traffic to the tenant virtual machine which is hosting a web server. In this case, the tenant has also provided attack signatures for its virtual machine that is running the web server. We have used signatures from Snort IDS in this scenario. The tenant’s customer machine (see Fig. 6) is used for legitimate access of the web server on the tenant virtual machine. Now the attacking sources generate different types of attacks on the tenant virtual machine. Since the tenant has registered for additional security services, the incoming traffic to the tenant web server is pre-monitored for attacks by the TSAD. All the traffic that matched with the attack signatures is dropped by TSAD. Note that there is no false alarm when determining the dropping of the spoof traffic in the SPAD. However false negatives and positives are possible with the security policies in TSAD, which are negotiated with the tenants. C. Performance Evaluation In building and evaluating this system, we have been carrying out many sets of experiments with different types of
malware attacks. In this paper, we present performance results with some of these experiments as well as those involving the use of SPECjvm benchmarks [23]. Fig. 9 shows the average response time for 10 runs for Pro_Val module for varying number of tenant virtual machines. It shows the time taken for validating the processes in the tenant virtual machine. Pro_Val ensures that the processes related to the host based security tool are running in the tenant virtual machine and also detects hidden processes. The response time increases with the number of tenant virtual machines hosted on the VMM. The response is faster when all tenant virtual machines have the same operating system (and version) compared to tenant virtual machines having different operating systems. Furthermore process validation is performed only once for each flow for the tenants who have requested additional security services. Fig. 10 shows the average transfer time for 10 runs for different file sizes from the tenant virtual machine to tenant customers without our architecture and with different components of our architecture. There is negligible delay with SPAD and a minor overhead with the inclusion of TSAD and process validation (PV) components. We have used a modified version of Sophos based tool for TSAD with detection engine which can detect up to 3483207 types of attacks. The attacks detected include denial of service attacks (e.g. TFN, Trin00 and LOIC), privilege escalation attacks, browser buffer overflow attacks, and worms (e.g. ADMw0rm). The detection component also checks for the signature updates every 10 minutes. Total of 104 file types for different applications such as rar, zip, doc, docx, and html were monitored for intrusions. We have also monitored for anomalous behaviour such as rate limiting ICMP packets and TCP SYN floods. In this case, process validation is performed once for each flow. Comparing Figs. 9 and 10, one can see that the overhead due to SPAD, TSAD and process validation seems to be negligible compared to the overhead caused by the file transfer. The reason for this observation is that the network delay is a major factor compared to the processing delay due to the security components. Furthermore, the process validation is performed only once for each file transfer. Fig. 11 shows the average response time for 10 runs updating the policies in the TSAD from CCSG for varying number of tenant virtual machines hosted on the VMM. The policy consists of default security policy, 50 attack signatures, 3 statistical parameters for anomaly detection and time interval
72
Fig. 12.
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014
SPEC benchmark results.
for process validation of TVM. The response is slower as the number of tenant virtual machines increases. The SPAD component remains the same for all the tenants. However the TSAD component varies for each tenant. We have used the SPECjvm 2008 benchmarks to validate the performance of our architecture. SPECjvm 2008 benchmarks consist of 38 applications grouped into 11 categories. We have also experimented with specific applications in the different categories. The benchmarks use different types of work loads with well known applications such as compression, compiler, scientific, serial and XML for validating the performance. If a category has multiple benchmarks, then the result for the benchmark is calculated using geometric mean. The composite score shows the geometric mean of all the benchmarks. Higher scores represent better performance. In Fig. 6, the SPECjvm benchmarks were installed on Dom0 of Xen. Fig. 12 shows the benchmark results performed on a system with Intel i7 2.2GHz processor, 6M cache with 8GB RAM with Xen 3.1.2 VMM and Centos 5.1 host operating system. The virtual machines were running windows XP SP2 and Linux operating system with 512 MB RAM. In Fig. 2, the first series (“Without” in blue) shows the results of the machine with Xen hypervisor, Dom0 and three tenant virtual machines. Different types of applications such as Yahoo messenger, Quick Time Real Player, Apache Services, Skype Application, and auto refresh websites such as news and cricinfo.com were running on the virtual machines. The applications generate different types of traffic at regular intervals, for example, to maintain the connection status, and to update the news. The composite result in this case is 45.13 operations per minute (ops/m) without invoking any security components of our architecture and without any performance tuning. The next series (S) shows the results with similar setup as the base run but with SPAD component. In this case tenant virtual machine traffic is monitored for spoofed source address and logged before forwarding the traffic to the destination. The composite result in this case is 44.96 ops/m without any performance tuning. The overhead in this case is negligible. Hence, this can be included as a standard package even if the customers do not opt for any security requirements. The third series (S+T) shows the results with SPAD and TSAD components with signature and anomaly detection. As
mentioned earlier we have used a modified version of Sophos based tool with detection engine that can detect up to 3483207 types of attacks and can also monitor anomalous behaviour in ICMP and TCP SYN traffic. The composite result in this case is 44.21 ops/m. Compared to Series 1 results, the overhead in this case is of the order of 2%. The final series (S+T+PV) shows the results of the operation of our architecture with SPAD, TSAD and process validation performed for every 10 seconds. This includes rootkits such as URK and AFT. The composite result in this case is 43.81 ops/m. Hence compared to Series 1, the overhead in this case is of the order of 3%. However, the overhead can vary for different types of attacks. Furthermore, the overhead can be changed by modifying the time interval for validation of the processes in the tenant virtual machines. The performance figures will also vary with the speed of the hardware. We have also carried out experimentations with specific applications such as startup.helloworld, startup.crypto.aes, crypto.aes, compress.zip, mpegaudio, startup.mpegaudio and xml. In all our experiments with different types of attacks, we found the overhead to be always less than 6% maximum. We also observed that variation in the composite result due to the addition of attack signatures on a daily basis has negligible impact on the performance of our architecture. Hence the cloud service provider can charge the tenants on a threshold basis (instead of every signature) as well as depending on the interval for process validation. For example, the premium can be increased every say 1,000 attack signatures and as the interval between process validation decreases. Furthermore, the tenants can use TSAD as an additional layer of security for their services. Hence our architecture provides benefits to both the cloud service provider and the tenants. V. R ELATED W ORK During the course of the description of the security architecture, we have already referenced many related works. In this section we consider additional relevant related works and compare them with our architecture. CloudVisor [24] uses nested virtualisation to deal with the compromise of the hypervisor. In this technique a secure hypervisor is introduced below the traditional hypervisor and the interactions between the traditional VMM and virtual machines are monitored by the secure hypervisor. However since the resource management is still performed by the traditional VMM, the compromise of VMM can impact the operation of the virtual machines. Compared to CloudVisor the main focus of our work is securing the network interactions of tenant virtual machines. The technique proposed in [25] allocates a separate privileged domain for each tenant. The tenants can use this for the enforcement of VMM based security on their virtual machines. However the model can become more complex as different tenant virtual machines can be hosted on the same physical server. Furthermore, such models cannot deal with the case of malicious tenants that misuse the cloud resources to generate attacks on other hosts. Our architecture considers the case of malicious cloud administrators and malicious tenants. There have also been some prior works addressing privacy related issues in the cloud. Butt et al [26] proposed self service
VARADHARAJAN and TUPAKULA: SECURITY AS A SERVICE MODEL FOR CLOUD ENVIRONMENT
cloud which splits the privileged domain into system wide domain (Sdom0) and privileged client domains. Each tenant has their own privileged domain for enforcement of security policies on their virtual machines. However, since several tenant virtual machines can be implemented on the same physical server, a separate client administrative domain has to be created for each tenant. This makes the model considerably complex. Furthermore, an attacker who has control of Sdom0 can cause resource starvation to tenant virtual machines. In our architecture, we enforce different level of access to the cloud administrators using role based access control. The techniques proposed in [27] consider making the cloud services scalable to the dynamic changes in the runtime environment. In the proposed architecture, the cloud service provider monitors the load (active connections) on the tenant web server and dynamically varies the number of virtual machines allocated to the tenant. Attacks can lead to increase of load on the tenant virtual machines. Our architecture is able to identify the increase in load caused by the attack traffic. There have also been some prior works which make use of cloud for securing the traditional systems and networks. For instance, Beaty et al [28] proposed network based access control for different cloud deployments. In this technique, the administrators from different cloud deployments report to a Cloud Access Manager on their requirements. However, our model also detects system level attacks such as rootkits. There has been considerable research interest [29]–[32] to develop security techniques to deal with spoofing attacks. Ingress filtering technique [29] has been proposed to validate the source address of the IP packets. However these techniques should be universally deployed for them to be effective. Some authors have suggested traceback through packet marking [30], logging [31], overlay networks [31] and pushback technique [32]. Although, several filtering and traceback techniques have been proposed, they can only detect the approximate domain from which the spoofed traffic is originating. Furthermore, Ingress filtering is not effective if the compromised host spoofs its source address with a valid address within the domain. On the other hand our architecture can efficiently identify the actual attacking source (or tenant virtual machine) that is generating the attack traffic with spoofed source address. Currently there is significant interest to develop security tools based on virtualization technology [6], [33], [34]. Dunlap et al., [33] proposed ReVirt architecture for secure logging by placing the logging tool inside the VMM. ReVirt logs detailed information such as real time clock, keyboard, mouse events, user inputs and system calls, which enables the administrator to replay the execution of virtual machine. Garfinkel [6] proposed a Livewire intrusion detection system which makes use of the virtual machine monitor to analyse the state of virtual machines and detect attacks. Lycosid [34] detects hidden process in the virtual machines by comparing the implicit guest view with the VMM image. However attacks can be generated by non hidden processes. However as already discussed, the virtualization techniques cannot be directly applied to the cloud environment due to the semantic gap problem. As the semantic gap increases the number of false alarms increases. This is a major issue as the cloud service provider is not aware of the applications running in
73
the tenant virtual machines and privacy requirements prevent the cloud service provider to use the introspection techniques without the consent of the tenants. In our architecture, there is no need for the cloud provider to have information of the operating system or applications in the tenant virtual machine for enforcing the basic security policies using SPAD. Also, there are no false alarms with the security policies in SPAD. The security policies in TSAD are enforced only with the consent of the tenants and hence the cloud service providers are not solely responsible for the false alarms due to security policies in the TSAD. Also in the case of cloud, the virtual machines belong to the tenants and the VMM belongs to the cloud service provider. Hence there is a need for justification for using VMM based security techniques in the cloud. We have provided a strong justification for using our architecture in practice and how our security as a service offers advantages to the cloud provider, tenants and tenant customers. VI. C ONCLUSION In this paper we have proposed a security architecture that provides a security as a service model that a cloud provider can offer to its multiple tenants and customers of its tenants. Our security as a service model while offering a baseline security to the provider to protect its own cloud infrastructure also provides flexibility to tenants to have additional security functionalities that suit their security requirements. The paper described the design of the security architecture and discussed how different types of attacks are counteracted by the proposed architecture. We have described the implementation of the security architecture and gave a detailed analysis of the security mechanisms and performance evaluation results. A PPENDIX Policy Specification Examples: Several policy languages have been proposed over the years. We have chosen XACML [36] as it has the necessary constructs to specify the types of policies even though it is generally regarded as an access control policy language. Also it is an international standard, and XML based schemas allow verification of the structure of the policy file; we have also developed an evaluation engine. Furthermore, Intrusion Detection Message Exchange Format (IDMEF) [37] which defines data formats and exchange procedures for sharing information of interest to intrusion detection and response systems also makes use of XML. Hence we make use of the XACML since the tenants and cloud service providers can be using different security tools for the enforcement of the security policies. Below we give examples showing the use of such a language; Fig. 13 shows an example of security policy specification, whereas Fig. 14 shows how information such as alerts can be exchanged between different security agents implemented in different physical servers. Fig. 13 shows a sample specification of tenant policies in XACML [37]. The Tenant-ID is used for identifying the tenants and retrieving the security registration details of the tenant. SPAD security policies are enforced by default for all the tenants. TSAD is optional and the tenants can choose one or more security enforcement such as signature based and/or anomaly based and/or process validation. Fig. 13 shows
74
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 11, NO. 1, MARCH 2014
Fig. 14.
Fig. 13.
Tenant policy specification in XACML.
the case for signature based detection of Slammer attack and anomaly based threshold for ICMP traffic to 3 messages per second. Fig. 14 shows the specification of SPAD policies which raises alarms to the cloud system administrator. The alert message ID refers to the attack. The analyser identifies the cluster ID and VMM within the cluster. There are 12 defined categories (such as DNS for domain name system, NT for windows domain) defined by the IDMEF[37]. In this case, the cloud provider does not have to be aware of the services in the tenant virtual machine. Hence we use the category 0 which refers to “Domain unknown or not relevant.” Name identifies the specific sensor “SPAD” that detected the attack. R EFERENCES [1] L. Youseff, M. Butrico, and D. Da Silva, “Towards a unified ontology of cloud computing,” in Proc. 2008 Grid Computing Environments Workshop. [2] Amazon Inc., “Amazon elastic compute cloud (Amazon EC2),” 2011. Available: http://aws.amazon.com/ec2/ [3] “Windows Azure.” Available: http://www.windowsazure.com/en-us/ [4] J. E. Smith and R. Nair, “The architecture of virtual machines,” IEEE Internet Comput., May 2005. [5] “AWS security center.” Available: http://aws.amazon.com/security/
Example of SPAD policy specifications.
[6] T. Garfinkel and M. Rosenblum, “A virtual machine introspection based architecture for intrusion detection,” in Proc. 2003 Netw. Distrib. Syst. Security Symp. [7] “VM escape.” Available: http://www.zdnet.com/blog/security/ us-cert-warns-of-guest-to-host-vm-escape-vulnerability/12471 [8] “Xen security advisory 19 (CVE-2012-4411)–guest administrator can access QEMU monitor console.” Available: http://lists.xen.org/archives/ html/xen-announce/2012-09/msg00008.html [9] V. Varadarajan, et al., “Resource-freeing attacks: improve your cloud performance (at your neighbor’s expense),” in Proc. 2012 ACM Comput. Commun. Security Conf. [10] J. Somorovsky, et al., “All your clouds belong to us—security analysis of cloud management interfaces,” in 2011 ACM Comput. Commun. Security Conf. [11] P. Barham, et al., “Xen and the art of virtualization,” in Proc. 2003 ACM Symp. Operating Syst. Principles. [12] Y. Zhang, et al., “Cross-VM side channels and their use to extract private keys,” in 2012 ACM Comput. Commun. Security Conf. [13] J. Idziorek, M. F. Tannian, and D. Jacobson, “The insecurity of cloud utility models,” IEEE Cloud Comput., pp. 14–18, May–June 2013. [14] R. Beverly, R. Koga, and K. C Claffy, “Initial longitudinal analysis of IP source spoofing capability on the Internet,” July 2013. Available: http://www.internetsociety.org/doc/initial-longitudinal-analysis-ipsource-spoofing-capability-internet [15] B. Balacheff, et al., Trusted Computing Platforms — TCPA Technology in Context. Hewlett-Packard Books, 2003. [16] H. Takabi, J. B. D. Joshi, and G. J. Ahn, “Security and privacy challenges in cloud computing environments,” IEEE Security Privacy, vol. 8, no. 6, Nov.–Dec. 2010. [17] S. M. Habib, V. Varadharajan, and M. Muhlhauser, “A framework for evaluating trust of service providers in cloud marketplaces,” in Proc. 2013 ACM Symp. Applied Comput. [18] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proc. 2009 ACM Symp. Theory Comput. [19] S. Bahram, et al., “DKSM: subverting virtual machine introspection for fun and profit,” in Proc. 2010 IEEE Symp. Reliable Distrib. Syst. [20] J. Pfoh, C. Schneider, and C. Eckert, “Exploiting the x86 architecture to derive virtual machine state information,” in Proc. 2010 Int. Conf. Emerging Security Inf., Syst. Technol. [21] J. Ryan and M. J. Lin, “Intrusion detection with neural networks,” in Proc. 1998 Advances Neural Inf. Process. Syst. [22] “XenAccess library.” Available: http://code.google.com/p/xenaccess/ [23] “Standard performance evaluation corporation.” Available: http://www. spec.org/download.html
VARADHARAJAN and TUPAKULA: SECURITY AS A SERVICE MODEL FOR CLOUD ENVIRONMENT
[24] F. Zhang, et al., “CloudVisor: retrofitting protection of virtual machines in multi-tenant cloud with nested virtualization,” in Proc. 2011 Symp. Operating Syst. Principles. [25] C. Yu, et al., “Protecting the security and privacy of the virtual machine through privilege separation,” in Proc. 2013 Int. Conf. Comput. Sci. Electron. Eng. [26] S. Butt, et al., “Self-service cloud computing,” in Proc. 2012 ACM Comput. Commun Security Conf. [27] T. C. Chieu, et al., “Dynamic scaling of web applications in a virtualized cloud computing environment,” in Proc. 2009 IEEE Int. Conf. e-Business Eng. [28] K. Beaty, et al., “Network-level access control management for the cloud,” in Proc. 2013 IEEE Int. Conf. Cloud Eng. [29] P. Ferguson and D. Senie, Network Ingress Filtering: Defeating Denial of Service Attacks Which Employ IP Source Address Spoofing, RFC 2267, Jan. 1998. [30] S. Savage, D. Wetherall, A. Karlin, and T. Anderson, “Network support for IP traceback,” ACM/IEEE Trans. Netw., vol. 9, no. 3, pp. 226–237, June 2001. [31] R. Stone, “CenterTrack: an IP overlay network for tracking DoS floods,” in Proc. 2000 Usenix Security Symp. [32] R. Mahajan, et al., “Controlling high bandwidth aggregates in the network,” ACM Comput. Commun. Rev., vol. 32, no. 3, pp. 62–73, July 2002. [33] G. W. Dunlap, et al., “ReVirt: enabling intrusion analysis through virtual-machine logging and replay,” in Proc. 2002 Operating Syst. Des. Implementation. [34] S. T. Jones, et al., “VMM-based hidden process detection and identification using lycosid,” in Proc. 2008 ACM Virtual Execution Environments. [35] L. Zhou, V. Varadharajan, and M. Hitchens, “Enforcing role-based
75
access control for secure data storage in the cloud,” Comput. J., vol. 54 , no. 10, pp. 1675–1687, 2011. [36] eXtensible Access Control Markup Language (XACML), Version 3.0, OASIS Standard, Jan. 22, 2013. [37] H. Debar, D. Curry, and B. Feinstein, The Intrusion Detection Message Exchange Format, RFC 4765, Mar. 2007. Vijay Varadharajan is the Microsoft Chair Professor at Macquarie University. He is also the Director of the Advanced Cyber Security Research Centre. Vijay has published more than 300 papers in international journals and conferences. Vijay has been/is on the Editorial Board of several journals including ACM TISSEC, IEEE TDSC, IEEE TIFS, and IEEE TCC.
Udaya Tupakula is a Research Fellow at the Advanced Cyber Security Research Centre at Macquarie University. In 2006, he completed his Ph.D. under the supervision of Prof. Varadharajan of Macquarie University. Uday has 50 publications in different research areas such as network security, denial of service attacks, MANET security, and secure virtual systems.