Implementation of a Cloud Computing Framework for Cloud Forensics Alecsandru P˘atra¸scu1,2 and Victor Valeriu Patriciu3 Technical Academy, Computer Science Department, Bucharest, Romania 2 Advanced Technologies Institute, Bucharest, Romania 1 Email :
[email protected], 2
[email protected], 3
[email protected]
1,3 Military
Abstract—Cloud computing technologies occupy an important place in today’s digital environment as they offer to the user attractive benefits such as file storage and renting virtual machines. In this context, at the datacenter level, we need a unified framework that permits reliable virtual and physical resource management, having in the same time the possibility for digital forensic investigators to access the data. Even though we have ready-to-use solutions that are already applied to the cloud resource allocation part, we lack the software infrastructure needed for conducting a digital forensic investigation. In this paper we will present the foundation of our proposed cloud forensics framework, that aims to resolve and unify into a single entity the two issues mentioned before. We will present in detail the architecture and the modifications needed to be made in order to create a digital forensic compliant framework. Key words - cloud computing; data forensics; basic cloud framework.
I.
I NTRODUCTION
There are currently a lot of debates regarding the concept of “Cloud Computing” and “Cloud Computing Forensics" and most of them refer to the data ownership perspective from a final user. But what does a user want to receive from a Cloud Service Provider (CSP)? What information can be kept for forensic purposes? To answer these questions we can approach two main directions or ideas. In the first place we can find the user of certain Cloud services that are using a pay-per-use model. They do not have to buy expensive hardware or software licenses, they just rent the software and pay only the amount of processing that is used by the datacenter in order to resolve their problem. In the second place we can find the CSPs that must upgrade their datacenter infrastructure in order to take benefit from all the available computing power. For example they can make efficient use of the processor power by spreading the computation across nodes, over certain periods of the day. This is one of the main and important features that Cloud Computing promotes - efficient usage of resources for both users and CSPs. In case of classic computer forensics, the purpose is to search, preserve and analyze information on computer systems to find potential evidence for a trial. In cloud environments the entire paradigm changes because we don’t have access to a physical computer, and even if we have access it is a great
chance that the data stored on it is encrypted or split across multiple other computer systems. As a result for the issues presented we have developed a solution for this problem. Our system can help datacenter owners, administrators, cloud computing providers and cloud forensic investigators to manage more easily the kind of workloads involved in this new way of doing computing, maintaining also the security to a high level. In this way, the end-user will benefit from a robust solution that automatically scales vertically and horizontally to its computational and analysis needs. The paper is structured as follows. In section 2 we present some of the related work in this field, that is linked with our topic and in section 3 and 4 we present the problems that our framework must resolve in order to be efficient and secure for our forensic needs. In section 5 we present in detail our architecture, based on our previous work and in section 6 we propose modifications that must be done over a datacenter in order to support it from the point of view of the hardware. Section 7 contains some implementation details and some metrics regarding the response time of our system. Finally, in section 8 we conclude our document. II.
R ELATED W ORK
The area of cloud computing research is an active area and in it we can find a lot of work regarding the problem of efficiently organizing hardware in order to make it possible to run multiple virtual machines. In [2] Tayal talks about the perspective of Cloud Computing scheduling techniques. He considers that task based scheduling can be used with success in such distributed environments because they improve the flexibility and reliability of the entire structure. More exactly the tasks are split evenly across processing nodes, taking in consideration the hardware platform that is specific to each of them. In [3] Buyya et al talk about the mixture of High Performance Computing (HPC) applications and Cloud Computing applications in modern software. This can help the development of new technologies and services offered to the end-users by enabling geographical aware deployments. Nevertheless, as the authors state, we must pay great attention to the processing power needed for such infrastructures and we must use a proper scheduler in order to balance computing across the nodes from the datacenter.
Also, Brandic et al [4] talk about what Cloud Computing means for end users and they present a model for a Cloud enabled architecture by using the power of Virtual Machines (VMs). They also present different scheduling strategies and present the need for a Service Level Agreement (SLA) for the offered services. Although there are currently some drawbacks, cloud computing can be interesting in the area of conducting a forensic investigation from a cloud environment. In papers such as [11], [12] and [13] we are presented a brief introduction into it, but without any practical implementations. In the future, companies can offer this as a Forensic-InvestigationEnvironment-as-a-Service with on-demand resource allocation needed for such problems. III. R ESOURCE MANAGEMENT While Cloud Computing provides many new and modern features, it still has some deficiencies such as the relatively high operating cost for private and public Clouds. The emergent field of “Green Computing” is becoming more and more important in our days when we have limited energy resources and an increased demand for computational power. In this context, the key is the productive use of Cloud Computing technologies. This can be achieved by doing a proper management of the resources found in the datacenter and furthermore, we must use similar techniques that are currently used in regular computer networks in order to achieve large scale adoption. Today we can find many dissipated Cloud infrastructures, but in the future, as the technologies are developed, we will not find this segregation and we will have a seamless merge and integration between them. In order to cope with these needs our framework has been designed and implemented from the ground in the direction of different Cloud Computing providers integration. The first steps towards a large scale distributed platform was made with the implementation of a distributed spam scanning infrastructure that had the capability of auto-scaling the number of workers according to the system load [5]. IV. S ECURITY Cloud Computing security represents a new and evolving sub-domain of information security. As we presented in our previous work [6], it must contain proper rules for data and application protection by making use of the available Cloud infrastructure. In order to ensure data security, meaning that it cannot be accessed by unauthorized users or simply lost, Cloud providers interact with the following areas: data protection and reliability, identity management, physical and personnel security, availability, application security, privacy and legal issues. The framework is also designed with the security need in mind. During development we tried to consider the current evolution of Cloud Computing and Cloud providers to new directions that have emerged over time, more exactly information security, trust management in remote servers and information privacy. Each of them is thoroughly presented in [6] and they
Figure 1: System Architecture
Figure 2: Presentation tier
are currently in active development and will be thoroughly investigated in our future development plan because they need the involving of a team with skills in multiple disciplines like Advanced Mathematics and Number Theory. V.
A RCHITECTURE
The framework consists in a sum of parts that are designed to work together. The application is composed from seven modules, as it can be seen in Figure 1. The “GUI” module is put outside because the goal of our thesis is the presentation of our framework in details. We will present each of the module and there will be detailed a series of essential sections for each module individually, in special the main data structures used, the design templates and the the module description. These modules are: Frontend, User Manager, Lease Manager, Scheduler, Hypervisor Manager, Monitor and Database Layer. In our current implementation we have all the modules written in server-side JavaScript (JS). The actual software was designed with a combined architectural template in mind. It brings benefits from both the “Client-Server” model and also the “Distributed-Computing” model. The Client-Server model acts as a distributed application which splits the computational tasks across various resources or services, called servers, and service requesters, called clients, that communicate over a computer network, each on separate or same hardware. A Distributed System consists of multiple autonomous computers that communicate also through a computer network. Therefore, in our framework, the Distributed-Computing model has been used due to the fact that the system is composed from a series of software components that run on different machines and communicate through the network in order to supply the answer to the user in a short time. The actual implementation is made using a multi-tier architecture, more exactly a three-tier architecture containing the presentation tier (Figure 2), application tier (Figure 3) and data tier (Figure 4). In our case the presentation, the application processing and the data management are logically separate processes. This kind of architecture enables the possibility to use a clear model in order to create a dynamic and reusable application. By breaking up an system into tiers, we only have to modify or add a specific layer, rather than have to rewrite the entire application from scratch.
Figure 3: Application tier
Figure 4: Data tier
On the following sections we will present each of the framework modules and how they map to the top view architecture. We will start with the external module called “GUI” and we will continue in order with the modules that compose each tier. A. GUI module The “GUI” module is responsible for user interaction. It is presented as a web interface. From it, an user can choose a number of virtual machines - a minimum and a maximum instance count. Also he can select each virtual machine template properties and software that is going to be installed, when to start and stop the lease. At this step he can also choose if he wants a preemptible lease or a regular lease. Beside this functionality, an user can register itself, see the status of each lease submitted by him or set specific alerts to each lease. This module was written using the open source Grails framework. B. Frontend module This module maps over the “Intrusion detection filter” and “API” layers and has the functionality described to them. It receives authentication requests from the “GUI” module and if the requests are legit it passes them to the “User Manager” module.
of the renting and is based on the fact that the user will be responsible in that for the resources allocated. The information that is going to be retained in these leases varies from system to system, but mostly they contain details regarding the processor, the memory that is going to be allocated. More exactly, the scheduler will accept leases that contain the following information, which will be joined together under a lease id: processor architecture, processor vendor, processor speed, processor number of cores, memory size, storage capacity, network bandwidth, network protocol, lease start time, lease end time and lease duration. We have chosen this approach all over the modules in order to fulfil the availability needs that our system must provide. More precisely, we created a general framework based on a cluster of processes, that we are modelling for each module needs. We will explain further this decision and how it works. Today’s modern processors are based on multi-core technology. This means we have to find new and better ways to deal with this environments. One approach can be made using threads, but it is hard to use them correctly in order to be efficient. The alternative approach is to use process parallelism which uses processes instead of threads. In this case the user will want to launch a cluster of processes to handle an incoming load. In our implementation we currently have a local master-slave architecture. This means that we have at least one master and at least a slave (worker) that does the job. Our framework recognizes when a process is first ran and automatically promotes it to master and every children that is forked from it, for each CPU existing on the system or virtual machine, is promoted to worker. A great advantage of our framework is that the master can share a single network port, making load balancing implicit. Furthermore, if a worker crashes the master instantly detects it and forks another fresh process to take its place. To avoid the single-point-of-failure problem, our framework automatically starts another shadow process that backs up the master in case of failure. We represented our basic sub-framework functionality in Figure 5.
C. User Manager module This module is responsible for storing the details for the system users, like username and password. It has authentication and lease validation purposes and it maps over the “Validation engine” layer. Figure 5: Process cluster framework D. Lease Manager module This module implements the “Job and Lease Manager” layer. It is responsible for getting the leases from the user, transform them into proper jobs and saving them both on local physical memory and a remote database. The main form of abstracting the hardware resources, and also the main method of providing the scheduler with information is the lease. The concept of leases is also used in other systems available. Basically, a lease is like a renting contract existing between the user that requests certain resources and the system that offers them. This contract specifies the duration
For this specific module we added on top of this framework the possibility to save a lease both on local memory and a remote database. Even if the database connectivity is broken, the module keeps running and will store the leases only in local memory. We also added the possibility that when the module starts up it loads the existing leases from the database into the memory. In this way we keep the entire consistency of the system in case of a crash. This module is implemented using a plug-an-play architecture, meaning that we can add different storage capabilities. For
example, instead of a single database, the system administrator can use a cluster of databases for redundancy. Even more, the driver registration is made automatically and does not create downtime. The lease manager was split in two parts: an API wrapper, accessible to the external world and the actual lease containers. Both of them are written in a clustered way - both the master process and children processes are backed up by replicas. The scheduler was implemented also in JS. We implemented its running behaviour to provide more throughput. First of all, when starting the scheduler module, we check the database for existing leases. If are found, we load them in the scheduler’s running leases cache. After that, at certain periods of time we take a lease from the lease manager module and we check it if it is an “URGENT” one we start in on a certain hypervisor and save it in the running leases cache. If it is not, we assume it is a delayable one and compute a delay to start it. We have chosen a lazy paradigm of allocating the virtual machines from the lease. E. Scheduler module This module is responsible with choosing on what physical node will a lease be ran. It is using an online form of scheduling using our custom tailored algorithm, presented in detail in our previous work [9]. F. Hypervisor Manager module This module is responsible for the hypervisors existing in the system and it maps over the “Virtual Machine Preparation” and “Cloud Specific Interface” layers, having a double functionality. It is also implemented as plug-and-play and the system administrator can add hypervisors by writing specific drivers. It can manage both individual hypervisors, like VMware ESX, and existing Cloud Computing environments, like OpenNebula or Eucalyptus. It also monitors the state of each of the attached hypervisors and the load of each virtual machine running on each hypervisor. The hypervisor manager currently implements ways to communicate with VMware ESXi version 5 and VirtualBox version 4: creating, starting, stopping, pausing, resuming and cloning a new or existing virtual machine. Furthermore, it receives requests from the “Scheduler” to start a new lease or to check the physical resources needed to run a new lease. In order to manage the virtual machines from the lease it uses a standard interface that permits it, for example, to start/stop/restart a particular virtual machine. G. Monitor module This module is responsible for monitoring the entire system activity. It knows the leases that are marked as preemptible and decides when to preempt them. Also it decides if a running lease needs virtual machine instances increased up to the maximum count specified by the user or decreased to the minimum count. In order to do this properly he knows what preemptible leases are running at a particular moment in time and also
knows the load of the virtual machines running inside the lease. If the load is bigger than a threshold, it commands the start of a fresh virtual machine, up to the permitted maximum instance count. If the load is smaller that a point, and the system is running more virtual machines than the minimum instance count, it commands the stop of virtual machines. To monitor the virtual machine load it must communicate with the “Hypervisor Manager”. H. Database Layer module This module is responsible with the underneath database(s) (DB). Since it is implemented separate from the entire system it can be easily adapted to work with different DB software. It is also implemented using the same plug-and-play architecture and the only thing that is needed to do from the part of the system administrator is writing the proper driver to interface with the particular DB vendor. VI. DATACENTER ENABLED ARCHITECTURE In order to have a proper working forensic and logging system we must pay a great deal of attention to its performance. Since all the activity can be intercepted, there is the risk of severe time penalties and processing speed. In order to solve this problem, we offer the possibility for an investigator to choose the logging level for a certain virtual machine. This is helpful considering that, for example, an investigator only wants to analyse the virtual memory for its contents, and it is not interested in virtual disk images or virtual network activity. The situation is different in case of implementing our solution in a real datacenter where large-scale parallel computers are the base of high performance computing and it relies on a Ethernet network interconnection that is fast and efficient. To better understand the issues imposed by our solution we will first present a computer architecture existing in today’s datacenters. In Figure 6 you can see a Fat Tree topology, as presented in [10], composed from multiple building blocks. The basic building block is called a “cluster”. A cluster is composed from multiple racks, each rack having multiple servers and one switch called top of the rack switch (ToR). This allows communications between adjacent servers to be made really fast. Using the same principle of data locality, each ToR is linked by a level 2 switch (L2S) and each L2S is linked to an aggregation switch. Each cluster is linked to an cluster router (CR) and finally, each CR to a border router. To better handle data that is going to be processed and transferred, we propose a slight modification of this topology. In the first place, the modifications will start at the physical servers, that are stored in each rack. For this, we need a dedicated forensics network port, just like a management port, as can be seen in Figure 7. Also, this port must have a correspondent in the ToR switch. This port will be used by our Cloud infrastructure for collecting and processing data and also it will be used by the authenticated forensic investigators. Using this approach we will make sure that the network connections between the users and their virtual machines remain untouched and the time penalties will be reduced to a minimum. In this case the only lag the hypervisor will have
Figure 6: Datacenter topology Figure 8: Datacenter topology for Cloud forensics
Figure 7: Dedicated forensics port on a server
is the moment in which the forensic system will try to take a snapshot of a virtual machine. Cascading the modification, the new datacenter topology will be like the one presented in Figure 8. We marked with a red line the alteration that must be made in the form of an extra Ethernet used port. This topology is just like a secondary backbone, that is used only by the forensic system and the investigators. VII. R ESULTS In this section we will present some of our results taken after testing our implementation so far.The modules have been implemented and split across multiple workstations, as can be seen in Figure 9. They are represented as a cluster of servers, each having the functionality presented in detail in the architecture section.
Figure 9: Mapping modules to workstations.
As it can be seen, the entire modules found in the dotted perimeter, called “Management modules”, can also be ran all on one workstation. Elements like network switches are not represented in order not to burden the graphic, but the IP address of the hosts are kept. For the “Hypervisor / Cloud Interface” three distinct hypervisor servers have been used, each having its own security and load policies. In order to test our implementation, besides the scheduler part which is thoroughly tested in our previous work published in [9], we used the Node.JS module called “node-inspector” which allowed us to get all parameters from the V8 virtual machine. We ran an analysis both on the “Lease Manager” and “Hypervisor Manager” modules with 5 different leases. The leases are generic and each consists in a simple virtual machine, running Debian 5 Linux. Other parameters involving the leases do not affect the monitored timings. The results are presented in the following tables. We tested our implementation before and after activating the forensic modules. As we can see from the results, even though the forensic modules are constantly recording information about the running hosts, such as virtual memory and virtual disk, the maximum overhead is kept under 8% of the global load of the datacenter. In Table I we can see on the second column the time needed for a lease to be created on the system. On the third and fourth column we can see the amount of time needed to check the lease for consistency and storage. In Table II we can see on the second column the time needed for the retrieval of an lease from the storage and on the last column we see the amount of time needed by the Hypervisor Manager to check a lease resource availability on the system. Table I: Lease Manager result table # 1 2 3 4 5
Lease creation time (ms) Before After 204 220 205 221 289 314 208 232 262 301
Lease check-up (ms) Before After 10 35 11 36 11 36 10 36 10 36
Our work is focused on increasing reliability, safety, security and availability of Distributed Systems. The characteristics of such systems present problems when tackling with secure resource management due to its heterogeneity and geographical distribution. We presented the design of a hierarchical architectural model that allows users to seamlessly scale workloads both vertically and horizontally while preserving scalability of large scale distributed systems. ACKNOWLEDGEMENTS This paper has been financially supported within the project entitled “Horizon 2020 - Doctoral and Postdoctoral Studies: Promoting the National interest through Excellence, Competitiveness and Responsibility in the Field of Romanian Fundamental and Applied Scientific Research”, contract number POSDRU/159/1.5/S/140106. This project is co-financed by European Social Fund through the Sectoral Operational Programme for Human Resources Development 2007 - 2013. Investing in people! R EFERENCES [1]
[2]
[3] [4]
[5]
Lease store (ms) Before 1 2 2 1 1
After 2 2 2 2 2
[6]
[7] [8] [9]
Table II: Hypervisor Manager result table # 1 2 3 4 5
Lease retrieve (ms) Before After 3 15 3 16 3 15 3 15 3 15
Lease resource check (ms) Before After 25 45 28 50 51 69 26 44 39 36
[10]
[11]
[12]
VIII. C ONCLUSION In this paper we presented a novel solution to provide reliability and security for Cloud users. Our approach takes the form of a complete framework that can be used as standalone or on top of an existing Cloud infrastructure and we have described each of its layers and characteristics.
[13]
[14]
A. P˘atra¸scu and V. Patriciu, “Beyond Digital Forensics. A Cloud Computing Perspective Over Incident Response and Reporting”, IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI), 2013 S. Tayal, “Tasks Scheduling optimization for the Cloud Computing System”, (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES, Vol5, 2011 S. K. Garg, C. S. Yeo, A. Anandasivam and R. Buyya, “Energy efficient Scheduling of HPC application in Cloud Computing environments”, 2009 R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg and I. Brandic, “Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility”, 2008 A. P˘atra¸scu, C. Leordeanu, V. Cristea, “Scalable Service based Antispam Filters”, Proceedings of First International Workshop on the Service for Large Scale Distributed Systems(Sedis 2011) in conjunction with the EIDWT 2011 conference, Tirana, 2011, ISBN 978-0-7695-4456-4 A. P˘atra¸scu, D. Maimu¸t, and E. Simion, “New Directions in Cloud Computing. A Security Perspective.”, COMM International Conference, Bucharest, 2012. C. Gentry, “Fully homomorphic encryption”, 2008. B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private information retrieval”, J. ACM, 45, 6 (1998), pp. 965-981. A. P˘atra¸scu, C. Leordeanu, C. Dobre and V. Cristea, “ReC2 S: Reliable Cloud Computing System”, European Concurrent Engineering Conference, Bucharest, 2012. M. Al-Fares, A. Loukissas and A. Vahdat, “A Scalable, Commodity Data Center Network Architecture”, Proceedings of the ACM SIGCOMM 2008 conference on Data communication, 2008 D. Birk, “Technical issues for forensic investigations in cloud computing environments”, in IEEE Sixth International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE), pp 1-10, 2011 B. Martini and K. R. Choo, “An integrated conceptual digital forensic framework for cloud computing”, in Digital Investigation, vol. 9, pp 7180, November 2012 S. Thorpe, I. Ray, T. Grandison and A. Barbir, “Cloud Digital Investigations Based on a Virtual Machine Computer History Model”, in Future Information Technology, Application, and Service, 2012 G. Chen, “Suggestions to digital forensics in Cloud computing ERA”, in Third IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC), 2012