proportion of worldwide IT companies, the importance of cloud computing has been increasing. Software platforms such as. Eucalyptus [1] provide a means to ...
2012 IEEE International Conference on Systems, Man, and Cybernetics October 14-17, 2012, COEX, Seoul, Korea
An Availability Model for Eucalyptus Platform: An Analysis of Warm-Standy Replication Mechanism Jamilson Dantas, Rubens Matos, Jean Araujo and Paulo Maciel Informatics Center, Federal University of Pernambuco Recife, Brazil Email: {jrd, rsmj, jcta, prmm}@cin.ufpe.br Abstract— High availability in cloud computing services is essential for maintaining customer confidence and avoiding revenue losses due to SLA violation penalties. Since the software and hardware components of cloud infrastructures may have limited reliability, fault tolerance mechanisms are a means of achieving the necessary dependability requirements. This paper investigates the benefits of a warmstandy replication mechanism in a Eucalyptus cloud computing environment. A hierarchical heterogeneous modeling approach is used to represent a redundant architecture and compare its availability to that of a non-redundant architecture. Both hardware and software failures are considered in the proposed analytical models. The results show an enhanced dependability for the proposed redundant system, as well as a decrease in the annual downtime. The results also demonstrate that the simple replacement of hardware by more reliable machines would not produce improvements in system availability to the same extent as would the fault tolerant approach. Index Terms—Cloud computing, availability, reliability, analytical models
I. I NTRODUCTION Since it is now being either used or sought after by a large proportion of worldwide IT companies, the importance of cloud computing has been increasing. Software platforms such as Eucalyptus [1] provide a means to constuct private and hybrid clouds in the Infrastructure as a Service (IaaS) style, on top of common on-premise hardware equipment. The guarantee of high availability of cloud services is essential for maintaining users’ confidence and avoiding revenue losses but, as with any system, availability in the cloud may be affected by events such as hardware failures, planned maintenance, exchange of equipment, software bugs or updates. In order to cope with software and hardware limited reliability, the employment of fault tolerance mechanisms is an option to be considered. This paper investigates the benefits of a warm-standy replication mechanism [2] [3] in a Eucalyptus cloud computing environment. A hierarchical heterogeneous modeling approach is used to represent a redundant architecture and compare its availability to that of a non-redundant architecture. Both hardware and software failures are considered in our analytical models, which are also used to obtain closed-form equations for computing the availability of the cloud infrastructure. The rest of the paper is organized as follows. Section II briefly discusses the analytical modeling for dependability evaluation. Section III demonstrates some mechanisms used for enhancing system availability. Section IV illustrates the main cloud computing concepts and explains the Eucalyptus platform. Section V describes our experimental study, including
978-1-4673-1714-6/12/$31.00 ©2012 IEEE
the analytical models and the corresponding results. Section VI draws some conclusions and points to possible future works.
II. A NALYTICAL MODELS FOR DEPENDABILITY EVALUATION
System dependability can be understood as the ability to deliver a specified functionality that can be justifiably trusted [4]. An alternate definition of dependability is “the ability of a system to avoid failures that are more frequent or more severe, and outage durations that are longer than is acceptable to the user” [4]. Dependability encompasses measures such as reliability, availability, and safety. Due to the ubiquitous provision of services on the Internet, dependability has become an attribute of prime concern in hardware/software development, deployment, and operation [5]. Dependability is a very important property for a cloud system as it should provide services with high availability, stability, fault tolerance and dynamical extensibility. Because cloud computing is a largescale distributed computing paradigm and its applications are accessible anywhere and at anytime, dependability in cloud systems is becoming more important and more difficult to achieve [6]. There are various model types which may be used for analytical evaluation of dependability. Reliability block diagrams, fault trees, stochastic Petri nets and Markov chains have been used to model fault-tolerant systems and to evaluate various dependability measures. These model types differ from one to another not only in the ease of use for a particular application but in terms of modeling power [7]. They may be broadly classified into combinatorial and state-based models [5]. Statebased models may also be referred to as non-combinatorial, and combinatorial can be identified as non-state based models. Combinatorial models (e.g., reliability block diagram, fault tree) capture the conditions that cause a system to fail (or to be working) in terms of structural relationships between the system components. These relationships observe the set of components (and sub-systems) of the system that should be properly working for the system as a whole to be working properly, or inversely which set of components should be faulty for the system as a whole to fail. Combinatorial models enable, in general, a more concise representation of the system when compared to non-combinatorial models. State-based models (e.g., Markov chains [8], stochastic Petri nets [9]) represent the system behavior (failures and repair activities) by its states, and
1664
event occurrence is expressed by labeled state transition [5]. Labels can be probabilities, rates or distribution functions. These models allow the representation of more complex relationships between system components, such as dependencies involving sub-systems and resource constraints [5]. Some state-based models may also be evaluated by discrete event simulation when needed, for example in case of intractable large state spaces. In some special cases state-based analytic models can be solved to derive closed-form answers, but generally a numerical solution of the underlying system of equations is necessary. Combinatorial and state-space models may be hierarchically combined, in order to get the best of both worlds, e.g., a reliability block diagram is used to represent the dependability relationship between independent subsystems, while detailed or more complex fail and repair mechanisms are modeled with stochastic Petri nets. This approach enables the representation of many kinds of dependency between components, and avoids the known issue of state-space explosion when dealing with large systems.
III. R EDUNDANCY IN HIGH AVAILABILITY CLUSTERS In systems which require high availability and reliability, it is necessary to create methods for detecting and correcting faults, and avoiding system failure. A failure in a large scale system can mean catastrophic losses. Many techniques have been proposed and adopted to build failover clusters [10] as well as to leverage virtualization and cloud systems for addressing service dependability issues [11] [12]. Many techniques are based on redundancy, i.e, the replication of components so that they work for a common purpose and ensure data security and availability even in the event of some component failure. In order to build failover clusters, composed by two or more nodes, there are some essential activities. One activity is to replicate the data among the available machines to maintain data consistency. Another essential action is monitoring of the active servers so that they are able to detect and handle failures, replacing the failed nodes by spare ones. In Linux-based systems, these two activities are usually performed by the Heartbeat [13] and DRBD (Distributed Replicated Block Device) [14] tools. Heartbeat is a cluster management software which enables a cluster infrastructure to identify its hosts, active and inactive, by means of periodic message exchanging. Figure 1, shows a cluster with two nodes, responsible for maintaining a given uninterruptable service. Heartbeat would send messages from one node to the other node, detecting when host1 is offline so as to activate the target service in host2, with no perceptible interruption or delay to the end user.
Fig. 1: Heartbeat and DRBD in a two nodes cluster DRBD is a module for the linux kernel which provides a block device driver [15]. In each cluster node, the DRBD driver
is in control of a “real” block device, which holds a replica of the system’s data. Read operations are carried out locally, while writes are transmitted to the other nodes in the HA cluster. Each DRBD device may be in primary or secondary state. Applications are granted with write access only if the device is in primary state. Only one device of a connected device pair may be in primary state. The assignment of the roles (primary or secondary) to the devices is usually done by cluster management softwares such as Heartbeat. DRBD works as a network RAID1, so it is responsible for synchronizing the data between hosts. The data mirroring in DRBD uses one of three protocols [15]. Protocol A is fully asynchronous. Applications are notified of write completions when the writes have completed locally, which is usually before they have propagated to the other hosts. Protocol B couples the nodes more closely than the previous one. Completion of a write operation is signaled to the upper layers of the operating system as soon as the local IO is complete and an acknowledgement packet has arrived from the secondary node. This acknowledgement is sent by the secondary node as soon as it receives the write operation. Protocol C is fully synchronous. Applications are notified of write completions after the writes have been carried out on all hosts.
IV. C LOUD C OMPUTING AND E UCALYPTUS P LATFORM A N OVERVIEW Cloud computing provides the access to computers and their functionality via the Internet or a local area network. It is called “cloud computing” because the user can not actually see or specify the physical location and organization of the equipment hosting the resources they are ultimately allowed to use [1]. Numerous advances in application architecture have helped to promote the adoption of cloud computing. These advances help to support the goal of efficient application development while helping applications to be elastic and scale gracefully and automatically [16]. EUCALYPTUS - Elastic Utility Computing Architecture Linking Your Programs To Useful Systems - is a software that implements scalable IaaS-style private and hybrid clouds [17]. It was created with the purpose of cloud computing research and it is interface-compatible with the commercial service Amazon EC2 [1]. The API compatibility enables to run an application on Amazon and on Eucalyptus without modification. In general, the Eucalyptus platform uses the virtualization capabilities (hypervisor) of the underlying computer system to enable flexible allocation of computing resources decoupled from specific hardware [17]. There are five high-level components in the Eucalyptus architecture, each with its own web service interface: Cloud Controller, Cluster Controller, Node Controller, Storage Controller, and Walrus [17]. Figure 2 shows an example of Eucalyptusbased cloud computing environment, considering two clusters (A and B). Each cluster has one Cluster Controller, one Storage Controller, and various Node Controllers. The components in each cluster communicate to the Cloud Controller and Walrus in order to service the user requests. An user is able to employ
1665
EC2 tools as an interface to the Cloud Controller, or S3 (Amazon’s Simple Storage Service) tools to access Walrus. A brief description of each component follows:
(a) No redundancy
(b) Redundancy in the GC
Fig. 3: Private cloud architectures
Fig. 2: Eucalyptus high-level components [18] The Cloud Controller (CLC) is the front-end to the entire cloud infrastructure. The CLC is responsible for exposing and managing the underlying virtualized resources (servers, network, and storage) via Amazon EC2 API [16]. This component uses web services interfaces to receive the requests of client tools on one side and to interact with the rest of Eucalyptus components on the other side. The Cluster Controller (CC) usually executes on a cluster front-end machine [17] [19], or on any machine that has network connectivity to both the nodes running NCs and to the machine running the CLC. CCs gather information about a set of VMs and schedules VM execution on specific NCs. The CC has three primary functions: schedule incoming instance run requests to specific NCs, control the instance virtual network overlay, and gather/report information about a set of NCs [19]. Node Controller (NC) runs on each node and controls the life cycle of instances running on the node. The NC interacts with the operating system and with the hypervisor running on the node. NCs control the execution, inspection, and termination of VM instances on the host where it runs. It queries and controls the system software on its node in response to queries and control requests from the CC [17]. A NC makes queries to discover the node’s physical resources - number of CPU cores, size of memory, available disk space - as well as to learn about the state of VM instances on that node [19], [20]. Storage controller (SC) provides persistent block storage for use by the virtual machine instances. It implements blockaccessed network storage, similar to that provided by Amazon Elastic Block Storage - EBS [19], and it is capable of interfacing with various storage systems (NFS, iSCSI, etc.). An elastic block storage is a Linux block device that can be attached to a virtual machine but sends disk traffic across the locally attached network to a remote storage location. An EBS volume can not be shared across instances [20]. [21] Walrus is a file-based data storage service, that is interface compatible with Amazon’s Simple Storage Service (S3) [19]. Walrus implements a REST interface (through HTTP), sometimes termed the ”Query” interface, as well as SOAP interfaces
that are compatible with S3 [19], [20]. Users that have access to Eucalyptus can use Walrus to stream data into/out of the cloud as well as from instances that they have started on nodes. In addition, Walrus acts as a storage service for VM images.
V. C ASE S TUDY Based on the study of characteristics and components described in Section IV, some analytical models were created for describing the behavior of Eucalyptus cloud systems. In order to verify the effects of a replication mechanism on one of the main components of the system, this case study analyzed the availability of two private cloud architectures. Both hardware and software components were considered in our models. Figure 3a depicts the first architecture, where all the main software components of Eucalyptus - Cloud Controller, Cluster Controller, Storage Controller and Walrus - run on the same machine, called General Controller. There are N nodes available for deploying the virtual machines, by means of the Node Controller running in each of them. This architecture has a single point of failure in the General Controller, so that any hardware or software failure may bring the whole system down. The second architecture, depicted in Figure 3b implements a redundant General Controller (GC2) following a warm-standby strategy. All software components of the primary General Controller are also installed in the spare machine, which is activated only when a failure in GC1 is detected. Continuous monitoring and data synchronization mechanisms enable the fast switchover of the service to the secondary host, characterizing the warm-standy redundancy approach. Due to their simplicity and efficiency of computation, Reliability Block Diagrams are used to analyze the dependability of the first architecture. A hierarchical heterogeneous model, composed of an RBD and Markov Reward Model (MRM), is used for the second architecture. The RBD describes the high-level components, whereas the MRM represents the components involved in the redundancy mechanism. The MRM enables the obtainment of a closed-form equation for the availability of the redundant general controller subsystem. A. Model for non-redundant architecture From a dependability point of view, the private cloud infrastructures depicted in Figure 3 may be divided in two parts:
1666
General Controller subsystem and Nodes subsystem. In the non-redundant architecture, the General Controller subsystem is represented by a pure series RBD, as shown in Figure 4. This subsystem consists of hardware, operating system, and the following Eucalyptus components: CLC (Cloud Controller), CC (Cluster Controller), SC (Storage Controller) and Walrus.
Fig. 4: RBD model of the non-redundant General Controller subsystem
6 shows the complete RBD model with one (non-redundant) General Controller and its five nodes. The availability of this non-redundant architecture was computed using this RBD and is shown in Table IV, with its related measures: the number of 9’s [24], which constitutes a logarithmic view of the availability; and the downtime, which better denotes the impact of service unavailability from the user’s standing point. Since the annual downtime reaches 46.66 hours, it is noteworthy that the system availability is not so high as would normally be expected for a cloud infrastructure.
Table I presents the values of mean time to failure (MTTF) and mean time to repair (MTTR) used in the GC model. Those values were obtained from [22] [23], and were used to compute the dependability metrics for the non-redundant general controller, and subsequently for the whole system. This subsystem has an MTTF of 180.72 h and an MTTR of 0.96 h.
TABLE I: Input Parameters for the General Controller Component HW SO CLC CC SC Walrus
MTTF 8760 h 2893 h 788.4 h 788.4 h 788.4 h 788.4 h
MTTR 100 min 15 min 1h 1h 1h 1h
Figure 5 shows the RBD model that represents one node in the Nodes subsystem. Besides the hardware and operating system, which are also present in the General Controller, each node needs a hypervisor and a Eucalyptus node controller, in order to be available in the cloud.
Fig. 5: RBD model of one node in the Nodes subsystem The Nodes subsystem model assumes that the hardware and operating system of the nodes has the same dependability characteristics as the General Controller, i.e., the same MTTF and MTTR. Therefore, Table II presents only the parameter values for the KVM and NC blocks [22] [23]. The analysis of this model provides an MTTF of 481.82 hours and an MTTR of 0.91 hours for each node in this cloud environment.
TABLE II: Input Parameters for the nodes Component KVM NC
MTTF 2990 h 788.4 h
MTTR 1h 1h
This case study considers the example of an architecture with five nodes, where at least one node must be available for the cloud to work properly. Therefore the Nodes subsystem is represented by a set of five node blocks in parallel. Figure
Fig. 6: RBD model for the non-redundant cloud system
B. Redundant private cloud architecture In order to enhance the availability of the infrastrucuture mentioned in the previous section, we propose the implementation of redundancy in the General Controller, which is the single entry point for customers in the cloud. Software mechanisms such as DRBD and Heartbeat, described in Section III, can be used for this purpose, since they enforce the data consistency and service availability. Therefore another model is used to represent a redundant architecture, with one General Controller active and one replicated GC host in warm-standby. This replication strategy cannot be properly represented in RBD models, due to dependency between states of components. So the redundant General Controller subsystem is represented by a Markov Reward Model (MRM), shown in Figure 7. The formal definition of MRM follows: Definition 1. A Markov Reward Model (MRM) M is a threetuple ((S,R,Label),ρ,ı)) where (S,R,Label) is the underling labeled CTMC C , ρ : S → ≥0 is the state reward structure, and ı : S × S → ≥0 is the impulse reward structure such that if RS,S ≥ 0 then ı(s, s) = 0. An MRM is a labeled CTMC augmented with state reward and impulse reward structures. The state reward structure is a function ρ that assigns to each state s ∈ S a reward ρ(s)
1667
such that if t time-units are spent in state s, a reward of ρ(s) · t is acquired. The rewards that are defined in the state reward structure can be interpreted in various ways. They can be regarded as the gain or benefit acquired by staying in some state and they can also be regarded as the cost incurred by staying in some state. The impulse reward structure, on the other hand, is a function ı that assigns to each transition from s to s’ , where s, s ∈ S , a reward ı(s, s ) such that if the transition from s to s’ occurs, a reward of ı(s, s ) is acquired. Similar to the state reward structure, the impulse reward structure can be interpreted in various ways. An impulse reward can be considered as the cost of taking a transition or the gain that is acquired by taking the transition. The MRM model in Figure 7 has 5 states: UW, UF, FF, FU, and FW. The state reward ρ(s) assigned to UW, UF and FU is equal to 1, since the GC subsystem is available in those states. The state reward assigned to FF and FW (shaded states) is equal to 0, since the GC subsystem is down in those states. There are no impulse rewards in this model. Therefore, the steady-state availability of the GC subsystem can be computed as the steadystate reward of the MRM, so AGC = s∈S πs · ρ(s), where πs is the steady-state probability of being in the state s, and ρ(s) is the reward assigned to the state s. In the state UW, the primary GC is up and the secondary GC is in a waiting condition. When GC1 fails, the system goes to state FW, where the secondary GC has not yet detected the failure of GC1. FU represents the state where GC2 leaves the waiting condition and assumes the role of active general controller, whereas GC1 is failed. If GC2 fails before the repair of GC1, the system goes to the state FF. In order to prioritize the repair of the main server there is only a single repair transition from FF, which goes to UF. If GC2 fails when GC1 is up, the system goes to state UF, returning to state UW with the repair of GC2, or going to state FF in case of GC1 also fails. The failure rates of GC1 and GC2 are denoted by λ s1 and λ s2 respectively. The rate λi s2 denotes the failure rate of the secondary server when it is inactive. The repair rate of GC2 is μ s2. The transition rate sa s2 represents the switchover rate, i.e. the inverse of the mean time to activate the secondary server after a failure of GC1. Table III presents the values for all the mentioned parameters of the MRM. The value of μ s1 is equal to the value of μ s2, the rates λ s1 and λ s2 also have equal values. These λ and μ values were obtained from the MTTF and MTTR computed using the single General Controller RBD model. The failure rate of the secondary GC when it is inactive is assumed to be 20% smaller than the failure rate of an active GC. The value of sa s2 comes from default monitoring interval and activation times found in softwares such as Hearbeat.
TABLE III: Parameter values for the Markov chain model Parameter λ s1 = λ s2 = 1/λ λi s2 = 1/λi μ s1 = μ s2 = 1/μ sa s2 = 1/sa
Description Mean time for host failure Mean time for inactive host failure Mean time for host repair Mean time to system activate
Value 1/180.721 1/216.865 1/0.9667 1/0.005
The model of Figure 7 enables the obtainment of a closed-
Fig. 7: Markov chain Model to redundant system with two nodes. form equation for the availability of the redundant general controller subsystem, as seen in Equation 1.
μ(λi (μ + sa) + μ2 + sa(λ + μ)) λi (λ + μ)(μ + sa) + μ2 (λ + μ) + sa(λ2 + λμ + μ2 ) (1) It is also possible to obtain a closed-form equation for computing the availability of the whole cloud infrastrucuture (Acloud ), from the corresponding RBD model. Equation 2 denotes how to compute Acloud , according to the rule of composition of series and parallel components [25]. AGC comes from Equation 1, whereas the availability of each node, AN ode i , can be computed from the RBD shown in Figure 5. AGC =
Acloud = AGC ∗ (1 −
n
(1 − AN ode i ))
(2)
i=1
TABLE IV: Availability measures of the system with and without redundancy Measure Steady-state availability Number of 9’s Annual downtime
GC without redundancy 0.99467823178 2.273944 46.66 h
GC with redundancy 0.99991793 4.08581 0.72 h
Table IV illustrates the availability measures of the cloud system with a redundant general controller. These results, which compare the system with and without redundancy, reveal a huge increase in the system availability, including a decrease of 98.45% in the annual downtime. Such a big decrease in the downtime is the expected result for a change from a system with a single point of failure to a redundant system. It is important to stress that the benefits of the replication cannot be achieved by simply enhancing the GC hardware reliability, i.e., by the acquisition of a machine with a larger mean time to failure. This fact is demonstrated by the comparison analysis depicted in Figures 8a and 8b. Even if the MTTF of the GC hardware was increased from the current value (8760 hours, or 12 months) to 60 months, the availability of the non-redundant system would not reach the availability of the redundant system with less reliable hardware. Therefore, the implementation of a replicated warm-standby GC is a valuable strategy to achieve high availability in Eucalyptus
1668
cloud systems. Other important point to stress is that the models analyzed here do not include network failures or problems during the activation of the secondary General Controller. Those additional failure events would result in a bigger downtime for both, the non-redundant and the redundant systems. The analysis of the Eucalyptus systems considering this simplified scenario is valuable to get an overview of the benefits provided by the replication of its components, without incurring in large and complex models.
(a) Non-redundant architecture
(b) Redundant architecture
Fig. 8: System Availability vs MTTF of General Controller hardware
VI. C ONCLUSIONS AND FUTURE WORKS This paper investigated the benefits of a warm-standy replication mechanism in a Eucalyptus cloud computing environment. A hierarchical heterogeneous model was used to represent a redundant architecture and compare its availability to that of a non-redundant architecture. The results show an enhanced dependability of the proposed redundant system, evidenced by the growth from 2 to 4 nines in the availability, as well as a decrease in the annual downtime from 46 hours to only 43 minutes. The results also demonstrate that the simple replacement of hardware by more reliable machines will not produce such a big improvement in system availability. As future work, we intend to analyze other scenarios where the components of the general controller are separated in different machines. Another possible work is to verify, by means of
testbed experiments, the data consistency between the replicated servers, and in so doing, guarantee the reliability of the warmstandy redundancy approach.
R EFERENCES [1] Eucalyptus. (2012) Eucalyptus - the open source cloud platform. Eucalyptus Systems, Inc. Available in: http://open.eucalyptus.com/. [2] A. Guimar˜aes, P. Maciel, R. Matos Jr, and K. Camboim, “Dependability analysis in redundant communication networks using reliability importance,” in Proc. of the 2011 Int. Conf. on Information and Network Technology (ICINT 2011), Chennai, 2011. [3] R. Matos Junior, A. Guimaraes, K. Camboim, P. Maciel, and K. Trivedi, “Sensitivity analysis of availability of redundancy in computer networks,” in Proc. of The Fourth Int. Conf. on Communication Theory, Reliability, and Quality of Service (CTRQ 2011), Budapest, 2011. [4] A. Avizienis, J.-C. Laprie, B. Randell, and C. E. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE Trans. Dependable Sec. Comput., vol. 1, no. 1, pp. 11–33, 2004. [5] P. Maciel, K. S. Trivedi, R. Matias, and D. S. Kim, “Dependability modeling,” in Performance and Dependability in Service Computing: Concepts, Techniques and Research Directions. Hershey: IGI Global, 2011. [6] D. Sun, G. Chang, Q. Guo, C. Wang, and X. Wang, “A dependability model to enhance security of cloud environment using systemlevel virtualization techniques,” in Proc. First Int. Conf. on Pervasive Computing, Signal Processing and Applications (PCSPA 2010), Harbin, 2010. [7] M. Malhotra, “Power-hierarchy of dependability model types,” IEEE Trans. on Reliability, vol. 43, no. 2, pp. 493–502, Sept. 1994. [8] G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi, Queuing Networks and Markov Chains: modeling and performance evaluation with computer science applications, 2nd ed. John Wiley and Sons, 2001. [9] M. K. Molloy, “Performance analysis using stochastic petri nets,” IEEE Trans. Comput., vol. 31, no. 9, pp. 913–917, Sept. 1982. [10] T. Liu and H. Song, “Dependability prediction of high availability oscar cluster server,” in Proceedings of the 2003 Int. Conf. on Parallel and Distributed Processing Techniques and Applications, June 2003. [11] W.-L. Yeow, C. Westphal, and U. Kozat, “A resilient architecture for automated fault tolerance in virtualized data centers,” in IEEE Network Operations and Management Symposium (NOMS), 2010, pp. 866 –869. [12] V. Chaudhary, M. Cha, J. Walters, S. Guercio, and S. Gallo, “A comparison of virtualization technologies for hpc,” in Advanced Information Networking and Applications (AINA 2008). 22nd Int. Conf. on, 2008, pp. 861–868. [13] Heartbeat, “Linux-ha.” [Online]. Available: http://www.linux-ha.org [14] DRBD, “Distributed replicated block device.” [Online]. Available: http://www.drbd.org/ [15] P. Reisner, “Drbd - distributed replicated block device,” in Proc. of the 9th Int. Linux System Technology Conference, Cologne, Sept. 2002. [16] SunMicrosystems, Introduction to Cloud Computing Architecture, 1st ed. Sun Microsystems, Inc., Jun. 2009. [17] Eucalyptus, Cloud Computing and Open Source: IT Climatology is Born, Eucalyptus Systems, Inc., Goleta, CA, 2010. [18] ——, “Eucalyptus cloud computing platform - administrator guide,” Eucalyptus Systems, Inc, Tech. Rep., 2010, version 1.6. [19] ——, Eucalyptus Open-Source Cloud Computing Infrastructure - An Overview, Eucalyptus Systems, Inc., Goleta, CA, 2009. [20] J. D, K. Murari, M. Raju, S. RB, and Y. Girikumar, Eucalyptus Beginner’s Guide, uec ed., 2010. [21] Amazon. (2011) Amazon Elastic Block Store (EBS). Amazon.com, Inc. Available in: http://aws.amazon.com/ebs. [22] D. S. Kim, F. Machida, and K. Trivedi, “Availability modeling and analysis of a virtualized system,” in Dependable Computing, 2009. PRDC ’09. 15th IEEE Pacific Rim Int. Symp. on, 2009, pp. 365–371. [23] T. Hu, M. Guo, S. Guo, H. Ozaki, L. Zheng, K. Ota, and M. Dong, “Mttf of composite web services,” in Parallel and Distributed Processing with Applications (ISPA), 2010 Int. Symp. on, Sept. 2010, pp. 130 –137. [24] M. Marwah, P. Maciel, A. Shah, R. Sharma, T. Christian, V. Almeida, C. Ara´ujo, E. Souza, G. Callou, B. Silva, S. Galdino, and J. Pires, “Quantifying the sustainability impact of data center availability,” SIGMETRICS Perform. Eval. Rev., vol. 37, pp. 64–68, Mar. 2010. [25] W. Kuo and M. Zuo, Optimal reliability modeling: principles and applications. John Wiley & Sons, 2003.
1669