ACCRS: autonomic based cloud computing resource scaling (PDF ...

11 downloads 5224 Views 2MB Size Report
Nov 24, 2016 - This paper presents the ACCRS framework for cloud computing infrastructures to ...... Parashar, M., Hariri, S.: Autonomic Computing: Concepts,.
Cluster Comput DOI 10.1007/s10586-016-0682-6

ACCRS: autonomic based cloud computing resource scaling Ziad A. Al-Sharif1 · Yaser Jararweh2 · Ahmad Al-Dahoud2 · Luay M. Alawneh1

Received: 12 June 2016 / Accepted: 2 November 2016 © Springer Science+Business Media New York 2016

Abstract A cloud computing model gives cloud service providers the ability to retain multiple workloads on a single physical system. However, efficient resource provisioning and possible system fault management in the cloud can be a challenge. Early fault detection can provide room to recover from potential faults before impacting QoS. Current static techniques of fault management in computing systems are not satisfactory enough to safeguard the QoS requested by cloud users. Thus, new smart techniques are needed. This paper presents the ACCRS framework for cloud computing infrastructures to advance system’s utilization level, reduce cost and power consumption and fulfil SLAs. The ACCRS framework employs Autonomic Computing basic components which includes state monitoring, planning, decision making, fault predication, detection, and root cause analysis for recovery actions to improve system’s reliability, availability, and utilization level by scaling resources in response to changes in the cloud system state.

B

Ziad A. Al-Sharif [email protected] Yaser Jararweh [email protected] Luay M. Alawneh [email protected]

1

Software Engineering Department, Jordan University of Science and Technology, Irbid 22110, Jordan

2

Computer Science Department, Jordan University of Science and Technology, Irbid 22110, Jordan

Keywords Cloud computing · Resource scaling · Autonomic computing · Quality of service · Energy efficiency

1 Introduction A Cloud Service Provider (CSP) provides and maintains on-demand computing services to users with an acceptable Quality of Service (QoS). Cloud Users (CU) are unrestrained to system maintenance, resource provisioning and service continuity, which became the obligations of CSPs. This permits CUs to focus on their business advancement without wasting time on system’s related issues. Furthermore, cloud computing model gives CSPs the ability to operate multiple workloads on a single physical system. This tremendously reduces cost and power consumption and increases resource utilization. However, this model incurs a number of challenges, some of which relate to efficient resource provisioning (scheduling) and potential system faults that may cause service interruption and consequently affect CSPs profit, market share, and reputation. The Service Level Agreements (SLAs) constrain CSPs to ensure service availability, reliability, and continuity. In a cloud system, the number of possible faults creates a critical challenge for CSPs and their SLAs. Early fault detection will provide room for CSPs to recuperate from faults before impacting QoS. On the other hand, Autonomic Computing in Cloud (ACC) is the cloud system’s ability to manage itself given high-level objectives [1–4]. Cloud computing models are growing large, complex, and costly to be managed yet workloads and environment conditions tend to change rapidly. Thus, autonomic decisions and actions are needed. The goal is to make cloud computing systems and their applications capable of managing themselves with minimum human interference. Thus,

123

Cluster Comput

ACC tries to ensure the system’s survivability, which is system’s ability to maintain its near optimal performance with minimum resources, protect itself from all types of attacks, recover from faults, and reflect to changes in the environment by automatically reconfiguring its resources. Environment changes can be internal such as excessive CPU utilization and high power consumption and/or external such as external attacks and spike in the incoming workloads, any of which can impact the system’s equilibrium [5]. Thus, the system must be able to modify itself in order to counter the effects of changes in the environment and maintain its equilibrium. The changes are analyzed to determine whether any of the essential variables outreaches its viability limits. Then, it triggers a predefined plan to determine the proper changes to inject into the current behavior of the system such that it accommodates these changes and returns the system to its stability state within the new environment. However, current static techniques and fault management in computing systems are not satisfactory enough [6–10] to handle ACC and provide the QoS requested by CUs. Thus, new smart yet comprehensive procedures are needed. This paper presents on-demand Autonomic Cloud Computing Resource Scaling (ACCRS) framework. The ACCRS framework employs resource provisioning of cloud environments and dynamically scales cloud resources based on available system resources, utilization level and SLAs. It improves cloud system reliability and availability by applying proactive fault detection techniques to prevent fault occurrence, if possible, and employing reactive recovery techniques when faults occur. ACCRS provides a mechanism whereby changes in the cloud system’s essential variables (i.e. performance, power, fault, security, etc.) can cause changes to the behavior of the computing system such that the system is brought back into equilibrium with respect to the environment. ACCRS will help CSPs to succeed in manifold. First, reduce costs and power consumption by increasing the utilization levels and efficiency of equipment and facilities. Second, create dynamic network policies that allow dealing with larger workloads and higher demands while maintaining reliability and availability. Finally, increase the velocity at which IT can respond to business needs and satisfying the verity of these needs. The ACCRS framework is composed of resource scaling to optimize operational cost, root cause analysis of faults, early fault detection and fault prevention with a fast recovery to system’s normal state, which is identified as a safe zone. The rest of this paper presents and evaluates the ACCRS framework. Section 2 highlights some of the related research. Section 3 presents the details of the ACCRS framework. Section 4 presents the results that are generated using CloudExp. Finally, Section 5 concludes our findings and presents our planned future work.

123

2 Related work Various researchers have tackled the utilization of a cloud system from different perspectives. Buyya et al. [11] presented a data analytics workflow engine, which is a prototype for dynamic resource provisioning. This system monitors the system workflow and calls the resource manager to increase or decrease cloud resources. The Aneka system [12] presents a resource provisioning algorithm based on SLA orientation by comparing the time needed to accomplish the current jobs with the proposed SLA timeline. Coasters system [13] is a uniform resource provisioning and access for clouds and grids. It assumes to build a uniform system to access a multiservice system, such as a Grid or a cloud. Its main goal is to convene both usability and performance goals. Lee et al. [14] presented optimal cloud resource provisioning (OCRP) algorithm that tries to minimize both under-provisioning and over-provisioning problems under the demand and price uncertainty in cloud computing environments. Dejun et al. [15] presented a resource provisioning approach for web applications in the cloud. Their approach consists of performance profiling where each tier consists of multiple hosts running the same application. Scarce [16] consists of deploying an agent at server side that is responsible for managing the resources and checking the system’s health. Federated Cloud Environment [17] is a collection of IaaS providers that interact with each other in order to minimize the cost of resource provisioning. The authors in [18] presented a resource provisioning approach to provide users with control over the resource manager. A limited look-ahead control (LLC) [19] proposed a prediction algorithm to solve the resource provisioning based on model predictive control techniques. In [20], the authors presented guided redundant submission (GRS) to ensure high performance distributed computing (HPDC) that tries to apply resource provisioning on slot allocation. Panda et al. [21] proposed several task scheduling algorithms in heterogeneous multi-cloud systems. Finally, the authors in [22] presented resource provisioning using virtual machine multiplexing, which defines the performance measurement and resource provisioning through SLA. The authors in [23] presented dynamic resources provisioning using a multi-agent based technique. An autonomic system can be a collection of autonomic components, which can manage their internal behaviors and relationships with others in accordance to a set of predefined policies. The autonomic computing system has properties [24] such as self-optimizing, self-protecting, selfconfiguring, and self-healing. It should be noted that an autonomic computing system addresses these issues in an integrated manner rather than being treated in isolation. Consequently, the system design paradigm takes a holistic approach that can integrate all these attributes seamlessly and

Cluster Comput

efficiently. In our work, we mainly focus on self-configuring of a cloud system. The ACCRS framework aims to provide the optimal resource allocation to satisfy user’s SLA, reduce power consumption, and increase the resource efficiency level (higher utilization level). Algorithm 1 System State Analyses and Decision Making 1: procedure SSA- DMA 2: WorkloadType ← SSA-WCA 3: NeededHosts ← SSA-Predict 4: 5:  Check System for Potential Faults (i.e. Hardware Failure) 6: while W or kloadT ype ∈ Sa f eZ one ∧ F AU L T == F AL S E do 7: DoN othing 8: while W or kloadT ype ∈ / Sa f eZ one ∧ F AU L T == F AL S E do 9: if W or kloadT ype == L I G H T then 10: CloudState ← UU 11: Decrease Resources based on Num. of NeededHosts 12: if W or kloadT ype == H E AV Y then 13: CloudState ← OU 14: if Resour ces == Available then 15: Increase Resources based on Num. of NeededHosts 16: else 17: Add-to-WaitingQueue 18: if F AU L T == T RU E then 19:  Apply RCA Techniques and Identify Faulty Resources (i.e. Hardware ID) 20: if W or kloadT ype == L I G H T then 21: CloudState ← UUF 22: Decrease/Increase Resources based on Num. of Needed Hosts 23: Migrate Obstructed VMs 24: else if W or kloadT ype == H E AV Y then 25: CloudState ← OUF 26: if Resour ces == Available then 27: Increase Resources based on Num. of NeededHosts 28: Replace Infected hosts 29: Migrate Obstructed VMs 30: else 31: Add-to-WaitingQueue

Algorithm 2 System State Analyses and Workload Classification 1: procedure SSA- WCA 2: cpu ← current level of CPU 3: ram ← current level of RAM 4: bw ← current level of BandWidth 5: 6:  CPU, RAM, & Bandwidth Thresholds are: 80%, 86%, & 63%, respectively 7: if cpu > 80%∨ram > 86%∨bw > 63% then return H E AV Y 8: else if cpu < 70% ∨ ram < 70% ∨ bw < 50% then return H E AV Y 9: else return Sa f eZ one

Fig. 1 The ACCRS framework

3 ACCRS framework Figure 1 depicts ACCRS framework and its major components. These components are explained below. 3.1 System state monitoring (SSM) This component is responsible for recording CPU, RAM, bandwidth utilization level, system throughput, and power consumption data. 3.2 System state analyses and decision making algorithm(SSA-DMA) This component processes the data collected in the SSM component above. It uses Algorithm 1 to perform the correct decision. The data is first checked for any possible hardware failure by measuring the system throughput. We assume that the system input must meet the system output based on the aspects of cloud system flow and the number of active VMs (i.e., 50 VMs). In case of hardware failure, we use the Root Cause Analysis (RCA) algorithm [25,26] in finding the fault origin (i.e. Host ID). RCA is used to identify and replace the hosts with failure. In case of free-errors state, we monitor the system utilization level to identify the workload intensity Fig. 2. Algorithm 2 presents the Workload Classification Algorithm (WCA) that identifies the system’s workload as heavy or light weight. WCA defines heavy workload by measuring utilization level of system’s RAM, CPU, and bandwidth and checks whether any of these attributes breach the 86, 80, and 63% respectively; these thresholds are experimentally identified by our experiments. A system in a faulty state or a heavy workload needs resource scaling, i.e. increase number of hosts, in order to return to the safe zone, which is between 70% and 80% utilization. Experimentally, it is reported by IBM that the ideal system utilization level for a near optimal power consumption is 75% [27]. The light workload state is considered when system utilization is below 70%. A system with a lightweight workload consumes unnecessary power as in the case of heavy workload, but with low utilization level and low throughput. By decreasing the number of hosts to an optimal number, we can achieve better sys-

123

Cluster Comput

Fig. 2 System’s safe zone [5]

tem utilization level within the system’s safe zone. This also optimizes the power consumption and system performance without impacting system’s reliability, availability and enduser’s SLAs. ACCRS identifies a cloud system in one of five states: 1. Safe zone means the system performs properly and its utilization level is within 70–80%, which means the system has an optimal utilization level, power consumption, and QoS (SLA is maintained). 2. Under-utilization (UU) occurs when the system encounters a light workload. The power consumption of this workload is high and its throughput is low. 3. Under-utilization with fault (UUF) occurs when the system encounters a light workload with some faulty resource (i.e., hardware failure). 4. Over-utilization (OU) occurs when the system encounters a heavy workload that puts the system utilization and throughput in a high level. Increasing the system utilization will cause the system to drop incoming workloads or delay them in a waiting queue. 5. Over-utilization with Fault (OUF) occurs when the system encounters a heavy workload with some faulty resources (i.e., hardware failure).

3.3 Host-level resource scaling (H-LRS) This component has a Global Cloud Manager (GCM) that performs resource scaling based on the best cluster configuration that is determined in the previous component (the SSA-DMA). Experimentally, and based on a normal cloud system configuration, we found that one host in heavy workload can deal with 5–6 VMs. Consequently, these results are used to predict the amount of resources needed to increase or decrease the number of hosts. The resource scaling process

123

will decide to scale up or down resources in order to get the near optimal utilization and power consumption while maintaining high level QoS. Continuous system monitoring is required to collect system’s information of utilization level. The main objective is to keep the system within its safe zone as possible. If the system is out of its safe zone, the ACCRS tries to return the system to its safe zone by scaling up or down system resources. Furthermore, the two-phase prediction algorithm is used to predict the optimal number of hosts needed, see Algorithm 3. Through our experimentation, we found that for light and heavy workloads, we can predict the number of hosts to decrease or increase in order to save power, increase utilization, and prevent the system from dropping or even delaying user requests. System specification contains the current number of hosts (H N ), host RAM (H R ), host CPU (HC ), and the safe zone threshold sa f ezonet that is found based on the number of VMs (VN ), and the maximum memory (V R ) and CPU (VC ) specifications that it could have, and the safe zone boundaries. – Phase 1 Lines 13–16 of Algorithm 3 produce a prediction of how much resources are needed in the current workload. These equations are experimentally formulated. Their values are used to apply the second phase to produce the near optimal number of hosts needed to serve the current workload within the acceptable high utilization level.

Algorithm 3 Host-Level Resource Prediction (H-LRP) 1: procedure SSA- Predict 2: 3:  Hosts Specifications 4: H N ← Total Number of Hosts 5: H R ← Total Number of RAM per Host 6: HC ← Total Number of CPU per Host 7: 8:  VMs Specifications 9: V M N ← Total Number of VMs 10: V M R ← Total Number of RAM per VM 11: V MC ← Total Number of CPU per VM 12: 13: R AM N eeded H osts ← H N × H R × Sa f eZ one R 14: C PU N eeded H osts ← H N × HC × Sa f eZ oneC 15: R AM N eededV Ms ← VN × V R × Sa f eZ one R 16: C PU N eededV Ms ← VN × VC × Sa f eZ oneC 17: 18:  Needed RAM 19: R AMU sage Pr ediction ← R AM N eededV Ms ÷ R AM N eeded H osts 20: 21:  Needed CPU 22: C PUU sage Pr ediction ← C PU N eededV Ms ÷ C PU N eeded H osts 23: 24:  Predicted Number of Hosts 25: #N eeded H osts ← (max (R AMU sage Pr ediction × H N , C PUU sage Pr ediction × H N ))

Cluster Comput Fig. 3 ACCRS: multi-level resource scaling

– Phase 2 The ACCRS predicts the number of hosts based on the maximum number (see line 25 of Algorithm 3) between the two numbers produced by lines 19 and 22 of Algorithm 3. Our experimental results show that the system operational safe zone is determined by an approximate utilization between 70 and 80%. Moreover, the upper border of the safe zone is determined to be about 86, 80, and 63% utilization level for all of RAM, CPU, and bandwidth respectively. 3.4 VM-level resource scaling (VM-LRS) Unlike the H-LRS and its GCM, this component has a Local Cloud Manager (LCM) that performs resource scaling based on the utilization level of the VM itself. Its main objective is to reconfigure (i.e. scale) the VMs’ specifications in order to cope with the dynamic changes in the workload. For example, the VM might have a light workload whereas it is configured to handle a heavy workload. By dynamically changing the VM’s resource configurations, ACCRS adds more VMs to the same active host without the need to employ new hosts. As shown in Fig. 3, GCM administers the total system flow whereas LCM is responsible for monitoring the VMs and the cloud user’s demand (utilization level on the VM itself). Algorithm 4 presents the VM-LRS resource scaling that decides which VM will be scaled down in order to free more resource. This approach increases the system throughput without the need for extra new resources; with a slight increase in power consumption. Algorithm 4 VM-Level Resource Scaling 1: procedure VM- LRS 2: for each V Mi ∈ V Ms do 3: if Actual Utilization of V Mi > 65% then 4: if U tili zation ∈ 80% − 86% of Reserved Resources then 5: Do Not Scale This VM 6: else 7: Decrease V Mi Reserved Resources 8: else 9: Decrease V Mi Reserved Resources

4 Experimental and simulated results 4.1 Experimental setup CloudExp [28] is built on top of CloudSim, which is a cloud computing modeling and simulation tool [29]. CloudExp is a rich, comprehensive, yet simple easy-to-use and efficient cloud computing modeling and simulation toolkit. It adds new features such as the support for different types of workloads. Users can build cloud infrastructure and customize all its aspects from the host processing nodes to the network topology. It also allows users to integrate SLAs and other business aspects; it includes an extensive workload generator capable of representing real world cloud applications accurately. It allows users to comprehend the different cloud system components and their roles in the whole system. Users can modify various components and their parameters, run simulations, and analyze results. Hence, CloudExp’s modularity in design allows users to integrate new components and extend existing ones easily and effectively. We based our workload on Rain workload generator [30]. Each user (or a task) is assigned to a certain generator and a thread that is executed in the assigned VM. When the thread finishes executing, it generates a summary (e.g. status, execution results, etc.). The experiments are kept for 48 h with a space-shared VM allocation policy. Table 1 shows the system setup specifications that are used in the experiments. Our experimentation setup consists of 10 hosts with fixed resources. ACCRS prediction algorithm can predict the near optimal number of hosts that can run the current workload. Algorithms 1,2, and 3 allow us to predict the number of hosts that are needed in order to minimize power consumption and get the optimal utilization level by returning the system to its safe zone. 4.2 Results The efficacy of ACCRS framework has been simulated entirely in CloudExp [28]. The experiments consist of creating multiple cloud environments to simulate the heavy and light workloads. The setup contains several types of workload

123

Cluster Comput Table 1 System’s specification

Component

RAM

CPU (MIPS)

Bandwidth

Host

4.5 GB

5000 (Dual Core)

1,000,000

VM

Random (512–1024) MB

2000(1Core)

Random (100,000–150,000)

100

3.6

100

3.4

80

/1 4 50 /1 5 50 /1 6 50 /1 7

50

3

0

1

2

0

2.6

/1

50

50

40

/1

35

50

33

/1

25 30 VMs

50

22

40

20

/1

15

50

10

Threshold 2 Threshold 1 BW CPU RAM

10

0

20

35

2.8

33

20

40

30

3

25

40

60

22

3.2

20

60

15

Power

Utilization (%)

Utilization (%)

80

Power / Watt (x10 4 )

BW CPU RAM

VMs/Number of Hosts

Fig. 4 System state monitoring with increasing workloads

Fig. 6 Increasing demand on cloud (system utilization with resource scaling) 6

3.6

100 BW CPU RAM Power

60

3.2

40

3

20

2.8

Power / Watt (x 105)

3.4

Power / Watt (x10 4 )

Utilization (%)

80

Power Threshold 1 Threshold 2

5.5 5 4.5 4 3.5 3

10

15

20

22

25

30 VMs

33

35

40

50

Fig. 5 System state monitoring with random workloads

35 40

33

30

25

22

/1 50 0 /1 50 1 /1 50 2 /1 50 3 /1 50 4 /1 50 5 /1 50 6 /1 7

2.6

50

0

15 20

10

2.5

VMs/Number of Hosts

Fig. 7 Increasing demand on cloud (power consumption with resource scaling)

4.3 Host-level resource scaling (H-LRS) patterns tested on stable and unstable environments (with faults). A stochastic model is applied to simulate different scenarios of cloud environments, users, and user requests (workloads). Different types of cloud environments are used. Figure 4 presents system normal flow with increasing workload. The RAM, CPU, bandwidth utilization and power consumption increase as the workload increases overtime. Figure 5 presents the cloud system behavior with changing workloads overtime. The purpose of this experiment is to show how the system utilization and power consumption increases or decreases in relevance to the workloads intensity.

123

Figure 6 presents the system with increasing workload and fixed number of hosts. The system starts dropping VMs when the system reaches 86% RAM, 80% CPU, and 63% bandwidth utilization. This may present a possible fault or an over-utilization state. Consequently, the system state diverges from the safe zone and ACCRS needs to scale up resources. By increasing number of hosts and redistributing the workloads, the system starts to return back to its normal state (safe zone). On the other hand, Fig. 7 shows the corresponding increase in power consumption, which is a trade off that

Cluster Comput 350000

100

90

60

200000

50 150000

40 30

100000

300000 80

Utilization (%)

Utilization (%)

250000

70

90

Power Consumption (Watt)

300000 80

350000

250000

70 60

200000

50 150000

40 30

100000

20

20

50000

50000 10

BW U Power

0

0

35 35/10 35/10 35/10 35/10 35/10 35/10 /1 35 0 35/9 35/9 35/9 35/9 35/9 35/9 35/9 35/8 35/8 35/8 35/8 35/8 35/8 /8

0

RAM U Power

VMs / Number of hosts

VMs / Number of hosts

Fig. 8 RAM utilization level for underutilization state (with resource scaling) 100

0

35 35/10 35/10 35/10 35/10 35/10 35/10 /1 35 0 35 / 9 35/9 35/9 35/9 35/9 35/9 35/9 35/8 35/8 35/8 35/8 35/8 35/8 /8

10

Power Consumption (Watt)

100

Fig. 10 BW utilization level for underutilization state (with resource scaling)

350000

100

100

60

200000

50 150000

40 30

100000

20 50000 10

80 70 RAM U CPU U

60

70

VM Number 50 60 40 50

0

35 35/10 35/10 35/10 35/10 35/10 35/10 /1 35 0 35/9 35/9 35/9 35/9 35/9 35/9 35/9 35/8 35/8 35/8 35/8 35/8 35/8 /8

0

CPU U Power

90 80

VMs / Number of hosts

30 20

System Throughput (VM Number)

Utilization (%)

250000

70

90

Utilization (%)

300000 80

Power Consumption (Watt)

90

40 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18

Time

Fig. 9 CPU utilization level for underutilization state (with resource scaling)

Fig. 11 Fault injection in stable environment

we are willing to pay for in order to prevent any possible SLA violation. Figures 8 and 9 present the RAM and CPU utilization level in the under-utilization state with a light workload. When host’s allocated resources are more than what is needed by the current workload, the system works perfectly but with a low utilization level and high power consumption. In this case, the ACCRS scales down the system resources (i.e. number of hosts) in order to increase the utilization level and reduce the power consumption. Figure 10 presents the bandwidth utilization level when dealing with light workload. ACCRS reduces the power consumption by removing hosts (i.e. scaling down) and redistributes the workloads into other hosts. Approximately, we can save up to 8% in power consumption by running 9 out of 10 hosts. Also, we can save up to 22% in power consumption by running 8 out of 10 hosts for the same workload. These figures ensures that no further scaling down can be made in resources when the 8 hosts are reached;

it is when system utilization reaches its near maximum. Any further resource scaling will diverge the system from its safe zone; unless there is a change in the workload. In Figure 11, we injected faults (i.e. infected hosts at time frames 9–11). Then the throughput started to go down and the utilization level started to go up. Injecting faults is achieved through reducing the system’s RAM by 75, 50 and 25% for each time frame. After locating the infected host, it is replaced by a new one in order to bring system’s state back to normal. As shown in Fig. 11, the system in time frames 13–18 will be back to its normal state. In our experimentation setup, the system can handle up to 50 VMs. In another experiment, we used 60 VMs on the same system specifications used above. In this experiment, the system needed to queue the extra 10 VMs until resources are available. Table 2 presents the time and power consumption after scaling up resources or queuing extra VMs. It is clear that the ACCRS framework provides 20.5% power sav-

123

Cluster Comput

5 Conclusion and future work

Table 2 Resource scaling versus queuing ACCRS

Queue time

Gain (%)

Power (W)

42100

52300

20.5

Time (min)

260

285

5.2

reserved CPU CPU Reserved RAM RAM

100 95 90

Utilization (%)

85 80 75 70 65 60 55 50

This paper presented on-demand Autonomic Cloud Computing Resource Scaling (ACCRS) framework for cloud computing infrastructures. By applying global and local ACCRS on a cloud system, we can increase system availability and reliability, reduce power consumption, and optimize utilization rate within a safe zone. ACCRS scales up or down the number of physical hardware (hosts) and VMs and provides an early detection for hardware failure. This keeps the system running in a near optimal power consumption and performance safe zone. One drawback of this work is the use of simulation models to conduct the experimental results and the evaluation parts. As future work, we are aiming at applying ACCRS approach in a real cloud infrastructure that will enable us to evaluate our simulated results with the ones generated from the real environment.

45 40 1

2

3

4

5

6

7

8

9

10

Time

References Fig. 12 Power consumption before and after the VM-LRS is applied 90 85 80

Utilization (%)

75 70

Old RAM Old CPU New RAM New CPU Target Level

65 60 55 50 45 40 1

2

3

4

5

6

Time

Fig. 13 System throughput before and after the VM-LRS is Applied

ing comparing with the queuing technique. It also reduces the execution time by 5.2%. 4.4 VM-level resource scaling (VM-LRS) Figure 12 shows the user’s allocated RAM and CPU in one host and the actual usage for them in a single VM created on the same host. The VM is configured with about 30% of extra resources than it is actually needed. By reconfiguring the VM allocated resources, we have extra resources to create new VMs on the same host. Figure 13 presents the new utilization levels after applying ACCRS’s LCM with an increase of the utilization levels of the VM near the reserved utilization level.

123

1. Parashar, M., Hariri, S.: Autonomic Computing: Concepts, Infrastructure, and Applications. CRC press (2006) 2. Al-Dahoud, A., Al-Sharif, Z., Alawneh, L., Jararweh, Y.: Autonomic cloud computing resource scaling. In: 4th International IBM Cloud Academy Conference (ICACON 2016), University of Alberta, Edmonton, Canada, IBM (2016) 3. Jararweh, Y., Al-Ayyoub, M., Darabseh, A., Benkhelifa, E., Vouk, M., Rindos, A.: Software defined cloud: survey, system and evaluation. Future Gener. Comput. Syst. 58, 56–74 (2016) 4. Darabseh, A., Al-Ayyoub, M., Jararweh, Y., Benkhelifa, E., Vouk, M., Rindos, A.: Sddc: a software defined datacenter experimental framework. In: Future Internet of Things and Cloud (FiCloud), 2015 3rd International Conference on, pp. 189–194 (2015) 5. Jararweh, Y.: Autonomic Programming Paradigm for High Performance Computing. PhD thesis, University of Arizona, Tucson (2010). AAI3423763 6. Dai, Y., Xiang, Y., Zhang, G.: Self-healing and hybrid diagnosis in cloud computing. In: Cloud Computing, pp. 45–56. Springer (2009) 7. Bhaduri, K., Das, K., Matthews, B.L.: Detecting abnormal machine characteristics in cloud infrastructures. In: Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pp. 137– 144, IEEE (2011) 8. Alhosban, A., Hashmi, K., Malik, Z., Medjahed, B.: Self-healing framework for cloud-based services. In: Computer Systems and Applications (AICCSA), 2013 ACS International Conference on, pp. 1–7, IEEE (2013) 9. Buyya, R., Ramamohanarao, K., Leckie, C., Calheiros, R.N., Dastjerdi, A.V., Versteeg, S.: Big data analytics-enhanced cloud computing: Challenges, architectural elements, and future directions. arXiv preprint, arXiv:1510.06486 (2015) 10. Islam, S., Keung, J., Lee, K., Liu, A.: An empirical study into adaptive resource provisioning in the cloud. In: IEEE International Conference on Utility and Cloud Computing (UCC 2010), p. 8 (2010) 11. Buyya, R., Garg, S.K., Calheiros, R.N.: Sla-oriented resource provisioning for cloud computing: challenges, architecture, and solutions. In: Cloud and Service Computing (CSC), 2011 International Conference on, pp. 1–10, IEEE (2011)

Cluster Comput 12. Vecchiola, C., Chu, X., Buyya, R.: Aneka: a software platform for.net-based cloud computing. High Speed Larg. Scale Sci. Comput. 18, 267–295 (2009) 13. Hategan, M., Wozniak, J., Maheshwari, K.: Coasters: uniform resource provisioning and access for clouds and grids. In: Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on, pp. 114–121, IEEE (2011) 14. Chaisiri, S., Lee, B.-S., Niyato, D.: Optimization of resource provisioning cost in cloud computing. IEEE Trans. Serv. Comput. 5(2), 164–177 (2012) 15. Dejun, J., Pierre, G., Chi, C.-H.: Resource provisioning of web applications in heterogeneous clouds. In: Proceedings of the 2nd USENIX Conference on Web Application Development, pp. 5–5, USENIX Association (2011) 16. Bonvin, N., Papaioannou, T.G., Aberer, K.: Autonomic sla-driven provisioning for cloud applications. In: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 434–443, IEEE Computer Society (2011) 17. Toosi, A.N., Calheiros, R.N., Thulasiram, R.K., Buyya, R.: Resource provisioning policies to increase iaas provider’s profit in a federated cloud environment. In: High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on, pp. 279–287, IEEE (2011) 18. Juve G., Deelman, E.: Resource provisioning options for largescale scientific workflows. In: eScience, 2008. eScience’08. IEEE Fourth International Conference on, pp. 608–613, IEEE (2008) 19. Kusic, D., Kephart, J.O., Hanson, J.E., Kandasamy, N., Jiang, G.: Power and performance management of virtualized computing environments via lookahead control. Clust. Comput. 12(1), 1–15 (2009) 20. Kee,Y.-S., Kesselman, C.: Grid resource abstraction, virtualization, and provisioning for time-targeted applications. In: Cluster Computing and the Grid, 2008. CCGRID’08. 8th IEEE International Symposium on, pp. 324–331, IEEE (2008) 21. Panda, S.K., Jana, P.K.: Efficient task scheduling algorithms for heterogeneous multi-cloud environment. J. Supercomput. 71(4), 1505–1533 (2015) 22. Meng, X., Isci, C., Kephart, J., Zhang, L., Bouillet, E., Pendarakis, D.: Efficient resource provisioning in compute clouds via vm multiplexing. In: Proceedings of the 7th International Conference on Autonomic Computing, pp. 11–20, ACM (2010) 23. Al-Ayyoub, M., Jararweh, Y., Daraghmeh, M., Althebyan, Q.: Multi-agent based dynamic resource provisioning and monitoring for cloud computing systems infrastructure. Clust. Comput. 18(2), 919–932 (2015) 24. Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The eucalyptus open-source cloud-computing system. In: Cluster Computing and the Grid, 2009. CCGRID’09. 9th IEEE/ACM International Symposium on, pp. 124–131, IEEE (2009) 25. Bhaumik, S.K.: Root cause analysis in engineering failures. Trans. Indian Inst. Metals 63(2), 297–299 (2010) 26. Zhu, Q., Tung, T., Xie, Q.: Automatic fault diagnosis in cloud infrastructure. In: Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on, vol. 1, pp. 467–474, IEEE (2013) 27. I.G.B.S. IBM, “Business strategy for cloud providers. http://www. itworldcanada.com/archive/Documents/whitepaper/ITW157B_ BusinessStretegyForCloudProviders.pdf (2009). Accessed: 1 June 2016 28. Jararweh, Y., Jarrah, M., Alshara, Z., Alsaleh, M., Al-Ayyoub, M.: Cloudexp: a comprehensive cloud computing experimental framework. Simul. Model. Pract. Theory 49, 180–192 (2014) 29. Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A., Buyya, R.: Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provision-

ing algorithms. Software: Practice and Experience, vol. 41, no. 1, pp. 23–50 (2011) 30. Beitch, A., Liu, B., Yung, T., Griffith, R., Fox, A., Patterson, D.A.: Rain: A Workload Generation Toolkit for Cloud Computing Applications. University of California, Tech. Rep. UCB/EECS-2010-14 (2010)

Ziad A. Al-Sharif is currently an assistant professor at Jordan University of Science and Technology, Irbid, Jordan. He joined the Department of Software Engineering in February of 2010. Dr. Al-Sharif received his Ph.D. degree in Computer Science in December of 2009 from the University of Idaho, USA. He also received his MS. degree in Computer Science in August of 2005 from New Mexico State University, USA. His research interests are in digital forensics, cloud computing, software engineering, and collaborative virtual environments. Yaser Jararweh received his Ph.D. in Computer Engineering from University of Arizona in 2010. He is currently an associate professor of Computer Science at Jordan University of Science and Technology, Jordan. He has coauthored about seventy technical papers in established journals and conferences in fields related to cloud computing, HPC, SDN and Big Data. He was one of the TPC Co-Chair, IEEE Globecom 2013 International Workshop on Cloud Computing Systems, and Networks, and Applications (CCSNA). He is a steering committee member for CCSNA 2014 and CCSNA 2015 with ICC. He is the General Co-Chair in IEEE International Workshop on Software Defined Systems SDS-2014 and SDS 2015. He is also chairing many IEEE events such as ICICS, SNAMS, BDSN, IoTSMS and many others. Dr. Jararweh served as a guest editor for many special issues in different established journals. Also, he is the steering committee chair of the IBM Cloud Academy Conference. Ahmad Al-Dahoud is a Ph.D. student at the University of Bradford, UK. He received his Ms. degree of computer science from Jordan University of Science and Technology (JUST). His research interest include cloud and autonomic computing.

123

Cluster Comput Luay M. Alawneh is an assistant professor in the Department of Software Engineering at Jordan University of Science and Technology, Irbid, Jordan. His research interests are software engineering, software maintenance and evolution, and high performance computing systems. Alawneh received a Ph.D. in electrical and computer engineering from Concordia University.

123

Suggest Documents