Document not found! Please try again

Resource management in cloud computing ...

45 downloads 40939 Views 4MB Size Report
One of the key aspect of cloud computing and virtualization is Resource ... Resource management in cloud computing: Taxonomy, prospects, and challenges.
Computers and Electrical Engineering xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Resource management in cloud computing: Taxonomy, prospects, and challenges q Saad Mustafa a, Babar Nazir a,⇑, Amir Hayat b, Atta ur Rehman Khan a, Sajjad A. Madani b a b

Department of Computer Science, COMSATS Institute of Information Technology, Abbottabad, Pakistan Department of Computer Science, COMSATS Institute of Information Technology, Islamabad, Pakistan

a r t i c l e

i n f o

Article history: Received 28 October 2014 Received in revised form 29 July 2015 Accepted 29 July 2015 Available online xxxx Keywords: Cloud computing Resource management Resource allocation Virtual machine migration Hybrid cloud Mobile cloud computing

a b s t r a c t Cloud computing has emerged as a popular computing paradigm for hosting large computing systems and services. Recently, significant research is carried out on Resource Management (RM) techniques that focus on the efficient sharing of cloud resources among multiple users. RM techniques in cloud are designed for computing and workload intensive applications that have different optimization parameters. This study presents a comprehensive review of RM techniques and elaborates their extensive taxonomy based on the distinct features. It highlights evaluation parameters and platforms that are used to evaluate RM techniques. Moreover, it presents design goals and research challenges that should be considered while proposing novel RM techniques. Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction During the past few years, the computational world has evolved tremendously due to constant increase in demand for high-end computational devices [1]. This evolution resulted in the emergence of new computational paradigms, such as cluster computing, grid computing, and cloud computing. Among these paradigms, cloud computing has gained significant popularity [2]. Cloud computing is offered in three forms, namely: (c) public cloud, (b) private cloud, and (c) hybrid cloud [1,3,4]. Moreover, different set of services are provided that can be broadly placed into three categories, namely: (a) Software as a Service (SaaS), (b) Infrastructure as a Service (IaaS), and (c) Platform as a Service (PaaS) [1]. Cloud computing is based on Service-Oriented Architecture (SOA) that uses concepts of virtualization and distributed computing [1]. In cloud computing SOA, access to shared pool of resources is provided via network and resources on hand can be configured according to users’ demand [3]. One of the key aspect of cloud computing and virtualization is Resource Management (RM). RM is a process that deals with the procurement and release of resources [2]. Virtualization techniques are used for flexible and on-demand resource provisioning [3]. To do so, for each received task, either a new VM is created or it is placed on the existing VM of the same user [2]. Once the task is completed, all the acquired resources are released which become parts of the free resource pool. Resource assignment is performed on the basis of Service Level Agreement (SLA) that is agreed between the service provider

q

Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. D. Gomes.

⇑ Corresponding author.

E-mail address: [email protected] (B. Nazir). http://dx.doi.org/10.1016/j.compeleceng.2015.07.021 0045-7906/Ó 2015 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

2

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

and the customer. SLA contains details of the service level that is required by a tenant. Moreover, it contains information about the payment process and SLA violation penalty [5]. As the cloud computing paradigm is market-oriented, the traditional system-centric RM architectures are unsuitable for such systems [5]. System-centric architectures do not provide any incentives to the service providers and deal all requests with equal importance. Consequently, explicit market-oriented techniques are devised for cloud environments that are capable to cater on-demand resource provisioning and resource sharing among users. Such RM techniques provide economic incentives to both the service providers and customers. Service providers share their resources among multiple tenants on pay as you use basis. In this work, we intend to provide an insight, and highlight issues in resource management techniques that need to be improved. To the best of our knowledge, currently, there exist only one survey paper [6] on this topic that lacks taxonomy of RM techniques, detailed working of each technique, critical discussions, overview of evaluation parameters, design goals, and research challenges that should be considered while designing RM techniques. Although, a few related surveys [7,8] highlight research issues and design goals with reference to a single aspect of RM techniques, there is a need to conduct a comprehensive study. This study presents taxonomy of RM techniques, their detailed working, critical discussions, evaluation parameters, evaluation platforms, design goals, and research challenges. For this study, we studied over 250 research papers, and selected over 100 papers. The selected papers provide a representative sample of the most significant work. Mixed-method systematic review strategy was used for the selection of paper, and at-least five papers were selected for each category. Moreover, only those papers were considered, which are published in high reputed journals and conferences. The keywords and strings used for searching of the papers are, ‘‘resource management + cloud computing’’, ‘‘resource allocation + cloud computing’’, ‘‘energy-aware + resource management + clouds’’, ‘‘SLA-aware + resource management + clouds’’, ‘‘network load + resource management + clouds’’, ‘‘mobile cloud + resource management’’, and ‘‘hybrid cloud + resource management’’. The major contributions of this study are as follows: 1. It presents a taxonomy of RM techniques that is based on the major RM metrics. Research metrics that are used to provide RM solutions include energy efficiency, SLA-awareness, network load minimization, load balancing, profit maximization, hybrid cloud computing, and mobile cloud computing. 2. It presents detailed working of the existing techniques and highlights the research challenges that are addressed in each technique. Moreover, it highlights the issues that are left unconsidered while devising the techniques. 3. It provides discussion on various performance evaluation parameters and provides the definitions and equations of commonly used parameters. In addition, it provides a review of the parameters that are used by the researchers to evaluate their RM techniques. 4. It highlights various evaluation platforms that are used for the evaluation of RM techniques. 5. It provides detailed discussions/recommendations on various design goals that must be considered while designing a new RM technique. It also highlights the importance of each design goal, and discusses its importance for the better performance of RM technique. 6. Lastly, it highlights open research issues, which are of high importance and requires attention. The rest of the paper is organized as follows. Section 2 presents the brief overview of existing surveys. Section 3 illustrates taxonomy of RM techniques. Section 4 presents the major performance evaluation parameters. Section 5 provides an overview of the evaluation parameters and platforms. Section 6 provides discussions on design goals. Section 7 highlights various research challenges, and Section 8 presents conclusions.

2. Existing surveys In the past few years, RM techniques for cloud environment received great attention from the researchers. Various RM techniques are devised that consider research challenges that can hinder smooth provisioning and maintenance of cloud resources. However, very limited survey/review articles are available to facilitate new researchers in understanding the key concepts, and working of the existing techniques. In this section, we discuss various studies that somehow intend to provide insight on RM in cloud environments. Jennings and Stadler [6] provide a survey on RM in cloud computing that discusses the scope of RM, various types of resources, enabling technologies, functions of RM, and various RM techniques. The authors also discuss workload management and some research challenges. In [7], Lin discusses working of various resource scheduling algorithms, and classifies the algorithms based on multiple factors, such as time, cost, and energy. Their main idea is to assist users in selecting suitable scheduling algorithm based on the type of service they want to use. Rygielski and Kounev [9] discuss various network virtualization techniques and their impact on QoS-aware RM. This study presents the impact of virtualization techniques on the performance of data centre networks. Various research issues are conferred that are faced during the performance modeling and designing of the RM techniques. In [10], the authors present architecture, principle, and an algorithm for energy-efficient cloud environments. In addition, it provides a survey and research challenges of energy efficient resource allocation techniques. Marojevic et al. [8] discuss numerous RM techniques that manage resources in Software Defined Radio (SDR) Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

3

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx Table 1 Summary of RM techniques. Technique

Energy efficiency

SLAawareness

MBFD [10] PCABFD [14] Addis et al. [15] Ardagna et al. [16] Ardagna et al. [17] Wei et al. [18] Ergu et al. [19] García et al. [20] CA-PROVISION [21] CMRM [22] Ali et al. [23] FRA [24] EALARM [25] Al Sallami and Al Daoud [26] LBMM [27] Ye and Chen [28] Jung and Sim [29] LPBP [14] CFMV [30] Malik et al. [31] Grewal and Pateriya [32] Choudhury et al. [33] Altmann and Kashef [34] PANDA [35] SMDP [36] Ge et al. [37] Ikram et al. [38] O’Sullivan and Grigoras [39]

U U U U

U

Load balancing

Network load minimization

Price/revenue handling

Hybrid clouds

Mobile cloud computing

U U U U U U U U U U

U U U U U U U U

U U U U U

U

U

U U

U

U U U U U

clouds. The SDR cloud is a large scale wireless cloud that provides services to users. In this study, authors discuss SDR cloud RM techniques. Kansal and Chana [11] and Shahapure and Jayarekha [12] present surveys on load balancing RM techniques. They discuss various load balancing strategies and evaluation parameters. Most of the existing studies focus on a single metric, such as QoS, energy consumption, software defined radio based clouds, and load balancing. None of the studies provide a comprehensive taxonomy, detail working, and critical discussion of the existing techniques. Moreover, the aforementioned studies do not provide discussion on evaluation parameters, evaluation platforms, design goals, and research challenges. Therefore, in this study, we focus on the highlighted aspects. 3. Resource management techniques RM is considered as one of the important aspects of cloud computing to provide performance isolation and efficient use of underlying hardware [6,13]. In cloud environments, almost all the resources are virtualized, and shared among multiple users [2]. Virtualization brings some challenging tasks related to resource management. Main research challenges/metrics of RM are energy efficiency, SLA violations, load balancing, network load, profit maximization, hybrid clouds, and mobile cloud computing (MCC). As shown in Table 1, significant research is conducted to address these metrics and offer various solutions. The proposed RM metrics are accompanied by multiple new challenges, for instance, energy efficiency and network loads introduce SLA violations. Similarly, reduction of SLA violations increase energy consumption and hinder the profit increase. Moreover, RM designed for public or private clouds cannot be directly applied to hybrid and mobile clouds, due to the difference in service architectures. It is evident from Table 1 that some of the researchers provided multi-criteria optimized solutions. Multi-criteria optimization tends to provide best possible solution by optimizing multiple RM metrics simultaneously. Moreover, multi-criteria optimization can add new dimensions and challenges in RM if the metrics under consideration are conflicting. It is because conflicting metrics are inter-dependent and optimizing one metric degrades the performance of another metric. A few examples of conflicting metrics includes, energy efficiency and SLA violations, energy efficiency and network load, and SLA violations and profit maximization. Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

4

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

Fig. 1. Taxonomy of RM Techniques.

Fig. 1 shows the taxonomy of RM techniques based on the above discussions and table. The techniques are classified into multiple categories based on the research problem and used metric. In case of multi-criteria optimized solutions, metric category is decided based on the highly focused metric. In the below sub-sections, we present the detailed working of the selected techniques, and highlighted their pros and cons. Moreover, we highlight major contributions of all techniques and provide recommendations for further improvement. 3.1. Energy-aware RM techniques Energy efficiency is one of the core issues that need to be addressed in the cloud systems [40]. It is because energy consumption not only increases the power expenses of service providers, but it also plays a role in increasing the CO2 emission [41]. Moreover, machines and fossil oil used to generate power are main contributors in CO2 emission. One potential way to solve this issue is the workload consolidation, where energy consumption is minimized by consolidating more workload on less number of servers. VMs hosted on lightly loaded servers are migrated to comparatively higher workload servers, and the idle servers are switched off. Some of the existing energy-aware RM techniques are discussed in the succeeding section. Modified Best Fit Decreasing (MBFD) [10] is based on Best Fit Decreasing (BFD) algorithm that is used for bin-packing. MBFD algorithm sorts the VMs in descending order based on their CPU requirements. After sorting, all VMs are assigned to the hosts based on the power model. Power model checks the change in energy consumption (Pu) of the servers, and places VM on a server that shows minimal energy consumption change. Moreover, it uses dynamic threshold values to keep the usage of the server within range and to avoid SLA violations. Lower threshold value indicates that a server is underutilized, whereas upper threshold value warns the service providers that SLA can be violated. If any threshold is breached, then the VM or set of VMs is migrated to other server(s). Traditional BFD assigns the VMs either to servers having minimum computational capacity, or to servers having maximum unutilized capacity. The issue with BFD algorithm is that it does not considers energy consumption of servers during VM placement. Hence, a VM may be placed on a server that has low computational capacity but consumes high power. Hence, BFD algorithm is not energy efficient and requires optimization. To address this issue, authors in [14] present a BFD based energy efficient solution. Power and Computing Capacity-Aware BFD (PCA-BFD) addresses the highlighted issue by assigning VMs to a server that has the highest computational capacity. By doing so, PCA-BFD minimizes power consumption and the number of utilized servers. Addis et al. [15] focus the PaaS architecture of the cloud systems. Their main focus is to minimize energy consumption and response time, and maximize availability. The proposed RM technique is based on the distributed hierarchical framework discussed in [42], in which non-linear optimization of resources is done against various timescales. As shown in Fig. 2, central manager (CM) resides at the root of hierarchy and performs its basic tasks after every 24 h. Moreover, it performs class partitioning and server partitioning. Server partitioning divides available resources of a server into multiple VMs, whereas, each received task is placed on a specific set of servers according on its class. On the other hand, the jobs performed by an application manager (AM) are divided into two categories, represented as T2 and T3. T2 jobs are performed after every hour, whereas T3 jobs are performed after every 15 min. Hourly jobs performed by AM are server switching and application placement, and jobs performed after every 15 min are load balancing, capacity allocation, and frequency scaling. In [16], the authors predict workload and allocate resources dynamically to the running applications. A central controller is used to allocate and manage all the resources. RM decisions are made on hourly basis to minimize the overhead incurred by the decisions. RM decisions include server power up, shutdown, and VM migration from one server to another, and consume considerable amount of energy and network resources. Hence, these operations cannot be performed very often. All of the above discussed techniques are devised to minimize energy consumption on the service provider ends. PCABFD [14] are only focusing on energy consumption and do not consider remaining RM issues such as, SLA violations, network load, profit maximization, and load balancing. However, MBFD [10] also minimize SLA violations that are incurred due to workload consolidation. Moreover, Addis et al. [15] and Ardagna et al. [16] focus on revenue maximization along with energy consumption. Though aforementioned techniques are handling multiple RM issues but still there is a scope for improvement. It will be further challenging if we consider conflicting issues while providing our solutions. Conflicting issues can be optimization of energy consumption and SLA violations, energy consumption and network load, and energy consumption and Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

5

Fig. 2. Distributed hierarchical framework.

load balancing. However, multi-criteria optimization needs to be employed with care, as over emphasizing any specific metric may result in unexpected outcomes. For example, if we try to minimize energy consumption by using workload consolidation, it will increase network load as we have to perform migrations. 3.2. SLA-aware RM techniques Quality of service (QoS) is one of the important issues that should be considered while managing resources [9]. As service providers intend to provide best suitable performance to their clients, a service level agreement (SLA) is signed between both the parties. SLA contains information regarding the required level of service and the price. SLA also contains a penalty clause that is enforced in case of agreement violation. The service providers should avoid violations and keep a check while providing services to the customers. To address this issue, various researchers provide different solutions, a few of which are discussed as follows: Ardagna et al. [17] propose capacity allocation algorithms to ensure SLA and handle fluctuating workloads. The proposed algorithms interact with geographically dispersed resource controllers, and can also redirect the load whenever congestion is encountered in the network. Moreover, if required, an application is run on multiple VMs and workload is evenly distributed on the VMs. As the workloads are fluctuating, a workload predictor is used to forecast future workload requirements, and capacity is changed on the basis of resultant predictions. Furthermore, SLA violations are avoided by keeping the response time low during the inter-VM communications. In case of higher response time, VM is migrated to another physical machine. The main goal of the proposed algorithms is to cater to fluctuating workloads while keeping the SLA violations low. In [18], the main focus of research is to minimize the chances of resource under/over provisioning caused by reactive RM. The proposed is based on the framework discussed in [43], which allocates resources to different tasks. The algorithm uses different agents for the provisioning and termination of resources. Moreover, it has a prediction module that predicts future needs of services on the basis of. It also has the ability to manage resources among different services of workflows to avoid unnecessary allocation and de-allocation of resources. In [19], a task-oriented resource allocation technique is proposed that uses pair wise comparison matrix method and analytical hierarchy method to rank resource allocation process. Based on these methods, available resources and user Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

6

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

preferences are identified, and resources are allocated to the tasks based on the assigned ranks. Moreover, a method is proposed based on induced bias matrix to find the inconsistent elements and improve the consistency ratio. In [20], authors have introduced a platform named Cloudcompass for PaaS based clouds. Cloudcompaas is a SLA-aware platform that manages the resources throughout the resource lifecycle. Moreover, an extension for SLA has been added to this platform by keeping in mind the web based architecture of cloud computing. Furthermore, Cloudcompass provides an SLA model that deals with higher-level metrics and can easily handle the flexible needs of multiple users. Additionally, a framework has been integrated that dynamically minimizes the SLA violations by elastically configuring the resources. Main focus of the aforementioned techniques is to avoid SLA violations. Ardagna et al. [17], Wei et al. [18], Ergu et al. [19], and García et al. [20] are providing exclusive solutions for SLA related issues. Therefore, multi-objective solutions, those not only focus on SLA violations but also consider other research challenges, should be provided. Multi-objective optimization like SLA violations and energy minimization, SLA violations and profit maximization, and SLA violations and network load can be really interesting to handle. Moreover, the performance of above mentioned techniques needs to be further improved as the results show that SLA violations can still be recorded. 3.3. Market-oriented RM techniques Cloud computing is market-oriented paradigm and main objective of service providers in this paradigm is to increase their profit. For this purpose, service providers use multiple types of solutions such as workload consolidation, SLA violation avoidance, network virtualization, and minimization of network load. All these solutions have some related issues. Therefore, there is a need to develop solutions that should be market-oriented and should be beneficial to service providers. In this section, we will be discussing some of the latest market-oriented solution that intends to enhance service providers’ revenue and profit. An auction based resource allocation technique has been proposed in [21]. An argument has been given that resource allocation based on auction can be more beneficial for service providers as compared to fixed-price resource provisioning. In proposed technique, user has to bid for the advertised resources that are only assigned if she wins the auction. Moreover, user demand is also considered during resource provisioning. Prior to proposed technique, user demand has not been considered by auction-based technique. Authors in [22] have proposed a multi-layer RM technique for cloud computing. In this technique, SaaS provider provides services to users who are using the services of IaaS service providers. Users pay to SaaS provider for the services which are provided to him/her. For calculating the amount that user has to pay, a unique optimal function is used t is based on the level of satisfaction user shows. SaaS provider on the other hand acts as both service provider and customer at the same time. It provides services to user and uses the resources of IaaS provider. SaaS provider hosts its VMs on the resources of IaaS provider. IaaS provider uses optimal resource assignment technique to increase its own revenue. Moreover, it also updates the prices regularly based on the resource demand and sends these prices to SaaS provider. Main purpose of authors in [23] is to maximize the revenue of service providers by meeting SLA and minimizing energy consumption. For this purpose, a framework has been proposed in which each server has a dynamic voltage/frequency scaling (DVFS) module. It is assumed that a server cannot be switched on or off and there is a common cost of VM migration. Furthermore, a hybrid optimization technique has been proposed which intends to solve issues related to load balancing, DVFS, resource allocation, and service placement based on VMs. In [24], two SLA models are proposed for cloud computing system. Gold SLA model is based on average response time, maximum arrival rate for the client request, reward value for each serviced request, and a penalty if the average request response is missed. Second SLA model is Bronze SLA which specifies maximum arrival rate and utility function which calculates the profit for each request depending on response time. Afterwards, a technique named force-directed resource assignment (FRA) is proposed for optimization problem. In this technique, an initial solution is generated by processing clients on greedy order and assigning them resources one by one. Afterwards, rate is fixed and resource sharing is improved by optimization steps, finally resource consolidation technique inspired from force-directed search is applied. Techniques presented by authors in [21,22] are only optimizing revenue and cost of service. On the other hand, Ali et al. [23] are maximizing the profit of service provider and minimizing user’s cost of service along with minimization of energy consumption. Minimization in energy consumption can have direct impact on the profit of service provider as it will minimize electricity bills. However, it may have a negative impact if energy minimization leads to SLA violations and service provider has to pay a penalty. Therefore, while optimizing profit and energy consumption, one must also consider SLA violations. Moreover, authors in [24] are considering SLA violations along with profit maximization but they are not considering other research challenges. Therefore, aforementioned techniques can be further improved by introducing metrics like SLA violations, network load, and energy consumption. 3.4. Load-balanced RM techniques Load balancing is a term used to interpret the concept of sharing workload among multiple resources [11,12]. In computing environments, another important feature of the system is load balancing [5]. As shown in Fig. 3, tasks are migrated between physical machines after the application of load balancing algorithm. Moreover, there is a need to devise such type of RM techniques that should allow resources to share their workloads. Researchers have been working to address this issue Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

7

Fig. 3. Load balanced VM placement.

and they have come up with different ideas. Some of well-known and most recent techniques have been discussed in this section. A novel model has been proposed in [25] that dynamically balances load among cloud resources. The resource utilization of each server is checked. If resources of a server are overloaded then workload of that server is shifted to the server that is underutilized. Whereas, when two machines are being underutilized, workload of one machine is transferred to other machine and it is turned off. In case of workload shifting failure, system can be scaled-up or down accordingly. If shifting failure is due the non-availability of free resources, then the system is scaled up by activating more machines. Conversely, if the migration fails, the newly switched on machines is switched off again. For the better management of servers and VMs, a physical layer has also been developed in tree structure of distributed hash table (DTH). In [26] an artificial neural network (ANN) based load balancing technique has been proposed. To distribute equal load among all the servers, this technique uses back propagation algorithm. Demand of each user is predicted and resources are allocated according to the predicted demand, but the active servers at any given time depend on the demand of users at that specific time. As a result, active servers are minimized which leads to low energy consumption. Furthermore, relation between energy consumption and carbon emission is highlighted in this paper. Min–Min and Max–Min algorithms have been proposed for RM in [27]. Min–Min algorithm starts with the tasks that have not yet been assigned to servers. First it calculates the completion time for all the tasks on different servers. Task with minimum completion time is selected and is assigned to corresponding server. After this assignment, the task is removed from unassigned tasks list and calculated completion time of remaining tasks is updated on the server that is hosting a task. This process keeps on repeating until the list is empty. On the other hand, Max–Min algorithm works in the same way with only one difference, that is, it selects a task with maximum completion time. Authors in [28] provide a model that considers n-dimensional set of resources of each VM, and agents are used to monitor these resources. Along with the model, two game theory based techniques are devised that tend to balance load, and allocate resources respectively. Load balancing technique minimizes the load of a VM if it increases in any dimension, whereas, bin packing technique is used by resource allocation technique for VM placement. Moreover, pure Nash equilibrium [44] is investigated for aforementioned problems and its ineffectiveness is calculated in terms of price of anarchy and price of stability. Price of stability and least price of anarchy for both games are 1 and n respectively. Whereas, maximum price for server load balancing game is n + 1  n/m and for VM placement game it is n + 16/5, where m is the number of servers. Techniques discussed in this section are balancing load among various physical machines. Load balancing along with energy consumption or network load can provide an interesting solution. As in energy efficient systems, number of active physical machines is minimized by the help of workload consolidation. Therefore, applying load balancing in such scenario can be quite difficult due to the highly populated physical machine. On the other hand, load balancing requires migration of tasks between hosts, and this can lead to increase in network load. Thus, while proposing a load balanced solution, one must consider the implementation scenario, and carefully select additional RM metrics. 3.5. Network load aware RM Network load is the amount of traffic that is passing through a network at some specific moment in time. In cloud computing, network load can incur due to information sharing between resource managers, VM placement, inter-VM communication, and VM migrations. High network loads result in performance degradation of the system as the service provider will have to wait for VM placements and critical inter-VM communication will be delayed. Therefore, there is a need Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

8

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

to minimize the amount of traffic that passes through a network. The above mentioned activities that result in increase of network load cannot be stopped; however, we can minimize the traffic generated by these activities. In this section, we will discuss some state of the art techniques that try to minimize network load. In [29], a technique is provided for service providers to select a data centre the can better serve the user’s interest. A user sends a request to service provider and resources are assigned on the basis of adaptive resource allocation algorithm. Aforementioned algorithms selects a data centre on the basis of two phenomena. First one these phenomena is distance between user and the data centre, whereas, second one is the workload on the specific data centre. Once the data centre is decided, task is placed on one of server with help of VM. In [14] authors propose a lazy approach that only perform most beneficial migrations to keep the VM placement optimal. Algorithm keeps the number of migrations low by performing slow transition from existing VM placement to new one. In first step, Low Perturbation Bin Packing Algorithm (LPBP) sorts the list of servers in descending order on the basis of energy consumption. After sorting, server at the top of list is offloaded by migrating a set or even all of its VMs to a server(s) with lowest energy consumption. However, migrations are only performed when all the VMs can be offloaded from the server. If a server cannot be fully offloaded then no migration is performed, server is removed from the list, and algorithm selects the next server in the list. In [30] a scenario is considered in which clusters of VMs are placed on a cluster of servers. Each virtual cluster (VC) provides only a specific type of service to the users, and it is the responsibility of whole VC to provide agreed QoS. However, VM’s resource requirement may change that can help in consolidating VMs on fewer servers. Moreover, a genetic algorithm is provided for VM consolidation that creates an optimized system state which represents VM-server mapping, and assigned resources to a VM. In case of change in VM’s resources allocation a new system state is created, and system must shift from older state to newer one. However, transition from older to newer system state results in system overhead. To tackle this problem, an algorithm is provided that calculates transition time and minimizes the reconfiguration cost. In case of higher latency, inter-process communication takes more time and ultimately increases the execution time of a distributed application. Therefore, a model is proposed in [31] to minimize the inter-process communication latency for distributed applications. The model uses a scheduler that assigns machines to applications, and creates groups on the basis of network latency. Moreover, VMs with higher inter-process communication are placed within same group to minimize overall latency, and execution time. Some of the techniques discussed in this section are handling more than one RM research metrics. However, Jung and Sim [29], and Malik et al. [31] are only minimizing network load, and can be further extended to introduce multi-metric optimization. However, LPBP [14], and CFMV [30] are minimizing network load along with energy consumption. In such kind of techniques, optimal solution can be hard to find due to the conflicting nature of RM metrics. Therefore, a set of solutions are identified, and based on defined criteria best suitable solution is selected.

3.6. RM techniques for hybrid/federated cloud RM in hybrid cloud is another topic that is under consideration by the researchers. As shown Fig. 4, hybrid cloud is the combination of private and public clouds. In such cloud environments, an organization owns a private cloud, and offers its resources to in-house users. However, internal resources are not sufficient to accommodate the workloads then organization uses the services of public clouds. The major decision to be made in these environments is when to use public cloud resources. To address this issue researchers have proposed their solutions that are discussed in this section. The authors in [32] propose a rule based resource manager for hybrid clouds. User’s requests are categorized into two types, such as, critical data/tasks, and secondary data/tasks. Critical data/tasks are given higher priority, whereas, secondary data/tasks have low priority. Moreover, to provide security to critical data/tasks, they are hosted on private clouds. However, task with low priority or secondary data can use both public and private cloud resources. But public cloud resources are only used when all the resources of private cloud are fully exhausted. In [33], authors propose an architecture that uses both private and public clouds. In given architecture, requests for VM provisioning are divided into three priority classes which are named as low, medium, and high. In low priority class, user can only request for one VM, whereas, in case of medium priority, two VMs can be requested. However, in both cases, VMs are placed on private cloud. In case of high priority class, three VMs are created, and from those three VMs, two are placed on private cloud, whereas, third VM is placed on public cloud. By using this technique resources of private cloud are managed and burden is shared between public and private clouds. In [34], authors present a cost model for federated clouds that considers all the cost factors, such as, electricity, hardware, software, labor, business premises, and service. Moreover, factors also include traffic and deployment related costs, such as, service placement vector, data traffic matrix, traffic cost rate matrix, and number of deployments. By using aforementioned factors cost model tends to minimize the total spending on the computational services taken from public clouds. Furthermore, model facilitates flexible service placement in federated clouds. In [35], authors present a framework named Pareto Near-optimal Deterministic Approximation (PANDA) for the scheduling of Bag-of-Tasks (BoT) applications. BoT applications are based on extremely parallel independent tasks [45]. PANDA schedules incoming BoT application on the resources of both private and public clouds with the help of Fully Polynomial-Time Approximation Scheme (FPTAS) [46]. FPTAS generates a Pareto-optimal schedule that provides a best Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

9

Fig. 4. Hybrid cloud.

trade-off point between cost and performance. Moreover, ISOMAP (Isometric Feature Mapping) [47] is used to check the practicality of presented framework. Techniques discussed in this section provide solutions for hybrid cloud environment. As hybrid clouds is the combination of private and public clouds, therefore, RM techniques for such clouds have to decide that when public cloud resources will be used? Grewal and Pateriya [32] and Choudhury et al. [33] provide solution for aforesaid question and make the decision between the use of private and public cloud resources. However, authors are not considering metrics discussed in above discussed sections. On the other hand, Altmann and Kashef [34], and PANDA [35] are handling RM metrics such as, SLA violations, pricing, and revenue. None of aforementioned techniques provides a multi-criteria optimization. 3.7. RM techniques for mobile clouds In recent times, mobile cloud computing (MCC) is evolved by bringing mobile devices into the domain of cloud computing [48]. Traditionally, MCC is based on client-agent based architecture in which mobile devices can only use the resources that are available in cloud [49]. However, latest research studies are working on sharing the resources of mobile devices as recent devices have ample resources. Such kinds of architectures for resource sharing are known as cooperation-based architectures. Some of the proposed RM techniques for MCC are discussed in this section. According to [36], an MCC is combination of multiple domains, and each domain comprises of cloud resources. The major issue that needs to be addressed is the handling of cloud resources present in these domains, so that continuous service can be provided to users. To handle aforementioned issue, a technique is proposed to share load among multiple domains as shown in Fig. 5. Moreover, service request decision is formulated on the basis of Semi-Markov Decision Process (SMDP) to minimize the number of service rejections, and increase customer satisfaction. Furthermore, decision of service transfer is taken on the basis of both, income and expenses. It is concluded that technique significantly increases the customer satisfaction, and minimizes the number of service disruptions. Authors in [37] suggest that in heterogeneous cloud infrastructure, calculation of energy dissipation highly depends on the mapping between cloud servers and mobile devices. Therefore, a game-theoretic technique is proposed by the authors to minimize the overall power consumption. In proposed technique, all the mobile devices act as a player, and their main task is to migrate workload on one of the available severs to minimize the overall energy consumption. Moreover, an algorithm is devised to achieve the Nash equilibrium in polynomial time. It is concluded that given technique is able to reduce the energy consumption of mobile devices and servers in mobile clouds. Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

10

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

Fig. 5. Mobile cloud service domains & inter-domain transfer.

In [38], authors argued to adopt a computing model that is inspired from the natural chemical phenomena. Proposed nature-inspired model can be used to manage resources in mobile cloud environments. Moreover, model can be further evolved for service modeling and social computing purposes. To provide solution for aforementioned challenges, a computational model Chemistry for Context Awareness (C2A) [50] is improved by the help of Higher Order Chemical Language (HOCL) [51], and High Level Petri-net Graph (HLPNG) [52]. Moreover, two types of applications dynamics, service compositions, and social community’s identifications are considered for the evaluation purposes of C2A. In [39], authors design a middleware to formulate energy, bandwidth and cloud resources that are required for mobile clouds. Users can assign their tasks to middleware with the help of a thin-client application, and in response results are received after the completion of tasks. Moreover, middleware assigns a Cloud Personal Assistant (CPA) to each user that handles all the tasks on the behalf of user. CPA interacts with the existing clouds by using web services, assigns tasks to cloud, and delivers the corresponding results to user. Although CPA does not instantiate cloud resources but it can access cloud resources such as storage. In this section, majority of work provides solutions to provision resources for mobile cloud users. However, O’Sullivan and Grigoras [39] provide an energy efficient solution as energy is one of leading issues in mobile systems. Moreover, mobile clouds introduce number of research challenges, such as, energy efficiency, connectivity, network load, latency, security, and scalability. As mobile devices have limited energy, therefore, load balancing in cooperation-based architectures is hard to achieve due to high energy consumption involved in migration process. Furthermore, SLA violations can be encountered due to higher latency, limited bandwidth, and connectivity issues.

4. Performance evaluation parameters Performance evaluation is a mandatory process that must be performed to check whether a designed technique is providing desired results or not. Performance evaluation process compares the original performance with expected one, and its results determine the success of the given technique. A new technique is evaluated on the basis of various performance parameters in different testing condition. In this section, we present some important parameters that are currently used by the researchers to evaluate their RM techniques. Moreover, we are providing their definitions, and formulas that can be used to calculate these parameters.

4.1. Throughput In cloud computing tasks are executed on remote servers with the help of VMs, and results are sent to user. To check the performance of cloud service, a parameter throughput is used. In cloud computing, throughput means number of tasks completed in certain period of time [26]. If the throughput of the system is high that means system is taking less time to generate Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

11

results of the provided task. In case of low throughput, user may be attracted by the higher throughput service providers. Throughput can be calculated by using the formula,

Throughput ¼ J total  Jremaining

ð1Þ

where Jtotal are the total no. of tasks (jobs) received and Jremaining is no. of tasks (Jobs) still in progress. 4.2. Network overhead In cloud environment there are chances that interdependent tasks are assigned in different clusters or assigned to different servers. In such scenarios, whenever interdependent tasks will try to communicate, network traffic will be generated that can lead to network overhead. Another aspect that can increase the network overhead is VM migration. In case of more frequent VM migrations, most of the time network will be overloaded. Therefore, network overhead parameter is used to check the performance of RM techniques. Network load can be determined by using following equation [14],

LoadðF; t1 ; t 2 Þ ¼

v X v X s X s X 1 ;t 2 C tmk f mx f ky dxy

ð2Þ

m¼1 k¼1 x¼1 y¼1 1 ;t 2 where C tmk is a V  V matrix that represent the amount of data exchanged between Vk and Vm within the time interval t1  t2. f mx indicate whether a VM V m is hosted on server Sx or not. If V x is hosted on Sx then the value of f xk will be one, otherwise it will be zero. However, dxy is the cost of exchanging abstract unit between server Sx and server Sy .

4.3. VM migration time VM migration is one of the important aspects of cloud environments. VMs are migrated for several purposes like load balancing, energy saving, SLA violations avoidance, and network load minimization. Migration process takes some time to migrate VM from one server to another, and that time is known as VM migration time. In order to increase the system performance, VM migration time should be minimized. In [53], following formula is used by the researchers to calculate migration time.

T mj ¼

Mj Bj

ð3Þ

where T mj is the migration time, M j is the amount of memory used by VMj and Bj is the available bandwidth. 4.4. Number of VM migrations Another important parameter that should be considered while evaluating a RM technique is number of VM migrations. As discussed previously, higher number of VM migrations increases the network load, and results in performance degradation. Following equation can be used to calculate the number of migrations during a given time interval.

MigrationsðF; t1 ; t 2 Þ ¼

s Z X x¼1

t2

t1

Mig x ðFÞ

ð4Þ

where F represents the current placements of VMs, Mig x ðFÞ shows the number of migration of server Sx between time interval t1  t2 for the placement F. 4.5. Resource utilization In cloud computing, optimal use of the resources have a huge impact on the overall profit of the system. High resource utilization can increase profit, and reduce energy consumption by minimizing the number of resources in use. Therefore, a RM technique should be evaluated on the basis of overall resource by the help of following equation [14].

U x ðF; tÞ ¼

V X k¼1

f xk 

Req CPU k ðtÞ CPU x

ð5Þ

Eq. (5) presents the utilization U x ðF; tÞ of a server Sx at a specific time t. F represents the placement of VMs whereas f xk indicate whether a VM V k is hosted on server or not. If V k is hosted on Sx then the value of f xk will be one, otherwise it will be zero. CPU x represents the total computation capacity of Sx , and Req CPU k ðtÞ is the amount of CPU capacity required by the V k at specified time. Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

12

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

4.6. Energy consumption As discussed in previous section, energy efficiency is one of major RM metric, and a lot of research has been carried out to achieve this metric. As energy efficient RM techniques tend to minimize energy consumption, therefore, impact of proposed technique should be checked on energy consumption of the system. To calculate the power consumption of a given server at a time t with placement F, we can use following equation [10,14]. x x Px ðF; tÞ ¼ 0:7P max þ 0:3Pmax  U x ðF; tÞ

ð6Þ

x where Pmax represent the power consumed by a server when it is fully utilized. U x ðF; tÞ is the utilization of server at time t. However, total energy consumption of all the servers between time t1 and t2, can be calculated by using following equation [10,14].

EnergyðF; t 1 ; t2 Þ ¼

s Z X x¼1

t2

Px ðF; tÞ

ð7Þ

t1

4.7. Revenue and profit Basic purpose of service providers is to maximize their revenue and profit. For this purpose, various techniques are used that minimize the energy consumption, increase user satisfaction, avoid SLA violations, and provide on demand services. The effect of proposed technique can be checked on revenue and profit by using following equations.

P ¼RE

ð8Þ

where P represents the profit, R is the total revenue and E are the total expenditures. Whereas,



n X

xi  pi

ð9Þ

i¼1

And

E¼c

n X xi  s i

ð10Þ

i¼1

where xi is the allocation vector given by user ui , pi is the price paid by ui for the given allocation vector, c is the cost incurred to run the given allocation vectors, and si is the amount of computing resources requested for each VM by ui [21]. 4.8. SLA violation In cloud environment, a service level agreement (SLA) is agreed between service provider and user to ensure the required level of service. SLA contains various details of service level that will be provided to user, such as, minimum capacities of CPU, RAM, storage, and bandwidth. In case of SLA violation, a party that is responsible for its breach has to pay a fine to the other party. Therefore, proposed technique must be evaluated, and in case of large number of violations, necessary actions should be taken. We can use following equation to calculate SLA violations [53].

SLAV ¼ SLATAH  PDM

ð11Þ

where SLAV denotes SLA violation, SLATAH represents SLA violation Time per Active Host, and PDM stand for Performance Degradation due to Migrations. Following equation can be used to calculate SLATAH and PDM.

SLATAH ¼

N T si 1 X þ N i¼1 T ai

ð12Þ

And

PDM ¼

M C dj 1 X þ M j¼1 C rj

ð13Þ

where N is the number of hosts, T si is the time during which resources of host i were 100% utilized, whereas, T ai is the active time of host i. In Eq. (13), M is the number of VMs, C dj is estimated performance degradation of VM j due to migration, and C rj is the total capacity requested by the j. Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

13

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx Table 2 Parameters used by the discussed RM techniques. Technique

Evaluation parameters

Evaluation platform

MBFD [10] PCABFD [14] Addis et al. [15]

SLA violations, VM migrations, Energy consumption VM migrations, Energy consumption, Network Load Incoming workload (req/s), No. of online servers

CloudSim

Ardagna et al. [16]

Revenue, Profit, Cost, Energy consumption, No. of concurrent users, Request throughput, Response Time, No. of online servers Profit, Request arrival rate, Response time, Request served Cost, Required capacity, Service load level, Resource misprovision, Created and terminated VMs, No. of steps service load remains within range Consistency ratio, Task priority, Response time, Task Expenses No. of requests, No. of replicas, No. of failures Revenue, Resource utilization, Users served, No. of VM instances

Ardagna et al. [17] Wei et al. [18] Ergu et al. [19] García et al. [20] CA-PROVISION [21] CMRM [22] Ali et al. [23] FRA [24] EALARM [25] Sallami and Al Daoud [26] LBMM [27] Ye and Chen [28] Jung and Sim [29] LPBP [14] CFMV [30] Malik et al. [31] Grewal and Pateriya [32] Choudhury et al. [33] Altmann et al. [34] PANDA [35] SMDP [36]

Ge et al. [37] Ikram et al. [38]

O’Sullivan and Grigoras [39]

Revenue, Resource utilization, Execution success ratio, User satisfaction ratio Revenue, Cost, Arrival Rate, Response Time Run time, Ratio of EPT inter arrival rate and maximum arrival rate, Resource utilization No. of requests Mean squared error Cost, Makespan, Resource utilization Price of anarchy, Price of stability, Average ratio of the cost of Nash equilibria and social optimum

Sun Java 1.6, IBM Data Centre SPECweb2005, RUBBoS SNOPT 7.2.4

OpenNebula

CSIM Xen 3.4.0

C++ Google trace and TraceVersion2

Profit, Geographical distance, Average allocation time, No of visiting locations, No. of denials, Success rate VM migrations, Energy consumption, Network Load Total servers, Time to find optimized server, Servers saved, No. of servers used, Transition time, Servers with active VMs, Time used for resource consolidation Execution time Resource utilization

Grid5000 CloudSim

Debt

CloudSim

Cost Cost, Makespan (hour), Speedup, Farness, Estimated errors Probabilities of actions of inter-domain transfer service, Dropping probability of inter-domain transfer service, Dropping probability of new service, Reward of inter-domain transfer service, Reward of new service Energy consumption Bonds frequency, Distribution of context bonds, Service composition time consumption, Dominating set identification time consumption, Node-Degree distribution, Service composition success ratio, Number of compositions for growing request size Energy consumption, Offload time, Mobile data usage, State change

Amazon AWS Amazon EC2 Matlab

Matlab Ikram et al. [50]

Vodafone Ireland, Amazon EC2

5. Used evaluation parameters and platforms In this section, we present a review of parameters that are used by the researchers to evaluate their techniques. Table 2 contains a list of techniques and their corresponding evaluation parameter. Moreover, information regarding evaluation platforms that are used by the authors has also been provided. Based on following information, researchers will be able to select appropriate parameters and platforms for the evaluation of their proposed techniques. 6. Design goals The discussion so far has focused on the existing RM techniques. In the following section we have discussed the design goals which a researcher should take into account while devising a novel RM technique. 6.1. Decentralized decision making A RM techniques can either be centralized or distributed. In case of centralized RM, all resource allocation/de-allocation decisions are done on a central resource manager. Therefore, the failure of the central resource manager leads to failure of whole system [13]. Moreover, since all decisions are made by the central resource manager, it causes significant delays in terms of resource allocation/de-allocation process. On the other hand, distributed techniques can avoid aforementioned issues by using global and local resource managers. Global resource managers can keep record of free resources in the whole system, whereas local resource managers can keep a record of resources under their supervision. Furthermore, local resource managers can take decisions related to RM within their pool of servers. Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

14

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

6.2. Proactive RM techniques An RM technique should be proactive and have the knowledge of workloads assigned to all the servers. Energy-aware resource allocation strategy has an energy model, SLA-aware technique provides a technique to identify a server on which there are less chances of SLA violations occurrence, and load-aware techniques have their own models. Each technique intends to provide a solution for an issue or group of issues. Whenever a new task should arrives, a server should be assigned immediately, and this can only be achieved if a suitable server is pre-identified. Proactive techniques can significantly save the time, and provide high quality of service. 6.3. Fault tolerance A RM technique should be fault-tolerant [13]. There should be a mechanism to keep the track of ongoing operations on all the servers. In case of server failure, workloads should be immediately transferred to another server to ensure the required level of service. Live VM migration can help in transferring workloads from one server to another. Faults or system failures can significantly affect the quality of service, and may lead to a major business loss. 6.4. Scalability Scalability plays an important role in the growth of a system. RM strategy should be able to work if new servers or services are introduced in the system. Basic task of RM technique is to allocate and de-allocate resources from workloads that are assigned to serves. If an RM technique can only handle limited number of resources, then it is not be a feasible solution. A RM technique should be able to handle variable sizes of workloads and data centers so that the task of a user can be served accordingly. Moreover, it should act accordingly if a system is scaled up or down. 6.5. Reduced complexity A RM technique is the combination of resource allocation and resource migration techniques. Therefore, RM technique should be simple as such kind of techniques require less execution time, and computational power. However, multi-metric technique introduces significant amount of complexity, as RM techniques selects a server based on multiple factors. For example, if RM technique intends to minimize both energy consumption and SLA violations then it must consider all the aspects of energy-aware RM that can lead to SLA violations. Nevertheless, this may lead to complexity as technique must take several measures so that energy minimization may not lead to SLA violations. 6.6. Security and trustworthiness Security is an important aspect in any system. In case of private cloud, security is not a major concern because users of the system are employees of the organization that owns the private cloud. Unauthorized access to private cloud is already banned by the organization or the system is used from within the organization only. In case of public and federated cloud, security is a major research issue. Whenever a user uses remote resources there are chances that sensitive data may fall in wrong hands. In federated cloud, users of public cloud may be able to access private cloud and security may be breached. 6.7. Energy efficiency Energy efficiency is also one of the primary design goals of a system. According to [41], in 2010, energy consumption of communication system of all the data centers has been approximately 15.6 billion kW h. Significant research has already been carried out and various energy efficient RM techniques have been proposed that are discussed in Section 3.1. Energy efficient techniques can benefit both users and service providers in term of price and revenue. Moreover, it plays a vital role in the controlling emission of CO2 and global warming. Therefore, energy efficient RM can not only save money but also help in saving our natural environment. 6.8. SLA violations Quality of Service (QoS) is considered to be one of the basic issues that remain to be solved [54]. In terms of Cloud environment, QoS is defined by the fulfillment of SLA. An SLA defines the required level of resources that must be provided to a user. In case of non-availability of resources, service provider has to pay an amount that is agreed in the SLA. An SLA can be violated while performing workload consolidation to achieve energy efficiency. Workload consolidation technique reduces the active resources by consolidating the workload on minimum possible resources. Therefore, while designing an energy-aware RM technique SLA should be taken into the account. Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

15

6.9. Load balancing Load balancing is another RM metric that needs to be addressed while designing an RM technique [13]. With the help of load balancing we can minimize the amount of workload that has been placed on a server. Unbalanced and overloaded resources may result in system failure or SLA violation. Load balancing can also be helpful in maintaining SLA, and minimizing the chances of SLA violations. However, load balancing may lead to more energy consumption as both the metrics are conflicting. Load balancing promotes the use of more servers, whereas, energy-aware RM strategy tries to use minimum servers. Still, we can balance load among servers that are already consolidated. 6.10. Network load minimization In cloud environment, network load can be classified into two types; (i), the communications incurred between VMs to carry out a certain task, and (ii) the communications incurred due to migration of VMs. In first case, network load can be minimized by placing VMs of the same task on a single server, or in single bin (rack). However, in case of insufficient resources, VMs can be placed on servers having minimum network distance. In second case, the number of VM migrations can be reduced to minimize network load. It is noteworthy that VM migrations should only be done when a server is either over utilized or underutilized. Moreover, only those VMs should be migrated that do not increase first type of network load or require re-migration after short period of time. 7. Open research challenges and issues RM in cloud computing has been a hot research topic in recent times. Researchers have proposed many RM techniques to handle specific kind of problems. Still there is a need to propose RM techniques that should be able to provide solution for the RM challenges. Based on our analysis we have identified few challenges that need to be further investigated. 7.1. Customer-driven service management Customer satisfaction is described as a key factor for the success of a service in [55]. The authors propose user-centric objectives that should be followed by service providers to increase the level of user satisfaction. Moreover, other features like customer’s profile, and requested service requirement can be used to enhance customer satisfaction. Furthermore, customer care service can be used to communicate with customers, take their feedback and provide service based on the received feedback. Security also plays a major role in customer satisfaction and improving the reliability of provider by securing customer data. Hence there is a need to study all the possible customer requirements so that customer’s quality of experience can be increased [56]. 7.2. Computational risk management It is well thought-out that cloud computing is first established solution that provides computing as a service. However, numerous risks, such as, inadequate resources, system failure, network load, and load on resource manager may lead to SLA violation while providing services to a customer [55]. Therefore, proper risk analysis should be done to provide solutions to the issues that can arise due to aforementioned risks. On the other hand, risk analysis and management processes must be studied in detail because of their complexity and to gain maximum benefits from these methods [57]. 7.3. Autonomic RM As the resource demands change over the course of time, there is a need to devise RM techniques that can change original resource request at run time. Elastic cloud data centers can play an important role by scaling resources up and down [58]. However, understanding elastic requirements of workloads, and providing appropriate resources is quite hard. Moreover, self-management is also required in data centers so that they can monitor current status of resources, amend the resource request at run time, and adjust the cost of service accordingly [59]. Furthermore, RM self-configuration should be introduced to handle ever changing service requirements [60]. Therefore, autonomic, elastic, and intelligent RM techniques must be devised for data centers to manage limited number of available resources for continuously changing service demands. Likewise, a system can be introduced on customer side that will be responsible for identification and selection of an appropriate service provider. 7.4. Service benchmarking and measurement Service providers offer multiple types of services which require evaluation. It is essential to grade the service level so that an appropriate service provider is selected. Recently, Cloud Service Measurement Index Consortium (CSMIC) has provided a technique named Service Measurement Index (SMI) to grade cloud services [61]. Cloud services can be better evaluated in Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

16

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

presence of real traces of cloud environment and probability distributions to model the change in needs of these services. Unfortunately, there is not any service standard to benchmark the RM. Moreover, different types of services may have different sets of requirements, for instance, some may require security, and some may not. Similarly, a service may be data-intensive or computation intensive. Considering these heterogeneities, there is a need to develop realistic benchmarks to evaluate different types of RM techniques accurately. 7.5. System modeling and repeatable evaluation To evaluate efficiency of the proposed RM technique, it must be tested on the basis of different parameters, such as number of resources, type of resources, different workloads, changing needs of services, and different sizes of VMs. However, performance evaluation of RM techniques is challenging, and there is no guarantee that it will perform similarly in a real environment. The main reason for such behavior is the varying user requirements for resources. Also, resources in real environment are distributed and service request can be generated at any time by user. Therefore, such type of simulation environments should be devised that are more realistic and capable to accommodate varying resource demands [59]. 7.6. Information management Cloud contains great number of resources and collecting information of these resources is a big task. Collection of resource information from different set of servers result in network overhead, and may affect the quality of service. Moreover, information analysis, and corresponding actions can increase the overhead on central manager and network. Therefore, distributed cluster managers should be used that can collect, analyze and react on the information of their respective cluster without disturbing the operations of other clusters. However, there is still a need of central manager that will decide the host cluster based on the information provided by the cluster managers. 7.7. Resource heterogeneity It is hard for service providers to include similar type of resources in resource pools. Therefore, cloud environment contains different types of resources with different specifications and manufacturers. Heterogeneity of resources may lead to various issues in terms of load balancing and VM migrations. While performing load balancing, an algorithm must also consider the architecture of computing server. It is because clock speed of different servers may be same, but they may differ in term of architecture, caches, and registers. If VMs are transferred to a server with less cache, this may lead to performance degradation instead of enhancement. Moreover, if VM migration is performed without the information of destination, then the heterogeneity can affect the performance of cloud services. 7.8. Managing large scale clouds Cloud capability of handling large number of users and tasks is increasing day by day. To deal with the growing needs of users, more resources are introduced in the cloud resource pool. However, large and disperse operations may increase the amount of communications and may lead to high network load. Therefore, there is a need for such a technique that should not only minimize the network load, but also minimize the latency. Nevertheless, resource allocation should also be smart enough to place interdependent sub-tasks on same cluster servers. 7.9. Security of data/algorithms In cloud environment, server resources may be shared among users those belong to competitor organizations. Therefore, cases may occur in which a user can access the data/algorithm of other user [4]. Since, the possibility of data breach exists, there is a need to devise robust resource sharing techniques to overcome this issue. Brewer and Nash model is one of the potential solutions in which users are divided into different groups based on their conflict of interest. Only users of same conflict of interest are allowed to share same server to ensure security of data. 7.10. Failure avoidance The threat of servers and network failure increases in large clouds. In case of server failure, the executing task and its result are lost that leads to service failure. On the other hand, network (link/node) failure may lead to communication failure between VMs and servers. Network failure may also prevent VMs from migration and result in performance degradation. Therefore, techniques must be devised to deal with server and network failures. 7.11. Network resource sharing In cloud environment, network resource sharing (virtualization) is more problematic compared to computational and memory resources. For computational and storage sharing, mature solutions, such as VMware [62] are available, but no such solution is available for bandwidth management and network virtualization. Therefore, set of solutions should be provided Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx

17

that not only share network resources but also isolate workloads of various users. Isolation in workloads is must because variation in load of one customer can affect the services of other customers. 8. Conclusions In this study, we present a taxonomy and comprehensive overview of existing RM techniques that are devised for cloud environments. We also provide a detail discussion on the open research challenges, and design goals that should be considered while designing a RM technique. Moreover, we conclude that most of the existing techniques either provide solution for a single RM metric or at max two RM metrics. Therefore, researchers must provide multi-metric solutions that should also be able to handle conflicting metrics. Conflicting metrics are the ones in which if we optimize the performance of one metric, performance of other metric suffers. Such metrics can be energy efficiency and SLA violations, SLA violations and profit maximization, and energy efficiency and load balancing. We also conclude that RM techniques designed for traditional public and private clouds cannot be directly implemented in mobile, and hybrid cloud environments. The main reason behind aforementioned problem is the architectural and infrastructural differences between both sets of environments. MCC introduce new research challenges in the field of cloud computing by the introduction of traditional wireless challenges. Whereas, hybrid clouds increase the complexity of RM by combining public and private clouds. Therefore, researchers must provide generic RM solutions that should be able to perform well in all types of cloud environments. Moreover, novel MCC RM techniques ought to be devised that should cater wireless issues along with RM metrics. References [1] Rimal BP, Choi E, Lumb I. A taxonomy and survey of cloud computing systems. In: Fifth international joint conference on INC, IMS and IDC; 2009. p. 44–51. [2] Arianyan E, Taheri H, Sharifian S. Novel energy and SLA efficient resource management heuristics for consolidation of virtual machines in cloud data centers. Comput Electr Eng 2015. http://dx.doi.org/10.1016/j.compeleceng.2015.05.006. [3] Mell P, Grance T. The NIST definition of cloud computing (draft), vol. 800. NIST special publication; 2011. p. 145. [4] Wickboldt JA, Esteves RP, de Carvalho MB, Granville LZ. Resource management in IaaS cloud platforms made flexible through programmability. Comput Netw 2014;68:54–70. [5] Buyya R, Yeo C, Venugopal S, Broberg J, Brandic I. Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst 2009;25:599–616. [6] Jennings B, Stadler R. Resource management in clouds: survey and research challenges. J Network Syst Manage 2015;23(3):567–619. [7] Lin CT. Comparative based analysis of scheduling algorithms for RM in cloud computing environment. Int J Comput Sci Eng 2013;1(1):17–23. [8] Marojevic V, Gomez I, Gilabert PL, Montoro G, Gelonch A. RM implications and strategies for SDR clouds. Analog Integr Circ Sig Process 2012;73(2):473–82. [9] Rygielski P, Kounev S. Network virtualization for QoS-aware resource management in cloud data centers: a survey. PIK – Praxis der Informations verarbeitung und Kommunikation 2013;36(1). [10] Beloglazov A, Abawajy JH, Buyya R. Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing. Future Gener Comput Syst 2012;28(5):755–68. [11] Kansal NJ, Chana I. Cloud load balancing techniques: a step towards green computing. IJCSI Int J Comput Sci Issues 2012;9(1):238–46. [12] Shahapure NH, Jayarekha P. Load balancing in cloud computing: a survey. Int J Adv Eng Technol 2014;6(6):2657–64. [13] Gulati A, Shanmuganathan G, Holler A, Irfan A. Cloud scale resource management: challenges and techniques. In: Proceedings of 3rd USENIX workshop on hot topics in cloud computing (HotCloud 2011); 2011. [14] Tziritas N, Xu C-Z, Loukopoulos T, Khan SU, Yu Z. Application-aware workload consolidation to minimize both energy consumption and network load in cloud environments. In: 42nd IEEE international conference on parallel processing (ICPP); 2013. [15] Addis B, Ardagna D, Panicucci B, Squillante MS, Zhang L. A hierarchical approach for the RM of very large cloud platforms. IEEE Trans Dependable Secure Comput 2013;10(5):253–72. [16] Ardagna D, Panicucci B, Trubian M, Zhang L. Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Trans Serv Comput 2012;5(1):2–19. [17] Ardagna D, Casolari S, Colajanni M, Panicucci B. Dual time-scale distributed capacity allocation and load redirect algorithms for cloud systems. J Parallel Distrib Comput 2012;72(6):796–808. [18] Wei Y, Blake MB, Saleh I. Adaptive RM for service workflows in cloud environments. In: 2nd international workshop on workflow models, systems, services and applications in the cloud; 2013. [19] Ergu D, Kou G, Peng Y, Shi Yong, Shi Yu. The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment’’. J Supercomput 2013;64(3):835–48. [20] García AG, Espert IB, García VH. SLA-driven dynamic cloud resource management. Future Gener Comput Syst 2014;31:1–11. [21] Zaman S, Grosu D. A combinatorial auction-based mechanism for dynamic VM provisioning and allocation in clouds. IEEE Trans Cloud Comput 2013;1(2):129–41. [22] Chunlin L, Layuan L. Multi-Layer RM in cloud computing. J Network Syst Manage 2013. http://dx.doi.org/10.1007/s10922-012-9261-1. [23] Ali S, Jing Si-Yuan, Kun S. Profit-aware DVFS enabled RM of IaaS cloud. Int J Comput Sci Issues (IJCSI) 2013;10(2):237–47. [24] Goudarzi H, Pedram M. Profit-maximizing resource allocation for multi-tier cloud computing systems under service level agreements. Large scale network-centric computing systems, Wiley series on parallel and distributed computing; 2013. [25] Ban Y, Chen H, Wang Z. EALARM: an ENHANCE autonomic load-aware RM for P2P key-value stores in cloud. In: 7th IEEE international symposium service-oriented system engineering; 2013. [26] Al Sallami NM, Al Daoud A. Load balancing with neural network. Int J Adv Comput Sci Appl (IJACSA) 2013;4(10):138–45. [27] Kokilavani T, George Amalarethinam DI. Load balanced min–min algorithm for static meta-task scheduling in grid computing. Int J Comput Appl 2011;20(2). [28] Ye D, Chen J. Non-cooperative games on multidimensional resource allocation. Future Gener Comput Syst 2013;29:1345–52. [29] Jung G, Sim KM. Agent-based adaptive resource allocation on the cloud computing environment. 40th international conference on parallel processing workshops (ICPPW) 2011:345–51. [30] He L, Zou D, Zhang Z, Chen C, Jin H, Jarvis SA. Developing resource consolidation frameworks for moldable virtual machines in clouds. Future Gener Comput Syst 2014;32:68–81. [31] Malik S, Huet F, Caromel D. Latency based group discovery algorithm for network aware cloud scheduling. Future Gener Comput Syst 2014;31:28–39. [32] Grewal RK, Pateriya PK. A rule-based approach for effective resource provisioning in hybrid cloud environment. Adv Intell Syst Comput 2013;203:41–57.

Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

18 [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62]

S. Mustafa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx Choudhury K, Dutta D, Sasmal K. RM in a hybrid cloud infrastructure. Int J Comput Appl 2013;79(12):41–5. Altmann J, Kashef MM. Cost model based service placement in federated hybrid clouds. Future Gener Comput Syst 2014;41:79–90. Farahabady MRH, Lee YC, Zomaya AY. Pareto-optimal cloud bursting. IEEE Trans Parallel Distrib Syst 2014;25(10). Liang H, Cai LX, Huang D, Shen X, Peng D. An SMDP-based service model for inter-domain resource allocation in mobile cloud networks. IEEE Trans Veh Technol 2012;61(5):2222–32. Ge Y, Zhang Y, Qiu Q, Lu Y. A game theoretic resource allocation for overall energy minimization in mobile cloud computing system. In: Proceedings of the 2012 ACM/IEEE international symposium on low power electronics and design ISLPED ’12; 2012. p. 279–84. Ikram A, Anjum A, Bessis N. A cloud resource management model for the creation and orchestration of social communities. Simul Model Pract Theory 2014. http://dx.doi.org/10.1016/j.simpat.2014.05.003. O’Sullivan MJ, Grigoras D. Integrating mobile and cloud resources management using the cloud personal assistant. Simul Model Pract Theory 2014. http://dx.doi.org/10.1016/j.simpat.2014.06.017. Gao Y, Guan H, Qi Z, Song T, Huan F, Liu L. Service level agreement based energy-efficient resource management in cloud data centers. Comput Electr Eng 2014;40(5):1621–33. Bilal K, Khan SU, Zomaya AY. Green data center networks: challenges and opportunities. In: 11th international conference on Frontiers of Information Technology (FIT); 2013. p. 229–34. Nowicki T, Squillante MS, Wu CW. Fundamentals of dynamic decentralized optimization in autonomic computing systems. In: Self-star properties in complex information systems. Springer-Verlag; 2005. p. 204–18. Wei Y, Blake MB. Adaptive service workflow configuration and agent-based virtual RM in the cloud. In: IEEE international conference on cloud engineering (IC2E); 2013. Fotakis D, Kontogiannis S, Koutsoupias E, Mavronicolas M, Spirakis P. The structure and complexity of nash equilibria for a selfish routing game. In: Proceedings of the 29th international colloquium on automata, languages and programming, ICALP; 2002. Bertin R, Hunold S, Legrand A, Touati C. Fair scheduling of bag-of-tasks applications using distributed Lagrangian optimization. J Parallel Distrib Comput 2014. http://dx.doi.org/10.1016/j.jpdc.2013.08.011. Mu SC, Lyu YH, Morihata A. Approximate by thinning: deriving fully polynomial-time approximation schemes. Sci Comput Program 2014. http:// dx.doi.org/10.1016/j.scico.2014.07.001. Tenenbaum JB, de Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science 2000;290(5500):2319–23. Ghasemi-Falavarjani S, Nematbakhsh M, Ghahfarokhi BS. Context-aware multi-objective resource allocation in mobile cloud. Comput Electr Eng 2015;44:218–40. Khan AU, Othman M, Madani SA, Khan SU. A survey of mobile cloud computing application models. IEEE Commun Surv Tutorials 2014;16(1):393–413. Ikram A, Anjum A, Hill R, Antonopoulos N, Liu L, Sotiriadis S. Approaching the internet of things (IoT): a modelling, analysis and abstraction framework. Concurr Comput: Pract Exp 2015;27(8):1966–84. Banàtre JP, Fradet P, Radenac Y. A generalized higher-order chemical computation model. Electr Notes Theoret Comput Sci 2006;135:3–13. International Standard ISO/IEC 15909. High-level Petri nets – concepts, definitions and graphical notation; 2000. Beloglazov A, Buyya R. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centres. Concurr Comput: Pract Exp 2012;24:1397–420. Singh S, Chana I. Q-aware: quality of service based cloud resource provisioning. Comput Electr Eng 2015. http://dx.doi.org/10.1016/ j.compeleceng.2015.02.003. Yeo CS, Buyya R. Integrated risk analysis for a commercial computing service. In: Proceedings of the 21st IEEE international parallel and distributed processing symposium (IPDPS 2007); 2007. Liu Y, Li C, Yang Z. Tradeoff between energy and user experience for multimedia cloud computing. Comput Electr Eng 2015. http://dx.doi.org/10.1016/ j.compeleceng.2015.04.016. Moeller RR. COSO enterprise risk management: understanding the new integrated ERM framework. USA: John Wiley and Sons; 2007. Brebner P. Is your cloud elastic enough? Performance modelling the elasticity of infrastructure as a service (IaaS) cloud applications. In: Third joint WOSP/SIPEW international conference on performance engineering, ICPE’12. ACM; 2012. p. 263–6. Buyya R, Garg SK, Calheiros RN. SLA-oriented resource provisioning for cloud computing: challenges, architecture, and solutions. 2nd international conference on cloud and service computing; 2011. p. 1–10. Kephart JO, Chess DM. The vision of autonomic computing. Computer, vol. 36(1). Los Alamitos, CA (USA): IEEE Computer Society Press; 2003. p. 41–50. Garg SK, Versteeg S, Buyya R. SMICloud: a framework for comparing and ranking cloud services. In: Proceedings of the 4th IEEE/ACM international conference on utility and cloud computing (UCC 2011); 2011. Sugerman J, Venkitachalam G, Lim BH. Virtualizing i/o devices on vmware workstation’s hosted virtual machine monitor. In: General Track: 2002 USENIX annual technical conference. USENIX Association; 2001. p. 1–14.

Saad Mustafa is a faculty member at COMSATS Institute of Information Technology (CIIT), Pakistan. Currently, he is pursuing his PhD from CIIT under the In-House Scholarship program for faculty. He received his BS (Hons) and MS degree in computer science from CIIT in 2007 and 2010, respectively. His area of interest includes cloud computing, sensor networks, VANETs and mesh networks. Babar Nazir is Assistant Professor in Computer Science Department at the COMSATS Institute of Information Technology, Abbottabad, Pakistan. He obtained his PhD in Computer Science from the Universiti Teknologi Petronas, Malaysia in 2011. He received his M.S in Computer Science from the COMSATS Institute of Information Technology, Abbottabad, Pakistan in 2007. His research interest includes communication protocols for Wireless Sensor networks, resource management and job scheduling in Cloud Computing, Grid Computing, and Cluster Computing, and Communication protocols for Mobile Ad hoc Networks (Bluetooth, VANETs). He has published more than 40 papers in international conferences and journals. Amir Hayat is working as Assistant Professor in the Computer Science department at COMSATS Institute of Information Technology. He has obtained his PhD from Graz University of Technology, Austria in the area of information security. Prior to that he has done his masters in information systems technology from George Washington University, USA. Dr. Amir Hayat has more than 15 years of industry, teaching and research & development experience. He has worked at various positions in national and multinational companies. His areas of interest are electronic government, information security, information systems, learning management systems and information retrieval. Atta ur Rehman Khan is the C.E.O of DeeByte software solutions and a faculty member at COMSATS Institute of Information Technology (CIIT), Pakistan. He has done his PhD from University of Malaya under the BrightSparks scholarship. Prior to that, he received his BS (Hons) and MS degree in computer science from CIIT in 2007 and 2010, respectively. His area of interest includes mobile cloud computing, sensor networks, VANETs and security. Sajjad Ahmad Madani is associate professor at COMSATS Institute of Information Technology (CIIT), Abbottabad, Pakistan. He joined CIIT in August 2008 as Assistant Professor. Previous to that, he was working with the institute of computer technology (Vienna, Austria) from 2005 to 2008 as guest researcher where he did his PhD research. He received his MS degree in Computer Sciences from Lahore University of Management Sciences (LUMS). He received his BSc degree from UET Peshawar and was awarded gold medal for outstanding performance. His areas of interest include low power wireless sensor networks and application of industrial informatics to electrical energy networks. He has published more than 50 papers in international conferences and journals.

Please cite this article in press as: Mustafa S et al. Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.07.021

Suggest Documents