TTSA: An Effective Scheduling Approach for Delay Bounded Tasks in ...

119 downloads 83083 Views 3MB Size Report
imization, delay bounded tasks, hybrid clouds, metaheuristic,. resource provisioning .... Workflow. applications whose deadlines are soft can fully take advantage.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CYBERNETICS

1

TTSA: An Effective Scheduling Approach for Delay Bounded Tasks in Hybrid Clouds Haitao Yuan, Student Member, IEEE, Jing Bi, Member, IEEE, Wei Tan, Senior Member, IEEE, MengChu Zhou, Fellow, IEEE, Bo Hu Li, and Jianqiang Li, Senior Member, IEEE

Abstract—The economy of scale provided by cloud attracts a growing number of organizations and industrial companies to deploy their applications in cloud data centers (CDCs) and to provide services to users around the world. The uncertainty of arriving tasks makes it a big challenge for private CDC to cost-effectively schedule delay bounded tasks without exceeding their delay bounds. Unlike previous studies, this paper takes into account the cost minimization problem for private CDC in hybrid clouds, where the energy price of private CDC and execution price of public clouds both show the temporal diversity. Then, this paper proposes a temporal task scheduling algorithm (TTSA) to effectively dispatch all arriving tasks to private CDC and public clouds. In each iteration of TTSA, the cost minimization problem is modeled as a mixed integer linear program and solved by a hybrid simulated-annealing particle-swarm-optimization. The experimental results demonstrate that compared with the existing methods, the optimal or suboptimal scheduling strategy produced by TTSA can efficiently increase the throughput and reduce the cost of private CDC while meeting the delay bounds of all the tasks. Index Terms—Cloud computing, cloud data center, cost minimization, delay bounded tasks, hybrid clouds, metaheuristic, resource provisioning, task scheduling.

I. I NTRODUCTION HE ECONOMY of scale provided by cloud computing has attracted many corporations to outsource their applications to cloud data center (CDC) providers [1]–[4]. In cloud computing, typical Infrastructure as a Service (IaaS) providers

T

Manuscript received August 17, 2015; revised December 12, 2015; accepted May 22, 2016. This work is supported in part by the Deanship of Scientific Research, King Abdulaziz University, Jeddah, Saudi Arabia, under Grant P024-135-437. This paper was recommended by Associate Editor S. Yang. (Corresponding authors: Jing Bi and MengChu Zhou.) H. Yuan is with the School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China (e-mail: [email protected]). J. Bi and J. Li are with the School of Software Engineering, Beijing University of Technology, and also with the Beijing Engineering Research Center for IoT Software and Systems, Beijing 100124, China (e-mail: [email protected]; [email protected]). W. Tan is with IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 USA (e-mail: [email protected]). M. C. Zhou is with the Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102 USA, and also with Renewable Energy Research Group, King Abdulaziz University, Jeddah, Saudi Arabia (e-mail: [email protected]). B. H. Li is with the School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2016.2574766

such as Rackspace [5] provide resources to support applications delivered to users. Similar to the paper in [6], from the perspective of a typical IaaS provider, private CDC in this paper refers to a resource-limited IaaS provider that may schedule some tasks to external public clouds if its resources cannot guarantee the expected QoS. Besides, the consideration of security and regulation causes that some applications can be provided by private CDC only. Private CDC aims to provide services to all arriving tasks from millions of users in the most cost-effective way while ensuring user-defined delay bounds. The arrival of users’ tasks is aperiodic and uncertain, and therefore it is challenging for private CDC to accurately predict the upcoming tasks. Besides, the limitation of resources in private CDC makes it possible that some arriving tasks must be refused to provide delay assurance of accepted tasks when the number of arriving tasks is unexpectedly large [7], [8]. However, this reduces the throughput of private CDC, and inevitably brings large penalty to private CDC due to the refusal of tasks. The emergence of hybrid clouds enables private CDC to outsource part of its arriving tasks to public clouds when tasks unexpectedly peak. In hybrid clouds, the total cost of private CDC mainly consists of the energy cost caused by the accepted tasks executed in it, and the execution cost of tasks dispatched to public clouds. Public clouds (e.g., Amazon EC2) deliver dynamic resources to users by creating a set of virtual machines (VMs). Delay bounded tasks usually have user-defined delay bounds to satisfy. In a real-life market, the execution price of VM instances provided by public clouds varies with the delay bounds [9]. Besides, the energy price of private CDC also shows the temporal diversity [8]. Therefore, how to minimize the total cost of private CDC in hybrid clouds where the execution and energy prices show the temporal diversity becomes a challenging problem. This work investigates the cost minimization problem for private CDC in hybrid clouds. This problem is formulated and solved by the proposed temporal task scheduling algorithm (TTSA). With the consideration of the temporal diversity in price, TTSA can effectively reduce the cost of private CDC by intelligently allocating all arriving tasks to private CDC or public clouds in their delay bounds. Then, public workload in Google production cluster [10] is adopted to evaluate the proposed TTSA. Comprehensive comparisons demonstrate that it outperforms the existing task scheduling approaches in terms of throughput and cost. The major contributions of this paper are as follows. First, the proposed method can strictly

c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2168-2267  See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE TRANSACTIONS ON CYBERNETICS

guarantee the delay bound of each delay bounded task. Second, this paper formulates an architecture of hybrid clouds that can provide temporal task scheduling. This architecture enables private CDC to outsource some of its tasks to public clouds provided that the delay bound of each arriving task is strictly ensured. Third, based on this architecture, TTSA is proposed to minimize the total cost of private CDC by intelligently dispatching all arriving tasks in hybrid clouds. The rest of this paper is structured as follows. Section II discusses the related work. Section III introduces the architecture of temporal task scheduling in hybrid clouds. Section IV presents the problem formulation. Section V shows the detail of TTSA. Simulation results are presented in Section VI. Section VII concludes this paper. II. R ELATED W ORK Resource provisioning in CDCs aims to provision limited resources while guaranteeing the performance of users’ tasks. Recently, a number of methods on resource provisioning in CDCs have been proposed [11]–[14]. In [11], a lightweight system is designed to simulate real-time resource provisioning in CDCs. In [12], a virtualized system is presented to dynamically provision resources based on users’ tasks. In [13], the effect of workload prediction on resource provisioning is investigated. Then, a decentralized algorithm that aims to dynamically provision resources is proposed. In [14], the problem of distributing users’ tasks to multiple heterogeneous servers is considered. Several greedy heuristic algorithms are proposed to realize the online allocation. Nevertheless, none of the existing studies focus on resource provisioning for delay bounded tasks in hybrid clouds. Task scheduling in CDCs is a challenging problem that was previously investigated [6]–[8]. In [15], an algorithm to dispatch scientific workflow tasks in multiple cloud environments is presented. In [16], an algorithm that can smartly exploit idle time of resources and replicate tasks is proposed. Workflow applications whose deadlines are soft can fully take advantage of this algorithm and mitigate the performance degradation caused by variation of resources. In [17], three algorithms that aim to realize energy-efficient task scheduling are proposed and compared with the existing scheduling algorithms. In [6], a task scheduling method based on heuristic is proposed to maximize the profit of a private cloud while ensuring the delay bounds. Nevertheless, none of the mentioned studies considers the temporal diversity in the execution and energy prices in hybrid clouds. In [8], a two-stage system is presented to dynamically dispatch arriving tasks to execute in CDCs and to minimize the energy cost of CDCs. However, it simply trims arriving tasks to satisfy the schedulability condition. Therefore, the refused tasks caused by their strategy may bring large penalty to a CDC provider and decrease the system throughput. In contrast, the proposed temporal task scheduling method aims to minimize the total cost of private CDC by intelligently dispatch all arriving tasks to private and public clouds. Some recent studies that focus on performance modeling of CDCs are based on classical queueing theory [18]–[20]. In [18], the average response time is modeled and estimated

Fig. 1.

Components of a hybrid cloud architecture.

according to the queueing theory. Then, a task scheduling algorithm is proposed to reduce the energy cost of CDCs. In [19], a multiserver system in a cloud is modeled as an M/M/m queueing model. Based on this model, the problem of multiserver configuration that aims to maximize the profit of a cloud is formulated and solved analytically. In [20], a hybrid queueing model is constructed for multi-tier applications in CDCs. Based on the model, a constrained optimization problem is formulated and solved by the proposed heuristic algorithm. They first estimate the average response time and then formulate and solve the profit maximization or cost minimization problem. Nevertheless, these studies can only guarantee the average response time for all tasks. However, the long-tail distribution of response time for the tasks implies that the delay of some tasks may be much longer than what users can accept [21]. Different from them, the proposed temporal task scheduling can guarantee all tasks to be done within their delay bounds. III. H YBRID C LOUD A RCHITECTURE The architecture of hybrid clouds is shown in Fig. 1. The architecture consists of private CDC and public clouds. A great number of physical clusters in the former are virtualized to provide resources (e.g., CPU, memory, network, and storage) to users. Component Monitor watches physical clusters, and sends resource information to Scheduler. Users’ tasks are first enqueued into an first-come-first-served (FCFS) queue that reports queue information to Scheduler. Besides, Predictor executes prediction algorithms [22] by using historical data to obtain future task information in private CDC and public clouds. There have been existing researches that focus on workload prediction based on historical data [23]–[27]. Therefore, this paper assumes that Predictor can well predict future information including task arriving rate, expected energy price in private CDC, expected execution price of public clouds, and expected average running time of each task in each time slot. In addition, similar to the work in [28] and [29], this paper assumes that data to be needed by tasks of each

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. YUAN et al.: TTSA: AN EFFECTIVE SCHEDULING APPROACH FOR DELAY BOUNDED TASKS IN HYBRID CLOUDS

application have been already distributed across all public clouds. Therefore, essential data for tasks of each application in each public cloud are strictly consistent with each other. In this way, tasks of each application can be independently executed within any public cloud. This paper mainly considers Scheduler that determines a scheduling strategy. Based on information reported by Monitor, Predictor, and the FCFS queue, Scheduler can execute TTSA, and specify the number of tasks dispatched to private CDC and public clouds, respectively.

Besides, this paper assumes that the capacity of each public cloud is unlimited. Therefore, in each time slot τ and τ + u, if there are arriving tasks corresponding to application n that are dispatched to execute in public clouds, these can only C tasks xτnc = 1 and be dispatched to one public cloud, i.e., c=1 C C nc  x nc τ +u = 1(1 ≤ u ≤ Un ); otherwise, c=1 xτ = 0 and c=1 C nc = 0(1 ≤ u ≤ U ). Note that tasks of application n  x n c=1 τ +u arriving in time slot τ must have dispatched to execute been nc  x by time slot τ + Un . Therefore, C c=1 τ +u = 0 (Un

Suggest Documents