Dynamic Request Redirection and Resource Provisioning for Cloud ...

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2470676, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. ×, NO. ×, 2015

1

Dynamic Request Redirection and Resource Provisioning for Cloud-based Video Services under Heterogeneous Environment Wenhua Xiao, Weidong Bao, Xiaomin Zhu, Member, IEEE, Chen Wang, Lidong Chen, Laurence T. Yang, Senior Member, IEEE, Abstract—Cloud computing provides a new opportunity for Video Service Providers (VSP) to running compute-intensive video applications in a cost effective manner. Under this paradigm, a VSP may rent virtual machines (VMs) from multiple geo-distributed datacenters that are close to video requestors to run their services. As user demands are difficult to predict and the prices of the VMs vary in different time and region, optimizing the number of VMs of each type rented from datacenters located in different regions in a given time frame becomes essential to achieve cost effectiveness for VSPs. Meanwhile, it is equally important to guarantee users’ Quality of Experience (QoE) with rented VMs. In this paper, we give a systematic method called Dynamical Request Redirection and Resource Provisioning (DYRECEIVE) to address this problem. We formulate the problem as a stochastic optimization problem and design a Lyapunov optimization framework based online algorithm to solve it. Our method is able to minimize the long-term time average cost of renting cloud resources while maintaining the user QoE. Theoretical analysis shows that our online algorithm can produce a solution within an upper bound to the optimal solution achieved through offline computing. Extensive experiments shows that our method is adaptive to request pattern changes along time and outperforms existing algorithms. Index Terms—Cloud computing, Cloud-based Video Service, Request Redirection, Resource Provision, Lyapunov optimization

F

1

I NTRODUCTION

Internet video is both bandwidth and CPU cycle demanding. According to a report[1] of Cisco Systems Inc. in 2013, the global Internet video traffic will contribute to 69% of the Internet traffic in 2017, up from 57% in 2012. The growth of Internet video traffic is at an annual rate of 34%. Video data processing also demands significant amount of CPU cycles. Video applications often involve in pre-processing steps such as transcoding[2], encoding/decoding, abstraction[3], adaption[4], rendering[5], etc. to satisfy different requirements. As an example, scenes in an online game are rendered dynamically to follow the actions of players. The difference in the screens of various devices of different players often requires different video codecs. Data processing involved in these steps is compute-intensive and normally done on the Video Service Provider (VSP) side. It poses significant challenges for VSPs to efficiently plan and manage their computing capacity in order to satisfy user requests in a timely manner, particularly when requests may have bursty arrival patterns. The cloud computing paradigm offers a convenient way • Wenhua Xiao, Weidong Bao, Xiaomin Zhu and Lidong Chen are with the Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, Hunan, P. R. China, 410073. E-mail:{wenhuaxiao, wdbao, xmzhu}@nudt.edu.cn, [email protected] • Chen Wang is with the Autonomous Systems, CSIRO DP&S Marsfield, NSW, Australia. E-mail:[email protected] • Lawrence T. Yang is with the Department of Computer Science, St. Francis Xavier University, Antigonish, NS, B2G 2W5, Canada. E-mail: [email protected]

for a VSP to dynamically adjust its computing resources rented from cloud service providers (CSPs) according to the demand in a Pay-As-You-Go (PAYG) manner. Compared with traditional approaches, the cloud computing paradigm eliminates VSPs’ costs of purchasing and maintaining their own infrastructures. The optimization goal of a VSP is therefore to minimize the monetary cost of renting VMs and guarantee its users’ Quality of Experience (QoE) in order to maintain its competitive advantage in the market. However, it is challenging for VSPs to dynamically rent computing resources in the cloud in a cost-effective manner to provide users with adequate level of QoE. Firstly, the user request arrivals are dynamic and bursty user demands are difficult to predict. With different QoE requirements associated with these user requests, it is difficult to find an optimal way to map them to a variety of resource types in the cloud. Secondly, balancing the cost of cloud resource renting and QoE of users is a difficult decision making problem itself, e.g., higher QoE may cost a VSP more in short term but reward it in long term. Thirdly, a single CSP may not have servers located in geographically different regions that sufficiently cover the users of a VSP. In this case, the VSP may need to use multiple CSPs with different geographically located servers to provide satisfactory QoE to its users. The difference in CSPs’ resource pricing in different regions and time slots further complicates the resource renting and user request scheduling for VSPs. There are some existing works in this area. Most of them consider the resource renting and request scheduling problem separately. For example, [6], [7] deal with the resource provisioning problem by optimizing the cost of renting computing

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.


resources from the cloud. They assume request arrival time and service time follow certain distributions. Some work focuses on finding optimal request dispatching strategies [8]. In practice, the resource provisioning strategy has impact on request scheduling policies, and the request scheduling strategy also affects resource provisioning strategy. In this paper, we takes both aspects into account to address the resource management problem for a VSP without relying on assumptions such as homogeneous workload[9], identical QoE requirement[10], and single CSP among all users. Our goal is to give an optimal cloud resource renting and user request scheduling strategy to deal with these challenges. The strategy intends to minimize the long term VM renting cost of resources from multiple CSPs for a VSP while maintaining certain level of user QoE. To achieve this goal, we first formulate the problem into a jointly stochastic optimization problem, and then, apply the Lyapunov Optimization framework to solve the problem. Such a stochastic system does not require predicting the future system states and makes decisions only based on current system state[11]. Based on drift-plus-penalty function transformation, we propose an online algorithm that is able to schedule user requests from multiple regions to distributed datacenters and dynamically compute the near optimal number of VMs needed to satisfy user requirements for serving their workloads. The major contributions of this work are summarized as below: • We propose a framework that systematically handles resource renting from multiple CSPs and schedules user requests to these resources in a nearly optimal manner. In particular, the framework is capable of handling heterogeneous types of user requests, workloads and QoE requirements. VMs in the cloud have different types and are priced dynamically. • We propose an algorithm to solve the jointed stochastic problem to balance the cost saving and QoE using Lyapunov optimization framework. The algorithm approximates the optimal solution within provable bounds. Moreover, the algorithm can have a distributed implementation. • We evaluate the algorithm using both real and synthetic datasets. Our extensive experiments show its effectiveness. Furthermore, the experiments also reveal that the heterogeneity of QoE requirements provides an opportunity to reduce the operational cost of VSPs. The remainder of this paper is organized as follows: Section 2 summarizes related works; Section 3 describes the system modeling and the problem formulation; Section 4 gives the online algorithm for solving the request dispatching and resource renting problem; Section 5 analyzes the proposed algorithm; Section 6 gives evaluation results and Section 7 concludes the paper.

2

R ELATED W ORK

Major platforms for video content delivery over the Internet include large content delivery networks, or CDNs, such as Akamai, P2P systems such as BitTorrent [12] and PPLive [13] and Cloud datacenters. The use of CDNs often requires the

2

negotiation of contracts and incur a relatively high setup cost. P2P systems require minimal dedicated infrastructure for video content delivery but suffer from problems such as long video start-up delay caused by excessively video data prefetching in a unstable environment. Cloud datacenters provide a dedicated infrastructure as well as a convenient Pay-AsYou-Go model of running video services on them, which makes them increasingly popular for video content delivery. In addition to content delivery capability, cloud datacenters also provide computing resources for video processing. Request scheduling and resource allocation in the cloud can be classified based on different perspectives of cloud providers and cloud users. There are many efforts on designing scheduling strategies for cloud providers. For single datacenters, improving resource utilization and fairness are often the focus [14], [15]. For multiple datacenters, some work propose scheduling strategies to minimize the cost of electricity use through balancing load among geographically located datacenters [16], [17]. From the perspective of users, scheduling strategies mainly deal with reducing the cost of resource renting and satisfying their performance requirements[18]. Traditionally, user requests are scheduled to servers using queue-based data structures. Requests are dispatched to queues maintained by servers randomly or according to round-robin, shortest queue or maximum profit [19] and other policies. These scheduling strategies often assume a fixed pool of servers with fixed service capacity. When a VSP uses cloud for its service, the server pool and the capacity of each server become elastic. [20], [7] consider elastic server capacity supported by virtualization technologies. [20] proposed adaptive request allocation and service capacities scaling mechanism mainly to cope with the flash crowd. Different to [20], [7], our work further consider the scenario where users have different response time requirements to various services offered by a VSP. [6] took into account of the VM renting cost and storage cost when making scheduling decisions. These works often need certain mechanisms to predict the future workloads. For a VSP, multiple datacenters located in different geographical regions form a content delivery network. [21], [22], [10] consider user request scheduling in this setting. The scheduling strategies take into account of price difference of resources in different datacenters as well as the tolerable delays of serving requests in these datacenters. CALMS [21] is capable of leasing cloud servers dynamically in a fine granularity to adapt the needs of user requests. It relies on a prediction mechanism to estimate the future demand. In [22], He et al. studied the problem of optimally procuring the number of VM instances of different types to satisfy dynamic user demands. It is based on Amazon EC2’ pricing model. Compared to our work, these systems do not consider different types of services that offer different QoE levels. There also has been a line of research works considering the service quality when studying the resource management in video service. Various approaches, with an objective to optimize the Quality of Service (QoS), were proposed to improve the performance of the HTTP ABR(Adaptive Bit Rate) streaming services [23], [24], [25], [26]. However, the QoS



Region 2

a11, a 2 1 , ... ,

demand, which significantly differs from the assumptions made in [21], [6], [33]. Thirdly, we handle the problem under a more general model compared to many existing works in terms of multiple datacenters, multi-services and various QoE requirements.

Region R

...

Region 1

c

a1 c

a12 , a22 ,..., a2c

2 ..., a R , a R, a R 1

VSP

c r1

Request allocation

c rD

c r2

Datacenters DC_1

3

DC_2

Service 1

...

Service 2

Service 2

Service C

3.1 c,k D

c ,k 2

n

n

n

F ORMULATION

...

VM provisioning

AND

Service 2

...

... Service C

c ,k 1

M ODELING

In this section, we first describe the system model, and then formulate the problem.

DC_D Service 1

Service 1

Service C

3

Virtual Machines Type-1

Type-2

Type-1

Type-2

Type-1

Type-2

Ă

Type-K

Ă

Type-K

Ă

Type-K

Fig. 1. System architecture

metric used in these works ignored the subjective experience from end users. Recently, Quality of experience(QoE) notion has attracted more and more attentions in the community. Different from QoS, QoE is a user-centric approach which pay more attention to the perception of end users. [27] proposed a QoE adaption model for mobile video application that is able to maximize the content provisioning and network resources while meet user’s QoE requirement. [28] studied the packet scheduling problem for the wireless links to achieve optimal QoE under bandwidth constraints. [22] studied the tradeoff between the procurement cost and the achieved QoE for end users when deploy cloud-based video service. [29] presented a solution on how to achieve the optimal QoE with the limited cache storage in the media cloud. Whereas, different from these works, our work regards the QoE as a constraint of the objective in a long-term time span. Lyapunov optimization technique was first proposed in [30] for network stability problem and then was introduced into cloud computing area to deal with job admission and resource allocation problem [31], [32]. Yao et al. [9] extends it from the single time scale to two-time-scale for achieving electricity cost reduction in geographically distributed datacenters. Recently, Wu et al. [10] used it for resource management in multimedia service. However, these works consider the problem from the perspective of cloud providers and the resource allocation problem at the granularity of physical servers. We apply the technique to address request scheduling problem from the perspective of VSPs and manage resources at the granularity of VMs. Moreover, we deal with multiservices with heterogeneous QoE offerings using Lyapunov optimization technique. Our work differs from existing works mainly in the following aspects: Firstly, we address the problem from the perspective of VSPs and handle both resource allocation and request scheduling problem. Secondly, with the Lyapunov framework, our method does not rely on the prediction of future user

System Modeling

We consider such a system scenario: datacenters belong to multiple CSPs that are geographically distributed over several locations, and run various types of services. Users from different regions can obtain the services from any datacenter at any time. The system architecture is illustrated in Fig. 1. In the system, users from different regions obtain various of services like video streaming and transcoding from VSPs which do not possess their own datacenters but actually rent the infrastructure (VMs) from CSPs. Once the VSP receives a request, the request should be dynamically redirected to an optimal datacenter according to its QoE requirements and the execution cost, considering the different prices of datacenters over different regions. Formally, considering the geo-distributed datacenters set D with size of D = |D|, indexed by d(1 ≤ d ≤ D). Each datacenter provides C classes of services denoted by set C(i.e., C = |C|), indexed by c(1 ≤ c ≤ C) . And a set K of distinct types of VMs (with size K = |K| ), each with specific capacity under different configurations of CPU, memory and storage, are provided in each datacenter. Requests are dynamically generated by users from R = |R| different regions (indexed by r, 1 ≤ r ≤ R), denoted as set R. Users from any region can access any datacenter. The requests are in the form of jobs and arrives independently, each type of job is described as < ωc , ℓc >, where ωc ∈ [0, Wmax ] is the workload of the job of type-c and c ∈ [1, C] is the type of service the current job belongs to. ℓc is the tolerable delay of type-c job, encoding the different QoE requirements of user requests. The reason we assume the same type of request have the same workload and tolerant delay lies in that chunk-based strategy is very popular in video application (e.g., video streaming). In this paper, we also assume video is divided into multiple chunks in the system, and both the users and VSP process the request by chunk fashion. Furthermore, the same as literature [2], we assume the workload of a specific request can be estimated by statistical learning. In this paper, we consider it is known because the workload (time requirement under one CPU unit) estimation is out of the scope of this paper. The system operates according to time slots, denoted by t = 0, 1, ..., T . As minimal rental period (e.g., hourly resource rental is supported in Amazon [34]) of VMs is allowed, we assume our resource procurement algorithm is periodically run every m time slots, while the requests are allocated once they arrive at the system. In the above presented system, our task is to make following decisions: 1) Request redirection, once the request arrives and



TABLE 1 IMPORTANT NOTATIONS D C R K m ρkd ωc Wmax ℓc acr (t) λcrd (t) Ndk Nmax Amax rc nc,k d (t) pkd (t) sk Q0 Qmax Hdc (t) Qcd (t)

set of datacenters distributed over multiple regions set of all services classes set of user regions set of VM types time interval to decide resource provisioning the availability of the type-k VM in datacenter-d workload of type-c request max workload of each type request tolerable delay of type-c service number of the requests of type-c from region r at t number of requests of type-c allocated to d in region r at t number of VMs of type-k in datacenter d max number of VMs of each type over all datacenters max number of request for type-c in region-r number of type-k VM for type-c request in d at t price to provision a type-k VM in d at t compute capacity of type-k VM the minimal QoE level should be guaranteed for users the max QoE level users can achieve unprocessed workload of type-c request in d at t Virtual queue to satisfy the constraint (11)

2) Resource procurement, every m time slots. The ultimate goal is to minimize the resource procurement cost as well as guarantee the user QoE in the long run. 3.2 Problem formulation In this subsection, we formulate VM provisioning cost and user QoE respectively and define the objective. 1) VM Cost. From the perspective of users, requests are dynamically generated over different regions in each time slot. Let acr (t) be the total number of the type-c service requests generated by users from the r-th region at time slot t. And λcrd (t) denote the number of the type-c service requests from region r allocated to datacenter d at time slot t, λmax as the rc max number of type-c request generated in region r. Then, we have: acr (t) ≤ Amax (1) rc , ∀r, ∀c, t ∈ [1, T ]. acr (t) =

∑

λcrd (t), ∀r, ∀c, t ∈ [1, T ].

(2)

d∈D

To meet the demand of the request task processing, VSP should scale up and down the number of VMs with heterogeneous capacities and different prices in each datacenter. Let nc,k d (t) be the number of type-k VM purchased for type-c job in datacenter d at time slot t, pkd (t) be the price to provision a type-k VM in datacenter d at time slot t, which is diverse in both spatial and time space.∑ Then, total cost of datacenter ∑ the k d at time slot t is given by nc,k d (t) · pd (t). Considering c∈C k∈K

all of the datacenters, the cost of the VMs provisioned at time slot t can be derived as: ∑ ∑ ∑ c,k C(t) = nd (t) · pkd (t). (3) d∈D c∈C k∈K

2) QoE definition. On the user side, QoE is a major metric to evaluate the service level. We also consider QoE factor

4

when making the resource procurement decision. Usually, in the network systems, QoE is sensitive to both queuing delay and network delay. Therefore, for a request k, we define its delay as follows: dk = dknet (·) + dkque (·), dknet (·)

(4)

dkque (·)

where and denote the function of network delay and queue delay of the request k. In reality, the two kinds of delay is very hard to estimate due to they depends on some different factors.(e.g., network delay depends on queueing delay (at routers), transmission delay and propagation delay of the routing path). For simplicity, we assume queuing delay is determined by the workload status and VM resources assigned to this workload, while network delay is mainly determined by the routing distance between clients and datacenters. In addition, the process of decision making and the VM renting inevitably incur delays for the workloads. We denote ddec (t) and drent (t) as decision delay and VM renting delay respectively. Then, the total delay generated by requests of type-c allocated to datacenter d at time slot t can be defined as: ∑ ∑ γdc (t) = dknet (·) + dkque (·) + ddec (t) + drent (t) k∈Udc (t) k∈Udc (t) ∑ c k ∑ , = λrd dnet (r, d) + dkque + ddec (t) + drent (t) k∈Udc (t)

r∈R

(5) where Udc (t) represents the set of request of type-c allocated to datacenter d at time slot t . Specifically, similar to literature [35], we define dnet (r, d) = u · (drd )v , where drd represents the distance between region r where request generated and datacenter d to which it is redirected, and u , v are parameters for scaling the distance and ensuring the ∑ convex property of this function respectively. Obviously, dknet can be k∈Udc (t)

calculated when we know the request allocation strategy λcrd (t) and the location of users and datacenters. Since the time consumption of decision making process mainly determined by the complexity of the algorithm and it usually appears to be stable over the time slots, we estimate this delay by the previously several time slots. And, since all the VMs to be rented can be launched simultaneously, the VMs renting delay within one time slot can be represented by that one VM needed. Due to its stability over time slots as measured in [36], we regarded∑ it as a constant in this paper, i.e.,drent (t) = drent . And, for dkque , we calculate it based on equation (6) k∈Udc (t)

which is similar to that in [10]. ∑ k∈Udc (t)

dkque = max[Hdc (t) −

∑

ρkd nc,k d (t)sk , 0],

(6)

k∈K

where Hdc (t) represents the total unprocessed workload of type-c request in datacenter d at time slot t (refer to (12)). The intuitive explanation is presented as follows. Assume one unit workload requires one unit time consuming, ∑ of then ρkd nc,k d (t)sk represents the total service time that k∈K

provisioned VMs in datacenter d can be allocated to type-c service. And Hdc (t) stands for the total service time needed to complete their tasks at current time slot. Therefore,



max[Hdc (t) −

∑ k∈K

ρkd nc,k d (t)sk , 0] indicates the service time

deficiency at time slot t. Thus, the newly-arrived ∑ k c,krequest will wait for a time period of max[Hdc (t) − ρd nd (t)sk , 0] to k∈K

execute. Furthermore, for different type of requests, they often have heterogenous tolerable delay. For example, the tolerant delay of online games often appear to be small because it is sensitive to delay, while for video analysis applications, it may be a little larger. Under this circumstance, it is unfair to measure the users’ QoE by the metric of delay. To solve this problem, similar to [29], [22], we consider that the user’s experience depends on the tolerable delay and the actual delay and define the corresponding QoE function as follows:  Qmax  ℓ·(b−1) [b · ℓ − γ], γ > ℓ qoe(γ, ℓ) = , (7) Q ,γ ≤ ℓ  max 0, γ > b · ℓ where ℓ, γ, Qmax represent the tolerable delay, actual delay of the specific service within one time slot and the max QoE that users can achieve respectively. From the definition, we know that, if requests are completed within the tolerable delay, users will get the highest constant QoE score Qmax since completing the tasks before deadline do not provide any QoE benefits. Otherwise, the more the actual delay violate tolerable delay, the worse the QoE users will get. When this violation is big enough, the QoE score will decline to zero because user will not wait for the response from the datacenter any more. b is a positive constant parameter, meaning the decline rate of QoE. Then, the QoE achieve by the users in datacenter d for type-c request at time slot t is qdc (t) = qoe(γdc (t), ℓc ). 3) A unified objective. So far, we have derived two important aspects of the system: The VSP cost metric C(t) in Eq.(3) (time average VM rental cost) and the QoE metric qoe(γ, ℓ) in Eq.(7). From the perspective of VSP, the main objective is to minimize its VM rental cost while guarantee users’ QoE requirements. Intuitively, to achieve a better QoE, VSP should rent more VMs; Whereas, it will increase the VM rental cost in return. Therefore, the fundamental challenge is how to optimize the number of VM of each type to minimize the operation cost while guaranteeing the QoE level in a long run. To this end, we construct a stochastic optimization as follows to solve the problem: T −1 1 ∑ P1. min lim C(t) T →∞ T t=1

s.t acr (t) = ∑

∑

λcrd (t), ∀r, ∀c, t ∈ [1, T ],

5

amount jobs arriving at that time slot. The constraint (10) ensures that the number of VMs required is within the capacity that a datacenter can provide and constraint (11) implies that all the user requests should be processed with a minimal QoE level. From the problem formulation presented above, as the request arrival is a random event, we know that the problem is a constrained stochastic optimization problem and our objective is to minimize the long-term average cost generated by VMs provisioning as well as guarantee users’ QoE level. However, there are two challenges when solving this problem: (1) The number of the requests in each region is time-varying and unpredictable, which makes it infeasible to precisely calculate optimal solution in an offline manner. (2) The large number of VMs and their hosted applications exacerbate the computational complexity of centralized solution. To deal with this problem, a recent developed optimization technique is adopted in this paper. The details of solution by using Lyapnov optimization framework is presented in the next section.

4

O NLINE A LGORITHM D ESIGN

In response to the challenges of problem P1, we take advantage of Lyapunov optimization techniques [11] to design an online control framework, which is able to concurrently make request redirection and resource procurement decision. In particular, our control algorithm does not require future information about user requests, which also can be proved to approach a time averaged cost that is arbitrarily close to optimum, while still maintaining system stability. 4.1 Problem Transformation Using Lyapunov Optimization According to the standard optimization framework theory [11], to minimize the time-averaged objective function, we transform the original stochastic optimization problem into a problem of minimizing the Lyapunov drift-plus-penalty. Let Hdc (t) be the total unprocessed workload of type-c request in datacenter d at time slot t. Initially, we define Hdc (0) = 0, and then the evolution of the queue Hdc (t) can be described as below: Hdc (t+1) = max[Hdc (t)−

∑

ρkd nc,k d (t)sk , 0]+

k∈K

(8)

(9)

d∈D

(10)

T −1 1 ∑ c lim qd (t) ≥ Q0 , ∀d, c T →∞ T t=0

(11)

c∈C

where, constraint (9) is to ensure that the sum of jobs redirected to each datacenter at one time slot is equal to the total

λcrd (t)wc .

r∈R

(12) Where ρkd represents the VM availability of type-k VM in datacenter-d, which means that VM fault-tolerance are considered in the model. The above queue update implies that the amount departed workload ∑ k of ∑ c and newly-arrived workload are ρd nc,k (t)s and λrd (t)wc , respectively. k d k∈K

c,k k nc,k d (t) ≤ Nd , 0 ≤ nd (t), ∀d, ∀k, t ∈ [1, T ]

∑

r∈R

To satisfy the constraint (11), we introduce a virtual queue Qcd (t), ∀d ∈ D, c ∈ C for each type of service in each datacen∑T −1 ter. As constraint (11) is equal to Q0 − T1 t=0 qdc (t) ≤ 0, according to the virtual queue theory in [37] and set Qcd (0) = 0, the update of the virtual queue Qcd (t) can be described by the following equation: Qcd (t + 1) = max[Qcd (t) + Q0 − qdc (t), 0].


(13)


Lemma 1: If virtual queue Qcd (t) is stable, i.e., limt→∞ E{qdc (t)}/t = 0, then the QoE constraint (11) can be satisfied. Proof : From (13), it is clear that Qcd (t + 1) ≥ [Qcd (t) + Q0 − qdc (t)]. By summing this inequality over time slot t ∈ {0, 1, · · · , T − 1} and dividing the result by T , we ∑T −1 Qc (T )−Qc (0) have: d T d ≥ T1 t=0 [Q0 −qdc (t)]. Note that Qcd (0) = 0 and Qcd (t) is stable, by taking a lim as T → ∞, we know that the ∑ left hand side of above inequality is 0. ∑ Thus we have: T −1 T −1 lim T1 t=0 [Q0 − qdc (t)] ≤ 0, i.e., lim T1 t=0 qdc (t) ≥ T →∞ T →∞ Q0 . Let Q(t) = (Qcd (t)), and H(t) = (Hdc (t)), ∀d ∈ D, c ∈ C denote the matrix of virtual queue and actual queue respectively. Then, we use Θ(t) = [Q(t), H(t)] to denote the combined matrix of actual queues and virtual queues. According to Lyapunov framework [11], we define the Lyapunov functions as follows:

L(Θ(t)) =

1 ∑∑ c 2 2 {Qd (t) + Hdc (t) }, 2

(14)

d∈D c∈C

where L(Θ(t)) measures the queue backlogs in the system. Since the system rents VMs every m time slots, we define the m-slot Lyapunov drift, which represents the expected change in the Lyapunov function over m slot as follows: ∆m (Θ(t)) = L(Θ(t + m)) − L(Θ(t)).

(15)

In the sense of Lyapunov optimization framework, the driftplus-penalty can be obtained by adding the the VM rental cost over m time slot to the above Lyapunov drift, namely,

6

drift-plus-penalty expression has the following upper bound. t+m−1 ∑

∑ ∑ max 2 2 (18) where B1 = 21 DC{Q2max + Q20 + C1 ( Arc ) Wmax + c∈C r∈R ∑ 2 Nmax ( sk )2 } is a constant and Nmax is the max number k∈K

of VM of each type that a datacenter can provide. proof: Please see the Appendix A for the proof details in [38]. As aforementioned above, our algorithm is to minimize the right-hand-side (R.H.S) of inequality (18) to solve the problem P2. However, for any slot t, this requires prior knowledge of the future queue backlogs Θ(τ ) = [Q(τ ); H(τ )] over the time slots τ ∈ [t, t + m − 1]. In this paper, we address this problem by using the backlog Θ(t) to approximate the future information. i.e. ,Hdc (τ ) = Hdc (t) and Qcd (τ ) = Qcd (t) for all t < τ ≤ t + m − 1. However, this approximation result in a “relexed” upper bound of the drift-plus-penalty as proof in lemma 3. Lemma 3. Suppose the arrival requests acr (t) are i.i.d. over slots. It can be proved that , under any control algorithm , the drift-plus-penalty expression is upper bounded by the following expression. t+m−1 ∑

∆m (Θ(t)) + V E{

τ =t t+m−1 ∑

∑∑∑

τ =t

d∈D c∈C k∈K

∆m (Θ(t)) + V · E{

∑ ∑ ∑

k nc,k d (τ )pd (τ )|Θ(t)} τ =t d∈D c∈C k∈K t+m−1 ∑ ∑ ∑ ≤ mB1 + E{ {Qcd (τ )(Q0 − qdc (τ )}|Θ(t)} τ =t d∈D c∈C t+m−1 ∑ ∑ ∑ ∑ c +E{ {Hdc (τ )( λrd (τ )wc }|Θ(t)} τ =t d∈D c∈C r∈R t+m−1 ∑ ∑ ∑ ∑ k c,k {Hdc (τ )( −E{ ρd nd (τ )sk )}|Θ(t)} τ =t d∈D c∈C k∈K t+m−1 ∑ ∑ ∑ ∑ c,k +V E{ nd (τ )pkd (τ )|Θ(t)}, τ =t d∈D c∈C k∈K

∆m (Θ(t)) + V E{

∑ ∑ ∑

d∈D c∈C k∈K t+m−1 ∑ ∑ ∑

= m(B1 + B2 ) + V E{ nc,k d (τ )

· pkd (τ )|Θ(t)},

(16) where V is a non-negative parameter, that can control the tradeoff between the system stability and cost. The larger the V is, the smaller the cost is, and vice versa. Hence, the original problem P1 can be transformed into the following problem P2:

t+m−1 ∑

+E{

∑ ∑

τ =t d∈D c∈C t+m−1 ∑ ∑ ∑

−E{

τ =t d∈D c∈C t+m−1 ∑ ∑ ∑

+E{

τ =t d∈D c∈C t+m−1 ∑ ∑ ∑

+E{ P2. min(16) . s.t(9)(10)(11)

τ =t

(17)

To solve problem P2, rather than directly minimize the driftplus-penalty expression (16), we seek to minimize the upper bound for it, without undermining the optimality and performance of the algorithm according to [11]. Therefore, the key is to find an upper bound on problem P2. The following lemma provides the upper bound for our problem. Lemma 2. Suppose the arrival requests acr (t) are i.i.d. over slots. It is can be proved that , under any control algorithm , the

d∈D c∈C

τ =t

k nc,k d (τ ) · pd (τ )|Θ(t)}

∑

d∈D c∈C k∈K

k nc,k d (t) · pd (t)|Θ(t)}

{Qcd (t)Γcd (t)}|Θ(t)} {Hdc (t)( {

∑

r∈R

{

∑

r∈R

∑ k∈K

ρkd nc,k d (t)sk )}|Θ(t)}

λcrd (τ )Hdc (t)wc }|Θ(t)} λcrd (τ )Qcd (t)Ac drd }|Θ(t)}

(19) ∑ max where Γcd (t) = Ac max[Hdc (t) + (m − 1)( Arc Wmax ) − r∈R ∑ k c,k ρd nd (t)sk , 0] − Ac (Bc + ddec (t) + drent (t)) k∈K

max , Bc = b · ℓc and B2 and Ac = ℓcQ·(b−1) ∑ ∑ max 2 2 2 (m − 1)DC(Q0 + Q2max + C1 ( Arc ) Wmax c∈C r∈R ∑ 2 Nmax ( sk )2 ).

= +

k∈K

proof: Please see the Appendix B for the proof details in [38]. So far, by minimizing the R.H.S. of the ”relexed” upper



bound (19), we can solve the problem P2 because the queue backlogs in future time slots τ ∈ [t, t + m − 1] has been estimated. 4.2 Design of Online Control algorithm

7

Since the resource procurement problems in each datacenter are independent, (23) can be solved distributedly within each datacenter. For a single datacenter d, the resource procurement problem can be further rewritten as (24). ∑ ∑ c,k min V E{ nd (t)pkd (t)|Θ(t)} ∑ cc∈C k∈K E{ Qd (t)Γcd (t)|Θ(t)} c∈C (24) ∑ c ∑ k c,k −E{ Hd (t)( ρd nd (t)sk )|Θ(t)}

Fortunately, a careful investigation of the R.H.S of inequality (19) reveals that the optimization problem can be equivalently decoupled into two subproblems: 1) request redirection and c∈C k∈K 2) resource procurement. The details of solving the two s.t(10) subproblems are presented as follows. 1) Request Redirection: To minimize the R.H.S of (19), It can be easily verified that above subproblem (24) is a convex by observing the relationship among variables, the part related function of nc,k d (t). Thus, we can exploit many methods such to request redirection can be extracted from the R.H.S of (19) as interior point method and gradient projection method to as: deal with it. For convenience, in this paper, we solve the {t+m−1 } minimization of (24) by using standard convex optimization ∑ ∑∑ ∑ E { λcrd (τ )(Hdc (t)wc + Qcd (t)Ac drd )}|Θ(t) tools . (e.g., CVX). Since it also adopts interior point algorithm τ =t d∈D c∈C r∈R which can find an ϵ-approximate solution within O(n2 ln(1/ϵ)) (20) iterations where n is the size of the problem (i.e., C · K), Furthermore, it should be noted that requests of each type the algorithm is able to run online. The detailed analysis on generated from each region are independent. The centralized computational complexity of the algorithm can be seen in the minimization can be implemented independently and distribut- Appendix D of the supplementary material in [38]. edly. Considering the redirection of type-c request in region r Finally, with the update of queue Qc (t) and H c (t) at every d d at time τ , we should solve the following problem. time slot, the system will make online decision about request ∑ c redirection (λcrd (t)) once the request arrives at system, and VM min λrd (τ )[Hdc (t)wc + Qcd (t)Ac drd ] c,k d∈D (21) provisioning (nd (t)) every m time slots. Since the resource s.t (9). procurement in each datacenter are independent, we designed a more general algorithm that can cope with the case that In fact, the above problem is a generalized min-weight different datacenters have different rental period (i.e., set md problem where the amount of request of each type from region as the VM provisioning period for datacenter d). The details c r redirected to datacenter d λrd (t) is weighted by the queue of the online decision algorithm are presented in Algorithm 1. backlog Hdc (t) and Qcd (t). By using linear program theory, we can get the following solution: { c ar (τ ), d =d∗ , λcrd (τ ) = (22) 5 P ERFORMANCE A NALYSIS 0, else, c,k,π Assume that (λc,π ) denotes a feasible policy to the rd , nd π ∗ c c where d = mind [Hd (t)wc + Ac Qd (t)drd ]. Obviously, the problem P2, and Oavg denotes its corresponding time average solution exhibits that the type-c request generated from re- cost. Now we analysis the performance of Algorithm 1 in gion r will incline to be redirected to the datacenter with terms of stability and cost of the system. Lemma 4 We assume that the arrival of all requests are the shortest weighted workload queue and virtual queue at current time slot, which is consistent with the recent work on strictly within the capacity region, denoted as Λ, then under scheduling of cloud computing task for load balancing[39]. our algorithm, for any control parameter V > 0, we have: 1)The gap between its achieved time average cost and However, comparing with the solution of load balance strategy, this paper considers QoE factor, which makes the model more optimal cost are described as follows: suitable for the real world. T −1 1 ∑ ∑ ∑ ∑ c,k B ∗ 2) VM procurement: The left part of R.H.S (19) related , (25) lim E{ nd (t)pkd (t)} ≤ + Oavg c,k t→∞ T V to variable nd (t) can be considered as resource procurement t=0 d∈D c∈C k∈K problem if we remove the constant term. Therefore, we can get where O∗ is the infimum of the average VM provisioning avg the optimal VM procurement strategy by solving the following c,k,∗ cost when choosing the optimal control action (λc,∗ ) rd , nd problem: and B = B + B + F/m. B , B are constant defined in 1 2 1 2 {t+m−1 } ∗ ∑ ∑ ∑ ∑ c,k Lemma 2, F is an additive constant within which Oavg comes min V E nd (t)pkd (t)|Θ(t) when minimizing the (19) over all other policies. τ =t c∈C d∈D k∈K {t+m−1 } 2)Meanwhile, suppose there exists an ε > 0 such that a + ∑ ∑ ∑ c +E Qd (t)Γcd (t)|Θ(t) ε1 ∈ Λ, for k ∈ Z + , the queue backlog stability is described τ =t d∈D c∈C {t+m−1 } as follows: ∑ ∑ ∑ ∑ k c,k −E Hdc (t)( ρd nd (t)sk )|Θ(t) K−1 ∑ ∑ ∑ 1 τ =t d∈D c∈C k∈K lim K E{Qcd (km) + Hdc (km)} (26) K→∞ s.t (10). k=0 d∈D c∈C B1 +B2 +V Omax (23) ≤ , ε 1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.


Algorithm 1: Algorithm Procedures of DYRECEIVE 2 3 4 5 6 7 8 9 10 11 12

13

14 15 16 17

18

19 20

Input: k k sk , ωc , Wmax , ℓc acr , Amax rc , pd (τ ), ρd , drent , V, a, b, u, v (∀c ∈ C, ∀d ∈ D, ∀r ∈ R, ∀k ∈ K); Output: c nc,k d (τ ), λrd (τ ) (∀c ∈ C, ∀d ∈ D, ∀r ∈ R, ∀k ∈ K); Initialization step: Let τ = 0, st = cputime, and set Qcd (0) = 0, Hdc (0) = 0, (∀c ∈ C, ∀d ∈ D), ddec (0) = 0; while the service of VSP is running do calculate time slot τ ,τ = (curtime − st)/60s; estimate the decision overhead ddec (τ ) based on ddec (t), t ∈ [τ − 5, τ − 1]; Resource provisioning: foreach datacenter d ∈ D do if (τ mod md )== 0 then Observing the queue backlogs Qcd (τ ) , Hdc (τ ) and the VM price pkd (τ ) at current time; Getting the VM provisioning strategy (nc,k d (τ )) by solving the problem (24) using CVX tool; Request redirection: if request arrives at system then foreach r ∈ R,c ∈ C do Observing the queue backlogs Qcd (τ ) , Hdc (τ ) ,the network delay drd and estimating the computation delay dcomp (τ ) at current time; Getting the request redirection strategy λcrd (τ ) by solving the problem (21) using (22); Update the queues Qcd (τ ), Hdc (τ ) according to queue dynamic equation (12)(13) respectively. Record the decision-making time consumed at current time slot ddec (τ ).

where Omax is the max time average cost to run the system and 1 denotes the vector of all elements 1. This lemma exhibits that the gap between the time average cost obtained by our algorithm and the optimal cost is with O(1/V ) (refer to (25)). Obviously, by choosing the control variable V , we can achieve a time-average cost O arbitrarily close to the optimal cost ∗ Oavg . However, notice that the queue backlog is with O(V ) (refer to (26)), the decreasing of cost is achieved at the expense of losing the robustness of the queue state. proof: Please see the Appendix C for the proof details in [38]. Although our algorithm exists queue backlogs estimation, which may cause estimation errors, it has been proved in [9] that the result of the decision is robust against the estimation errors. Also, the result of theoretical proof suggests that, by choosing a larger value of V , the cost will approach to that using accurate information.

6

P ERFORMANCE E VALUATION

In this section, we evaluate the effectiveness and performance of the proposed algorithm using a discrete-event simulator and synthetic datasets based on real traces. 6.1 Dataset The dataset used in the experiment is synthesised by Youtube trace dataset [40], WoWAH dataset [41] and random dataset,

WoWAH dataset

200 100

Number of request

1

8

0 1000

500

1000

1500

2000

2500

3000

Youtube dataset

50 0 0 300 200 100 0 0 400

500

1000

1500

2000

2500

3000

Random dataset

500

1000

1500

2000

2500

3000

Syntheic dataset

250 100

0

500

1000

1500

2000

2500

3000

time slot

Fig. 2. Rate of each type of request over time slots which represent video streaming service, video game service and any other kind of video services respectively. YouTube trace data was collected from a campus network over a period of 13 days, and we extract two days’ data between 30th-Jan to 1th-Feb in 2008 for experiment. WoWAH dataset is the trace of online game World of Warcraft, containing the records during a 1,107-days period between Jan. 2006 and Jan. 2009, from which we also extract two days’ records for experiment. And the random dataset is randomly generated under Poisson distribution by ourselves. Then, the above three datasets are aggregated to form the synthetic dataset. Fig.2 illustrates the workload (per minute) variation of these datasets over two days. To be more realistic, we partition the Youtube trace into different regions according to the user’s IP address of each record. Note that WoWAH dataset does not include any information about user’s IP, as same to random dataset, we randomly classify the records into multiple regions with the number proportional to the statistic population of each region in Youtube trace. 6.2 Experiment setting In our experiment, we consider the system consists of 5 geo-distributed datacenters, running 3 types of services, and users are spread over 20 regions. Four types of VMs are considered in the experiments, the details are presented in table 2. Similar with [10], the basic price (BP in table 2) of small type VM instances takes values from a finite set within the range [0.05, 0.07] (in units of dollars per hour) and changes dynamically over regions and time slots, which is based on the pricing traces obtained directly from the Amazon EC2 web site [34] (for Linux/Unix VM instances). And the prices of other types are the log function of the number of compute units and BP , which implies that the more VMs a customer buys from the CSPs, the cheaper the unit price is. In the experiment, the price of each datacenter is independent and identically distributed. The tolerable delay of WoWAH, Youtube and Random are 1.5 × 104 , 1.6 × 104 , 1.7 × 104 , (in the unit of the time consumed by one workload,e.g., 10um) respectively.



TABLE 2 AMAZON EC2 VM INSTANCE name Small Medium Large Extra Large

Number of compute units 1 2 4 8

price BP ∈ [0.05, 0.07] BP · (1 + log2.5 (2)) BP · (1 + log2.5 (4)) BP · (1 + log2.5 (8))

The max request of each service in each region Amax are set rc based on the practical situation of synthetic workload. The availability of the VMs are set according to the measurement result of dataset UCB within 30 minutes in [42]. For simplicity, we set ρkd = 0.9 for all VM types and datacenters. And as presented in [43], only few seconds is needed to start the VMs, we set drent = 10s. Common parameters are: b = 2, Qmax = 5, Nmax = 1000, Wmax = 100. 6.3 Experiment result and analysis In order to facilitate the comparison, two metrics are defined in this section. (1) Cost Ratio (CR), which measures the cost proportion of a single case among the total cost obtained by N ∑ all cases. It can be calculated by using CRcur = Ccur / Ci . i=1

Where Ci denotes the cost incurred by the i-th case and N is the number of cases. (2) The Normalized Cost Ratio (NCR), which is defined as the cost incurred by the current case divided by the max one among those incurred by all cases. Similarly, it can be obtained via N CRcur = Ccur / max[C1 , ..., CN ]. Although the proposed algorithm is able to deal with the case that the VM provisioning periods vary over different datacenters, to validate the original mathematical derivation result, the following experiments mainly consider the case that the periods are the same over datacenters(i.e., md is the same for each datacenter-d). Nonetheless, a comparison experiment is conducted to validate the capability of the proposed algorithm in dealing with different rental period for different datacenters. (1) effectiveness of the algorithm .We run our dynamic algorithm for T = 2, 880 time slots, with parameter V = 2 × 104 , m = 10. Fig.3(a) presents the cost occurred in each time slot. We observe that the monetary cost curve is fluctuating synchronously with the variation of requests as shown in Fig.2, which means that our algorithm can adaptively lease and adjust VMs resources to meet dynamic user demands, without forecasting the future workload information. In detail, the cost comparison of each type of VM is illustrated in Fig.3(b), in which we use the metric CR for comparison. It can be observed that, under the variation of workload, the cost ratio of each VM type is relatively stable in the whole sense especially within crowd flash period. It may attribute to the fact that, within crowd period, resources are inadequate to the system and all type of the VMs will be rented to guarantee the user QoE, which cause a stable cost ratio near to the price ratio. Also the Extra Large is shown to have the highest ratio. It is due to that the more capacity of the VM is the lower the unit price of the VM is, so that the system will prefer to rent VM with more

9

capacity to reduce their cost. However, within the uncrowded period (e.g., between time slots 400-800 and 2,000-2,500), the algorithm inclines to rent more VMs with larger capacity (e.g., Extra large type). The rationale is that, within this period, VMs resource is in a less demand so that the system will incline to rent VM with low price to minimize its cost, which is different from the case when the resource demand is higher. Whereas, within this period, not all of the rented VMs are the type with lowest price, that’s due to that the large granularity may cause resource waste. This means that our algorithm can adjust the optimal combination among each type of VMs to minimize the total cost. (2) Impact of V and m. For parameter V , as can be seen in Fig.4(a), with the increasing of V , the time average cost obtained using our algorithm declines significantly and converges to the minimum level for a larger value of V . However, the stability of the system simultaneously declines since the variation of queue backlogs(i.e., Hdc (τ ) + Qcd (τ )) improves with the increase of V , which is consistent with Lemma 4. Furthermore, the cost reduction is achieved at the cost of degrading the users’ QoE. As can be seen in Fig.4(b), user QoE is decreasing with the increase of parameter V . Additionally, the variation of QoE is increasingly fluctuating with the increase of V , which means that increasing V degrades the stability user QoE level. The reason may be lie in that the larger the parameter V is, the less stable the queue backlog is, which leads to the fluctuation of resource provisioning and ultimately make the QoE level instable. Therefore, the parameter V control the tradeoff between the cost and user QoE in the system, which also verifies the Lemma 4. For parameter m (with V = 20, 000), as illustrated in Fig.6(a), the cumulative cost is presented to be increasing with the increase of m over time slots. It reveals the fact that a larger value of m will lead to renting reserved VMs for a longer time, regardless of the dynamic changing of workload within this period. Thus, some VMs will be wasted when workload declines within this period, resulting in cost increasing. Fig.6(b) shows that with the increase of m, user QoE decreases significantly, which is caused by the increasing estimation error of queue backlogs in future time slots τ ∈ [t, t + m − 1]. However, this can be alleviated by increasing the value of V as proved in [9]. Furthermore, we also studied the capability of dealing with heterogenous resource rental periods for different datacenters of the model. As presented in the algorithm 1, if we set md differently for each d ∈ D, then our algorithm is able to cope with the case that different datacenters have different rental periods. As can be seen in Fig.6(c) and Fig.6(d), the case M = [8 9 10 11 12] represents the five datacenters with the rental period of 8, 9, 10, 11, 12 respectively, and the case M = [10 10 10 10 10] represents that the datacenters with the same rental period of 10 (i.e., the previous setting m = 10). For comparison, we extracted the original data every 10 time slots to draw the curve. The Fig.6(c) shows that the cost of the case M = [8 9 10 11 12] is similar to the case with M = [10 10 10 10 10] and is fluctuating with the workload variation, which means that our algorithm is able to deal with the case that different datacenters have various rental periods. Also, the QoE levels are similar as can be seen



10

0.7

3000

Small

Medium

Large

Extra Large

0.6

2500

Cost Ratio

Cost

0.5

2000 1500

0.4 0.3 0.2

V=20000, m=10 1000

0.1

0

500

1000

1500

2000

2500

0

3000

time slot

0

500

1000

1500

2000

2500

3000

time slot

(a) Cost incurred by the system over time slots

(b) Cost ratio of each type of VM over time slots

Fig. 3. Cost metric over time slots

1.8 1.6

2400 2350 2300

1.4

Average Cost Workload Queue Variation

1.2 1 0.8

2250

0.6

2200 0

0.5

1

1.5

V

2

0.4 3

2.5

4

x 10

(a) Impact of V on cost and workload queue

Average QoE QoE Variation

4.5

0.305

4.4

0.3

4.3

0.295

4.2

0.29

4.1

0

0.5

1

1.5

2

V

Variation of Average QoE

2450

0.31

4.6

Average QoE

x 10 2

Variation of Queue Backlogs

Time Average Cost

5

2500

0.285 3

2.5

4

x 10

(b) Impact of V on QoE

Fig. 4. Impact of V on cost and QoE from Fig.6(d). It may because that the resource procurement of each datacenter is independent, and the decision-making at different time of different datacenters hardly affect the QoE level. (3) effects of considering the heterogeneity of QoE . To evaluate the effects of considering heterogeneity of QoE, we compared our algorithm with those cases that all services have the same tolerable delay (TD). The metric NCR is used for comparison in this experiment. In Fig.5(a), our algorithm DYRECEIVE is with different TD setting (i.e., ℓW oW AH = 15, 000, ℓY outube = 16, 000, ℓRandom = 17, 000 respectively) for three services, while TD=15,000, TD=16,000, TD=17,000 means that tolerable delay of all services are set as the same value correspondingly. As exhibited in Fig.5(a), the smaller the TD is the greater the NCR is achieved over all time slots. It can be explained by the fact that a smaller TD implies that the job is urgent, which leads to the rental of a larger number of VMs to complete the jobs within deadline and necessarily incurs much more cost. We also find that our algorithm incurs a lower cost than the cases with T D = 15, 000 and T D = 16, 000 at most time slots and the case with T D = 17, 000 at some time slots. We believe this is because that, under QoE heterogeneity considering, the urgent jobs are prior to execute, while less urgent jobs will waiting for the resource released by the urgent jobs since their deadline is not so tight. As a result, it is not necessary to launch new VMs when fresh jobs with long delay arrives, leading to the saving of cost. In depth, this is more obvious in the crowd period (e.g.,

time slots 1,000-1,500 and 2,500-3,000) than in the stable period (e.g., time slots 1,500-2,500) since resource is more inadequate when workload is crowded. Fig.5(b) illustrates the curve of metric CR of each type of service over time slots, which further verifies this result. As can be seen in this figure, the CR on Random dataset is stable over low workload periods(e.g., time slots 300-700 and 1,800-2,200 ) , while fluctuation significantly increased over crowd period (e.g., time slots 1,000-1,500 and 2,500-3,000). This is because that, within the crowd period, the requests from WoWAH have the priority to execute due to their urgent deadline while requests from Random dataset may concede some resources to the WoWAH request due to their longer tolerable delay. (4) Comparison with other strategies. In this section, we compared the proposed algorithm with other alternatives, each of which is the combination of a request allocation strategy and a resource provisioning strategy. For request allocation strategies, four approaches are considered. 1) Our Dynamic Request Redirection (DRR), in which the requests are inclined to be redirected to the datacenter with minimal weighted workload queue and QoE queue. As presented in (21), in fact, it is a weighted metric between workload balance and user QoE. This metric shows that our request redirection strategy not only consider the workload balance but also guarantee the QoE level. 2) Proximity-aware Request Redirection (PRR), in which requests are always allocated to the spatially nearest datacenter. This can minimize the response delay incurred by communication. 3) Load-balance Request Redirection(LBRR),



11

1

WoWAH 0.5

0.99

Cost Ratio

Normalized Cost Ratio

1

0.98

TD=15000 TD=16000 TD=17000 DYRECEIVE

0.97 0.96

0 0 1

500

1000

1500

2000

2500

3000

Youtube

0.5 0 0 1

500

1000

1500

2000

2500

3000

Random 0.5 0

0

500

1000

1500

2000

2500

3000

0

500

1000

1500

2000

2500

3000

time slot

time slot

(a) Normalized Cost Ratio comparison under various tolerable delay

(b) Cost ratio of each type of service over time slots

6

x 10

m=1 m=3 m=5 m=7 m=9 m=11 m=13 m=15 m=17 m=19

6

4

2

Average QoE

Cumulative Cost

8

4.7

0.65

4.6

0.55

4.5

4.3

0.25 0.15

4.1 4

500

0.35

4.2

0 0

0.45 Average QoE QoE Variation

4.4

1000

1500

2000

2500

3000

0

5

10

time slot

0.05 20

15

Variation of the Average QoE

Fig. 5. Cost Ratio comparison under different tolerable delay settings

m

(a) Impact of m on cost

(b) Impact of m on QoE

3000

5

QoE

Cost

2500 2000 1500

0

4

M=[10 10 10 10 10] M=[8 9 10 11 12]

1000 500

1000

1500

4.5

2000

2500

3000

time slot

(c) Cost comparison

M=[10 10 10 10 10] M=[8 9 10 11 12] 0

500

1000

1500

2000

2500

3000

time slot (d) QoE comparison

Fig. 6. Cost and QoE comparison under different m setting in which the requests are always allocated to the datacenter with the minimal workload. This strategy can keep workload balance among all datacenters. 4) Minimal Price Request Redirection (MPRR), in which all the request are allocated to the datacenter with the lowest price at current time slot. Obviously, this strategy can achieve the lowest cost. For resource provisioning strategies, we considered the following strategies. 1) Our Dynamic Resource Provisioning(DRP), in which the number of VM resource is scaled up and down by solving (24). 2) Heuristic Resource Provisioning (HRP), in which the VMs at current time slot are provisioned according to the workload at previous time slot. To better cope with the workload fluctuation, we add 50 percent VMs to those required at previous time slot to form the provisioned VMs. 3) Stable Resource Provisioning (SRP), in which the number of VM of each type in each datacenter is fixed along time slots. For comparison, we set the average number of each VM type

achieved by DRP as the stable resource provisioning strategy. In the whole sense, their total number of VM of each type consumed are equal. Firstly, the HRP strategy is compared with our DRP. Combining the two resource provisioning strategies and all the request redirection strategies, we have following eight cases for comparison, namely, DYRECEIVE(DRP+DRR), DRP+PRR, DRP+LBRR, DRP+MPRR, HRP+DRR, HRP+PRR, HRP+LBRR, HRP+MPRR. The cost comparison is presented in Fig.7(a), from which we see that the cost incurred by our algorithm is lower than DRP+LBRR, HRP+DRR, HPR+PRR and HPR+LBRR. This means that: 1) With the jointed consideration of two aspects (resource provisioning and resource allocation), the system can achieve a lower cost than those cases in which the two aspects are sovled individually, which verifies the previous assumption that the two aspects may exert effect mutually. 2)


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2470676, IEEE Transactions on Parallel and Distributed Systems

Average Cost

2000 1500 1000 500 0

E R R R R R R R IV PR BR PR DR PR BR PR CE P+ +L +M P+ P+ +L +M E R R D RP RP HR HR RP RP H H D D DY

(a) Cost comparison

12

Average QoE Queue Backlog

2500

Average Workload Queue Backlog

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. ×, NO. ×, 2015

DYRECEIVE DRP+PRR DRP+LBRR DRP+MPRR HRP+DRR HRP+PRR HRP+LBRR HRP+MPRR

5

10

4

10

2

10

1

10

0

3

10

DYRECEIVE DRP+PRR DRP+LBRR DRP+MPRR HRP+DRR HRP+PRR HRP+LBRR HRP+MPRR

3

10

10 0

500

1000

1500

2000

2500

3000

0

500

1000

1500

2000

2500

3000

time slot

time slot

(b) Workload queue state comparison

(c) QoE queue state comparison

Fig. 7. Comparison between our algorithm and other strategies The cost achieved by HPR is higher than our dynamic resource provisioning strategy DRP. It can be explained by the fact that the workload fluctuation is so various that a fixed heuristic method failed to predict it. Notwithstanding, the cost of DYRECEIVE is higher than the cases DRP+PRR, DRP+MPRR and HRP+PRR. This is because these strategies allocate the request to a single one (with proximity or lowest price) datacenter with limited VM resource, and the cost was incurred by only a single datacenter with inadequate VMs. However, the performance of these strategies is far away from DYRECEIVE’s, which is measured by the workload queue and QoE queue state. Fig.7(b) depicts the change of ∑ ∑workload queue state(for time slot τ , Havg (τ ) = d∈D c∈C H(d, c, τ )/(D · C)), which exhibits that our algorithm can achieve a stable workload queue that is similar to that of DRP+LBRR, while the workload queue of other strategies are increasing along the time. It means other strategies can not keep the stability of the system over a long time. Note that DYRECEIVE and DRP+LBRR achieve the similar workload queue state curve, our algorithm can balance the workload in different datacenters. Fig.7(c), exhibits the QoE queue state of the system. It reveals that DYRECEIVE can achieve a stable QoE queue that is even more stable than that of DRP+LBRR, the reason may due to the simultaneous consideration of the workload balance and QoE balance. 6

x 10

350

SRP+DRR DRP+DRR (DYRECEIVE)

6

300

Cost Gap

Cumulative Cost

8

4

250 200 150

2 100

0

0

1000

2000

3000

time slot

(a) Cumulative cost comparison

50 5000

10000 15000 20000 25000 30000

V

(b) Cost gap between SRP+DRR and DRP+DRR over V

Fig. 8. Comparison between our algorithm and SRP+DRR Secondly, we also compared the stable resource provision-

ing(SRP) with our dynamic resource provisioning (DRP). For the sake of fairness, we set both of the request redirection as DRR. Fig.8(a) shows the cumulative cost comparison between the two strategies (with V = 20, 000) and Fig.8(b) depicts the time-average cost difference (i.e., Cgap = CSRP +DRR − CDRP +DRR , ) between the two strategies along the change of parameter V . From the graphs, we have following observation: 1) the DRP is more cost-efficient than SRP even if they rent the same amount of VM resource, which means DYRECEIVE is able to optimize the resource provision along time slots (e.g., rent more VMs at the time slot with lower price and vice versa. ). 2) The cost gap is increasing with the increase of V , which means by choosing a larger V we can achieve more cost saving. However, as presented above, a larger V will lead to the degrading of QoE, it is very important to choose a suitable V to save cost (e.g., the experiment find V = 20, 000 is suitable).

7

C ONCLUSIONS

This paper proposed a novel method called DYRECEIVE for request redirection and resource procurement from the perspective of VSPs. We showed that DYRECEIVE is capable of reducing the cost of providing video services in the cloud and achieving satisfactory user QoE level simultaneously. The method provided an efficient way to run video services in a general and heterogeneous environment consisting of dynamic user workload, dynamic resource price, multiple services with heterogeneous QoE requirements, and heterogeneous datacenters. By using Lyapunov technique, we transformed the original problem into two independent sub-problems and gave an online algorithm to solved them. Theoretical analysis showed that, by choosing the parameter V , the long-term time average cost achieved by the algorithm approximates the offline optimum within provable upper bounds. Experiments conducted on synthetic datasets have validated our theoretical results. In addition, we found that QoE heterogeneity in fact offers a potential for cost reduction of VSPs. In the future, we will focus on following directions: 1) Taking into account the video consumption pattern in social network group to share VM resources. 2) Solving the problem at the level of job/VM match in stead of job/datacenter match considered in



this paper. 3) Taking into account more factors in objective modeling (e.g.,resource utilization, storage cost, VM migration cost etc) and QoE function definition(e.g., router delay, propagation delay etc).

8

ACKNOWLEDGEMENTS

This research is supported by the Research Fund for the Doctoral Program of Higher Education (RFDP) of China with No.20134307110029, Public Project of Southwest Institute of Electronics & Telecommunication Technology under Grant 2013001, the National Natural Science Foundation of China under Grant 91024030, and the Hunan Provincial Natural Science Foundation of China under grant 2015JJ3023. The authors would like to thank Prof. Yewang Chen at Jinan University and those anonymous reviewers for their constructive suggestions to the work.

R EFERENCES [1] [2]

[3] [4] [5]

[6]

[7]

[8]

[9]

[10]

[11] [12] [13] [14]

[15]

“Cisco system inc, cisco visual networking index: Forecast and methodology, 2012-2017,” 2013. W. Zhang, Y. Wen, J. Cai, and D. Wu, “Toward transcoding as a service in a multimedia cloud: Energy-efficient job-dispatching algorithm,” IEEE Transactions on Vehicular Technology, vol. 63, no. 5, pp. 2002– 2012, Jun 2014. B. Gunsel and A. Tekalp, “Content-based video abstraction,” in Proceedings of International Conference on Image Processing, Oct 1998, pp. 128–132. S.-F. Chang and A. Vetro, “Video adaptation: Concepts, technologies, and open issues,” Proceedings of the IEEE, vol. 93, no. 1, pp. 148–158, Jan 2005. D. Miao, W. Zhu, C. Luo, and C. W. Chen, “Resource allocation for cloud-based free viewpoint video rendering for mobile phones,” in Proceedings of the 19th ACM International Conference on Multimedia(MM’11), 2011, pp. 1237–1240. Y. Wu, C. Wu, B. Li, X. Qiu, and F. C. M. Lau, “Cloudmedia: When cloud on demand meets video on demand,” in Proceedings of the International Conference on Distributed Computing Systems(ICDCS’11), June 2011, pp. 268–277. X. Nan, Y. He, and L. Guan, “Optimal resource allocation for multimedia cloud based on queuing model,” in Proceedings of the IEEE 13th International Workshop on Multimedia Signal Processing(MMSP’11), Oct 2011, pp. 1–6. H. Wen, Z. Hai-ying, L. Chuang, and Y. Yang, “Effective load balancing for cloud-based multimedia system,” in Proceedings of the International Conference on Electronic and Mechanical Engineering and Information Technology(EMEIT’11), vol. 1, Aug 2011, pp. 165–168. Y. Yao, L. Huang, A. Sharma, L. Golubchik, and M. Neely, “Power cost reduction in distributed data centers: A two-time-scale approach for delay tolerant workloads,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 1, pp. 200–211, Jan 2014. D. Wu, Z. Xue, and J. He, “icloudaccess: Cost-effective streaming of video games from the cloud with low latency,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 8, pp. 1405– 1416, 2014. M. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan and Claypool, 2010. B. Cohen, “Incentives build robustness in bittorrent,” in Workshop on Economics of Peer-to-Peer systems, vol. 6, 2003, pp. 68–72. X. Hei, C. Liang, J. Liang, Y. Liu, and K. Ross, “A measurement study of a large-scale p2p iptv system,” IEEE Transactions on Multimedia, vol. 9, no. 8, pp. 1672–1687, Dec 2007. A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, “Dominant resource fairness: Fair allocation of multiple resource types,” in Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation(NSDI’11). USENIX Association, 2011, pp. 323–336. Y. Song, Y. Sun, and W. Shi, “A two-tiered on-demand resource allocation mechanism for vm-based data centers,” IEEE Transactions on Services Computing, vol. 6, no. 1, pp. 116–129, Jan 2013.

13

[16] S. Ren, Y. He, and F. Xu, “Provably-efficient job scheduling for energy and fairness in geographically distributed data centers,” in Proceedings of the IEEE 32nd International Conference on Distributed Computing Systems(ICDCS’12), June 2012, pp. 22–31. [17] Z. Liu, M. Lin, A. Wierman, S. H. Low, and L. L. Andrew, “Greening geographical load balancing,” in Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems(SIGMETRICS ’11). ACM, 2011, pp. 233–244. [18] G. Lee, B.-G. Chun, and H. Katz, “Heterogeneity-aware resource allocation and scheduling in the cloud,” in Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing(HotCloud’11). USENIX Association, 2011, pp. 1–5. [19] F. I. Popovici and J. Wilkes, “Profitable services in an uncertain world,” in Proceedings of the 2005 ACM/IEEE Conference on Supercomputing(SC’05). IEEE Computer Society, 2005, pp. 36–36. [20] J. Tang, W. P. Tay, and Y. Wen, “Dynamic request redirection and elastic service scaling in cloud-centric media networks,” IEEE Transactions on Multimedia, vol. 16, no. 5, pp. 1434–1445, Aug 2014. [21] F. Wang, J. Liu, and M. Chen, “Calms: Cloud-assisted live media streaming for globalized demands with time/region diversities,” in Proceedings of the IEEE INFOCOM, March 2012, pp. 199–207. [22] J. He, Y. Wen, J. Huang, and D. Wu, “On the cost-qoe tradeoff for cloud-based video streaming under amazon ec2’s pricing models,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 4, pp. 669–680, April 2014. [23] I.Hofmann, N.Farber, and H.Fuchs, “A study of network performance with application to adaptive http streaming,” in Proceedings of the IEEE Int. Symp. Broadband Multimedia Systems and Broadcasting (BMSB), 2011, pp. 1–6. [24] C. Liu, I. Bouazizi, , and M. Gabbouj, “ate adaptation for adaptive http streaming,” in Proceedings of the ACM Conf. on Multimedia Systems, 2011, pp. 169–174. [25] V. Adzic, H. Kalva, and B. Furht, “Optimized adaptive http streaming for mobile devices,” in Proceedings of the SPIE, 2011, pp. 81 350T– 81 350T–10. [26] T. Lohmar, T. Einarsson, P. Frojdh, F. Gabin, and M. Kampmann, “Dynamic adaptive http streaming of live content,” in Proceedings of the Int. Symp. World of Wireless, Mobile and Multimedia Networks(WoWMoM), 2011, pp. 1–8. [27] A. Khan, L. Sun, E. Jammeh, and E. Ifeachor, “Quality of experiencedriven adaptation scheme for video applications over wireless networks,” IET Commun., Special Issue on Video Communications Over Wireless Networks, pp. 1337–1347, 2010. [28] A. B. Reis, J. Chakareski, A. Kassler, and S. Sargento, “Distortion optimized multi-service scheduling for next-generation wireless mesh networks,” in Proceedings of the IEEE Conf. Computer Communications Workshops, 2010, pp. 1–6. [29] W. Zhang, Y. Wen, Z. Chen, and A. Khisti, “Qoe-driven cache management for http adaptive bit rate streaming over wireless networks,” IET Commun., Special Issue on Video Communications Over Wireless Networks, vol. 15, no. 6, pp. 1431–1445, Oct 2013. [30] L. Tassiulas and A. Ephremides, “Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks,” IEEE Transactions on Automatic Control, vol. 37, no. 12, pp. 1936–1949, 1992. [31] R. Urgaonkar, K. I. U. Kozat, and M. Neely, “Resource allocation and power management in virtualized data centers,” in Proceedings of the IEEE Network Operations and Management Symp(NOMS’10), 2010. [32] F. Liu, Z. Zhou, H. Jin, B. Li, B. Li, and H. Jiang, “On arbitrating the power-performance tradeoff in saas clouds,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 10, pp. 2648–2658, Oct. 2014. [33] F. Jokhio, A. Ashraf, S. Lafond, I. Porres, and J. Lilius, “Predictionbased dynamic resource allocation for video transcoding in cloud computing,” in Proceedings of the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2013, pp. 254– 261. [34] “Amazon elastic compute cloud,” [Online]. Available: http://aws.amazon.com/ec2/, 2013. [35] J. He, D. Wu, Y. Zeng, X. Hei, and Y. Wen, “Toward optimal deployment of cloud-assisted video distribution services,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 10, pp. 1717– 1728, Oct 2013. [36] M. Mao and M. Humphrey, “A performance study on the vm startup time in the cloud,” in Proceedings of the IEEE CLOUD, 2012, pp. 423–430. [37] L. Georgiadis, M. J. Neely, and L. Tassiulas, Resource allocation and cross-layer control in wireless networks. Now Publishers Inc, 2006.



[38] http://www.escience.cn/system/download/73104. [39] S. Maguluri, R. Srikant, and L. Ying, “Stochastic models of load balancing and scheduling in cloud computing clusters,” in Proceedings of the IEEE INFOCOM, March 2012, pp. 702–710. [40] “Umasstracerepository,” [Online]. Available: http://traces.cs.umass.edu/index.php/Network/Network, 2014. [41] Y. Lee, K. Chen, and Y. Cheng, “World of warcraft avatar history dataset,” in Proceedings of the ACM conference on Multimedia systems, 2011, pp. 123–128. [42] D. Kondoa, G. Fedaka, F. Cappelloa, A. A. Chienb, and H. Casanovac, “Characterizing resource availability in enterprise desktop grids,” Future Generation Computer Systems, vol. 23, pp. 888–903, 2007. [43] J. Zhu, Z. Jiang, and Z. Xiao, “Twinkle: A fast resource provisioning mechanism for internet services,” in Proceedings of The 30th IEEE INFOCOM’11), 2011.

Wenhua Xiao received his B.S. and M.S. degree from the International School of Software at Wuhan University, China, in 2010 and College of Information and System Management at National University of Defense Technology (NUDT), China, in 2012 respectively. He is currently a Ph.D. student in the College of Information System and Management at NUDT. His research interests include scheduling, resource management and content delivery related to Cloudbased video service.

14

Chen Wang received his Ph.D. degree from Nanjing University. He is a senior research scientist of CSIRO, Australia. His research interests are primarily in distributed, parallel and trustworthy systems. His current work focus on accountable distributed systems, resource management in cloud computing and demand response algorithms in the smart grid. He is also an Honorary Associate of the School of Information Technologies at the University of Sydney. Dr. Chen Wang has industrial experience. He developed a highthroughput event delivery system and a medical image archive system, which are used by many hospitals and medical centers in USA.

Lidong Chen received his Ph.D. degree in control science and engineering from National University of Defense Technology, Changsha, China, in 2012, where he is currently a lecturer at College of Information System and Management. Previously, he was a visiting scholar at Department of Computing Science, University of Alberta, Edmonton, Canada, from 2011 to 2012. His research interests include optical design of imaging system, real-time image processing, and cloud-based video service.

Weidong Bao received his Ph.D. degree in management science and engineering from the National University of Defense Technology in 1999. He is currently a Professor in the College of Information Systems and Management at National University of Defense Technology, Changsha, China. His recent research interests include cloud computing, information system, and complex network.

Xiaomin Zhu received his Ph.D. degree in computer science from Fudan University, Shanghai, China, in 2009. In the same year, he received the Shanghai Excellent Graduate Award. He is currently an Assistant Professor in the College of Information Systems and Management at National University of Defense Technology, Changsha, China. His research interests include scheduling and resource management in green computing, cluster computing, cloud computing, and multiple satellites. He has published more than 50 research articles in refereed journals and conference proceedings such as IEEE TC, IEEE TPDS, IEEE TCC, JPDC, ICPP. He is a member of the IEEE, the IEEE Communication Society, and the ACM.

Laurence T. Yang His research fields include networking, high performance computing, embedded systems, ubiquitous computing and intelligence. He has published around 300 papers in refereed journals, conference proceedings and book chapters in these areas. He has been involved in more than 100 conferences and workshops as a program/general/steering conference chair and more than 300 conference and workshops as a program committee member. Currently is the chair of IEEE Technical Committee of Scalable Computing (TCSC), the chair of IEEE Task force on Ubiquitous Computing and Intelligence, the co-chair of IEEE Task force on Autonomic and Trusted Computing. He is also in the executive committee of IEEE Technical Committee of Self-Organization and Cybernetics for Informatics, and of IFIP Working Group 10.2 on Embedded Systems.


Dynamic Request Redirection and Resource Provisioning for Cloud ...

Dynamic Request Redirection and Resource Provisioning for Cloud ...

Suggest Documents

Dynamic Request Redirection and Elastic Service Scaling in Cloud ...

Dynamic Request Redirection and Elastic Service Scaling in Cloud

Resource provisioning for cloud computing

Dynamic Resource Provisioning for Cloud- Based Gaming ... - CiteSeerX

Dynamic Resource Provisioning for Cloud - Gabriele D'Angelo - Unibo

Resource Provisioning and Dynamic Resource ... - Semantic Scholar

Load and Proximity Aware Request-Redirection for Dynamic Load ...

Energy-Aware Cloud Resource Provisioning

Elastic Resource Provisioning for Cloud Workflow Applications

Autonomic Resource Provisioning for Cloud-Based Software

Heuristic Resource Provisioning for Dynamic ... - OSA Publishing

dynamic cloud provisioning for scientific grid workflows

Practical Resource Provisioning and Caching with Dynamic ...

Request Redirection Algorithms for Distributed Web ... - CiteSeerX

Request Redirection Algorithms for Distributed Web ... - CiteSeerX

Resource Provisioning Techniques in Cloud Computing ... - ijrcct

Autonomic Resource Provisioning in Cloud Systems ...

Resource Provisioning Techniques in Cloud Computing Environment ...

Resource Provisioning Techniques in Cloud Computing Environment

Resource Provisioning Techniques in Cloud Computing ... - ijrcct

Dynamic Provisioning for High Energy Efficiency and Resource ...

Dynamic resource provisioning through Fog micro datacenter

Hybrid Algorithm for Resource Provisioning of Multi-tier Cloud ...

Cloud Computing for on-Demand Resource Provisioning - wc