Optimal VM Placement in Data Centers with ...

Optimal VM Placement in Data Centers with Architectural and Resource Constraints Deze Zeng, Song Guo, Huawei Huang

Shui Yu

Victor C.M. Leung

School of CSE The University of Aizu, Japan [email protected]

School of IT Deakin University, Australia [email protected]

Department of ECE University of British Columbia, Canada [email protected]

Abstract—Recent advance in virtualization technology enables service provisioning in a flexible way by consolidating several virtual machines (VMs) into a single physical machine (PM). The inter-VM communications are inevitable when a group of VMs in a data center provide services in a collaborative manner. With the increasing demands of such intra-data-center traffics, it becomes essential to study the VM-to-PM placement such that the aggregated communication cost within a data center is minimized. Such optimization problem is proved NPhard and formulated as an integer programming with quadratic constraints in this paper. Different from existing work, our formulation takes into consideration of data-center architecture, inter-VM traffic pattern, and resource capacity of PMs. Furthermore, a heuristic algorithm is proposed and its high efficiency is extensively validated.

I. I NTRODUCTION With the recent fast development of cloud computing, many companies such as Amazon, Google and Microsoft have made large investments to data center deployment for the support of services in a diversity of forms, such as IaaS (Infrastructure-asa-Service), PaaS (Platform-as-a-Service) and SaaS (Serviceas-a-Service). Virtualization has been proved as a key technology in cloud computing as it provides a fundamental support in data centers by consolidating several virtual machines (VMs) into a single server, i.e., physical machine (PM). By this means, the physical resources can be provided in a flexible way to tenants by leasing VMs for different service deployment. Thanks to the flexibility and efficiency of cloud computing, many service providers have adopted commercial data centers as their service hosting platforms. With the increasing communication demands in a data center, it has been shown that to minimize the data center communication cost is significant to its scalability [1]. This has raised a number of concerns from the perspective of data center architecture, in particular, network architecture [2]–[4]. For a given architecture, the inter-PM communication cost is determined. To improve the communication efficiency between PMs, quite a few data center architectures have been proposed in the literature, e.g., VL2 [3], Fat-tree [5], BCube [6], etc. Fig. 1 shows two representative architectures. Generally, PMs in a data center are organized in racks as shown by dotted rectangles in Fig. 1. Racks and servers in each rack are interconnected by interrack and intra-rack switches, repectively. Note that the inter-PM traffic is actually determined by

... ...

... ...

(a) Tree Fig. 1.

(b) Fat-tree Examples of data center architecture

the inter-VM communications that vary from services to services. For example, MapReduce [7] requires performing map operations across many VMs before proceeding to the reduce phase, which may be conducted on some other VMs. A mashup Web service (e.g., Wikimapia [8]) must aggregate data from different services (e.g., Google Map [9] and Wikipedia [10]) to compose the final service. With more communicationintensive applications, e.g., streaming services like Youtube [11], the bandwidth between VMs rapidly becomes a principal bottleneck in data centers. A straightforward idea is that all related VMs are placed in the same PM such that the communication cost is minimized without involving any switches. Unfortunately, the limited resources (i.e., CPU, memory and uplink/downlink bandwidth) on each server makes the all-inone solution impractical. To tackle this communication cost issue, the approach by VM placement, i.e., the mapping between VMs and PMs, has been studied recently in the literature [1], [12]–[14]. However, most existing work either relies on an over-simplified resource assumption model (e.g., [1]) or ignores the actual data center architecture (e.g., [13]). This motivates us to re-investigate the communication cost minimization problem via VM placement in a data center by taking both factors into consideration. Our contributions are summarized as follows. •

•

We formulate the VM placement problem as an integer programming with quadratic constraints. Inputs to this formulation include the resource requirement of each VM, pairwise VM traffic and data center architecture description (i.e., PM-rack relationship). The solution to our formulation provides the optimal VM-PM placement such that the communication cost is minimized. We also prove the formulated problem NP-hard. To address the computational complexity, we propose

a low-complexity heuristic algorithm to solve the VM placement problem. Experimental results show the high efficiency of our algorithm by the fact that it performs close to the optimal solution. The remainder of this paper is structured as follows. Section II provides a brief literature survey on VM placement. Section III introduces the data center model studied in this paper. Section IV develops our formulation of the virtual machine placement problem. Section V presents a heuristic algorithm for this problem. Section VI gives the evaluation results on both exact and heuristic algorithms. Finally, Section VII concludes this paper. II. R ELATED W ORK Due to the massive investments in data center for the support of cloud computing, its scalability in terms of aggregated communication cost has become a major concern. One of the approaches is on the design of efficient architectures, e.g., VL2 [3], Fat-tree [5], BCube [6]), Portland [15], etc. Recently, it has also received significant attention that the communication cost could be much reduced by exploiting the desired locations of VMs in a data center. In a dynmic environment, various VM migration algorithms have been proposed, e.g., [14], [16]. On the other hand, when the services to be provided are stable, the VM placement problem is of great interest and most relevent to our work. Ajiro et al. [17] propose a VM placement scheme called FFD (First-Fit-Decreasing). The idea is to iteratively place each VM to the first server that can fully satisfy its resource requirement. Zhang et al. [13] and Meng et al. [1] also study the problem under various dada-center system models. However, all these solutions ignore either resource constraints or actual data-center architectures. To our best knowledge, our proposal is the first work that explicitly takes account of these two issues. III. S YSTEM M ODEL We consider a data center with M racks K = {K1 , K2 , · · · , KM } and N servers S = {S1 , S2 , · · · , SN }. The location of servers is described by an N × M 0-1 matrix SK MSK = {Mij }, i = 0, 1, · · · , N, j = 0, 1, · · · , M , where ( 1, if server Si locates at rack Kj , SK Mij = (1) 0, otherwise. Let V = {V1 , V2 , · · · , VL } be all VMs to be placed in the data center. A service or an application (e.g., Web application, Hadoop, ERP) typically consists of multiple collaborative VMs that communication with each other. The inter-VM traffic is represented by an L×L matrix T = {Tij }, i, j = 0, 1, · · · , L, where Tij is the pairwise rate between Vi and Vj . The communications can be generally categorized into intra-server, inter-server&intra-rack, and inter-rack classes, depending on the locations of the corresponding VMs. Without loss of generality, their communication costs per unit are denoted as Cα , Cβ and Cγ , respectively, which are determined by the

data center architecture (e.g., Tree, VL2 [3], Fat-tree [5]) as indicated in [1]. In general, Cα < Cβ < Cγ . Let Cij denote the unit communication cost between any VMs Vi and Vj , i.e.,    Cα ,Vi and Vj on the same PM, Cij = Cβ ,Vi and Vj on the same rack but different PMs,   Cγ ,Vi and Vj on different racks. (2) We consider a more realistic data-center resource model that each server has only limited resources (i.e., CPU, memory, hard disk, I/O, etc.) denoted by an H-vector: R = {R1 , R2 , · · · , RH }. The available resources on all servers are described by an N × H matrix A = {Aij }, i = 1, · · · , N, j = 1, · · · , H, where Aij refers to the available capacity of resource Rj on server Si . Similarly, the resource requirements of all VMs are denoted by QV R = {Qij }, i = 1, · · · , L, j = 1, · · · , H, where Qij is the request for resource Rj from VM Vi . Note that MSK , Cα , Cβ , Cγ and A will be determined once a data center is deployed. Moreover, T and QV R will be also determined when all VMs in the data center are fixed. IV. T HE VM-P LACEMENT P ROBLEM In this section, we analyze the hardness of the VMplacement problem and then develop a mathematical formulation for the optimal solution. Theorem 1: The virtual machine placement problem under our system model is NP-hard. Proof: We consider a special case of the problem with the configurations: M = 1 and Cα = 0. In other words, the intra-PM communication incurs no cost and the inter-rack communication does not exist. This is exactly the problem studied in [13], where the impact of data-center architecture is ignored and the resulting problem is NP-hard proved by reducing the Balanced Minimum K-cut Problem. A. Architectural Constraints To determine the communication cost of any pair of VMs, we define binary variables: ( 1, if Vi and Vj are placed on the same sever, VS fij = (3) 0, otherwise, and ( VK fij

=

1, if Vi and Vj are placed on the same rack, 0, otherwise.

(4)

Note that if two VMs locate in the same server, they are VS definitely within the same rack, i.e., if fij = 1, then VK fij ≡ 1. On the other hand, if two VMs are within the same rack, they could be in the same sever or different ones. These can be described by the following constraint: VK VS fij ≥ fij , ∀ i, j = 1, · · · , L.

(5)

VK VS By similar observation, we conclude that fij − fij is equal to one if Vi and Vj are within the same rack but different

servers, and zero otherwise. These lead to the communication cost Cij to be written as VS VK VS VK Cij = fij Cα + (fij − fij )Cβ + (1 − fij )Cγ .

(6)

B. Placement Constraints To describe the VM-PM placement, We define an L × N VS 0-1 matrix MV S = {Mij }, i = 1, · · · , L, j = 1, · · · , N , where ( 1, if Vi is placed on server Sj , VS Mij = (7) 0, otherwise. First of all, each VM must be placed on exactly one PM, i.e., N X

VS Mij = 1, ∀i = 1, · · · , L.

(8)

j=1

In addition, recalling that the PM-rack placement is given by MSK , we can derive the VM-rack placement, denoted by an L × M matrix MV K , by MV K = MV S × MSK , or equivalently VK Mij =

N X

VS SK Mik · Mkj , ∀i = 1, · · · , L, ∀j = 1, · · · , M.

k=1

(9)

VS VK Finally, varibles fij and fij can be rewritten as:

VS fij =

N X

VS VS Mik · Mjk , ∀i, j = 1, · · · , L

(10)

VK VK Mik · Mjk , ∀i, j = 1, · · · , L

(11)

k=1

VK fij =

M X k=1

respectively. C. Resource Constraints The final set of constraints are for our capacity-limited resource model. A number of VMs can be consolidated onto a PM only if their resource requirements are all satisifed by the PM, i.e., L X

VS VR Mki Qkj ≤ Aij , ∀i = 1, · · · , N, ∀j = 1, · · · , H. (12)

k=1

Our objective is to find an optimal VM placement MV S that minimizes the overall inter-VM communication cost in a data center. By summarizing all the constraints derived above, we formulate the placement problem as an integer programming with quadratic constraints as: min : U =

MVS

L X L X

Tij Cij

i=1 j=1

s.t. : (5), (6), (8), (9), (10), (11), and (12).

(13)

Algorithm 1 The TAF Algorithm Require: pairwise traffic rate: T 1: Sort all elements in matrix T to a vector T in a decreasing order 2: while at least one VM has not been placed do 3: Let Vi and Vj be the PM pair with maximum rate Tij in the head of T 4: if both Vi and Vj have not been placed then 5: Place Vi and Vj to the target server S d found by Algorithm 2({Vi , Vj }) 6: if S d == null then 7: Place Vi to S d found by Algorithm 2({Vi }) 8: Place Vj to S d found by Algorithm 2({Vj }) 9: end if 10: else if only Vi has already been placed then 11: Place Vj to S d found by Algorithm 2({Vj }) 12: else if only Vj has already been placed then 13: Place Vi to S d found by Algorithm 2({Vi }) 14: end if 15: Remove Tij from T 16: end while Algorithm 2 Find the server with minimal incremental communication cost Require: a set of VMs V to be placed Ensure: target server S d 1: Sc ← ∅ 2: for all server S with enough residual resources satisfying all VMs in V do 3: Calculate the incremental communication cost ∆U after placing them on S 4: Add S into the candidate server set Sc 5: end for 6: if Sc 6= ∅ then 7: S d ← arg minS∈Sc ∆U 8: else 9: S d ←null 10: end if

V. T HE T RAFFIC -A MOUNT-F IRST A LGORITHM Due to the NP-hardness of the VM placement problem as formulated in (13), we propose a low-complexity heuristic VM placement algorithm in this section. Our idea comes from the intuition that the more communication traffic can be completed within a server, the less aggregated communication cost would be. Therefore, the VMs with higher traffic rate should be considered with higher priority. Following this principle, we design our Traffic Amount First (TAF) algorithm that always tries to place the VM pair with heaviest traffic to the same server without violating the resource constraints (12). The details of TAF is summarized in Algorithm 1 and Algorithm 2. To find the VM pairs with high traffic rates, we first sort matrix T , in a decreasing order, in a vector T. Then we check the pairs from T one by one

3000

14000 OPT. TAF.

TAF. FFD.

12000

Average cost

Average cost

2500 2000 1500 1000 500

10000 8000 6000 4000

0 5

6

7

8 9 Number of VMs

10

11

12

2000 40

50 60 Number of VMs

(a) Small-scale data centers Fig. 2.

70

(b) Large-scale data centers Total communication costs v.s. number of VMs. 4

2000

1.6

x 10

1.5

1500

Average cost

Average cost

OPT. TAF.

1000

TAF. FFD.

1.4 1.3 1.2

500 1.1 0 2

3

4 Number of servers

5

6


1 20

30

40 Number of servers

50

60

(b) Large-scale data centers Total communication costs v.s. number of PMs.

until all of them are placed (line 2–16 in Algorithm 1). During each placement iteration, we first pick up PM pair Vi and Vj with the maximum traffic rate (line 3 in Algorithm 1). If both Vi and Vj have not been placed, we take them as a whole and place them on the same server provided that it has enough residual resources to satisfy both of their requirements. We check the whole server set S to find out the candidate target server set Sc with sufficient resources as shown in Algorithm 2, where ∆U is denoted as the incremental communication cost incurred by placing the new VMs on a candidate server. Finally, the one from Sc with minimal incremental communication cost will be applied (line 7 in Algorithm 2). Otherwise, if no server has enough residual resources to satisfy both Vi and Vj simultaneously, we have to locate the server with the minimal incremental communication cost for each of them (lines 7 and 8 in Algorithm 1). In some cases, the host of one VM (Vi or Vj ) may have been decided (line 10 and 12, respectively, in Algorithm 1) in previous iterations. Following the same principle, we find the best server for the other (Vj or Vi ) as shown in line 11 and 13, respectively, in Algorithm 1. VI. P ERFORMANCE E VALUATION To evaluate the performance of the exact and heuristic algorithms, we conduct simulation studies for both smallscale (i.e., with small values of L, M and N ) and large-scale data centers based on the model described in Section III. For small-scale data centers, the optimal solutions are obtained by solving (13) using commercial solver Gurobi optimizer [18]. For large-scale networks, we compare our TAF algorithm against the FFD algorithm [17].

In our simulations, without loss of generality, the resources on each PM are CPU, memory and I/O with capacities of 1500, 1500 and 300 units, and the corresponding requests by each VM are randomly generated in (0, 300), (0, 500) and (0, 50), respectively. The traffic rate in matrix T for each pair of VMs is also randomly generated within 0 and 10. The communication costs Cα , Cβ and Cγ are set as 1, 3 and 5, respectively. In the following, our experiments show how the exact and heuristic algorithms perform, in terms of aggregated communication cost, as a function of L, N , M and A. For each setting, we present the average cost obtained from 100 simulation instances. A. On the Effect of the Number of VMs We first study how the number of VMs (i.e., L) affects the communication cost by varying it from 5 to 12 and from and 20 to 80 in small-scale and large-scale data centers, respectively. From the results shown in Fig. 2(a), we notice that the performance of our heuristic algorithm approaches the optimal one. In particular, the communication cost is a non-linear increasing function of L. Such phenomenon is attributed to the following two factors. 1) When the number of VMs is small, most communication can be conducted in an intra-server or intra-rack way such that the aggregated communication cost is low. 2) The aggregated communication traffic itself increases with the number of VMs in a scale of O L2 as the number of possible connections among L VMs is L · (L − 1)/2. We can also see from Fig. 2(b) that TAF outperforms FFD under any number of VMs. The high efficiency of our heuristic

4

3000

Average cost

Average cost

2500

1.4 OPT. TAF.

2000 1500 1000

x 10

1.2 1

TAF. FFD.

0.8 500 0 2

3

4 Number of racks

5

0.6 2

6

3

(a) Small-scale data centers

4 Number of racks

5

6

(b) Large-scale data centers

Fig. 4.

Total communication costs v.s. number of racks. 4

1.4

OPT. TAF.

1000 500 0 500

x 10

1.3

1500

Average cost

Average cost

2000

TAF. FFD.

1.2 1.1 1

700

900 1100 Capacity of CPU

1300

0.9 500

1500

700


900 1100 Capacity of CPU

1300

1500

(b) Large-scale data centers Total communication costs v.s. capacity of CPU. 4

1.4

OPT. TAF. 1000 500 0 600

x 10

1.3

1500

Average cost

Average cost

2000

TAF. FFD.

1.2 1.1 1

800

1000 1200 1400 1600 Capacity of memory size

1800

2000


0.9 500

700

900 1100 1300 Capacity of memory size

1500

(b) Large-scale data centers Total communication costs v.s. capacity of memory.

algorithm is thus validated. B. On the Effect of the Number of PMs We also conduct two groups of experiments. In the smallscale data centers, we fix K = 2, L = 10 and vary the value of N from 2 to 6. In the large-scale data centers, we fix K = 4, L = 60 and vary the value of N from 8 to 16. We have witnessed the effect of the number of VMs on the aggregated data center communication cost as shown in Fig. 3. Once again, we notice that TAF outperforms FFD and approaches to the optimal solution under different numbers of PMs. Furthermore, the communication cost shows as a decreasing function of the number of PMs. This is because, under a fixed number of racks, increasing the number of PMs in each rack also increases the probability of intra-rack communications, without sacrificing the intra-server communications. The communication cost is thus reduced.

C. On the Effect of the Number of Racks Then, we would like to see how the number of racks (i.e., M ) affects the communication cost. In this group of evaluations, we fix the values of N = 6, L = 10 and vary N from 2 to 6 and from 12 to 60 in the small-scale and largescale data centers, respectively. The closeness of TAF to the optimal solution and its advantage over FFD can be observed in Fig. 4. We also notice that the communication cost is an increasing function of M . This is because, under the same number of PMs, increasing the number of racks will also increase the probability of inter-rack communications, leading to comparatively higher cost. D. On the Effect of Resources Capacities Finally, let us investigate the effect of resource to the communication cost by varying the resource capabilities of CPU, memory and I/O in different sets of experiments. The values of (L, M , N ) are fixed as (10, 3, 6) and (60, 4, 40) in small-scale

4

1.6

1500 OPT. TAF. 1000 500 0 60

Average cost

Average cost

2000

x 10

1.4 TAF. FFD. 1.2 1

100

140 180 220 260 Capacity of I/O bandwidth

300

340


0.8 60

100

140 180 220 260 Capacity of I/O bandwidth

300

340

(b) Large-scale data centers Total communication costs v.s. capacity of I/O.

and large-scale data centers, respectively. Meanwhile, we vary the capabilities of CPU, memory and I/O from 500 to 1000, from 600 to 2000 and from 60 to 260, and show the evaluation results in Fig. 5, Fig. 6 and Fig. 7, respectively. From Fig. 5, we see that the total communication cost is a decreasing function of CPU capacity in Fig. 5(b). Similar phenomenon can be observed from Fig. 6 and Fig. 7 as well. This is because each server with richer available resources can host more VMs such that the chances of intra-server and intrarack communications would be improved. in other words, the aggregated communication cost is decreased. On the other hand, the perform will finally converge when the capability exceeds a certain value, e.g., 1000 CPU units in Fig. 5. This is because under such condition, the dominant factor to the communication cost becomes the intrinsic pairwise traffic rates, not the resource capabilities any more. Furthermore, we can see that our heuristic algorithm can always provide much better solutions than FFD. VII. C ONCLUSION In this paper, we address the problem of VM placement to minimize the aggregated communication cost within a data center under the consideration of both architectural and resource constraints. We prove this optimization problem NP-hard and formally formulate it as an integer programming problem with quadratic constraints. To tackle the high computational complexity of the exact algorithm, we further propose a low-complexity heuristic algorithm TAF. By extensive simulation-based studies, the high efficiency of TAF is validated by the fact that it always approaches to the optimal performance in small-scale instances and outperforms a representative existing algorithm in large-scale instances, under various settings. As part of our future work, the VM migration approach would be taken into consideration for the similar optimization problem in a data center with dynamic service provisioning. R EFERENCES [1] X. Meng, V. Pappas, and L. Zhang, “Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement,” in Proceedings of the 29th Annual IEEE INternational Conference on Computer Communications (INFOCOM ’10), March 2010, pp. 1–9.

[2] M. Al-Fares, A. Loukissas, and A. Vahdat, “A Scalable, Commodity Data Center Network Architecture,” in Proceedings of ACM Conference on Data communication (SIGCOMM ’08). New York, NY, USA: ACM, 2008, pp. 63–74. [3] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “VL2: a Scalable and Flexible Data Center Network,” in Proceedings of ACM Conference on Data Communication (SIGCOMM ’09). New York, NY, USA: ACM, 2009, pp. 51–62. [4] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The Cost of a Cloud: Research Problems in Data Center Networks,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008. [5] C. Leiserson, “Fat-trees: Universal Networks for Hardware-efficient Supercomputing,” IEEE Transactions on Computers, vol. C-34, no. 10, pp. 892–901, Oct. 1985. [6] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, “BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers,” in Proceedings of the ACM Conference on Data communication (SIGCOMM ’09). New York, NY, USA: ACM, 2009, pp. 63–74. [7] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. [8] “Wikimapia,” http://wikimapia.org. [9] “Google map,” https://maps.google.com. [10] “Wikipedia,” www.wikipedia.org. [11] “Youtube,” www.youtube.com. [12] J. Jiang, T. Lan, S. Ha, M. Chen, and M. Chiang, “Joint VM Placement and Routing for Data Center Traffic Engineering,” in Proceedings of the 31st Annual IEEE INternational Conference on Computer Communications (INFOCOM ’12), March 2012, pp. 2876–2880. [13] B. Zhang, Z. Qian, W. Huang, X. Li, and S. Lu, “Minimizing Communication Traffic in Data Centers with Power-Aware VM Placement,” in Proceedings of the 6th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS ’12), 2012, pp. 280–285. [14] X. Zhang, Z.-Y. Shae, S. Zheng, and H. Jamjoom, “Virtual Machine Migration in an Over-committed Cloud,” in Proceedings of IEEE Network Operations and Management Symposium (NOMS ’12), April 2012, pp. 196–203. [15] R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat, “PortLand: a Scalable Fault-tolerant Layer 2 Data Center Network Fabric,” in Proceedings of the ACM Conference on Data Communication (SIGCOMM ’09). New York, NY, USA: ACM, 2009, pp. 39–50. [16] V. Shrivastava, P. Zerfos, K.-w. Lee, H. Jamjoom, Y.-H. Liu, and S. Banerjee, “Application-aware Virtual Machine Migration in Data Centers,” in Proceedings of the 30th Annual IEEE INternational Conference on Computer Communications (INFOCOM ’11). IEEE, 2011, pp. 66– 70. [17] Y. Ajiro and A. Tanaka, “Improving packing algorithms for server consolidation,” in CMG CONFERENCE, vol. 2. Computer Measurement Group, 2007, p. 399. [18] Gurobi Optimization, “Gurobi Optimizer Reference Manual,” 2013. [Online]. Available: http://www.gurobi.com