Engaging Hardware-Virtualized Network Devices in Cloud Data Centers Pawit Pornkitprasan, Vasaka Visoottiviseth
Ryousei Takano
Faculty of Information and Communication Technology Mahidol University Nakhon Pathom, Thailand
[email protected],
[email protected]
Information Technology Research Institute National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
[email protected]
Abstract—Recently, virtualization has been an increasingly popular trend in cloud computing due to its ease of administration and efficient resource allocation. Hardware virtualization reduces the overhead of virtualization and significantly improves the performance, comparing favorably to that of physical machines. However, hardware virtualization is complicated to configure and thus is not widely used. This paper proposes integration of hardware-virtualized network devices into Apache CloudStack to bring the benefits to a larger user base. The proposed mechanism allows virtual machines attached to hardware-virtualized network devices to automatically provision and migrate without any user interactions. Experimental results show that after the initial setup is completed, users can provision new virtual machines without any complicated setup, provisioning, or configuration. The results of a performance evaluation reveal that hardware-virtualized network devices provide higher throughput, lower latency, and lower CPU usage. In addition, we have confirmed that the proposed mechanism has negligible impact on the performance of live migration. These results lead to the positive prospect that hardware virtualization is promising for future cloud data centers. Keywords—Cloud computing; CloudStack; High performance computing; Live migration; SR-IOV
I.
INTRODUCTION
Together with the increasing popularity of cloud computing, the use of virtual machines has also become popular due to its advantages in ease of provisioning and administration, and also its efficiency and flexibility in resource allocation. A Cloud OS, such as CloudStack or OpenStack, orchestrates virtual machines, network, and storage to provide users with an Infrastructure as a Service (IaaS) platform. Virtualization is the key technology in cloud computing. However, virtualization introduces an overhead resulting in a performance penalty. This problem becomes tangible for both high performance computing and high throughput computing. Many advances in software and hardware virtualization techniques have been introduced in order to reduce this overhead. This paper focuses on the hardware virtualization of network devices. Hardware virtualization is also known as Virtual Machine Monitor (VMM)-bypass I/O technologies,
which allow a guest OS to directly access physical devices without VMM intervention. Currently, software virtualization, including I/O emulation and para-virtualized I/O, is the most popular method of providing network connectivity to virtual machines. Hardware manufacturers have introduced hardware virtualization support to provide better performance and many papers have illustrated the performance benefits [1]. However, no Cloud OS is currently capable of deploying and migrating virtual machines with hardware-virtualized devices due to complicated configuration requirements. The objective of this paper is to integrate hardware virtualization of network devices using PCI Passthrough and SR-IOV with CloudStack in order to simply the configuration of hardware-virtualized network devices. The rest of this paper is structured as follows. In Section II, we introduce the background concepts and technologies used in this paper, in addition to briefly presenting related work. In Section III, we propose our design for the integration of hardware-virtualized network devices into CloudStack. In Section IV, we analyze the performance advantages and overhead associated with using hardware-virtualized network devices. In Section V, we discuss the potential roadblocks we have faced. Finally, in Section VI, we present our conclusion and plans for future work. II.
BACKGROUND
A. CloudStack Apache CloudStack [2] is an open source Cloud OS designed to be a turnkey solution for setting up an IaaS cloud platform supporting many hypervisors, including KVM, VMWare ESXi, and XenServer. Using CloudStack, users can provision their own virtual machines using its web interface or the REST API. As shown in Figure 1, CloudStack is divided into two
User
Web UI REST API
CloudStack Agent
VM
CloudStack Agent
VM
VM
CloudStack Management
Figure 1. The architecture of CloudStack
VM
major parts: the management server and the agent. The management server provides the user interface as well as manages and allocates all available resources. The computing nodes used to run the virtual machines are known as the agents, which delegate requests from the management server to the KVM hypervisor. B. Hardware Virtualization Support for Network Devices Many techniques are combined to make hardware virtualization of network devices possible. The techniques, which are PCI Passthrough, SR-IOV [1], and a network bonding device [3], will be described in this section. PCI Passthrough allows a PCI device to be directly attached to a virtual machine without intervention from the host OS. However, PCI Passthrough does not allow sharing of a single PCI device with many virtual machines. Single Root Input Output Virtualization (SR-IOV) extends PCI Passthrough by allowing one PCI device to represent itself as multiple virtual PCI devices, solving the problem of one PCI device being available to only one virtual machine. The final hurdle is that virtual machines cannot be migrated while PCI Passthrough devices remained attached. A bonding device solves this problem by allowing traffic to be routed through an alternative network device during migration. C. Related Work The performance of network devices using SR-IOV and PCI Passthrough has been evaluated previously in [1] and [4]. The experiment was performed on the Xen hypervisor in [1], while [4] used the KVM hypervisor. Both studies found that SR-IOV offers superior performance when compared to software-based network device virtualization. Additionally, [1] has found that SR-IOV can be used with a large number of virtual machines while incurring very little overhead. The technique of using network device bonding to allow live migration to be performed when using PCI Passthrough has also been described in [1] and [5]. An alternative approach has been proposed in [6], where the hardware states are transferred and restored during migration. However, it requires modification to the driver and hypervisor, and thus is not yet suitable for mass deployment. None of the above papers has attempted to address the allocation of SR-IOV devices to virtual machines within a cluster of compute nodes, which is the main focus of this paper. III.
DESIGN AND IMPLEMENTATION
In this section, we first address issues in hardware virtualization and then propose a mechanism to solve these issues. The problems that must be addressed in order to support hardware virtualization of network devices are shown as follows: • We must know which virtual machines require the presence of a PCI device. • We must be able to allocate available PCI devices to virtual machines. Each PCI device has a unique PCI ID
on each host and one PCI device must only be assigned to one virtual machine at a time. • Live migration must be supported. As a technical limitation, virtual machines cannot be migrated while a PCI device is directly attached using PCI Passthrough. Next, we will discuss the design choices taken in order to solve the above problems. A. Tagging Virtual Machines requiring PCI Passthrough For the first problem of marking the PCI-required virtual machines, we choose to store a comma-separated list of names of requested PCI devices in a service offering. In CloudStack, a service offering is a list of parameters for a virtual machine, such as available CPU and RAM. With this approach, we have a very flexible method to specify which PCI devices are available. Any virtual machine may request any number of PCI devices, as required. B. PCI device allocation Before we can allocate PCI devices, we must first know what PCI devices are available in the system. Each agent is configured with a list of available PCI devices with each device being associated with a name. Devices with the same name are considered equivalent and exchangeable. On startup, each agent will send the list of available PCI devices to the management server, which will then store the list in a database table. Figure 2 describes the procedure used when starting a virtual machine with PCI Passthough. When the virtual machine is started, the management server looks through its database to see if any of the requested PCI devices are available or not. Only agents with the requested device available will be considered for deployment. When a suitable host is found, the device will be marked as being in-use and the PCI ID of the device will then be sent to the agent running on that host, together with other parameters. The agent will then ask the underlying hypervisor to attach the requested PCI device. C. Live Migration of Virtual Machines The design mentioned above has already laid much of the ground work for the support of live migration of virtual machines. Before migrating a virtual machine, the agent will ask the underlying hypervisor to detach the PCI device from the virtual machine and the migration will be able to proceed normally. After the migration is completed, the management server will use the same algorithm to allocate a PCI device from the target host and ask the agent to attach the PCI device. In order for the virtual machine not to lose connection while the PCI device is attached, a bonding device is used. A bonding device is a feature of Linux which can combine 1. Get a free device with name “10GE”
Data base
CloudStack Management 2. PCI ID: 28:00.3 on agent 0
Agent 0 3. PCI ID: 28:00.3
Figure 2. Assignment of a PCI device to a virtual machine at boot up.
multiple network devices in various ways. In our case, we use the “active-backup” mode in the configuration described in Figure 3. The PCI Passthrough device is the active device and the VirtIO device is the backup device which will automatically become active when the PCI Passthrough device is detached.
Virtual Machine
VirtIO
PCI Passthrough
Host
a test is performed between a virtual machine and a different physical host. For the throughput measurement, we transmit data via single/multiple TCP connections, depending on the number of threads, for 60 seconds. TABLE I compares the performance characteristics between hardware-virtualized (PCI Passthrough), softwarevirtualized (VirtIO), and native network devices. The results show that all devices offer similar transmission throughputs (Tx), especially when simultaneous threads are used. For the reception throughput, PCI Passthrough offers a clear advantage over VirtIO, but the performance is still lower than that of the native I/O. For the latency, PCI Passthrough also offers much lower latency than VirtIO, and even native I/O. This can be because the firewall stack on the guest OS is simpler than that on the host OS. TABLE I.
Network
Figure 3. The network configuration of a virtual machine. A VirtIO device is always available whenever a hardware-virtualized device using PCI Passthrough is not available during live migration.
IV.
EVALUATION
In order to evaluate the efficiency of our proposed mechanism, we have performed two experiments. The first experiment compares the network performance of hardwarevirtualized (PCI Passthrough), software-virtualized (VirtIO) and native network devices in terms of throughputs, latency, and CPU utilization. The second experiment evaluates the migration performance of PCI Passthrough and VirtIO in terms of the migration time, number of packets lost, and number of errors in the application layer. A. Experimental Setup The cluster used for the experiments consists of six HP Z800 Workstation servers, and each is comprised of a quadcore Intel Xeon W5590 3.33 GHz, 48 GB of memory, an onboard Gigabit Ethernet interface, and a Mellanox ConnectX-2 SR-IOV-capable 10 Gigabit Ethernet interface. The servers are running Ubuntu 13.04 Server 64-bit, CloudStack version 4.2SNAPSHOT, kernel 3.8.0, QEMU 1.4.0 and libvirt 1.0.2. The KVM hypervisor was used, and virtual machine image files are stored on an NFS server via Gigabit Ethernet. We ensured that CloudStack system virtual machines were not running on the hosts being tested to avoid performance interference. Note that each value shown in the tables below was measured three times, except the latency in TABLE I, which was measured ten times. B. Network Performance For network performance, various factors were tested. The TCP throughput is measured using the iperf [7] utility, the round trip latency is measured using the ping utility and the CPU usage is measured using the top utility on the host. The CPU usage for transmission is measured because the transmission throughput is approximately equivalent and thus a fair comparison can be made. For both throughput and latency,
Criteria Tx (1 thread) Tx (5 threads) Rx (1 thread) Rx (5 threads) Latency (ping) CPU Usage (Tx)
NETWORK PERFORMANCE
PCI Passthrough
VirtIO
Native
9.17 Gbits/sec
9.38 Gbits/sec
9.17 Gbits/sec
9.39 Gbits/sec
9.38 Gbits/sec
9.40 Gbits/sec
9.29 Gbits/sec
8.93 Gbits/sec
9.16 Gbits/sec
8.79 Gbits/sec
6.89 Gbits/sec
9.22 Gbits/sec
0.152 ms
0.402 ms
0.174 ms
11.7%
16.8%
7.8%
TABLE II shows the network performance when multiple virtual machines share the same network device using SR-IOV. The maximum number of virtual machines, 50, is the maximum number supported by CloudStack on KVM. The result reveals that SR-IOV can handle a large number of virtual machines well with little to no performance degradation. The variation in throughput can be attributed to rounding error of iperf’s output. TABLE II.
SCALABILITY OF SR-IOV
Transmission Throughput (Gbits/sec)
Virtual Machines
1 simultaneous thread
5 simultaneous threads
1
9.08
9.39
2
9.40
9.40
4
9.39
9.40
8
9.36
9.36
16
9.39
9.39
32
9.38
9.35
50
9.40
9.38
C. Migration Performance Using PCI Passthrough naturally increases the overhead for migration as the device has to be detached and re-attached. The
detachment can also introduce packet loss as the interface is removed and the guest needs to switch to the softwarevirtualized interface without any prior hint. Therefore, we also evaluate the migration performance of our approach. The criteria used to measure the effect on migration were 1) migration time, 2) packets lost, and 3) the number of errors from httperf [8]. Httperf is a utility for measuring web server performance which we have used to generate real-world workloads. For the httperf test, 2000 HTTP requests were generated at the rate of 100 requests per second. During that time, only one VM, either utilizing our PCI Passthrough technique or VirtIO, was migrated from one physical host to another. TABLE III shows that the overhead of PCI Passthrough has resulted in an approximately one second increase in migration time. The number of packets lost increases as migration time increases. However, there is no increase in HTTP request error, if the client has a reasonable timeout. TABLE III.
EFFECTS ON MIGRATION PERFORMANCE
Criteria
PCI Passthrough
VirtIO
available from [11]. Experiments have shown that hardwarevirtualized network devices can bring improved network performance and lower CPU usage to virtual machines with minimal overhead in live migration. By adding PCI Passthrough support to CloudStack and allowing it to automatically track and allocate devices, the administrator is relieved from the burden of having to manually track PCI devices; and efficient automatic allocation and migration are made possible. Although this paper focuses on network devices, the proposed design can be easily extended to other I/O devices, such as storage and GPU. In order to make hardware-virtualized network devices viable for a larger user base, several tasks still remain: addressing accounting and firewall issues, as mentioned in Section V. For multiple hypervisor support, the current work focuses on the KVM hypervisor, but our future tasks include extending support to other hypervisors. Another future task is deployment of the proposed mechanism to the AIST private HPC cloud system, where support of InfiniBand devices is mandatory. As far as live migration is concerned, the exact same approach cannot be employed, since there is no InfiniBand bonding driver. Therefore, we have considered the use of a Symbiotic Virtualization mechanism [12] to migrate a virtual machine with an InfiniBand device.
Migration time
5.59 seconds
4.59 seconds
Number of packets lost
135 packets
111 packets
40 requests (2.0%)
34 requests (1.7%)
ACKNOWLEDGMENT
0 requests (0.0%)
0 requests (0.0%)
This work was done during an internship at AIST, Japan. This work was partly supported by JSPS KAKENHI Grant Number 24700040. We would like to thank CloudStack developers for discussions on the cloudstack-dev mailing list, and especially Edison Su, whose suggestion was quite helpful in deciding on our design.
httperf errors (2 seconds timeout) httperf errors (5 seconds timeout)
V.
DISCUSSION
Our experimental results lead to a positive conclusion, but we have also faced some trivial problems during our configuration as follows. First, the Linux kernel and driver support are still not stable due to a lack of widespread use. We have faced issues such as requiring specific kernel command lines or random kernel panics on the host when a guest is shutdown [9]. Second, network bridging, required for softwarevirtualized network devices, does not work with SR-IOV enabled. Work is underway in QEMU and libvirt to fix this issue [10]. In our configuration, we have used another network device for the software-virtualized interface in order to work around this issue. In order to make hardware-virtualized network devices viable for a larger user base, more work must be done on control and accounting. Currently, our implementation bypasses the firewall and accounting mechanisms of CloudStack, allowing for potential abuse. Thus, work must be done on integrating hardware-based firewall and QoS solutions with CloudStack. VI.
CONCLUSION AND FUTURE WORK
We have proposed an integration of hardware virtualization into a popular Cloud OS CloudStack. To the best of our knowledge, this is the first trial to provide users with the benefits of hardware virtualization without complicated configuration. The source code of our implementation is
REFERENCES [1]
Y. Dong et al., “High performance network virtualization with SR-IOV, ” Journal of Parallel and Distributed Computing, vol.72, no.11, pp.1471 -1480, Nov. 2012. [2] Apache CloudStack [Online]. Available: http://cloudstack.apache.org/ [3] T. Davis, Linux Ethernet Bonding Driver HOWTO [Online]. Available: https://www.kernel.org/doc/Documentation/networking/bonding.txt [4] Jiuxing Liu, "Evaluating standard-based self-virtualizing devices: A perf ormance study on 10 GbE NICs with SR-IOV support," IEEE Internatio nal Symposium on Parallel & Distributed Processing, pp.1-12, 2010. [5] E. Zhai, G.D. Cummings, and Y. Dong, “Live Migration with Pass-throu gh Device for Linux VM,” 2008 Linux Symposium, pp.261-267, 2008. [6] Z. Pan, Y. Dong, Y. Chen, L. Zhang, and Z. Zhang, “CompSC: live migr ation with pass-through devices,” ACM SIGPLAN Notices, vol.47, no.7, pp. 109-120, 2012. [7] Iperf [Online]. Available: http://iperf.sourceforge.net/ [8] D. Mosberger and T. Jin, “httperf—a tool for measuring web server perf ormance,” ACM SIGMETRICS Performance Evaluation Review, vol.26, no.3, pp.31-37, 1998. [9] P. Pornkitprasan, PROBLEM: Kernel Panic with Mellanox ConnectX-2 ( mlx4_en) card in SR-IOV mode with KVM [Online]. Available: http://w ww.spinics.net/lists/netdev/msg241117.html [10] M.S. Tsirkin, Re: PROBLEM: Bridging does not work with Mellanox Co nnectX-2 (mlx4_en) card in SR-IOV mode [Online]. Available: http://ww w.spinics.net/lists/netdev/msg241323.html [11] P. Pornkitprasan, Add PCI Passthrough Support to CloudStack on KVM hypervisor [Online]. Available: https://reviews.apache.org/r/12098/ [12] R. Takano, et al., “Cooperative VM migration: a symbiotic virtualization mechanism by leveraging the guest OS knowledge,” IEICE Transaction s on Information and Systems, vol.E96-D, no.12, pp.2675-2683, 2013.