PC-based Router Virtualization with Hardware Support

5 downloads 166 Views 746KB Size Report
PC-based Router Virtualization with Hardware Support. M. Siraj Rahore, Markus Hidell, Peter Sjödin. Telecommunication Systems (TS) Lab. School of ICT, KTH ...
PC-based Router Virtualization with Hardware Support

M. Siraj Rahore, Markus Hidell, Peter Sjödin Telecommunication Systems (TS) Lab School of ICT, KTH Royal Institute of Technology Stockholm, Sweden {siraj,mahidell,psj}@kth.se

Abstract—In this paper we focus on how to use open source Linux software in combination with PC hardware to build high speed virtual routers. Router virtualization means that multiple virtual router instances will run in parallel on the same hardware. To enable this, virtual components are combined in the router’s data plane. This can result in performance penalties. Furthermore, an overloaded virtual router can affect the performance of other virtual routers running in parallel. Achieving high performance and strong performance isolation in a virtualized environment is challenging. We investigate how hardware can help to achieve these goals in the Linux Namespaces environment. We propose a forwarding architecture for virtual routers based on multicore hardware where virtual routers can run in parallel on different CPU cores. This reduces resource contention among virtual routers and results in improved performance and isolation. To enable this architecture, we find that hardware based I/O virtualization support is essential. We demonstrate this by making a comparison with a software based I/O virtualization approach. We also show that hardware assisted virtual routers can achieve better aggregate throughput than a non-virtualized router on a multi-core platform. Keywords-component; virtual router, I/O virtualization, SRIOV,SoftIRQ,RSS

I.

INTRODUCTION

With the continuous growth of the Internet to new areas, services and applications, the demands increase on the ways in which we organize and manage networks. Future networks need to be flexible, support a diversity of services and applications, and should be easy to manage and maintain. One way of addressing these requirements is to virtualize routers. Router virtualization involves running several router instances on the same physical hardware, in a way that allows each instance to appear as a separate, independent router. This makes it possible to support a multitude of services, management disciplines, and protocols in parallel when running multiple different virtual routers on top of the same hardware platform. A general approach to router virtualization is to use computing virtualization techniques to run multiple operating systems as guests in parallel on the same hardware, and let each guest run one instance of the router software. In open source virtual routers, these operating

systems are based on open source software that can be combined with commodity PC hardware. Router virtualization is associated with performance penalties [1], [2], due to the fact that virtualization implies that a certain degree of overhead is introduced in terms of how packets are processed in the router. Furthermore, performance isolation is yet another challenge on a shared platform. For instance, an overloaded virtual router might affect the performance of other virtual routers running in parallel. In this regard, different virtualization approaches may have different properties. In earlier work, we have investigated the impact of various open source virtual components on the performance of virtual routers [3], [4]. We proposed an efficient way to design a virtual router using Linux Namespaces [5], which is an emerging virtualization technique currently being developed for the Linux kernel. In our current work we focus on Linux Namespaces using state of the art PC hardware. A traditional limitation with PC-based router architectures has been its forwarding performance. However, we believe that recent advances in multi-core hardware can address the issue. To do so, it is important to effectively exploit processing potential of multi-cores by running them in parallel. We believe that router virtualization can benefit from this type of platform in terms of performance and isolation. This is due to the fact that resources are dedicated for each virtual router (e.g. virtual network interfaces, routing table etc.) in a virtual environment. This should reduce resource contention among multi-cores and allow them to work efficiently in parallel. Therefore we propose a packet forwarding architecture for virtual routers over multi-core platforms. We examine how different virtual routers can be served by different cores in parallel. Our hypothesis is that this will result in high performance and strong performance isolation among virtual routers. A major component of router virtualization is I/O virtualization. It defines a mechanism to transfer a packet from a physical network interface to a virtual router and vice versa. Earlier studies show that this indirection may degrade virtual router performance significantly [2], [3], [4]. Since recently, support for I/O virtualization is provided in hardware on network interface cards (NICs) [6]. This is

known as SR-IOV (Single Root I/O Virtualization) [7]. SRIOV offloads packet handling from the system CPU makes packets directly available inside a virtual router. This is attractive from a performance point of view since it opens up possibilities for a parallelized forwarding architecture. In our study we consider both hardware assisted and software based I/O virtualizations. For software based I/O virtualization, we use macvlan devices which have shown favorable results in earlier work [3], [4]. We analyze both approaches for their suitability towards a parallel forwarding architecture. We identify limitations of the software based approach and explore how SR-IOV can be used instead. We demonstrate that SR-IOV is better not only from performance perspective but it also scales well on a multicore platform. Furthermore, it can isolate parallel forwarding paths in a better way which results in strong performance isolation among virtual routers. The rest of this paper is organized as follows: Section 2 presents general design approach for PC-based virtual routers and related work. Thereafter, Section 3 describes packet forwarding architecture both for macvlan and SRIOV based virtual routers. Section 4 presents and analyses performance measurements of virtual router configurations using macvlan and SR-IOV. Finally, Section 5 concludes the paper. II.

GENERAL DESIGN AND RELATED WORK

In this section we present a general design approach to enable virtual routers on a shared infrastructure. We also discuss about available virtualization technologies for virtual routers and relevant literature in this regard. Broadly, we divide virtualization into two categories: system virtualization and I/O virtualization. A. System Virtualization A virtual router sends and receives packets on virtual interfaces (VIF). Besides this, a virtual router is just like a physical router: is has a routing table, routing protocols, packet filtering rules, management interface, and so on. There can be multiple virtual routers running on the same hardware platform, sharing the available resources, as illustrated in Figure 1. This can be achieved using a virtualization technology. It divides the system into multiple virtual environments i.e. system virtualization. A host environment is responsible for allocating and managing resources to the virtual routers. The virtualization technology ensures resource isolation among virtual routers. Therefore one virtual router should not be able to see or access resources (e.g. the routing table) of other virtual routers. Virtualization technologies can be based on hypervisors or containers [8]. A hypervisor runs on top of the physical hardware and it virtualizes hardware resources to be shared among multiple guest operating systems. Common examples are Xen [9] and KVM [10]. In a container-based approach, operating system resources (e.g. files, system

libraries, routing table) are virtualized to create multiple isolated execution environments on top of same operating system. Each execution environment (often denoted container) may contain its own set of processes, file system, (virtual) network interfaces, routing tables etc. Common examples are OpenVZ [11] and Linux Namespaces [5].

Figure 1. General design of Linux-based virtual routers

Hypervisor-based and container-based virtualization technologies have different characteristics from a system point of view. Hypervisors incur performance penalties but offer more flexibility and better isolation among virtual instances [8]. In contrast, container-based virtualization is considered more efficient in terms of performance but less flexible, since all containers must use the same kernel. Furthermore, failures in one container may result in overall system failures affecting all other containers. In related work, different virtualization technologies have been evaluated as virtual router platforms. Xen virtualization has been investigated in detail [12] and it was found that Xen can achieve a considerable forwarding rate in the privileged domain, but that guest domain forwarding results in poor performance. It has also been demonstrated how to make efficient use of multi-core commodity hardware for virtual routers using Xen and OpenVZ [13]. VINI [1] is another example using User-Mode Linux (UML) for router virtualization. B. I/O Virtualization Virtual network interfaces is an important part of a virtualization technology. A virtual interface is similar to a physical interface, but completely implemented in software. Examples include virtual Ethernet interface (veth) and macvlan interface (both are the part of Linux kernel 2.6.x). These interfaces operate at layer 2 with a unique MAC address for each interface. It is possible to create multiple virtual interfaces on top of a single physical interface and bind them to different virtual routers, as shown in Figure 1. Even though a virtual router communicates over virtual interfaces, its purpose is often to process packets that appear on the physical interfaces in the host environment. This means that a mechanism is required to redirect packets between physical and virtual interfaces. This redirection introduces a layer of indirection that is not present in a

physical router. This is known as I/O virtualization (as shown in Figure 1). The functionality can be implemented in several ways; broadly it can be divided into two types: • Software based I/O Virtualization • Hardware assisted I/O Virtualization 1) Software based I/O Virtualization: One common configuration is to use the regular software bridge module in the Linux kernel to interconnect the interfaces [4], [12]. The software bridge provides a generalpurpose switching function that allows packets to be switched between interfaces (virtual or physical) based on MAC addresses and MAC address learning. This solution is general in the sense that it allows packets to be switched between any pair of interfaces, and it is an attractive solution being based on a well-known software component already available in the kernel. However, for the purpose of redirecting packets between virtual and physical interfaces, it introduces a considerable amount of overhead. This, in turn, incurs performance penalties, something that was investigated in previous work [4]. Similar solutions that have been used are the virtual switch [14] and the short bridge [2]. It is also possible to use, for example, IP routing and Network Address Translation (NAT) for traffic to and from virtual machines, but those are not suitable for virtual routers. A promising solution from a performance point of view is to replace the software bridge with a multiplexing/demultiplexing module as available with macvlan devices (private mode). It maintains a MAC address table for physical to virtual address mapping. This is a more restricted solution since it does not support switching between virtual routers, but it is potentially more efficient, since it allows packets to be moved between physical and virtual interfaces with less overhead [4]. 2) Hardware assisted I/O Virtualization: Hardware-assisted I/O virtualization offloads packet handling onto the NIC. The NIC supports virtual interfaces (through the Virtual Function interface as described below) and is able to directly deliver a packet to a virtual interface without intervention from the system CPU. This should result in better performance. Single Root I/O Virtualization (SR-IOV) is an industry standard to provide hardware-assisted I/O virtualization. PCI-SIG has released SR-IOV specifications [7], providing guidelines for hardware vendors to share a network device among many virtual machines. It allows a single physical PCIe device to create multiple PCIe instances, known as Virtual Functions (VFs). A VF interface provides an Ethernet-like interface. Only a limited number of VFs are supported on top of a physical interface. For instance, the Intel 1Gbps NIC [15] supports 8 VFs per port whereas the 10Gbps NIC [6] supports 64 VFs per port. There are some examples where SR-IOV is proposed with hypervisor-based technologies. For instance, SR-IOV based architecture is proposed for Xen [16]. Another study evaluates SR-IOV and proposes performance optimization

in hypervisor based environment [17]. It is shown in [18] that SR-IOV can achieve better throughput as compare to native KVM drivers for I/O virtualization (virtio). Another work presents architecture to share SR-IOV devices among multiple computers [19]. In our work we focus on using SR-IOV with Linux Namespaces. According to our knowledge there is no previous work on using SR-IOV with container-based virtualization. Earlier work [4] indicates that Namespaces has low virtualization overhead and can give good performance. Our approach is to combine it with SR-IOV to further minimize virtualization overhead. Apart from this, SR-IOV also provides packet classification features in hardware (details in next section), which can be helpful to improve isolation among virtual routers. III.

VIRTUAL ROUTERS FORWARDING ARCHITECTURE

In this section we propose a packet forwarding architecture for virtual routers. We present a packet forwarding architecture both for software-based and hardware assisted I/O virtualizations and make a comparison. For the software-based approach we consider macvlan devices (a modified version [3]) and use SR-IOV for the hardware-based approach. Our forwarding architecture is based on multi-core hardware equipped with multi-queue NICs (receive queues as well as transmit queues). Multi-core hardware is attractive to perform packet forwarding tasks in parallel for multiple virtual routers. However, in order to exploit packet processing performance benefits of a multi-core system, a multi-queue NIC is essential. Such a NIC is specifically designed to make efficient use of the processing power of multi-core systems. It is able to distribute incoming network traffic on several RX queues, and each queue can be served by a different CPU core. In this way multiple CPU cores can process incoming packets in parallel. Similarly multiple TX queues can be used for multiple packet transmissions in parallel. Such parallelization looks more suitable for virtual routers than a regular non-virtualized IP router. This is due to the fact that resources are partitioned in a virtual environment and each virtual entity has dedicated resource. It provides more independence to a multi-core system to work in parallel. For instance in a regular router, all packets would traverse the same data path and share data structures such as the forwarding table. In contrast, each virtual router has its own routing table and different cores could access multiple routing tables at the same time. Multi-queue NICs provide different methods to select RX queue for an incoming packet [6]. A first method, which is the default, is hash computation (known as Receive Side Scaling or RSS), where a hash is computed based on source and destination IP addresses. The hash is used to select the RX queue. Thus, all packets belonging to the same IP flow will use the same RX queue. The second option is to use SR-IOV, which selects RX queue based on destination MAC address. Finally, a third option is to use Flow Director

a hardware feature that allows queue selection based on different packet header fields including VLAN header, source/destination IP address and port number. In the subsequent sections, we first analyze packet forwarding path for a macvlan setup. We investigate how multi-core systems can support multiple virtual routers. We highlight its potential drawbacks and explore how SR-IOV can be useful to address these issues. A. Macvlan based Virtual Routers When a packet is received on a physical interface with the default configuration using RSS, the packet is placed in an RX queue after IP header hash computation, as shown in Figure 2. The packet is then transferred to main memory using the RX ring. The RX ring is a buffer of packet descriptors to handle incoming packets and each descriptor points to a unique memory location. Once a packet is available in memory, the RX queue generates an interrupt to notify the CPU of a packet receive event. The CPU schedules a SoftIRQ (software interrupt/kernel thread) to perform the packet processing task later and terminates the hardware interrupt. When the SoftIRQ occurs, the host environment (Figure 1 ) checks the packet in order to identify the virtual router to which the packet should be redirected, and makes the packet available to the virtual router on one of its virtual interfaces (ingress macvlan). This is referred to as device mapping (Figure 2).

Figure 2. Forwarding architecture for macvlan-based virtual routers

Inside a virtual router the forwarding decision is taken and the next hop is determined. The packet is placed on an outgoing virtual interface (egress macvlan). After this, outgoing physical interface is identified and the packet is placed in the corresponding transmission queue (known as Qdisc). The NIC’s DMA engine then fetches the packet from host memory using the TX ring of the egress interface. In the following, we will analyze the scaling of such forwarding paths on a multi-core platform. For this purpose we consider a CPU with four cores and two physical network interfaces. Figure 2 describes the forwarding architecture for such a setup. Incoming network traffic is distributed among four RX queues. Each RX queue is

served by a different CPU core. Similar configurations are made among TX queues for outgoing traffic. We create four macvlan-based virtual routers on top of this physical setup as shown in Figure 2. In this way, the setup provides four parallel forwarding paths. Our idea is that the forwarding paths should be able to scale in terms of performance by adding more CPU cores and interface queues. It can also be observed in Figure 2 that incoming traffic for a virtual router may be received on any of the available RX queues. In other words one RX queue can receive packets belonging to different virtual routers. This is due to the fact that incoming packets are distributed among RX queues based on the IP header hash value (i.e. IP flows). As a result one RX queue can receive traffic for multiple virtual routers. Since RX queues are bound to CPU cores, each CPU core can be involved to serve many virtual routers. We are thus unable to dedicate a particular CPU core to serve a particular virtual router. This might be a concern when considering CPU core isolation among virtual routers. To address isolation issue, flow director might be one possible solution. It can classify network traffic based on VLAN header, different VLAN IDs may correspond to different virtual routers. However from performance perspective, this approach does not provide any hardware assistance. B. SR-IOV based Virtual Routers When using SR-IOV, a packet received on a physical interface is passed to a layer 2 switch as shown in Figure 3. Based on the destination MAC address the packet is placed in a specific RX queue. The RX queue is reserved for a particular VF, which is associated to a virtual router. The NIC initiates a DMA action and moves the packet directly to that VF memory area. The VF generates an interrupt to notify a packet receive event. The CPU schedules a SoftIRQ to dispatch packet processing. When the SoftIRQ occurs, the forwarding decision is taken and next hop VF is determined. The packet is placed on the Qdisc of that VF. The NIC is now able to directly fetch the packet from the VF memory area and place it in a TX queue which is reserved for that VF. Finally, the packet is transmitted onto the wire. The forwarding path is different from the earlier setup in many ways. The first difference is the packet classification scheme, which here is based on the destination MAC address. Secondly, each VF has reserved queues (RX and TX). These two features are attractive when it comes to providing traffic isolation between virtual routers. This is due to fact that each VF has a unique MAC address and belongs to a specific virtual router. Thirdly, the CPU only needs to process a packet when it is available on a VF. This means there is no need to perform any physical/virtual device mapping. Our hypothesis is that this should make the forwarding process faster and result in improved performance. For a multi-core platform, we consider the same hardware setup as described earlier. We have a CPU with

four cores and two network interfaces, and configure a CPU core for each VF so that all VFs belonging to one virtual router are served by the same CPU core. Similarly, VFs of other virtual routers can be served by other cores as shown in Figure 3. In this way, this setup allows to completely slice the forwarding path of a virtual router and dedicate a CPU core to serve that particular virtual router. The advantages with this are that we expect to isolate virtual routers from each other, and thereby reduce contention for resources and improve caching performance.

device under test (DUT) and is forwarded towards a destination machine (known as sink), as shown in Figure 5. The hardware used for traffic generator and sink, is based on the Intel Xeon Quad Core 2.26 GHz processor with 3GB of RAM. The virtual router hardware (DUT) is an AMD PhenomII Quad Core 3.2GHz processor with 4GB of RAM. Each machine is also equipped with one 10 Gbps dual-port NIC with the Intel 82599 GbE controller. The DUT is running the Linux net-next 2.6.37-rc8 kernel with namespace options enabled.

Figure 5. Experimental Setup

Figure 3. Forwarding architecture for SR-IOV-based virtual routers

IV.

EXPERIMENTAL EVALUATION

This section presents an evaluation of macvlan and SRIOV based virtual router platforms from different perspectives, including throughput, scalability and isolation.

Figure 4. IP forwarder configuration

We relate the performance of a virtual router to regular IP forwarding in a non-virtualized Linux based router. Throughout this section, the latter is denoted “IP Forwarder”, and we use it as a reference to study the effects of applying virtualization. In our test setup IP forwarder simply forwards packets from one physical interface to other after IP protocol handling. The setup is shown in Figure 4. A. Experimental Setup We adopt a standard method to examine the performance of a router in conformance with RFC 2544 [20]. A source machine generates network traffic that passes through a

As traffic generator, we use pktgen [21], which is an open source software operating as a kernel module to achieve high packet rates. On the receiver side, we use pktgen with a patch for receiver side traffic analysis [22] as the traffic sink. In all tests we generate64 byte packets to expose the DUT to maximum load. The throughput is measured in kilo packets/ second (kpps). The maximum offered load is 5500kpps, which is the highest load that our sink can receive without dropping packets. We generate traffic with 100 different UDP flows and distribute these uniformly over all virtual routers running on the DUT. B. Throughput Results We start with a simple scenario; A DUT with two physical interfaces forwards packets from one interface to another. A virtual router is configured with two virtual interfaces; one virtual interface is connected to the physical ingress interface while the other virtual interface is connected to the physical egress physical. For macvlan setup, the experimental configuration is according to Figure 2 whereas for SR-IOV it is based on Figure 3. As a first step, a single CPU core is used. The maximum achieved throughput can be seen in Figure 6. The nonvirtualized IP forwarder reaches 1400 kpps, the macvlan setup achieves 1100 kpps, whereas SR-IOV obtains 1217 kpps. It shows there is a certain degree of virtualization overhead for both virtual router setups compared to the nonvirtualized IP forwarder. However the overhead is lower for SR-IOV. We continue by adding another CPU core and a virtual router to the DUT. Figure 6 now shows the aggregate throughput of both virtual routers. We observe that SR-IOV scales in a much better way compared to the other configurations. We notice that throughput increases for all setups but the increase is more significant for SR-IOV. We also note that the difference between IP forwarder and SRIOV becomes very marginal. When we introduce a third CPU core and a virtual router, SR-IOV exceeds both IP

forwarder and macvlan setups. Finally, with four CPU cores SR-IOV finishes with a significant throughput increase (32.33%) compared to macvlan and smaller increase (7%) compared to the IP forwarder. We expect better throughout for SR-IOV than for macvlan. This is due to the fact that SR-IOV offloads some (virtualization related) packet processing tasks to hardware as described in section II. However it is interesting to see that SR-IOV also obtains better throughput than the nonvirtualized IP forwarder. Generally some virtualization overhead is expected for a virtual router, resulting in lower performance than for a non-virtualized IP forwarder [3], [4]. Figure 6 also confirms this when we use one or two CPU cores. However for more CPU cores SR-IOV achieves better throughput. This indicates that SR-IOV based virtualization lends itself better to parallelization than the IP forwarder. For a non-virtualized setup (e.g. IP forwarder) it is more likely that many cores share the same resource. In that case resource contention among cores may result in slow performance. On the other hand virtualization allows dedicating virtual resources to different entities e.g. each name space with a dedicated routing table. It reduces resource contention and provides more parallelism with multi-cores.

TABLE 1. CPU USAGE (NUMBER OF CORES=4)

Application/Symbol Name

CPU Usage (%) IP Forwarder

SR-IOV

16.68

-

Ixgbevf (VF device driver)

-

11.28

__alloc_skb

9.74

6.74

__slab_free

5.35

3.69

__kmalloc_node_track_caller

3.43

2.32

__netif_receive_skb

3.36

2.72

Ixgbe (physical device driver)

C. Scalabilty Results In the throughput tests we ran only one virtual router per CPU core. However, in practice there are many scenarios where it is desirable to run more virtual routers than there are CPU cores available. In that case, one CPU core will be shared by several virtual routers. In the scalability tests, we study the impact of this sharing in terms of aggregate throughput for the virtual routers.

Figure 7. Throughput test for multiple virtual routers

Figure 6. Throughput vs. CPU Cores

In order to investigate why SR-IOV gives better performance, we have done CPU profiling using oprofile [23]. We make a comparison between IP forwarder and SRIOV. The software components with high CPU usage are reported in Table 1. We find major differences in network device drivers. The IP forwarder uses the ixgbe driver for ordinary Ethernet devices. In contrast SR-IOV uses the ixgbevf driver (virtualization extensions for ixgbe) for VFs. It is shown in Table 1 that ixgbevf uses 5.4% less CPU cycles than ixgbe when four cores are used. However we also observed that ixgbevf consumes 1.5% extra cycles than ixgbe with one core (results are not shown here). This indicates that the virtual driver allows better parallel processing on a multi-core platform than a non-virtualized driver. Similarly, we also notice (Table 1) that SR-IOV consumes less CPU cycles for symbols/functions inside kernel. This is an expected outcome of better parallelism.

We start with a single CPU core and then gradually increase the number of virtual routers. The 100 different UDP flows are now distributed uniformly over all virtual routers. We observe that an increasing number of virtual routers results in a certain degree of throughput drop for both set-ups. However, the impact is marginal for SR-IOV compared to the macvlan setup, as shown in Figure 7. With sixteen virtual routers running in parallel, SR-IOV based setup reports 11.3% throughput drop as compare to 1VR. For a macvlan setup, drop is 30%. We repeat the test with all four CPU cores. It can be seen in Figure 7 that we run eight virtual routers for both setups without any throughput drop. For more than eight virtual routers, some throughput drop can be observed. For sixteen virtual routers, throughput drop is 2.3% for SR-IOV and 6.4% for macvlan compared to 1VR per CPU core (i.e. 4VRs). We draw two conclusions from these results. Firstly, we observe that scalability in terms of aggregated throughput

for an increasing number of virtual routers improves with the number of CPU cores. This is observed for both virtual setups. The result can be explained by the architectural properties presented in Figure 2 and Figure 3. When a CPU core is added, one more forwarding path is available. More paths will reduce resource contention among virtual routers and improve performance. Secondly, SR-IOV exhibits better scalability than macvlan. This is due to the fact that the SR-IOV based architecture offloads some packet handling to hardware. It results in less CPU cycles for packet forwarding in SR-IOV compared to the macvlan setup. The additional processing resource can be used to serve more virtual routers. Furthermore, we observed in the throughput tests that the SR-IOV VF driver (ixgbevf) better lends itself to parallel processing compared to the physical driver (ixgbe) for Ethernet devices. The macvlan devices are created on top of Ethernet device. In this way ixgbe is also part of macvlan setup. Accordingly, the limitations of the physical setup would automatically be inherited by the virtual setup. D. Isolation Results We are sharing physical resources among virtual routers. In such an environment, it is important to understand the implications of resource contention among virtual routers. For instance, we would like to know how an overloaded virtual router might affect the performance of other virtual routers running in parallel. We refer to this as isolation properties. TABLE 2. ISOLATON TEST WITH TRAFFIC OVERLOAD

Setup Macvlan

SR-IOV

Packet rate (kpps) VR1 800 1600 2400 3200 800 1600 2400 3200

Offered load VR2 Total 800 1600 800 2400 800 3200 800 4000 800 1600 800 2400 800 3200 800 4000

VR1 800 990 1054 1140 800 1152 1157 1162

Throughput VR2 Total 800 1600 660 1650 676 1730 690 1830 800 1600 800 1952 800 1957 800 1962

For our isolation experiments we use the same physical setup as described earlier in the throughput tests. We have two network interfaces at the DUT where one is ingress and the other is egress interface. We use two CPU cores and create two virtual routers. We offer a load of 800 kpps towards each virtual router in parallel (on the ingress physical interface). We observe an aggregate throughput of 1600 kpps for both SR-IOV and for the macvlan configurations (Table 2). This is the expected behavior for both setups. At this point, we gradually overload VR1 and study the impact on the performance of VR2. The offered load for VR1 is increased up to 3200 kpps while it remains at 800 kpps for VR2. The results are shown in Table 2. We first notice that the overall throughput is better for SR-IOV.

Secondly, SR-IOV demonstrates better isolation than macvlan. For SR-IOV, VR2 is not affected at all. In contrast, performance degradation is observed for macvlan. The explanation to this difference is related to how the RX queues are used. For macvlan, the RX queues are shared between the two virtual routers. When one virtual router is exposed to very high traffic rates, the RX queues overflow and packet drops are observed for both virtual routers on the ingress physical interface. In contrast, for SR-IOV each virtual router has a dedicated RX queue. An overloaded virtual router can only affect its own RX queue, and no impact will be observed on the RX queue of other virtual router.

Figure 8. Isolation test with CPU Bombs

In the above experiment a virtual router is overloaded with high packet rates. We also consider another overload scenario. We overload one of the CPU cores used by VR1. We do this by running a CPU intensive application (called CPU Bomb [24] with continuous integer arithmetic operations) on VR1. We make sure that this bomb only uses one CPU core and does not consume any resources from the other core. We offer a load of 800 kpps on each virtual router in parallel. The results are shown in Figure 8. For a single instance of CPU bomb, we see throughput drops for both virtual routers in the macvlan setup. However there is no throughput degradation for the SR-IOV setup. We gradually increase the number of CPU bomb instances on VR1 to offer more stress. When we increase up to ten instances, VR1 throughput drops to 750 kpps for SR-IOV. However VR2 is still achieving 800 kpps. In comparison, the results are totally different for the macvlan setup. Throughput degradation is observed for both virtual routers, and the situation becomes worse when adding more CPU bombs. We could only achieve 550 kpps for each virtual router with ten CPU bombs running. The superior isolation properties for SR-IOV can be explained by its architecture which allows binding a virtual router to a specific CPU core. In this way, the impact of that core is limited to that virtual router. In contrast, the macvlan architecture shares all available cores for all virtual routers

and single core overload can affect the performance of all virtual routers. V.

CONCLUSIONS AND FUTURE WORK

In this paper, we have evaluated the performance of virtual router platforms based on Linux Namespaces. Our focus here is on I/O virtualization which is a major component of router virtualization. We consider both software (macvlan) and hardware (SR-IOV) solutions on a multi-core platform. We analyze the virtual router’s data plane with macvlan interfaces. We examine how data planes can be combined with a multi-core platform in order to provide parallel forwarding paths. We observe that this setup prevents the possibilities to dedicate a CPU core to a virtual router, something which might be required to provide isolation among virtual routers. We propose an architecture based on SR-IOV. There are two main benefits with this architecture. Firstly, it uses MAC address based packet classification which is suitable to isolate traffic for different virtual routers. Secondly, it offloads some packet processing tasks to hardware and packets are directly available inside a virtual router. We compare the performance of macvlan and SR-IOV based virtual routers, and find that higher forwarding performance is achieved using SR-IOV. We also observe that SR-IOV scales better with the number of CPU cores used. With four CPU cores, SR-IOV achieves almost 32.33% better throughput than macvlan. Furthermore, we find that SR-IOV also results in better throughput than a nonvirtualized IP forwarder. Our profiling results indicate that SR-IOV virtualization makes better use of multi-cores. Furthermore, we run up to sixteen virtual routers in parallel both for SR-IOV and macvlan. With four CPU cores, we observe only 2.33% throughput degradation for SR-IOV and 6.43% for macvlan. In addition, the SR-IOV based architecture provides performance isolation between virtual routers which cannot be achieved with the macvlan setup. In future work, we plan to evaluate SR-IOV in a full virtualization environment (e.g. KVM) and make a comparison with Linux Namespaces.

[3] [4]

[5] [6] [7]

[8]

[9] [10] [11] [12]

[13]

[14]

[15] [16]

[17]

[18]

[19]

ACKNOWLEDGMENT

[20]

We would like to thank Robert Olsson from TSLab KTH for his valuable input during the work.

[21] [22]

REFERENCES [1]

[2]

Bavier, N. Feamster,M. Huang, L. Patterson and J. Rexford. “In VINI Veritas: Realistic and Controlled Network Experimentation”, In SIGCOMM’06: Proceedings of ACM SIGCOMM 2006 Conference, September 11-15, 2006, Pisa, Italy. S. Bhatia, M. Motiwala, W. Muhlbauer, V. Valancius, A. Bavier, N. Feamster, L. Peterson, and J. Rexford, “Trellis: A Platform for

[23] [24]

Building Flexible, Fast Virtual Networks on Commodity Hardware” ACM ROADS’08, Madrid Spain, December 2008. S. Rathore, M. Hidell, P. Sjödin, “Data Plane Optimization in Open Virtual Routers” IFIP Networking 2011, Valencia Spain, May 2011. S. Rathore, M. Hidell, P. Sjödin, “Performance Evaluation of Open Virtual Routers” IEEE GlobeCom workshop on future Internet , Miami USA, December 2010. Linux Namesapces, http://lxc.sourceforge.net/index.php/about/kernelnamespaces/ Intel 82599 10GbE Controller Datasheet, rev 2.3, Intel corporation, 2010. PCI-SIG: PCI-SIG Single Root I/O Virtualization Specifications. http://www.pcisig.com/specifications/iov/single_root/, last accessed August, 2011 S. Soltesz, H. Poltz, M. Fiuczynski, A. Bavier and L. Patersson, “Container-based Operating System Virtualization: A Scalable, Highperformance Alternative to Hypervisors”, in EuroSys’07: Proceedings of the 2nd ACM EuroSys Conference, March (21-23) 2007. Xen project, http://xen.org/ KVM project, http://www.linux-kvm.org/page/Main_Page OpenVZ project, http://wiki.openvz.org/Main_Page N. Egi, A. Greenhalgh, M. Handley, M. Hoerdt, L. Mathy, and T. Schooley, “Evaluating Xen for virtual routers,” in PMECT07, August 2007. N. Egi, A. Greenhalgh, M. Handley, M. Hoerdt, L. Mathy, “Towards High Performance virtual routers on commodity hardware”, ACM CoNext December 2008. Ben Pfaff, Justin Petit, Teemu Koponen, Keith Amidon, Martin Casado, Scott Shenker “Extending Networking into the virtualization layer,” ACM Sigcomm HotNets, September 2009. Intel 82576 1GbE controller producet specifications http://download.intel.com/design/network/ProdBrf/320025.pdf Y. Dong, et al. “SR-IOV Networking in Xen: Architecure, Design and implementation” 1st Workshop on I/O Virtualization, San Diego USA 2008. Y. Dong, X. Yang, X. Li, J. Li, K. Tian, H. Guan, “High Performance Network Virtulizatoin with SR-IOV” in IEEE Internation Symposium on High Performance Computer Architecure, Banglore India, 2010. J. Liu, “Evaluating Standard-Based Self-Virtualizing Devices: A Performance Study on 10 GbE NICs with SR-IOV support” in IEEE International Symposium on parallel and distributed processing” in Atlanta USA, 2010. J. Suzuki, Y.Hidaka, J. Higuchi , “Multi-Root Share of Single-Root I/O Virtualization (SR-IOV) Compliant PCI Express Device “ in IEEE Symposium on High Performance Interconnects”, Calafornia USA, 2010. RFC 2544 “Benchmarking methodology for interconnecting devices”, http://tools.ietf.org/html/rfc2544, last accessed April, 2010 R. Olsson “pktgen the Linux packet Generator”, Proceedings of the Linux Symposium Vol.2, pp. 11-24, Ottawa Canada, July 2005. D. Turull, “”Open source traffic analyzer” Master’s thesis, KTH Information and Communication Technology, 2010. Availabe at http://tslab.ssvl.kth.se/pktgen/docs/DanielTurull-thesis.pdf Open source profiling tool http://oprofile.sourceforge.net/news/,last accessed Septemberl,2011 Isolation Benchmark Suit http://web2.clarkson.edu/class/cs644/isolation/design.html, last accessed August, 2011

Suggest Documents