Design of A Software-Defined Resilient Virtualized ... - IEEE Xplore

7 downloads 136 Views 787KB Size Report
the substrate network provider who provisions and manages virtual networks (VNs) for service providers. A mix of software- defined and autonomic technology ...
2015 11th International Conference on the Design of Reliable Communication Networks (DRCN)

Design of a Software-Defined Resilient Virtualized Networking Environment Xuan Liu†, Sarah Edwards‡, Niky Riga‡ and Deep Medhi† †University of Missouri–Kansas City, USA; ‡Raytheon BBN Technologies, Cambridge, MA, USA

Abstract—Network virtualization enables programmability to the substrate network provider who provisions and manages virtual networks (VNs) for service providers. A mix of softwaredefined and autonomic technology improves the flexibility of network management, including dynamic reconfiguration in the virtualized networking environment (VNE). Virtual router (VR)s run at a logical level where software failures may be more frequent. Thus, a VR failure is more frequent than a physical router failure on the substrate network. In this paper, we present a software-defined resilient virtualized networking environment where a VN topology can be restored by using a preserved standby virtual router (S-VR) after a VR failure. We illustrate a preliminary autonomic setup of a VNE on the GENI testbed.

I. I NTRODUCTION In a virtualized networking environment (VNE), the physical network that provisions computing and networking resources for virtual networks (VNs) is the substrate network. A substrate network provider has the capability to allocate substrate resources and create isolated VNs for service providers, where each VN will be allocated network resources (e.g., nodes, link capacity, etc). The network components of a VN are operated at the logical level. For example, virtual routers (VRs) are essentially virtual machines (VMs) running routing software with a particular routing protocol configured. Unlike for typical physical routers, configuration on a VR (e.g., adding/removing links, routing protocol configuration, etc.) could be achieved in a programmable manner [1], which does not require much manpower. Hence, the VNE can be designed to be programmable to allow flexible network management, and such flexibility can be improved by a mix of softwaredefined and autonomic technologies. Software-defined networking (SDN) has been a hot research topic in recent years. In SDN, network management (control plane) and packet forwarding (data plane) are separated from each other, where the control functions are deployed and implemented at a centralized server or controller to support flexible network management. In this work, we discuss a software-defined resilient virtualized networking environment, where a centralized system provides the software-based virtual network management, including VN reconfiguration to restore connections in the VNE. This work has been partially supported by US National Science Foundation under grant CNS-1217736 and cooperative agreement CNS-0737890. Any opinions, findings, conclusions or recommendations expressed in this material are the authors’ and do not necessarily reflect the views of the NSF. c /978-1-4799-7795-6/15/$31.00 2015 IEEE

978-1-4799-7795-6/15/$31.00 ©2015 IEEE

Autonomic computing has been a practical solution to manage complex distributed computer systems, and it also enables self-management in computer networks, in particular, the autonomic fault-tolerance management in a resilient networking design [2]. Since network virtualization supports isolated service delivery over reserved networking resources, service customers are expecting services with a good quality of experience (QoE), and the service providers expect a good quality of service (QoS) with the VNs that the substrate network provider leases to them. Hence, an efficient virtual network management framework is desired from the substrate network provider’s perspective. In the traditional substrate network infrastructure, node failures are not as common as link failures, and the restoration requires manual reconfiguration (e.g., connecting cables, rebooting or configuration) so most prior work focused on the link failure recovery in physical computer networks. However, a software failure may cause a virtual node failure in a VN, and a substrate node failure may affect multiple VNs concurrently. That is, node failures are likely to be more frequent in the virtualized networking environment than in the physical computer network. On the other hand, since VNs are programmable and operated at the logical level, reconfiguration may be managed automatically. [3] presents an experimental study on dynamically recovering from a singleVR failure in a VN, by providing standby virtual routers (SVRs). Inspired by IBM’s white paper about autonomic computing architecture [4], we apply the autonomic computing concept into the VN management life cycle, which involves three procedures: (i) network resource provisioning, (ii) network monitoring and (iii) dynamic network reconfiguration to recover VNs from failures. In particular, the reconfiguration process may restore VNs from node failures, link failures or congestion issues. In this paper, we propose a resilient VN reconfiguration framework to recover VNs from VR failures with S-VRs in a VNE, using software-defined autonomic management. The rest of the paper is organized as follows: Sec. II presents related works on autonomic network management and resilient SDN against node failures. In Sec. III, we present a resilient VNE design with a software-defined autonomic approach. Sec. IV presents a preliminary proof of concept. Sec. V summarizes the paper and presents future works.

111

2015 11th International Conference on the Design of Reliable Communication Networks (DRCN)

Virtual Network Plain Overlay the substrate core network

II. R ELATED W ORK

S-VRs

In virtual network management research, autonomic computing has been addressed in VN mapping and resource allocation (e.g., [5] [6] [7]) but autonomic fault-tolerance has not been discussed much in the VNE. For a resilient SDN design, the primary-backup idea has been applied to the controller placement problem in recent literatures. In [8], the core component CPRecovery is designed to support communication between the primary and backup controllers. Bari el at. [9] proposed a dynamic controller provisioning mechanism. Hock el at. [10] discussed a resilient controller placement in the core SDN network, and the proposed optimal controller placement considered various failure events such as node failures including the controller failure. A dynamic recovery mechanism from a virtual router (VR) failure in a VN was presented by identifying the closest standby virtual router (S-VR) on a virtual network testbed by taking geographic location into account [3]. A multi-criteria selection model was formulated and studied that choose optimal S-VRs to recover from VR failures for multiple VNs [11]. Since software-defined networking and autonomic networking management share some common features [12], we present in this work a design of a software-defined resilient VNE to achieve autonomic fault-tolerances, where a centralized virtual network manager (VNM) runs the multi-criteria selection algorithm presented in [11] when VR failures occur. III. F RAMEWORK D ESIGN In this section, we show the holistic design of a softwaredefined resilient VNE with autonomic node failure recovery. A. Resilient Virtualized Networking Environment First, we present the architecture of a resilient VNE. Fig. 1 presents a VNE with three VNs created over the core substrate network, and each one is considered as the core VN between its end customers. Note that the core substrate network does not contain the edge routers connecting to the end host’s domain. Within each VN, in addition to the active VRs and virtual links that are in service, a set of S-VRs and logical links are also reserved for fault-tolerance purposes. These SVRs are identical with other active VRs in terms of system configuration and software. For each VN that is ready for delivering services, the S-VRs should be potentially capable to connect every other active VR easily. For instance, as presented with dotted lines in Fig. 1, logical links can be reserved between an S-VR and other VRs, so that if a VR fails, the selected S-VR can be easily added into the existing topology. On the other hand, logical links may be created between any pair of S-VRs, in case a secondary SVR needs to be brought in and connected to the S-VR that is already connected. We design a middleware called the Virtual Network Manager (VNM) operated by the substrate network provider, and it logically resides between the substrate network and the VN plane. The VNM has both northbound interfaces and southbound interfaces. The northbound interfaces communicate with the VN plane to manage VNs including creation,

r3 VRs

r2

r6

r1

Enabled Link

r4

Disabled Link r6

r1 VN2

r2

r5 r2

VN1

r3

r1

r5

r4 VN3

r3

Northbound Interfaces VN Creation/ Configuration

VN Reconfiguration

*VN Monitoring

Get Static Information (Configuration, Node Location)

Get Dynamic Information (Resources Utilization, i.e., CPU, BW, etc.)

Provisioning Resources

Virtual Network Manager

Southbound Interfaces Substrate Nodes

User

Fig. 1.

R4

R2 R1

R3

Core Substrate Network (Single Domain)

Service Delivered via VN2

R6 R5

Servers

Dynamic Reconfiguration Scheme for Virtual Networks

configuration, reconfiguration, and monitoring. The southbound interfaces communicate with the substrate network to monitor and collect both static information and the dynamic, real-time statistics about the substrate resources. B. Virtual Network Manager The VNM is a centralized control system running a software-defined mechanism and it is designed with functional modules for autonomic management. In this subsection, we introduce the modules communicating with the VN plane and the substrate network through the VNM’s northbound interfaces and southbound interfaces, respectively. 1) Southbound interfaces: There are three modules to communicate with the substrate network . • Get static information: The VNM is able to obtain stationary knowledge (e.g., topology, the node’s geographic location, link capacity, etc.) about the substrate network from a static profile. • Get dynamic information: The VNM periodically collects dynamics on the substrate network resources (e.g., CPU utilization and interface statistics of the substrate nodes). • Network resource provisioning: Having once learned about substrate network resources, the VNM runs the VN embedding algorithm to map a new VN to the substrate network. In this framework, the VNM not only provisions regular VRs and bandwidths based on the service provider’s request, but also provides a set of SVRs with corresponding virtual links. 2) Northbound interfaces: Towards the northbound interfaces, three modules are running in an autonomic manner. • VN Creation/Configuration: When resource provisioning is done, the VNM creates profiles for the requested VN and configures the new VNs, according to the mapping results. • VN Reconfiguration: This module supports self-healing for the VNs. When a VR failure is detected by the VN Monitor, the optimal S-VR selection algorithm proposed in [11] will be triggered to select an optimal S-VR to replace the failed VR in the affected VN.

112

2015 11th International Conference on the Design of Reliable Communication Networks (DRCN)



VN Monitor: This module monitors the VN status, such as network congestions, virtual link failures, or virtual node failures. IV. E ARLY P ROOF OF C ONCEPT

As an early proof of concept for a resilient VNE presented in Sec. III, we choose to use the GENI testbed [13]. A. Testbed Background GENI is a distributed experimental testbed for networking innovation research, and it provides computing and networking resources for experimenters to built isolated experiments. For example, InstaGENI [14] is a flavor of a standard GENI rack ( i.e., aggregate) containing a consistent set of software and hardware including reservable computers, an OpenFlow data plane switch, and virtual machines (VMs). GENI resources are specified in a standard XML document called Resource Specification (RSpec). There are three types of RSpec files: advertisement RSpec, request RSpec, and manifest RSpec [13]. Specifically, an advertisement RSpec contains the static information about GENI aggregates, such as the geographic information. A request RSpec contains the resource specification desired by an experimenter for a specific slice (i.e., resource container), and the VN configuration information (e.g., VM’s interface configuration, etc.). Once a request RSpec is submitted, GENI aggregates allocate the requested resources to the experimenter’s corresponding slice, and the experimenter has full control of the reserved VMs to build experiments in his/her own slice. Given the flexibility and isolation provided by GENI, we are able to build a resilient VNE. Table I displays the relation between the proposed VNE and the GENI context. In particular, when creating a VN, the VNM assigns some VMs to be S-VRs, and the remaining VMs are labeled as active VRs. From the service provider’s perspective, the requested VN consists of only the active VRs. However, from the VNM’s perspective, the actual VN also includes the disabled logical links connecting S-VRs to active VRs and the ones between the S-VRs. Since the virtual links from an S-VR are already created, it is easy to add them to the existing active virtual topology by enabling the corresponding virtual interfaces. TABLE I VNE UNDER GENI C ONTEXT GENI Context GENI testbed Slices VMs installed with routing software GRE/EGRE tunnels or stitched links or LANs [13]

Proposed VNE Substrate Networks Virtual Networks VRs or S-VRs Virtual links

B. Autonomic Virtual Network Creation As RSpecs are important for VN creation and configuration, generating RSpecs in an automated fashion is desirable. Since GENI aggregates automatically allocate resources to the requested VN based on the request RSpec, to achieve autonomic VN creation, the first challenge is to generate the

Fig. 2.

A Segment of the Request RSpec Example

request RSpec for complex virtual topologies. Fortunately, GENI provides geni-lib [15], a python library for building scripted tools. We developed a scripted tool relying on geni-lib to create and configure arbitrary virtual topologies and generate the request RSpec automatically for experiments, to request resources from InstaGENI aggregates. This scripted tool is now distributed with geni-lib as the scaleup tool. With scaleup, we were able to: (1) create arbitrary topologies in various size consisting of Xen-VMs on InstaGENI aggregates; (2) identify different node categories by specifying the OS image, post-boot scripts to install relevant software and post-boot executable commands; (3) automatically assign IP addresses to the interfaces on each Xen-VM; (4) automatically determine the link types (e.g. EGRE tunnels between XenVMs from different InstaGENI aggregates); and (5) generate the request RSpec based on the VN configuration through the above features. Fig. 2 shows a sample request RSpec generated by scaleup. Secondly, the virtual router configuration process takes two steps: routing software installation and routing protocol configuration. In this preliminary setup, we created a customized Ubuntu 12.04 image with XORP v1.8 (i.e., a routing software supports typical routing protocols) installed, and specified this customized image as the boot image when generating the request RSpec with the scaleup tool. We configured every VR as an OSPF VR, and the OSPF configuration file is automatically generated based on the virtual interface information right after the VR is up, including those interfaces configured for inactive virtual links. Since all S-VRs are provisioned with inactive virtual links reserved to the active VRs, the OSPF protocol will be configured automatically for each S-VR at the post-boot stage as well. C. Collect Substrate Network Information In [11], a heuristic multi-criteria selection algorithm to select an S-VR to replace a failed VR in a VN was proposed . Under the GENI context, we are able to obtain two pieces of information: the geographic location of InstaGENI racks and the VM load on each server on each InstaGENI rack. The geographic information of every InstaGENI aggregate can be obtained from the advertisement RSpec. Currently there is no public data available about the CPU utilization and interface statistics of servers provisioning VMs. So for the real-time

113

2015 11th International Conference on the Design of Reliable Communication Networks (DRCN)

TABLE II S ELECTION C RITERIA FOR S-VR S ELECTION

Maryland (max –ig)

2

3

Illinois (illinois-ig)

6

1

4

5

New York (nysernet-ig)

Missouri (missouri –ig)

Wisconsin (wisconsin-ig)

8 Wisconsin (wisconsin-ig)

Fig. 3.

9 California Stanford-ig

S-VR 8 9 10

7

InstaGENI wisconsin-ig stanford-ig UtahDDC-ig

latitude 43.075 37.430 40.751

longitude -89.410 -122.170 -111.890

Dist (mi.) 0.000 1760.078 1164.032

VM Load 6% 14% 4%

10 Utah UtahDDC-ig

Preliminary Setup

monitoring data about InstaGENI aggregates, we instead used the VM load information to represent the resource utilization on the substrate node. geni-lib provides a sample script to retrieve VM load information, and we have customized it to periodically collect this information from every VR’s host on GENI. Thus, if we consider two criteria: the distance from a S-VR to the failed VR and the VM load of the S-VR’s host, a simplified version of the objective function proposed in [11] can be defined as min {λ · dist to fail + π · VM load}

(1)

where λ and π are the weight parameters to determine the emphasis of selection criterion. D. Preliminary Results For a preliminary implementation using the GENI testbed, we applied the autonomic VN creation module at the VNM to create a five-VR virtual topology between two end hosts (labeled as 6 and 7) crossing five different geographic locations, as presented in Fig. 3. Apart from the VRs actively in service, we also assigned three VRs to be the S-VRs (labeled as 8, 9, 10 in Fig. 3) from three different locations. If VR-5 fails, the VNM needs to select one of the S-VRs and activate the virtual links between this S-VR to VR-5’s neighbors, which are VR-1, VR-3, and VR-4. Table II displays an example of the geographic location information (i.e., latitude and longitude value), and the VM load information for each S-VR’s host, which was collected in Oct. 2014. When the failure detector at the VNM detects the failure at VR-5, the S-VR selection algorithm based on (1) will be triggered at the reconfiguration module. For example, in extreme cases: if the VNM considers to use the S-VR from the nearest site to the failed VR’s site, it may assign λ = 1, π = 0, so S-VR from Wisconsin (ID = 8) will be selected based on Table II. However, if the VNM considers to balance the VM load on the substrate network, it may assign λ = 0, π = 1, so the S-VR from UtahDDC will be selected, as only 4% of the VMs have been reserved. In other words, this substrate node at UtahDDC has more resources than the other two substrate nodes. V. S UMMARY AND F UTURE W ORK In this paper, we propose the design of a software-defined resilient virtualized networking environment, relying on autonomic management to dynamically restore connections with S-VRs after VR failures for one or more VNs. We presented

an early proof of concept for our design on the GENI testbed, where we focused on two of the major modules at the VNM: the VN creation and substrate network information collection, and presented a simplified composite objective based on [11]. We next plan to focus on evaluating the impact of the software-defined autonomic VN restoration from the VR failures, which implements the heuristic multi-criteria selection algorithm, on multiple VNs while focusing on the independent and dependent VR failures. R EFERENCES [1] R. Cherukuri, X. Liu, A. Bavier, J. P. Sterbenz, and D. Medhi, “Network Virtualization in GpENI: Framework, Implementation & Integration Experience,” in Integrated Network Management (IM), IFIP/IEEE International Symposium on, 2011, pp. 1216–1223. [2] R. Chaparadza, M. Wodczak, T. Ben Meriem, P. De Lutiis, N. Tcholtchev, and L. Ciavaglia, “Standardization of Resilience & Survivability, and Autonomic Fault-management, in Evolving and Future Networks: an Ongoing Initiative Recently Launched in ETSI,” in Design of Reliable Communication Networks (DRCN), 9th International Conference on the, 2013, pp. 331–341. [3] X. Liu, P. Juluri, and D. Medhi, “An Experimental Study on Dynamic Network Reconfiguration in a Virtualized Network Environment using Autonomic Management,” in Integrated Network Management (IM), IFIP/IEEE International Symposium on, 2013, pp. 616–622. [4] “An Architectural Blueprint for Autonomic Computing,” IBM, Tech. Rep., 2005. [5] M. S. Kim, A. Tizghadam, A. Leon-Garcia, and J.-K. Hong, “Virtual network based autonomic network resource control and management system,” in GLOBECOM’05. IEEE, 2005. [6] I. Houidi, W. Louati, and D. Zeghlache, “A Distributed and Autonomic Virtual Network Mapping Framework,” in Autonomic and Autonomous Systems (ICAS). 4th International Conference on, 2008, pp. 241–247. [7] C. C. Marquezan, L. Z. Granville, G. Nunzi, and M. Brunner, “Distributed Dutonomic Resource Management for Network Virtualization,” in Network Operations and Management Symposium (NOMS), IEEE, 2010, pp. 463–470. [8] P. Fonseca, R. Bennesby, E. Mota, and A. Passito, “A Replication Component for Resilient OpenFlow-based Networking,” in Network Operations and Management Symposium (NOMS), 2012, pp. 933–939. [9] M. F. Bari, A. R. Roy, S. R. Chowdhury, Q. Zhang, M. F. Zhani, R. Ahmed, and R. Boutaba, “Dynamic Controller Provisioning in Software Defined Networks,” in CNSM, 2013, pp. 18–25. [10] D. Hock, M. Hartmann, S. Gebert, M. Jarschel, T. Zinner, and P. TranGia, “Pareto-optimal Resilient Controller Placement in SDN-based Core Networks,” in Teletraffic Congress (ITC), 25th International, 2013. [11] X. Liu and D. Medhi, “Optimal Standby Virtual Routers Selection for Node Failures in a Virtual Network Environment,” in Network and Service Management (CNSM), 10th International Conference on, 2014, pp. 28–36. [12] G. Poulios, K. Tsagkaris, P. Demestichas, A. Tall, Z. Altman, and C. Destre, “Autonomics and SDN for Self-organizing Networks,” in Wireless Communications Systems (ISWCS), 11th International Symposium on, 2014, pp. 830–835. [13] M. Berman, J. S. Chase, L. Landweber, A. Nakao, M. Ott, D. Raychaudhuri, R. Ricci, and I. Seskar, “GENI: a Federated Testbed for Innovative Network Experiments,” Computer Networks, vol. 61, pp. 5–23, 2014. [14] N. Bastin, A. Bavier, J. Blaine, J. Chen, N. Krishnan, J. Mambretti, R. McGeer, R. Ricci, and N. Watts, “The InstaGENI Initiative: An Architecture for Distributed Systems and Advanced Programmable Networks,” Computer Networks, vol. 61, pp. 24–38, 2014. [15] “geni-lib.” [Online]. Available: https://bitbucket.org/barnstorm/geni-lib

114

Suggest Documents