Running Lifecycle Experiments over SDN-enabled OF ...

1 downloads 0 Views 4MB Size Report
called “DevOps” relationships. Fig. 3 shows an example topology view of different perspectives between operators and users. The operators are responsible in ...
Running Lifecycle Experiments over SDN-enabled OF@TEIN Testbed Aris Cahyadi Risdianto, Taeheum Na, and JongWon Kim School of Information and Communications Gwangju Institute of Science and Technology Gwangju 500-712, Republic of Korea {aris, thna, jongwon}@nm.gist.ac.kr

Abstract— Managing the lifecycle of service-realization experiments in Future Internet testbed environments is very challenging due to the unbounded scope of user-driven experiments. The lifecycle of experiments should be carefully designed and managed among users and operators so that tasks and responsibilities are well-defined for all experiment stages. The states of involved resources should also be appropriately managed to streamline the experiment lifecycle. In this paper, by considering these issues, we discuss the realization of lifecycle experiments over SDN (Software-Defined Networking)-enabled OF@TEIN (OpenFlow @ Trans-Eurasian Information Network). In order to partially automate the lifecycle experiments, we are working on the automated resource provisioning and experiment execution. To illustrate current progress, we demonstrate prototype lifecycle experiments that attempt to link bandwidth measurements and the automated re-installation and reconfiguration of broken resource node(s). Keywords—Lifecycle Experiment, Future Internet, SofwareDefined Networking Openflow-enabled testbed, and automated installation and configuration.

I.

INTRODUCTION

Unlike traditional network-focused testbeds, Future Internet testbeds should provide experimental networking facility without the limitations in the deployed network topology, the number of supported services, and the types of applications. Moreover, they should accommodate multiple simultaneous users by provisioning different types and amount of resources. Thus, thanks to the open and programmable nature, the emerging SDN (Software-Defined Networking) paradigm has been establishing a new foundation to enable Future Internet technology. For flexible Future Internet experiments, the construction and operation of SDN-enabled testbeds over the international R&E (research and education) network is highly important. For example, GENI [1] has a specific project to deploy GENI racks to promote the easy construction and wide expansion of SDN-enabled testbeds. Aligned with Future Internet testbed projects like GENI, in 2012, we launched OF@TEIN project [2] to build an OpenFlow-based testbed infrastructure over TEIN (TransEurasia Information Network). TEIN is now providing highspeed R&E (research and education) network that connects 20 countries in Asia and 34 countries in Europe. By leveraging TEIN4 international network connection, as shown in Fig. 1,

OF@TEIN now connects 12 sites spread over 7 countries (Korea, Indonesia, Malaysia, Thailand, Vietnam, Philippines, and Pakistan). To support computing and networking resources, several types of SmartX Racks are deployed for OF@TEIN. Among them, SmartX Rack Type B consists of four devices such as Management & Worker node, Capsulator node, OpenFlow switch, and Remote power device 1 . Network connections among SmartX Racks are divided into two separate planes: One for management and control and the other for datapath (or forwarding). The former uses common L3 IP connections while the latter employs special L2-GRE tunnels created by Capsulator nodes [2]. With the diverse resources of OF@TEIN, it is important to allow the users to conveniently go through the stages of experiment lifecycle for designing, provisioning, executing, monitoring, and finishing the experiments. Motivated by the resource-aware service composition of previous FIRST@PC effort [3] and recent GENI Experiment Lifecycle [4], we define OF@TEIN experiment lifecycle to describe how researchers carry out experiments over the shared infrastructure. This lifecycle interpretation helps OF@TEIN operator to provide the needed resources for experiments and also helps the users to understand different stages of experiments and matching support tools [5]. Note that the complete picture for experiment lifecycle can differ depending on the required functions for targeted service realization.

Fig. 1. OF@TEIN SDN-enabled testbed infrastructure. 1

Early 2014, these racks are upgraded to SmartX Rack Type B+ to merge multiple devices into one box with integrated remote management.

In case of OF@TEIN lifecycle experiments, we need to facilitate OF@TEIN operators to monitor, check, and recover the available resources before or during the experiments of users. For this, we can utilize several SDN-tied tools to manage computing/networking/FlowSpace resources for users and users can play with other tools and their own scripts. There are web-based interactive GUI tools for resource provisioning and CLI/script-based tools for automatic resource provisioning and experiment execution [2]. This paper extends our previous work [3], where the workflow-based control of experiment is conceived to design, implement, and execute the resourceaware control of media-centric service composition. We are now implementing the automated resource provisioning (i.e., installation and configuration of resources) and automating the experiment execution via easy-to-use scripting, with the goal of providing agile and efficient realization of diverse lifecycle experiments over OF@TEIN. The remaining organization of this paper is as follows. Section 2 describes the design for OF@TEIN lifecycle experiments by including the motivation, basic design, and privilege distinction between operators and users. We then discuss the preliminary prototype implementation of lifecycle experiments with use-case descriptions in Section 3. The next Section 4 illustrates how current prototype implementation is verified for specific experiment scenarios. Finally, we conclude this paper in Section 5. II.

DESIGN FOR LIFECYCLE EXPERIMENTS

A. Basic Design The specific design of OF@TEIN lifecycle experiments is proposed as shown in Fig. 2. The proposed design contains several stages that are divided into several inter-related tasks based on a specified workflow. First, we start with a design stage to describe the desired experiment and identify specific workloads and required resources. The next stage is a provisioning stage where operators need to prepare the required computing/networking/FlowSpace resources before allocating them to users. An execution stage is when the users have full control of running the experiments by utilizing easy and simple tools/scripts. The final finishing stage releases allocated resources after the house-keeping jobs for experiment archival and repeatability.

Fig. 2. Preliminary design for OF@TEIN lifecycle experiments.

B. Operator and User Perspectives In testbed environments, ideally, users want to obtain resources and run experiment independently without being restricted by operators. However, most experiments still require operators to get involved to fix any difficulties that are beyond user privileges. Thus, to enable smooth and agile realizations of lifecycle experiments, privileges (and responsibilities) should be clearly defined between operators and users during resources provision and experiment execution. In other words, users and operators should build socalled “DevOps” relationships. Fig. 3 shows an example topology view of different perspectives between operators and users. The operators are responsible in connecting multiple SmartX Rack sites via GRE-tunnel overlay networking. Users have full configuration access to most of OpenFlow switches through their own controllers under the operator FlowSpace supervision. For computing resources, the users create and utilize VMs, meanwhile the operators control the hypervisors and fix software failures with automatic installation and configuration capability. C. Scope and Focus of Paper There are many types of experiment support tools, based on either GUI or non-GUI, which are deployed in several Future Internet testbed projects. For example, there are Flack, OMF, OMNI in GENI [1], OCF in Ofelia [6], and others tools. GUI or Web-based tools are easy-to-use in the beginning, with the cost of longer configuration time. CLI or Script based tools are a bit difficult to understand with the benefit of automation. Thus, in this paper, we are motivated in supporting agile proofof-concept experiments over OF@TEIN by utilizing scriptbased automated configurations and executions. More specifically, Chef-based automation framework [7] is used for deploying and managing SmartX Rack resources by using “cookbooks” and “recipes” written in Ruby codes [8]. Also please note that, as of now, this paper only covers partial issues for provision and execution stages of OF@TEIN lifecycle experiments.

Fig. 3. OF@TEIN topology view with operator and users perspectives.

III.

PROTOTYPE IMPLEMENTATIONS FOR LIFECYCLE EXPERIMENTS

A. Automation Tools Fig. 4 describes the combination between several scripts and Chef-based tools that give flexible efficiency and agility for OF@TEIN lifecycle experiments. As mentioned before, Chef-based DevOps automation is adopted to transform resource configurations into common Ruby-based codes. Chef supports to maintain multiple versions of configuration and automate all procedures. Operator scripts are designed to check the status of all computing/networking/FlowSpace resources, either working or faulty. All faulty resources are marked as “failed” and, at the end, summarized in a status report. Then, depending on the types of failures, recovery tasks are provided either by script-based or Chef-based automation. Note that, to avoid configuration errors, currently, interactions with operators are partially arranged to confirm the recovery procedure. Chef also manages SmartX Rack resources in the same way it manages common codes. Computing resources are described as a set of boxes that are to be built and managed by a set of selected codes from cookbooks and recipes. Box-Install cookbook covers installation of resources such as hypervisors and virtual switches with specific recipes (e.g. xen-install and ovs-install). Box-Config does similar jobs for the configuration of boxes (e.g. grub, XEN, OVS, and interfaces).

B. Resource Provisioning States During resource provisioning and recovery, the operators keep tracking resource status according to following four states, as depicted in Fig. 5. 1) Box Installed and Configured State (Critical): This state represents that, into the box, OS (operating system) and additional packages are installed and network is configured with remote access, which is verified by a connectivity testing. Addtionally the box is installed with hypervisor and virtual switch daemons. Hypervisor installation is verified by checking hypervisor daemon (domain 0) and VM creation testing, if needed. Finally virtual switch daemon is verified by checking the daemon status and kernel module loading. If all verification tests are passed, we can proceed to next state. If not, we go back to previous Initial State. 2) Topology Configured State: In this state, FlowVisor [9] is running, slices can be created, and datapaths are registered. Also, flow entry are inserted based on the daemon status and topology database of FlowVisor. GRE tunnels are created and verified by checking tunnel status maintained by the operator controller. The collected check results determine overall topology status, which is summarized in the checking report. 3) Function Installed and Configured State: Each box can host various VMs (virtual machines) with different roles based on the functional requirements. For example, a small-flavor VM can be instantiated by creating specific image and configuration template. The instantiated VM can be further configured with dedicated scripts to install required application packages. 4) Resources (SmartX) Checking State (Critical): The status of all experiment resources is repeatedly checked. For example, any interruption indicated in user reports and monitoring/visibility tools can start the checking procedure. According to the checkup results, the operator begins automatic recovery by enforcing state change to selected other state. The operator may instruct to change to Box Installed and Configured State or Initial State.

Fig. 4. Tools for automated provisioning and experiment execution.

C. Automatic Control of Experiment Execution The experiment execution is divided into following smaller tasks to ease automated control. 1) Check Resources: Before running the experiment, all the required computing/networking/FlowSpace resources should be obtained and ready. Note however that users can only view and check their own allocated resources. 2) Execute Experiment: The main task is running the prepared experiment scripts while checking whether it is successfully running and producing output data. To facilitate this, the users may deploy experimenter UI [10] that visualizes both experiment execution and resource status.

Fig. 5. Provisioning for OF@TEIN experiment lifecycle.

3) Verify Experiment: This task checks any failure during experiment execution. If experiment display output and log

file indicate successful resource checking and experiment execution, this task is done and the experiment continues. In case any failure is reported, the users examine experiments and react by either repeating experiments or reporting to the operators. 4) Stop Experiment: Running experiment will stop after successful execution or by manual user interrupts. Users may resume experiments or force clean-up to restart their experiments from the beginning, i.e., by going back to Check Resources task. IV.

PRELIMINARY VERIFICATION OF LIFECYCLE EXPERIMENTS

A. Automated Bandwidth Measurement Experiment In order to illustrate our efforts in prototyping lifecycle experiments, we first explain “Automated bandwidth measurement experiment” with an iperf tool to measure the inter-site bandwidth. Note that FlowSpace resources (IP subnet), networking resources (OpenFlow switches and tunnels), and computing resources (VMs) are involved. The whole experiment is controlled by a single script, which starts from checking FlowSpace allocations for user slice and ends with executing multiple VM-to-VM iperf tests. The experiment is executed only if VMs are reachable and running the iperf server/client tool for TCP/UDP bandwidth measurements, as shown in Fig. 6. The automatic resource checking and experiment execution may be verified from the script outputs that contain the detailed resource status and from the visual output shown in the SDN experimenter UI (as shown in Fig. 7). This Java UI based on OpenFlow GUI extension project [11], developed in [10], shows the details of allocated resources, traffic flows, and experiment results. This demonstration also verifies stopping experiments, which release resources automatically by using a script to destroy VMs, removing VM configuration data, and removing FlowSpace allocation. Note that OpenFlow switch configuration and tunnels are kept untouched as it is part of operator responsibilities.

Fig. 7. SDN Experimenter UI.

B. Resource Recovery Experiment The second demonstration shows how the operators can automatically recover faulty resources after getting notified by user reports (or generated alarms). In this scenario, one SmartX box is simulated to have a fault on computing resources by uninstalling xencommons service of XEN hypervisor. When a user attempts to create new VMs, it fails with ‘The access to XEN hypervisor (Domain-0) through XEN management console is failed’ message and then this failure is reported to the operator. The operator then executes admin-check script to check all resources. During the checkup, the operator script discovers that one service for XEN hypervisor from the faulty SmartX box is not running properly. It then changes the status of faulty box to Initial State, because there is a possibility of some packages of XEN hypervisor completely being removed. The state change then initiates the installation and configuration of faulty box by using Chef box-install and boxconfig cookbooks [7]. As a safety measure, we arrange the operator to confirm the execution of recovery procedure. Finally, we repeat the checking of all resources to verify if the problem is solved. V.

CONCLUSION

The proposed automation for OF@TEIN is verified to facilitate easy and fast experimentation by leveraging automated resource provisioning and experiment execution. By fully leveraging the rapid development of DevOps tools, we are planning to continue and extend the support for lifecycle experiments by further refining more tasks of experiment stages. ACKNOWLEDGMENT This work was partially supported by one of KOREN projects of National Information Society Agency (13-951-00001). Also, this work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012R1A2A2A01014687). REFERENCES [1]

Fig. 6. Illustration of automatic bandwidth measurement experiment.

M. Berman et al., “GENI: A federated testbed for innovative network experiments,” Comput. Netw. (2014), http://dx.doi.org/10.1016/ j.bjp.2013.12.037

J. Kim et al., “OF@TEIN: An OpenFlow-enabled SDN testbed over International SmartX Rack Sites,” in Proc. APAN - NRW, Daejeon, Korea, August, 2013. [3] S.W. Han et al., “An experimental service composition tool for media-centric networked applications,” Comput. Netw. (2014), http://dx.doi.org/10.1016/j.bjp.2013.12.025 [4] J. Ohren, “Experiment Lifecycle Tools,”, in: GENI Engineering Conference 15, Texas, October, 2012. http://groups.geni.net/geni/ attachment/wiki/GEC15Agenda/ExperimentLifecycleTools/ELT -GEC15.pdf [5] V. Thomas, “Lifecycle of a GENI Experiment,” April, 2009. http://groups.geni.net/geni/attachment/wiki/ExperimentLifecycle Document/ExperimentLifeCycle-v01.2.pdf [2]

[6] [7] [8] [9]

[10]

[11]

OFELIA, “OCF - OFELIA Control Framework,” February, 2014. http://www.fp7-ofelia.eu/ocf-ofelia-control-framework/ Opscode. Chef Documentation http://docs.opscode.com/ Ruby Programming Languange. https://www.ruby-lang.org/en/ R. Sherwood, et al., "Flowvisor: A network virtualization layer," Technical Report Openflow-tr-2009-1, Stanford University, July 2009. N. Kim and J. Kim, “Building netopen networking services over openflow-based programmable networks,” in Proc. of ICOIN'11, Jan. 2011. OpenFlow Consortium, “OpenFlow GUI Extension - Display,” February, 2014. http://archive.openflow.org/wk/index.php/ OpenFlow_GUI_Extension_-_Display

Suggest Documents