We outline new software to orchestrate end-to-end connections over multi-layer networks, coordinated with. Eucalyptus clouds and other resources at the edge.
Cloud Network Infrastructure as a Service: An Exercise in Multi-Domain Orchestration Jeff Chase Aydan Yumerefendi Duke University Ilia Baldine Yufeng Xin Anirban Mandal Chris Heerman Renaissance Computing Institute (RENCI) David Irwin University of Massachusetts Amherst Abstract Cloud computing is now a successful and wellunderstood example of the Infrastructure as a Service (IaaS) model. This paper explores how to extend IaaS clouds to other kinds of substrate resources beyond servers and storage, and to link these elements together in a coordinated, multi-provider “web” of cloud infrastructure services. The vision is to enable cloud applications to request virtual servers at multiple points in the network, together with bandwidth-provisioned network pipes and other network resources to interconnect them. We outline new software to orchestrate end-to-end connections over multi-layer networks, coordinated with Eucalyptus clouds and other resources at the edge. We present results from a demonstration experiment with the prototype, and discuss various architectural challenges arising in multi-domain cloud computing with dynamic circuit networks.
1
Introduction
EC2 and other server clouds follow an “Infrastructure as a Service” (IaaS) model, in which the cloud customer rents virtual servers and selects or controls the software for each virtual server instance. Cloud computing is now a successful and well-understood example of IaaS. For example, clouds are gaining acceptance as a simple and powerful vehicle to scale up computing power for science [9, 19]. This paper explores how to extend the IaaS vision to enable coordinated access to diverse resources from multiple autonomous resource providers. For example, cloud users may wish to spread their usage across multiple cloud providers to improve scaling or control dependency risk, or for geographic dispersion. We also advocate extending the cloud abstraction to other kinds of substrate resources beyond servers and storage, including cloud networks.
Several efforts are building support for dynamic circuits on national-footprint multi-layer networks (National Lambda Rail, Internet2, ESNet), including interdomain circuits that span more than one of these networks. Technologies to virtualize networks continue to advance beyond the VPLS/VPN tunneling available to the early cloud network efforts [11, 18, 23, 25, 21, 1, 12]. Advanced multi-layer networks offer direct control of the network substrate to instantiate isolated virtual pipes, which may appear as VLANs, MPLS tunnels, or VPLS services at the network edge. As a first step to linking these resources to clouds, we have developed a software prototype to instantiate dynamic circuits in tandem with virtual machine instances to interconnect cloud applications across multiple cloud sites and domains. The prototype includes extensions to Eucalyptus, a commercially supported open-source cloud infrastructure service. It uses the ORCA orchestration and network control software (Open Resource Control Architecture), which derives from almost a decade of research in networked clouds [7, 13, 17, 22, 6, 5, 4, 14]. It adds plug-in control modules for ORCA to interface to various substrate providers, including Eucalyptus cloud sites, NLR’s Sherpa FrameNet service, and the Breakable Experimental Network (BEN), a metro-scale optical network testbed operated by RENCI in North Carolina. Our vision is to enable cloud applications to request virtual servers at multiple points in the network, together with bandwidth-provisioned network pipes and other network resources to interconnect them. The GENI initiative (Global Environment for Network Innovation, funded by the US National Science Foundation) is pursuing a similar vision for a specific use case: linking testbeds for research in network science and engineering. The principal goal of GENI is to enable researchers to experiment with radically different forms of networking within private isolated “slices” of shared testbed resources offered by a federation of providers. A GENI slice gives its owner control over a set of virtu-
alized substrate resources allocated from the providers, which may include programmable network elements, mobile/wireless platforms, and other infrastructure components as well as virtual servers, storage, etc. The slices are built-to-order for the needs of each experiment through a GENI control framework. ORCA is a candidate control framework for GENI. The GENI vision and development effort is an important direction for the future of cloud computing. Our project is one of several GENI Spiral 2 projects linking server clouds into GENI. More importantly, GENI addresses key challenges for federating clouds and deploying cloud applications across multiple infrastructure providers, and extending the infrastructureas-a-service vision to orchestrated control of pervasive virtualization—flexible, automated configuration and control of distributed cyberinfrastructure resources. This paper outlines some design challenges and choices of the software, and reports on a demonstration experiment. Following GENI, we refer to the virtual resources assigned to a distributed cloud application as its slice of the shared substrate [13, 8]. The resources in the slice may be obtained from multiple providers, including but not limited to cloud providers and network transit providers. The control and orchestration framework must link or “stitch” these elements into a slice with suitable end-to-end connectivity, and coordinate allocation of resources from the shared substrate across competing workloads (e.g., co-scheduling). In the experiment, ORCA services coordinate co-scheduling, dynamic instantiation, and interconnection of virtual server instances and dynamic network circuits to create a seamless end-to-end slice with a VLAN spanning multiple clouds. Our approach uses a semantic web ontology to describe the cloud substrates and the elements of each slice. These declarative representations expose sufficient information to enable a general-purpose substrateindependent core to automate slice stitching across multiple autonomous providers. An orchestration server (an ORCA slice manager) consumes these representations and drives stitching by sequencing the flow of secure tokens among ORCA servers operated by the providers. When the slice is ready, it launches the application into the slice. The orchestration server requires no special privilege: it may be controlled by the customer or operated as a service by a third party.
2
sources, and when, and on what terms, often including assurance of predictable service quality levels needed for certain mission-critical customers. The customer has full control over how it uses those virtual resources once the provider assigns them to the slice. Virtualization offers some degree of containment and isolation (safety and privacy) for slices hosted on a shared substrate. For example, VM instances can be sized for an application and loaded with an operating system and application stack selected by the user. There is a rich set of tools to construct these packaged software stacks as virtual appliance images, and a wide range of prepackaged images are available from third parties. Once an image exists, users can deploy it at wide scale on cloud resources from multiple providers, without requiring those providers to support the specific software stack. Users may change their software stacks without involving the cloud providers or affecting other users. Similarly, built-to-order virtual networks are suitable for provisioning flexible packet-layer overlays using IP or other protocols selected by the owner. IP overlays may be configured for secure isolation, or linked with routed connections to the public Internet through gateways and flow switches. VM instances in the cloud can plug into multi-layer networks at any layer: different layers offer different services, quality-of-service profiles, and isolation properties. Amazon’s recently introduced Virtual Private Cloud service is an example of the power of linking configurable networks to the cloud. These common benefits of virtualization, and the opportunities to link virtual resources together, motivate us to take a comprehensive view of IaaS spanning multiple cloud providers and substrate types. This view raises many interesting new architectural issues. How to describe, package, and deploy future multi-domain cloud applications? User-driven tools may select the target for each component at deployment time; this late binding to the target cloud may require the tool to modify and/or register images before launch. Orchestration software is needed to federate clouds across domains, coordinate image registration, resource allocation, stitching, launch, monitoring, and adaptation for multi-domain cloud applications. Doing this right requires solutions for identity, authorization, monitoring, and resource policy specification and enforcement. ORCA uses open interfaces and an extensible plug-in architecture to enable these solutions to evolve over time and leverage and interoperate with software outside of ORCA.
Overview and Design 2.1
Infrastructure-as-a-service is based on virtualization technologies that offer common advantages across different kinds of substrate. They give providers rich mechanisms to control who has access to their substrate re-
Linking Clouds through Multi-Layer Networks
A principal goal is to link dynamic circuit networks to cloud sites. Amazon’s Elastic Compute Cloud (EC2) 2
ViSE iGENI
Dome
6509
StarLight 65xx
UMass NLR
NOX NLR FrameNet + Sherpa NLR 65xx
VM
BEN
Dynamic VLAN
6509
EX3200
Static VLAN
VM DukeCS
ORCA-controlled substrate
6509
BEN@Duke BEN@RENCI
Figure 1: Elements of the slice created in the demonstration experiment, and their linkages. is a popular commercial offering for a “public” IaaS cloud. The EC2 API allows customers to request, control, and release virtual server instances on demand from Amazon-operated servers. Pay-as-you-go pricing helps EC2 users adapt nimbly to changing demands with minimal capital cost [3]. Private clouds offer the same opportunity for flexible, controlled sharing and agile management of resources. Eucalyptus is a leading open-source software technology for private clouds: it offers an EC2-compatible API to provision and program groups of VMs across a range of underlying hypervisor systems. Duke and RENCI maintain private Eucalyptus clouds linked through the BEN network. The BEN PoPs use Cisco and Juniper routers above WDM/TDM bandwidth virtualization technology from Infinera; BEN offers 10Gbs dynamic circuits as well as dedicated fiber access through dynamically configurable optical switches (Polatis). BEN has 10Gbps linkages to NLR FrameNet and is adding other connectivity to other national-footprint networks through a connection to the StarLight network hub. BEN is operated by RENCI and is externally controllable through ORCA. Each Eucalyptus site owner runs a local ORCA server endowed with rights to invoke Eucalyptus APIs on behalf of remote users whose identities might be unknown to the provider. The ORCA service incorporates policies to authorize these users and arbitrate among contending requests. The cloud site owner may select the policies that govern access to each site. These policies may be quite different from the pricing policies used in a public cloud such as EC2. The ORCA per-site server is mostly generic, but it runs plugin scripts to invoke the Eucalyptus APIs. Co-
scheduling is performed by ORCA brokering intermediaries, which have limited allocation power delegated to them by the sites using mechanisms described in previous work [17, 13]. Eucalyptus supports an EC2 abstraction called security group that offers containment and connectivity filtering for a user’s VM instances. The group has an isolated VLAN with a per-site IP/NAT gateway with custom filtering rules for external connectivity. We have prototyped Eucalyptus 1.5.2 extensions that enable the transit provider to link these group VLANs to external network circuits provisioned from BEN. This linkage can occur at multiple layers. Our prototype makes the linkages at layer 2: the BEN PoP switch (Cisco 6509) maps the group’s VLAN tag onto the pipe, effectively joining the VM instances in the group to a private VLAN that may span multiple sites. A private cross-site VLAN has various benefits: for example, it enables migration of VM instances across sites, and allows the user to modify the network protocol stack. Various policies and conventions are necessary to coordinate naming at each layer. For example, for layer 2 connectivity, Eucalyptus already assures uniqueness of MAC addresses assigned to VM instances, so ARP and spanning tree functionality work properly. However, ORCA must coordinate IP addresses on the slice VLAN to avoid collisions. The VM instances are created with an identity and keys held by the local ORCA server so that it may connect to the instances and configure them. When it is done, ORCA installs the user’s public key in the instance and generates a notification to transfer control to the user, or to an orchestration server running on the user’s behalf. 3
VLAN tag through DukeNet to BEN
Request to ViSE immediately Start Duke Eucalyptus when DukeNet VLAN tag is known Request NLR/Sherpa link to Starlight immediately When NLR/Sherpa path is ready, stitch one end to ViSE through Starlight… …stand up BEN path and stitch to Sherpa path at one end, and to Duke Eucalyptus VM on the other.
Figure 2: Instantiation schedule and completion times for elements of the demo slice. This figure is generated from timestamped lease event traces collected from the ORCA server logs at each of the providers.
2.2
Network Control
ample, an ORCA server directly controls the BEN network, and runs plugins that emits configuration command sets to software drivers we developed for the native TL-1 interfaces of the fiber switches (Polatis) and WDM DTNs (Infinera), and the CLI interface of the Ethernet switches (Cisco). Another ORCA server runs different plugins that invoke the Sherpa API [24] to configure layer-2 FrameNet paths through the National Lambda Rail (NLR). In the future, the degree of automation of networks will increase. Networks will expose their capabilities and will allow attachment of cloud edge resources at different layers, including transport technologies like OTN or SONET). These layers will carry encapsulated traffic of cluster interconnects between different sites. Today common cluster interconnects are IP over 10G Ethernet or Infiniband. GLIF (Global Lambda Interchange Facility) is automating GOLE (GLIF Opel LightPath Exchange) operations to allow provisioning of global WDM lightpaths that are largely agnostic to the payloads they carry.
Multiple high-speed national fabrics (NLR, I2, ESNet, others) offer resource reservation mechanisms with different levels of abstraction. Some now offer automated control planes (NLR Sherpa [24], I2 ION, ESNet OSCARS [15]) and inter-domain provisioning mechanisms (I2 DCN, GLIF Fenius [10]) with external APIs. These mechanisms make it possible to provide varying levels of quality of service to meet a range of needs. The most common abstraction offered by the national fabrics today is a VLAN—a tagged Layer 2 circuit with possible bandwidth guarantees that can be carried from one interface of the fabric to another. Network embedding is a multi-dimensional optimization problem for which the inputs are the current state of the substrate, availability and compatibility of different interconnect technologies at the participating sites, and the requested topology of the slice and its Quality of Service profile. ORCA provides a pluggable interface for such policies. Our prototype policy grants any request for which it can identify a feasible embedding, as described below. A local ORCA domain server runs for each network transit provider. As with the ORCA server at each Eucalyptus site, these servers are generic except for plugin scripts matched to the specific domain. For ex-
2.3
NDL-OWL
One focus of the project is to advance standards and representations for describing network cloud substrates declaratively. There is a need for a common declarative language that can represent multi-level physical network 4
For example, VLAN tags generally must be mapped or translated at stitching points to establish an end-toend VLAN across multiple network domains. Various standards have been developed to facilitate Ethernet label switching, but there are many technical obstacles to adoption [20] and they are not yet widely deployed. Some major Ethernet service providers assign an arbitrary VLAN ID from a range (e.g., NLR Sherpa). Our prototype maps VLAN tags at the BEN edge, and in an ORCA-controlled Ethernet switch at the Starlight network hub in Chicago, which links to BEN through NLR VLANs provisioned with Sherpa. In our approach to stitching, a broker intermediary returns NDL-OWL descriptions of slice elements reserved to the slice, including the label produce/consume behavior each provider domain. Each domain describes the following attributes: (1) Label type; (2) if it is a label producer; (3) if it has a label translation capability. An ORCA slice manager collects this information and generates a DAG encoding the flow of labels and resulting instantiation order and stitching dependencies. The slice manager uses this DAG to sequence its interactions with the ORCA servers representing each of the provider domains. Each domain signs any labels its produces, so downstream providers can verify their authenticity using the common broker as a trust anchor. The ORCA slice manager runs on behalf of the user and is not trusted by either the broker or the providers.
substrate, complex requests for network slices, and the virtualized network resources (e.g., linked circuits and VLANs) assigned to a slice. Our approach builds on the Network Description Language (NDL [16]). NDL has been shown to be useful for describing heterogeneous optical network substrates and identifying candidate cross-layer paths through those networks. We extended NDL to use a more powerful ontology defined using OWL (Web Ontology Language). OWL is a core technology for the Semantic Web, and a widely used W3C standard [2]. Semantic Web ontologies are especially suitable to model graph structures such as complex network clouds. The result is an NDL-compatible extension of NDL which we refer to as NDL-OWL. NDL-OWL represents various substrate-specific constraints for allocation, sharing, and stitching. These constraints are crucial for the resource control plug-in modules in ORCA, which are responsible for allocating and configuring substrate resources for each slice. The ultimate goal of this process is to create a representation language that is sufficiently powerful to enable generic resource control modules to reason about substrate resources and the ways that the system might share them, partition them, and combine them. Ideally, we could specify all substrate-specific details declaratively, so that we can incorporate many diverse substrates into a network cloud based on a general-purpose control framework and resource leasing core. For example, our prototype identifies feasible network embeddings in the BEN network by graph queries on the NDL-OWL representation of the current state of the substrate. These queries use a standard semantic web query language (SPARQL) and query engine (Jena). The result graph is processed to generate a schedule of configuration actions that instantiate the embedding on the BEN network.
2.4
3
Experiment
We used the prototype described here to demonstrate an end-to-end slice linking a Eucalyptus cloud site at Duke with the ViSE testbed at U. Mass Amherst through an end-to-end VLAN spanning BEN and a dynamic NLR/Sherpa circuit through the Starlight network hub in Chicago. The ViSE testbed is linked to Starlight through a static VLAN. This VLAN is stitched to the Sherpa path through an ORCA-controlled switch maintained at Starlight by the iGENI project. Figure 1 depicts the elements of the slice and their relationships. The experiment instantiates a VM instance at the Eucalyptus cloud site and a Xen VM from the ViSE testbed, and links them through the dynamic stitched VLAN. It then launches an Apache Web server on its Eucalyptus node, which provides an interface to process and visualize radar data fed through the circuit from the ViSE node. We ran this demo live at the GEC7 Conference in March 2010. Figure 2 depicts the instantiation of the slice from a test run of the demo. The orchestration server initiates the “center” (NLR Sherpa) and “edges” (Eucalyptus and ViSE VMs) of the demo slice immediately. When the NLR path is ready and its VLAN tag is known, it commands the Starlight switch to stitch the
Stitching
A major challenge is to stitch different elements of a slice together to establish end-to-end connectivity within the slice. In networks, various labels are used to isolate and identify logical network channels (e.g. VLAN tags) or physical channels (e.g., frequency). Other substrate elements use similar labels, such as logical unit numbers (LUN) in storage systems. Stitching involves exchanging these labels across logically neighboring elements or the slice. We generalize the stitching problem as a label producing and consuming problem based on the relationship among neighboring elements. These relationships create a dependency DAG that defines a partial order for the operations to instantiate the slice elements and stitch them together. 5
path through to ViSE, and initiates a BEN circuit to connect the other end through to the Eucalyptus VM instance group VLAN at the Duke site. The end-to-end slice is ready in four minutes.
[11] R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes. A case for grid computing on virtual machines. In ICDCS ’03: Proceedings of the 23rd International Conference on Distributed Computing Systems, page 550, Washington, DC, USA, 2003. IEEE Computer Society.
4
[12] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang. Virtual clusters for grid communities. In CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06), pages 513–520, Washington, DC, USA, 2006. IEEE Computer Society.
Conclusion
The demo serves as a proof of concept to show how cloud applications can interconnect and link to other resources through dynamic circuits provisioned along with the VMs by a cloud orchestration framework. Certain security mechanisms were disabled for the practical demands of a live demo, important and difficult policy questions are left unaddressed, the network embedding instance is relatively trivial, and the Eucalyptus integration in the prototype is at best a proof of concept. However, the important orchestration code is automated as outlined in this paper, and not bound to this specific scenario. Acknowledgement. This work was supported by the National Science Foundation GENI Initiative, NSF awards CNS-0720829 and CNS-0910653, and an IBM Faculty Award.
[13] Y. Fu, J. Chase, B. Chun, S. Schwab, and A. Vahdat. SHARP: An Architecture for Secure Resource Peering. In Proceedings of the 19th ACM Symposium on Operating System Principles, October 2003. [14] L. Grit, D. Irwin, A. Yumerefendi, and J. Chase. Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration. In Proceedings of the First International Workshop on Virtualization Technology in Distributed Computing (VTDC), November 2006. [15] C. Guok, D. Robertson, M. Thompson, J. Lee, B. Tierney, and W. Johnston. Intra and Interdomain Circuit Provisioning Using the OSCARS Reservation System. In Proc. GridNets, 2006. [16] J. Ham, F. Dijkstra, P. Grosso, R. Pol, A. Toonk, and C. Laat. A distributed topology information system for optical networks based on the semantic web. Journal of Optical Switching and Networking, 5(2-3), June 2008. [17] D. Irwin, J. S. Chase, L. Grit, A. Yumerefendi, D. Becker, and K. G. Yocum. Sharing Networked Resources with Brokered Leases. In Proceedings of the USENIX Technical Conference, June 2006. [18] X. Jiang and D. Xu. Violin: Virtual Internetworking on Overlay Infrastructure. In Proceedings of the Third International Symposium on Parallel and Distributed Processing and Applications (ISPA), July 2003.
References
[19] K. Keahey and T. Freeman. Science Clouds: Early Experiences in Cloud Computing for Scientific Applications. In Cloud Computing and its Applications (CCA), 2008.
[1] S. Adabala, V. Chadha, P. Chawla, R. Figueiredo, J. Fortes, I. Krsul, A. Matsunaga, M. Tsugawa, J. Zhang, M. Zhao, L. Zhu, and X. Zhu. From virtualized resources to virtual computing grids: the in-vigo system. Future Gener. Comput. Syst., 21(6):896–909, 2005. [2] G. Antoniou and F. Harmelen. A Semantic Web Primer. MIT Press, 2008. [3] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the Clouds: A Berkeley View of Cloud Computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, Feb 2009. [4] I. Baldine, Y. Xin, D. Evans, C. Heermann, J. Chase, V. Marupadi, and A. Yumerefendi. The Missing Link: Putting the Network in Networked Cloud Computing. In ICVCI: International Conference on the Virtual Computing Initiative (an IBM-sponsored workshop), 2009. [5] J. Chase, I. Constandache, A. Demberel, L. Grit, V. Marupadi, M. Sayler, and A. Yumerefendi. Controlling Dynamic Guests in a Virtual Computing Utility. In International Conference on the Virtual Computing Initiative (an IBM-sponsored workshop), May 2008. [6] J. Chase, L. Grit, D. Irwin, V. Marupadi, P. Shivam, and A. Yumerefendi. Beyond Virtual Data Centers: Toward an Open Resource Control Architecture. In Selected Papers from the International Conference on the Virtual Computing Initiative (ACM Digital Library), May 2007. [7] J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle. Dynamic Virtual Clusters in a Grid Site Manager. In Proceedings of the Twelfth International Symposium on High Performance Distributed Computing (HPDC), June 2003. [8] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson, M. Wawrzoniak, and M. Bowman. Planetlab: an overlay testbed for broadcoverage services. SIGCOMM Comput. Commun. Rev., 33(3):3– 12, 2003. [9] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The Cost of Doing Science on the Cloud: The Montage Example. In Proceedings of SC’08, Austin, TX, 2008. IEEE. [10] G. FENIUS. http://code.google.com/p/fenius/.
[20] D. I. S. O. Kou Kikuta, Masahiro Nishida and N. Yamanaka. Establishment of vlan tag swapped path on gmpls controllingwide area layer-2 network. In Proc. of IEEE OFC, 2009. [21] I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo. Vmplants: Providing and managing virtual machine execution environments for grid computing. In SC ’04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 7, Washington, DC, USA, 2004. IEEE Computer Society. [22] L. Ramakrishnan, L. Grit, A. Iamnitchi, D. Irwin, A. Yumerefendi, and J. Chase. Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control. In Supercomputing (SC06), November 2006. [23] P. Ruth, X. Jiang, D. Xu, and S. Goasguen. Virtual distributed environments in a shared infrastructure. Computer, 38(5):63–69, 2005. [24] N. SHERPA. http://noc.nlr.net/nlr/maps_ documentation/nlr-framenet-documentation. html. [25] A. SUNDARARAJ and P. DINDA. Towards virtual networks for virtual machine grid computing, 2004.
6