Integrating Software Defined Networks within a Cloud Federation Ioan Petri1 , Mengsong Zou2 , Ali Reza Zamani2 , Javier Diaz-Montes2 , Omer Rana3 and Manish Parashar2 1 School of Engineering, Cardiff University, UK 2 Rutgers Discovery Informatics Institute, Rutgers University, USA 3 School of Computer Science & Informatics, Cardiff University, UK contact author:
[email protected]
Abstract—Cloud computing has generally involved the use of specialist data centres to support computation and data storage at a central site (or a limited number of sites). The motivation for this has come from the need to provide economies of scale (and subsequent reduction in cost) for supporting large scale computation for multiple user applications over (generally) a shared, multi-tenancy infrastructure. The use of such infrastructures requires moving data to a central location (data may be pre-staged to such a location prior to processing using terrestrial delivery channels and does not always require the use of a network-based transfer), undertaking processing on the data, and subsequently enabling users to download results of analysis. We extend this model using software defined networks (SDNs), whereby capability within the network can be used to support in-transit processing while data is in movement from source to destination. Using a smart building infrastructure scenario, consisting of sensors and actuators embedded within a built environment, we describe how an SDN-based architecture can be used to support real time data processing. This significantly influences the processing times to support energy optimisation of the building and reduces costs. We describe an architecture for such a distributed, multi-layered Cloud system and discuss a prototype that has been implemented using the CometCloud system, deployed across three sites in the UK and the US. We validate the prototype using data from sensors within a Sports facility and making use of EnergyPlus. Keywords-Cloud Computing, Cloud Federation, CometCloud, Software Defined Networks.
I. I NTRODUCTION Data intensive applications have promoted SDNs (Software Defined Networks) as efficacious tools for undertaking complex data analysis and manipulation. Such SDNs decouple the network control from the data forwarding hardware by positioning the control logic and state as a programmable software component, the controller. Being able to separate the data path from the control path, it is possible to place in-network processing capability as data is migrated across the network. In many data intensive applications the requirement of providing tighter coupling between the workflow components is inherent, for instance many applications require sharing of large data volumes with varying hard (on-time data delivery) and soft (in-transit processing) quality of service (QoS) requirements. Whereas previous work in workflow enactment over distributed infrastructures have often emphasised the need to provide loose coupling
between services that make up the workflow – in some applications, where real time processing constraints need to be observed, it is necessary to also manage connectivity between such services. Enabling loose coupling between services, whilst at the same time having the ability, when necessary, to control data transfer between these services provides a useful capability made possible with SDNs. In particular, for those applications where the performance of the services/ processes is dependant on time, the interval between the generation of data at a producer and subsequent consumption of this data can have a significant impact on the execution of the workflow. The implications of such timesensitive applications are complex, as the process of data delivery can have significant impact both on the producer and the consumer. For example when data generation rate is high (compared to data transfer rate), the process of data delivery may require temporary storing or pre-processing. In addition, applications may also require different data representations between the source and destination, hence the data has to be transformed in a timely manner before it can be consumed. SDNs provide a number of features that can be made use of, not only to solve the challenges described above, but also to potentially simplify the implementation of existing solutions. Using the SDN paradigm with components that support OpenFlow, the campus network and data-center operators can provide a programmatic interface to build an end-to-end network service with: (i) traffic isolation characteristics; (ii) ability to support large data transfers. Energy optimisation within a built environment is a type of application which conforms to these characteristics, as it involves dynamic data collection and real time computation, often with additional actuation capability to subsequently influence a real world environment. Based on real-time readings from sensors measuring a number of key parameters within a building (e.g. temperature gradient over a particular area/zone), it has become possible to provide intelligent energy consumption plans and help building facility managers and automated control systems within buildings take decisions to reduce energy consumption. As sensors can provide readings within a time interval of 15-30 minutes (as described later in our scenario), a new representation of the building is constructed accordingly within this time frame.
Energy optimisation in this context involves capturing data from a variety of sensors and returning control set points to be implemented through building management systems. With recent developments, building management systems and sensors have the potential to enable their users to become more “active” consumers of energy (through smart metering, for instance). Centralised building controls are often used to enable interactions between the different sensors, actuators, and controllers to perform appropriate control actions. Intelligent buildings have embedded monitoring and control equipment and the potential to reduce energy use along with operations and maintenance expenses, while improving comfort levels. For achieving an equilibrium in terms of consumption and comfort, these systems typically necessitate the deployment of a wide range of sensors (e.g., temperature, CO2 , zone airflow, daylight levels, occupancy levels, etc.), which are, in turn, integrated through a Building (Energy) Management System and an array of electronic actuators, terminal unit controllers to process sensor outputs, and control set-points. In particular, sensor systems can enable building energy simulations – enabling users to optimise various associated aspects of building use over time. In this paper we describe how a smart building infrastructure with sensors and actuators can be mapped into an SDN, using OpenFlow, which greatly increases the performance of the resulting optimisation process. We demonstrate how a distributed cloud system can subsequently be used to process such data, providing both resource elasticity (i.e. new resources can be added dynamically to enable processing to be carried out within a particular time interval) and network programmability (enabling reservation of network bandwidth for particular data streams and support for innetwork processing using SDNs). Our distributed Cloud is deployed across three sites (Cardiff (UK), Rutgers and Indiana (USA)) – all hosting EnergyPlus [6]. The reminder of this paper is organised as follows: Section I, section II and section IV outline the development and use of federated clouds, providing a key motivation for our research (and analysing several related approaches in this area). Section III describes CometCloud and the aggregated CometCloudbased federation. We also describe the building energy optimisation problem in more detail in section V, identifying in particular how a distributed cloud-based systems is used in this context. The evaluation of our implemented system is presented in section VI. We conclude and identify future work in section VII. II. R ELATED W ORK OpenFlow represents both control protocols specified by the Open Networking Foundation (ONF) enabling the network hardware to expose an API to application programs, and specifies a framework through which centralized control of flow forwarding rules can be orchestrated. OpenFlow is
added as a feature to commercial Ethernet switches, routers and wireless access points thereby providing a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow-enabled switches now commercially available [18]. A key benefit of this approach is the ability to make use of spare capacity directly on network elements and couple this with capability within a data centre. The coordinated use of these two types of resources provides opportunities for supporting data processing in real time, streaming applications. An SDN controller provides an application user the ability to potentially install forwarding rules inside a router, monitor flow status and respond to particular events of interest influenced by the data being transferred [7]. There are several organizations worldwide including Google [25], NDDI [26], and GENI [27] running and testing OpenFlow networks. In OpenFlow research, Onix [19] is a control plane platform designed to enable scalable control applications. Onix provides advantages in terms of separating the task of network state distribution from applications and provides them with a logical view of the network state. As in most of the OpenFlow based protocols, Onix provides a general API for control applications, while allowing them to make their own trade-offs among consistency, durability, and scalability. Tootoonchian and Ganjali [21] develop “Hyperflow”, an event based engine for OpenFlow that allows control applications to make decisions locally by passively synchronizing network-wide views of the individual controller instances. In the same OpenFlow research area, focus on supporting Consistent Updates [20] tackles the problem of state management between the physical network and the network information base (NIB) to enforce consistent forwarding state at different levels (per-packet, per-flow). In-transit data analysis refers to the manipulation and transformation of data using resources in the data path between source and destination, and can be extremely advantageous for data intensive applications. Various reactive management strategies for in-transit data manipulation have been undertaken [12],[15]. Studies have investigated the possibility of coupling these strategies at various application levels, in order to create a cooperative management framework and in-transit data manipulation for data-intensive scientific and engineering workflows [13], [11]. Several insitu system workflows study the problem of visualization for monitoring purposes. Recently, such systems facilitate the coupling of simulation codes with popular visualization and analysis toolkits, such as VisIt [9] and ParaView [10], exposing a broader suite of analytics tools for undertaking simulations. The performance-oriented designs for these workflows are becoming increasingly important as they attempt to balance a number of parameters such as latency and run-time performance both for in-situ and in-transit
workflows [11]. SDNs provide a useful infrastructure to carry out in-transit analysis, where network control plane is decoupled and is directly programmable. This migration of control, formerly tightly bound in individual network devices, into accessible computing devices enables the underlying infrastructure to be abstracted for applications and network services, which can treat the network as a logical or virtual entity. There are several monitoring tools for SDNs, e.g. [12], [16] which allow the development of more expressive traffic measurement applications by proposing a clean slate design of the packet processing pipeline. These tools focus on efficiently measuring traffic matrix using existing technology and aim to determine an optimal set of switches to be monitored for each flow. Another key benefit of using SDNs is the ability to support and dynamically make available multiple network topologies between source and destination. This is particularly useful within a data centre, where multiple types of networks can co-exit.
of-Tasks. Masters generate tasks and workers consume them. Masters and workers can communicate via the virtual shared space or using a direct connection. Scheduling and monitoring of tasks are supported by the application framework. The task consistency service handles lost/failed tasks. Previous work has demonstrated how CometCloud can be integrated with Amazon EC2/S3 [14] and current work focuses on integration with IBM’s SoftLayer Cloud1 . IV. A PPROACH We describe our approach for implementing a distributed, cloud-based system with SDN capability. In our system, the execution of each application is coordinated by a CometCloud agent (or “master” node) – responsible for requesting resources to execute computational jobs, whilst at the same time also requesting capacity on a network element to enable data transfer and support for carrying out in-transit processing. A. CometCloud Federation
III. C OMET C LOUD Our Cloud federation makes use of CometCloud [1], [2], an autonomic computing engine based on Comet [3] decentralized coordination substrate, and supports highly heterogeneous and dynamic cloud/grid/HPC infrastructures, enabling the integration of public/private clouds and autonomic cloudbursts, i.e., dynamic scale-out to clouds to address extreme requirements such as heterogeneous and dynamic workloads and spikes in demands [4]. Conceptually, CometCloud is composed of a programming layer, service layer and infrastructure layer. The infrastructure layer uses a dynamic self-organizing overlay to interconnect distributed resources of various kind and offer them as a single pool of resources. It enables resources to be added or removed from the infrastructure layer at any time as well as providing the capability to deal with disconnects and failures (primarily through re-try of failed jobs). An information discovery and content-based routing substrate is built on top of the overlay. This routing engine supports flexible content-based routing and complex querying using partial keywords, wildcards, or ranges. It also guarantees that all peer nodes with data elements that match a query/message will be located. The service layer provides a range of services to support autonomics at the programming and application level. This layer supports a Linda-like [28] tuple space coordination model and provides a virtual shared-space abstraction as well as associative access primitives. Dynamically constructed transient spaces are also supported to allow applications to explicitly exploit context locality to improve system performance. Asynchronous (publish/subscribe) messaging and event services are also provided by this layer. The programming layer provides the basis for application development and management. It supports a range of paradigms including the master/worker/Bag-
The CometCloud-based federation is designed to be dynamically updated as it is created in a collaborative way, where each site communicates with others to identify itself, negotiate the terms of interaction, discover available resources, and advertise their own resources and capabilities. In this way, a federated management space is created at runtime and sites can join and leave at any point. This federation model does not have any centralized component and users can access the federation from any site, which increases the fault tolerance of the overall federation, see Figure 1. Another key benefit of this model is that since each site can differentiate itself based on the availability of specialist capability, it is possible to schedule jobs to take advantage of these capabilities. The federation model is based on the Comet [3] coordination “spaces” (an abstraction, based on the availability of a distributed shared memory that all users and providers can access and observe, enabling information sharing by publishing requests/offers to/for information to this shared memory). In particular, we have decided to use two kinds of spaces in the federation. First, we have a single federated management space used to orchestrate the different available resources. This space is used to exchange any operational messages for discovering resources, announcing changes at a site, routing users’ request to the appropriate site(s), or initiating negotiations to create ad-hoc execution spaces. On the other hand, we can have multiple shared execution spaces that are created on-demand to satisfy computing needs of users. Execution spaces can be created in the context of a single site to provision local resources or to support a cloudburst to public clouds or external high performance computing systems. Moreover, they can be used to create 1 http://www.ibm.com/cloud-computing/us/en/softlayer.html
Figure 1: Integrating Software Defined Networks capability with CometCloud
a private sub-federation across several sites. This case can be useful when several sites have some common interest and they decide to jointly target certain types of tasks as a specialized community. As shown in Figure 1, each shared execution space is controlled by an agent that initially creates such space and subsequently coordinates access to resources for the execution of a particular set of jobs. Agents can act as a master node within the space to manage job/ task execution, or delegate this role to a dedicated master (M) when some specific functionality is required. Moreover, agents deploy workers to actually compute the tasks. These workers (W) can be in a trusted network and be part of the shared execution space, or they can be part of external resources such as a public cloud and therefore in a non-trusted network. Moreover, each site may have SDN-enabled switches (R) to allow network customization across sites. Each agent/master node can only request access to channel reservation or a user slice on a network device via its local, first hop switch/gateway. The master node at a particular site does not have direct control, in this instance, on network devices that are outside its own network. V. A PPLICATION S CENARIO Various types of sensors are used to monitor energy efficiency levels within a building, such as: (i) solid-state meters for accurate usage levels, (ii) environmental sensors for measuring temperature, relative humidity (RH), carbon monoxide (CO), and carbon dioxide (CO2 ), (iii) temperature measurements using both mechanical (e.g., thermally expanding metallic coils) and electrical means (e.g., thermistors, metallic resistance temperature detectors (RTD), thermocouples, digital P-n junctions, infrared thermocouples) provide sufficient accuracy. When dealing with large buildings such as sports facilities, the accuracy of these
sensors is often questioned, largely because of the significant drift that occurs after initial calibration. In some buildings, there are specific requirements for sensors when monitoring CO2 concentration, air flow, humidity, etc and these sensors are more expensive to use and deploy [17]. We use sensor data from the SportE 2 project pilot called FIDIA2 , a public sports building facility, located in Rome, Italy. SportE 2 is a research project co-financed by the European Commission FP7 programme under the domain of Information Communication Technologies and Energy Efficient Buildings. This project focuses on developing energy efficient products and services dedicated to needs and unique characteristics of sporting facilities. The building we have used in the pilot study has wooden external walls of 9cm and a wooden external roof of 9cm. The floor is made of concrete. The windows are single glass with a thermal transmittance of 5.7W/m2 K and a solar gain of 0.7. The geometry of the building is composed of a Gable roof with Hmin = 3m and Hmax = 6m with window surfaces of about 70m2 . The sports facility is equipped with sensors and actuators for monitoring, control and optimisation of the facility. The building has metering capability to determine consumption of electricity, gas, biomass, water and thermal energy. This data can be accessed through a specialist interface and recorded for analysis. The sub-metering of thermal and electrical consumption within grouped zones (gym/fitness and swimming pool is also provided along with “comfort” monitoring by functional area: gym, fitness room and swimming pool). In these areas the Predicted Mean Vote (PMV) index is used (which measures the average response of a group of people to a thermal sensation scale – such as hot, warm to cool and cold) – it is one of the most widely recognised thermal comfort models, and is measured as a function of the activity performed within a particular part of the building. The occupancy is also monitored in the gym, fitness room and around the swimming pool area. The structure of the facility does not allow the direct measurement of the total value of occupancy for the pilot, so the occupancy of the whole facility is provided as sum of number of people who have entered/exited the building over a particular time interval. Additional details – along with sensor types can be found in [8]. In Table I we identify how various building parameters are measured by the available sensors. Each sensor can communicate via a gateway or can be directly linked (using wired infrastructure) with the pilot application server (identified as I/O to an Automation Service (AS) in the table). In our scenario we consider that a user job is defined as job : [input, obj, deadline], where input data is represented as [IDF, W, [param]], IDF represents the building model to be simulated, W represents the weather file required for the simulation, [param] defines the parameter ranges associ2 http://www.asfidia.it
Table I: Sensor endpoints – from [8] Objective Input for Optimisation
Output of Optimisation
Additional Parameters
Variables Occupancy Indoor Temperature Water Temperature Indoor Humidity Air Temperature Inlet Supplied Air Flow Rate PMV(comfort level) Electrical Energy Thermal Energy Supplied Carbon Concentration Chlorine in Air
Sensors/Meters Occupancy Sensor Temperature sensor(Battery powered) Temperature sensor Humidity sensor(Battery powered) Temperature sensor(Battery powered) Velocity sensor Electricity Meter(220-240 HVAC) Heat Meter(Battery powered) CO2-CO/C: CO2 sensor(air quality) Cl sensor (230 VAC)
ated with the IDF file that need to be optimised [param] = [ri → (xm , xn )]. A job obj therefore encodes the optimisation objective objective : [outV arN ame, min/max], defining the name of the output variable to be optimised outV arN ame and the target of the optimisation process. Deadline is a parameter defining the required time to completion of the submitted job. A job contains a set of tasks N = {t1 , t2 , t3 , ..., tn } mapped into tuples within the CometCloud tuple-space. Each task ti is characterised by two parameters ti → [ID, data] with the first parameter being a task identifier and data represents one set of results (given a particular parameter range). The application scenario used in this paper is based on EnergyPlus [5], a simulation engine that enables energy simulation of a built environment based on various inputs from sensors. The simulation output represents an optimum control setpoint to be implemented within the building using suitable actuation mechanisms.
Figure 2: Application Scenario A. Sensors Each sensor in our pilot can communicate via a gateway or can be directly linked (using wired infrastructure) with the pilot Automation Server (AS). Sensors are usually battery powered meters and can be used to measure particular variable(s) of interest, such as: (i) indoor temperature and air temperature inlet – usually battery powered with a Modbus IP protocol connected to the AS gateway; (ii) Water Temperature using a regular I/O operation to the AS gateway; (iii)
Units deg. C deg. C deg. C deg. C kg/s Kwh Kwh ppm ppm
Type TPS210:People counter iPoint-T:Air T&RH sensor STP100-100:water T sensor SHO100-T:Air RH sensor iPoint-T:Air T&RH sensor TI-SAD-65:Air velocity Sensor iMeter:Electric meter HYDRO-CAL G21:Heat Meter (DN80) CO2 duct sensor Murco MGS: Air Cl2 sensor
Protocol Modbus IP Modbus IP I/O to AS I/O to AS Modbus IP I/O to AS Modbus RS485 % Modbus IP I/O to AS Modbus RS485
Indoor Humidity – battery powered, and communicating to the AS gateway; (iv) Supplied Air Flow Rate measured with a velocity sensor and using I/O operations to the AS gateway. The AS can also be used to control (carry out actuation) on the building, such as modifying controls on a boiler. The BACnet protocol can also be used as an alternative. B. Gateways Level - BMS and AS There are two distinct gateways:(i) Building Management System (BMS) and (ii) Automation Server (AS). The BMS gateway is a server machine that controls the activities and spaces within the building. BMSs are most commonly implemented in large projects with extensive mechanical, electrical, and plumbing systems and are a critical component to manage energy demand. In addition to controlling the building’s internal environment, BMS systems are sometimes linked to access control (turnstiles and access doors controlling who is allowed access to the building) or other security systems such as closed-circuit television (CCTV) and motion detectors. The AS gateway is a hardwarebased server that is factory programmed with StruxureWare Building Operation software (from Schneider Electric). In a small installation, the embedded AS acts as a stand-alone server, mounted with its I/O modules with a small footprint. In medium and large installations, functionality is distributed over multiple Automation Servers (ASs) that communicate over TCP/IP. The AS can run multiple control programs, manage local I/O, alarms, and users, handle scheduling and logging, and communicate using a variety of protocols. Consequently, most parts of the system function autonomously and will continue to run as a whole even if communication fails or individual servers or devices go offline. In terms of computing capability the BMS and the AS can carry out various operations on the raw data collected at the sensor level. The AS is generally considered to be closer to the phenomenon being measured and can be used to collect data directly from the sensors and carry out actuation. The AS therefore acts as means to sample and archive sensor data (over a time or sample window). The AS can perform simple operations, such as carrying out a running average, identifying the min or max values of data collected over a time/sample window, simple filtering on the data etc. Control actions can also be carried out through the AS over the building infrastructure using a limited set
of control operations/commands. The AS is primarily a dedicated hardware component with storage and network interaction capabilities. The BMS on the other hand is more complex and in the scenario considered in this work is hosted on a dedicated machine. It acts as an interface for data collection and analysis between the cloud infrastructure and the data collection capability made available through the AS. Operations carried out at the BMS may be either triggered by a request from the Cloud system, or alternatively, a more proactive, partial analysis of the data could be carried out. C. CometCloud Sites At this level, we have a CometCloud-based federation of resources [22], [23], where each site has access to a set of heterogeneous and dynamic resources, such as public/private clouds, supercomputers, etc. These resources are uniformly exposed using cloud-like abstractions and mechanisms that facilitate the execution of applications across the resources. Each site decides on the type computation it runs, as well as the prices based on various decision functions that include factors such as availability of resources, computational cost, etc. This federation is dynamically created at runtime where sites can join or leave at any given time. Notably, this requires a minimal configuration at each site that amounts to specifying the available resources, a queuing system or a type of cloud, and credentials. We consider three sites in these scenario – one based at Cardiff, at Rutgers, and at Indiana. A federation site therefore refers to a deployment which is connected over a network and not co-located with a master node. Our sites are: Cardiff site: has a virtualized cluster-based infrastructure with 12 dedicated physical machines. Each machine has 12 CPU cores at 3.2 GHz. Each VM uses one core with 1GB of memory. The networking infrastructure is 1Gbps Ethernet with a measured latency of 0.706 ms on average. Rutgers site: has a cluster-based infrastructure with 32 nodes. Each node has 8 CPU cores at 2.6 GHz, 24 GB memory, and 1Gbps Ethernet connection. The measured latency on the network is 0.227 ms on average. FutureGrid site: makes use of an OpenStack cloud deployment at Indiana University. We have used instances of type medium, where each instance has 2 cores and 4 GB of memory. The measured latency of the cloud virtual network is 0.706 ms on average. We consider that we have SDN capabilities across all CometCloud sites. These capabilities can be provided by a SDN controller (e.g., OpenFlow) using the network switches under its control. One advantage of SDNs is that the network infrastructure can scale as required by simply adding switches to the controller that will dynamically reorganize the virtual networks. Hence, if a site in our federation adds SDN capabilities to its infrastructure, it should only require to notify it to the SDN controller and other sites in the federation. In this way, we can shape our network infrastructure in the same way we shape the computational
resources of our federation. The use of SDNs at this level enables control over data flows and resource provisioning. In this way, it is possible to meet varying QoS requirements from multiple workflows and users sharing the same service medium. Some specific capabilities include controlling bandwidth to match data transfer with computational resource provision; in-transit processing when appropriated resources are found along the path; security by isolating traffic; or route optimization depending on usage. Specifically in this work we focus on bandwidth reservation and basic in-transit operation performed on the SDN nodes. VI. E XPERIMENTS We consider a marketplace scenario where a building manager needs to decide whether they compute their workload using local resources or they outsource the workload to a remote site. When the workload is outsourced, it is necessary to consider both the computation and data transfer costs. We use three federated sites: Rutgers, Cardiff and Futuregrid in our experiments. Based on the FIDIA pilot (as described in section V), we consider that building data is available at each site, with data being generated at different rates. We use poisson distribution to generate heterogeneous jobs from three sites. The amount of input data to be transferred can be 10MB, 20MB, or 30MB. Computational resources: Rutgers site has five machines that can consume and compute tasks simultaneously; and Futuregrid and Cardiff sites has only two machines each. We assume that the computational capacities of all workers are similar. Network resources: We assume SDN capabilities are available across all sites. SDNs allows us to guarantee certain quality of service (QoS) in the network – this is emulated using Mininet3 and traffic control toolkit (TC). In our experiments, we allocate five SDN channels between each pair of sites with a guaranteed bandwidth of 1 Mbps. Moreover, we also have a network channel without QoS guarantees, called shared channel, that has a bandwidth of up to 0.2 Mbps. The SDN channels can be reserved for a price and QoS is guaranteed, while the shared channel (non-SDN) is free. Note that if a site A is using the reserved channel #1 to transfer data to B, B cannot use channel #1 at the same time to transfer data to A. However, a shared channel can be used at anytime by any site. Finally, SDNs also allow us to use capacity directly available on a network device along the data propagation path in the network. We consider that we can do filtering of our parameter space to reduce the overall computation time by half [24]. Each of the three sites can accept computational requests, and make the decision about whether to compute such requests on local resources or to outsource to other 3 http://mininet.org/
sites based on predefined policies. The decision policy used in our current experiments is selecting the site that can complete the workload with the minimum Time to Completion (TTC) subject to Cost < Budget. If a job cannot be completed at any site within the given deadline and budget constraint, this job is declined. The TTC of a job is DataT ransf er + ComputationT ime. The Cost is DataT ransf erCost + ComputationCost. The shared network is free, while the cost of the SDN network varies based on utilization. The default cost of each SDN network channel is $0.05/second. This cost is increased when utilization exceeds 50%. The cost of the network is calculated as follows:
Cost = BaseCost ∗ (1 + (
ChannelsInU se − 0.5) ∗ 2) T otalChannels (1)
First, we evaluate how the price of the SDN affects the decision of where to execute the workload. Figure 3 collects the SDN price and the number of jobs outsourced using SDN over time for each one of the sites. Figures 3a, 3b, and 3c show how the price of reserving an SDN channel varies over time as the channel usage ratio changes between each pair of sites. Figures 3d, 3e, and 3f show the number of outsourced jobs that use reserved channels over time, which is calculated by the summation of all outsourced jobs over a particular network link (e.g., cardiff-futuregrid also includes jobs outsourced from futuregrid-cardiff). It can be observed from these two figures that as the reserved net price increases, the number of outsourced jobs decrease. This is related to the fact that each job has a fixed budget and, as the net price increases, the user cannot afford the total cost of outsourcing job requests. Therefore, more jobs will be computed locally until the net price is later decreased to an acceptable level. We can observe this relationship from Figures 3a, 3b, and 3c and Figures 3d, 3e, and 3f. We can observe that when a large number of jobs are outsourced to a site, the price of the SDN channel increases. This motivates users to choose other sites to do computation as they cannot afford the high prices. Essentially we are observing that the marketplace is regulating itself based on the offer and demand. Table II shows how the SDN channels guarantee the promised QoS and only have insignificant differences between estimated and real data transfer time. The use of this approach can therefore provide a much more stable transfer speed compared to that of a shared network whose real transfer time may be significantly higher than estimated (in the worst case). This can also be observed in Figure 4 where we show the execution time and the time required to transfer the data for different cases. When the jobs are computed locally, no data needs to be transferred. However, when we outsource jobs to remote sites, we observe how the
SDN allows us not only to reduce the transfer time but also the computation time by doing in-transit processing. This demonstrates how an SDN can help to complete jobs within their required deadlines, ensuring constraints identified in user-based service level agreements (SLA) can be met. At the same time, we can see that shared networks could potentially cause delays and violate user SLAs. Table II: Total amount of time spent transferring data using shared network and SDN network. Shared Network
SDN
Transfer Time
Estimated
Real
Estimated
Real
Rutgers Cardiff Futuregrid
560 180 264
4652 1588 1489
2405 400 680
2457 402 686
Finally, we study factors that influence whether jobs should be computed locally or remotely. Table III shows the number of jobs that have been computed locally and the number of jobs that have been outsourced to other sites. It can be observed that sites with fewer resources (Cardiff and FutureGrid) outsource most of their jobs. Although outsourcing involves transferring data, the overall TTC of the job may be shorter than computing locally. If SDN is chosen, some data processing is taking place during the transfer which can reduce the total computation needed at the destination. Moreover, other sites may have available resources or may have lower workload, which leads to a shorter estimated TTC. Table III: Number of jobs outsourced versus number of jobs computed locally. job Source
Outsourced
Local
Rutgers Cardiff Futuregrid
49 96 100
74 40 30
Table IV shows how many jobs were outsourced using shared network versus SDN. We can observe that most of the outsourced jobs choose to reserve a SDN channel for data transfer. There are several reasons for this. First, our decision policy targets the fastest solution if it is within the budget. Another reason is that if we do not choose the SDN option, we may not be able to complete the job within the required deadline. VII. C ONCLUSIONS We demonstrate how SDNs can be used to support energy optimisation in built environments (with deadline constraints on completion times). Specifically we have presented how the OpenFlow protocol can provide significant advantages
0.08 0.07 0.06 0.05
0.07 0.06 0.05 0.04
0.03
0.03 5
10
15
20
25
30
35
40
45
50
5
10
15
20
10
25
30
35
40
45
50
0
5
10
15
6 4 2 0
10
Rutgers-FutureGrid
8
25
30
35
40
45
50
25
30
35
40
45
50
45
50
(c)
6 4 2 0
20
20
Time (min)
Number of Jobs
Number of Jobs
Number of Jobs
0.05
Time (min)
FutureGrid-Cardiff
8
15
0.06
(b)
10
10
0.07
0.03 0
(a)
5
0.08
0.04
Time (min)
0
Rutgers-Cardiff
0.09
0.08
0.04 0
0.1
Rutgers-Futuregrid
0.09 Net Price($)
Net Price($)
0.1
Cardiff-Futuregrid
Net Price($)
0.1 0.09
Rutgers-Cardiff
8 6 4 2 0
0
5
10
15
Time(min)
20
25
30
Time(min)
(d)
(e)
35
40
45
50
0
5
10
15
20
25
30
35
40
Time(min)
(f)
Figure 3: Summary of experimental results. At the top we have the price of reserving SDN over time and at the bottom we have the number of jobs outsourced using SDN over time.
250
We have mapped the smart building infrastructure as an SDN and tested several scenarios to show how capabilities made available for supporting channel reservation and intransit processing can be employed in a realistic application context.
Network Time Execution Time
Time (s)
200
150
100
50
0 Local
SDN
Non-SDN
Figure 4: Average Execution and Network transfer time for local site and outsourcing to a remote site with and without SDN. Table IV: Number of jobs outsourced using shared network versus SDN. job Destination
Shared Channel
Reserved Channel
Rutgers Cardiff Futuregrid
11 3 10
132 36 53
for such data intensive applications by reducing the time to completion of jobs generated from the EnergyPlus application and optimising the costs of execution of these jobs.
Data-intensive applications executing jobs over a distributed infrastructure can lead to significant benefit – reducing the need to capture all the data at a single site (e.g., at a data centre). Calculating energy in buildings is one example of such an application, where the flow of data to be processed presents challenges due to the computational requirements and timing constraints. We describe an implementation of a distributed Cloud using CometCloud. We show how our distributed cloud model facilitates EnergyPlus simulations to be deployed with data recorded from building sensors and how various analysis can be applied at intermediate architectural layers to ease the energy optimization of buildings. We have presented the design and implementation of the proposed approach and experimentally evaluated a number of scenarios based on the execution of EnergyPlus tasks using data from an instrumented sports facility (building). The experimental results have shown a number of benefits that our system provides with regards to task completion and costs. In particular, our results demonstrate how direct control of the network can influence overall task execution performance, especially where results need to be generated within a particular time interval. Integrating elastic resource provisioning in federated clouds with support for SDN provides a computational architecture for supporting real time data processing, where the volume of data is hard
to predict beforehand. ACKNOWLEDGEMENTS The research presented in this work is supported in part by US National Science Foundation (NSF) via grants numbers OCI 1339036, OCI 1310283, DMS 1228203, IIP 0758566 and by an IBM Faculty Award. The scenario outlined in this work comes from the European “Energy Efficiency for European Sports Facilities” (SportE2) project (http://www.sporte2.eu/). We are grateful to Prof. Rezgui (Cardiff University) for access to this scenario.
R EFERENCES [1] CometCloud Project. http://www.cometcloud.org/. Last accessed: November 2014. [2] J. Diaz-Montes, M. AbdelBaky, M. Zou, and M. Parashar. “CometCloud: Enabling Software-Defined Federations for End-to-End Application Workflows.” IEEE Internet Computing 19, no. 1 (2015): 69-73. [3] Z. Li and M. Parashar. A computational infrastructure for gridbased asynchronous parallel applications. In HPDC, pages 229230, 2007 [4] M. Parashar, M. Abdelbaky, I. Rodero, A. Devarakonda, ”Cloud Paradigms and Practices for Computational and DataEnabled Science and Engineering”, IEEE Computing in Science and Engineering (CiSE) Magazine, 2014. [5] N. Fumo, and P. Mago, and R. Luck, “Methodology to Estimate Building Energy Consumption Using EnergyPlus Benchmark Models.” Energy and Buildings; (42:12); pp. 2331-2337, 2010. [6] US Department of Energy, “EnergyPlus – Energy Simulation Software”. Available at: http://apps1.eere.energy.gov/buildings/ energyplus/. Last accessed: November 2014. [7] M. Appelman and M. de Boer, “Performance Analysis of Open Flow Hardware”, University of Amsterdam, 2011-12. Available at: http://www.delaat.net/rp/2011-2012/p18/presentation. pdf. Last accessed: November 2014. [8] Ioan Petri, Omer Rana, Yacine Rezgui, Haijiang Li, Tom Beach, Mengsong Zou, Javier Diaz Montes, Manish Parashar: Cloud Supported Building Data Analytics. CCGRID 2014: 641-650 [9] J.-M. F. Brad Whitlock and J. S. Meredith. Parallel In Situ Coupling of Simulation with a Fully Featured Visualization System. In Proc. of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV11) , April 2011. [10] N. Fabian, K. Moreland, D. Thompson, A. Bauer, P. Marion, B. Gevecik, M. Rasquin, and K. Jansen. The paraview coprocessing library: A scalable, general purpose in situ visualization library. In Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV), pages 89–96, October 2011 [11] K. Moreland, R. Oldfield, P. Marion, S. Jourdain, N. Podhorszki, V. Vishwanath, N. Fabian, C. Docan, M. Parashar, M. Hereld, M.E. Papka, and S. Klasky, Examples of in Transit Visualization, Proc. International Workshop Petascale Data Analytics: Challenges and Opportunities (PDAC 11), Nov. 2011
[12] S. Klasky, B. Ludaescher, and M. Parashar, “The Center for Plasma Edge Simulation Workflow Requirements,” in 22nd Int. Conf. on Data Engineering Workshops (ICDEW’06). Atlanta, GA, USA: IEEE Computer Society, 2006, p. 73. [13] Klasky, Scott, Hasan Abbasi, Jeremy Logan, Manish Parashar, Karsten Schwan, Arie Shoshani, Matthew Wolf et al. ”In situ data processing for extreme-scale computing.” Scientific Discovery through Advanced Computing Program (SciDAC’11) (2011). [14] Yaakoub el-Khamra, Hyunjoo Kim, Shantenu Jha, and Manish Parashar, “Exploring the Performance Fluctuations of HPC Workloads on Clouds”, 2nd IEEE international conference on cloud computing technology and science (CloudCom), Indianapolis, Nov. 30-Dec.3, 2010. [15] Janine C. Bennett et al. “Combining in-situ and in-transit processing to enable extreme-scale scientific analysis”. In Proc. of the Int. Conf. on High Perf. Computing, Networking, Storage and Analysis (SC ’12). IEEE Computer Society Press, Los Alamitos, CA, USA [16] H. Kim and N. Feamster, “Improving network management with soft- ware defined networking,” Communications Magazine, IEEE , vol. 51, no. 2, pp. 114-119, 2013. [17] Von Neida, B., D. Maniccia and A. Tweed., An Analysis of the Energy and Cost Savings Potential of Occupancy Sensors for Commercial Lighting Systems, Proc. Illuminating Engineering Society of North America Annual Conference, pp. 433459, 2010. [18] I. Monga, E. Pouyoul, and C. Guok, “Software defined networking for big-data science,” SuperComputing 2012, 2012. [19] T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, and S. Shenker. Onix: A distributed control platform for large-scale production networks. In USENIX OSDI, 2010. [20] M. Reitblatt, N. Foster, J. Rexford, and D. Walker. Consistent updates for software-defined networks: change you can believe in! In ACM HotNets Workshop, 2011. [21] A. Tootoonchian and Y. Ganjali. Hyperflow: a distributed control plane for openflow. In USENIX INM/WREN, 2010. [22] I. Petri, T. Beach, M. Zou, and et. al. Exploring models and mechanisms for exchanging resources in a federated cloud. In Intl. Conf. on cloud engineering (IC2E 2014),pp. 215–224, Boston, 2014 [23] J. Diaz-Montes, Y. Xie, I. Rodero, J. Zola, B. Ganapathysubramanian, and M. Parashar. Exploring the use of elastic resource federations for enabling large-scale scientific workflows. In Proc. of Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS), pages 1–10, 2013.
[24] I. Petri, O. Rana, J. Diaz-Montes, M. Zou, M. Parashar, T. Beach, Y. Rezqui, and H. Li, ”In-transit Data Analysis and Distribution in a Multi-Cloud Environment using CometCloud,” The International Workshop on Energy Management for Sustainable Internet-of-Things and Cloud Computing. Colocated with International Conference on Future Internet of Things and Cloud (FiCloud 2014), Barcelona, Spain, August 2014. [25] Urs Hoelzle, “OpenFlow at Google”, Open Net Summit. Available at: http://www.opennetsummit.org/archives/apr12/
hoelzle-tue-openflow.pdf. Last accessed: November 2014. [26] -, “Network Development and Deployment Initiative”. Available at: https://code.google.com/p/nddi/. Last accessed: November 2014. [27] McGreer, “GENI Cloud”. Available at: http://groups.geni.net/ geni/wiki/GENICloud. Last accessed: November 2014. [28] N. Carriero and D. Gelernter, Linda in context, Commun. ACM, vol. 32, no. 4, 1989.