Maximum Flow Minimum Energy Routing for ExaScale Cloud Computing Systems T.H. Szymanski McMaster University, Hamilton, Canada
[email protected]
Abstract—This paper explores the feasibility of Exascale Cloud Computing over the Nobel-EU (European Union) IP backbone network. The EU represents an attractive environment for Exascale Cloud Computing, due to its well-established IP infrastructure with relatively low edge distances. An Exascale Cloud Computing system linking hundreds of data-centers can interconnect potentially millions of servers, considerably larger than the world’s most powerful supercomputer. However, the US Dept. of Energy has recently concluded that cloud computing is too expensive and suitable only for scientific applications with minimal communications requirements. This paper explores a Future-Internet network to address these concerns. The network combines several technologies; (i) a QoS-aware router scheduling algorithm, (ii) a flow resource-reservation signalling algorithm, and (iii) flow aggregation, to establish highly-efficient guaranteed-rate connections between data-centers. A MaximumFlow Minimum-Energy routing algorithm is used to route highcapacity ”trunks” between data-centers with minimum energy requirements. The traffic flows between data-centers are aggregated onto the trunks, to achieve exceptionally low latencies with improved link utilizations and energy-efficiencies. The high packet loss rates and large queueing delays of traditional BestEffort Internet are eliminated, and the latency over the cloud is reduced to the fiber latency. The average fiber latencies over the Nobel-EU network are a few milliseconds, and they can be mitigated with multithreading. The bisection bandwidth of the cloud can be improved to match the demand by activating dark fiber. By lowering latencies, the utilization of the Internet data-centers will also increase, further improving overall energy efficiency. The proposed Future Internet network can improve the utilization and energy efficiency of both the communications and computations in Cloud Computing systems. Index Terms—energy efficiency, quality of service, QoS, exascale, cloud computing, maximum flow, minimum energy, minimum latency, low latency, routing, Internet, data-centers
I. I NTRODUCTION This paper explores technologies to achieve an Exascale Cloud Computing system over a European IP backbone network. Current Internet data-centers typically consist of 1...50 ×103 servers, and are geographically distributed over the globe. The Nobel-EU IP backbone network is shown in Fig. 1, and typical data-center locations are shown by bold (red) dots. The EU represents one particularly attractive region for Exascale Cloud Computing, due to its well-established IP infrastructure with relatively low edge distances. One of the primary advantages of Cloud Computing is reduced cost. A cloud computing system can utilize existing Internet data-centers and infrastructure, created to serve a large Internet user population. According to Google, Internet
data-centers are typically under-utilized (typically 30...50% utilization). A cloud computing system can utilize the idle processing capacity of existing data-centers (typically 50...70% of their capacity). A Cloud Computing system linking hundreds of remote data-centers over the Internet can interconnect potentially millions of processors, considerably larger than the Cray Titan supercomputer. The existing infrastructure of global data-centers has a potential computing capacity of several exaflops per second. By enhancing data-center processors with FPGAs or GPUs, the potential computing capacity can be higher. A cloud computing system is also available on a payfor-use basis, and it does not require the extensive front-end capital costs of a traditional HPC machine. Cloud computing has achieved significant milestones recently. Cycle Computing has implemented a 100,000 core cloud computing system using 7 regions from Amazon Web Services, spanning the USA, Europe and Asia, for the pharmaceutical industry. The system completed 100,000 hours of computation on a serial machine within 3 hours, at a cost under $5,000 US per hour. However, the application is highly parallel and places minimal demands on the Internet. According to a recent report by the US Dept. of Energy [1], Cloud Computing is not cost effective, and is suitable only for scientific applications with minimal communications and IO requirements. Traditional HPC machines use tightly-coupled low-latency communication networks such as 10GB Ethernet or InfiniBand which do not drop packets, with average delays in the microsecond range. In contrast, today’s Best-Effort (BE) Internet network is a loosely-coupled network which intentionally drops packets when congestion is detected [2,3], and it offers poor bisection bandwidth, latency, and energyefficiency. Packet delays are typically 100-200 milliseconds [2,3]. To achieve these average delays, the Internet links are significantly over-provisioned and operate at light loads (typically 33% utilization) [2,3]. The BE-Internet represents a significant challenge for Exascale Cloud Computing due to its poor throughput, its large latency, its affinity to drop packets, and its poor energyefficiency. The poor performance of the BE-Internet also results in low utilizations of Internet data-centers, which further lowers energy efficiency. As a result, governments world-wide are exploring ’Future Internet’ architectures, Recently, the Greentouch Consortium was formed by the telecommunications industry, with a goal of achieving a 1000 times reduction
89
978-1-4799-1501-9/13/$31.00 ©2013 IEEE
Fig. 1. The Nobel-EU IP network with 28 nodes and 82 directed edges, with data-centers shown by bold dots.
in energy per bit over the Internet (www.greentouch.org). Hence, there is significant interest in technologies to enable energy-efficiency in the cloud. In this paper we explore the use of a recently-proposed Future-Internet network which can establish exceptionallylow latency guaranteed-rate connections between data-centers, with improved energy-efficiency. The theory of the FutureInternet is described in [12,13] and is briefly summarized. A Virtual Private Network (VPN) topology consisting of high capacity connections called ”trunks” is used to interconnect remote data-centers. In this paper, a Maximum-flow MinimumEnergy routing algorithm is used to route the trunks with minimum energy requirements when the network is provisioned. The communications between remote data-centers are then multiplexed onto these high-capacity trunks. Prior research has shown that the communication pattern between data-centers is typically a highly-bursty ON/OFF process [7]. It is shown that many highly-bursty ON/OFF processes can be multiplexed onto a single high-capacity trunk, and achieve significantly improved resource utilization, latency and energy-efficiency. Using these techniques, end-to-end latencies of less than a few milliseconds can be achieved over the Nobel-EU backbone network, 1-2 orders of magnitude lower latency than possible with the existing Best-Effort Internet. Referring to Fig. 2, the average edge length in the NobelEU network is approx. 400 km, and the average edge latency is 2 millisec. The end-to-end delay between data-centers also includes a small (typically 2 millisec) queueing delay at the source and destination data-centers. These delays are lower than the seek times in current hard disk drives, and existing multithreading techniques can be used to hide these latencies. As a result of significantly lower network latencies over the proposed Future Internet, the data-centers can achieve improved utilizations, performance and energy-efficiency, with less time spent stalled on network IO. The changes required to existing internet routers to achieve these results are minimal, i.e., the addition of a few FPGAs per router to support; (i) the QoS-aware routing algorithm, and (ii) the resource-
Fig. 2.
Edge lengths in Nobel-EU Network.
reservation algorithm. Together these technologies can result in exceptionally low latencies and significant improvements in utilization, performance and energy-efficiency for Cloud Computing systems. Section 2 reviews the challenges of Exascale Computing. Section 3 reviews the Future-Internet network. Section 4 describes the aggregation of traffic sources. Section 5 describes the Max-Flow Min-Energy routing algorithm. Section 6 describes the energy savings. Section 7 concludes the paper. II. T HE C HALLENGES OF E XASCALE C OMPUTING As of 2012, the largest ’High Performance Computing’ (HPC) machine is the Cray Titan supercomputer located in the USA. The Cray Titan has nearly 20K AMD Opteron processors and the same number of Nvidea Tesla Graphics Processing Units (GPUs) used as co-processors, with a peak performance near 20 Petaflops/sec, and with a cost of ≈ $100 Million US. Governments in the USA, EU, Asia and India are now exploring Exaflop scale computing systems, due to its strategic importance in the computer-aided design of complex systems. Using today’s technology, an Exascale HPC system would require ≈ 50 Cray Titan machines, at a cost approaching $5 Billion US. The system would consume 20K square meters of floor-space, and require a power supply with ≈ 0.4 Gigawatts of power. Cloud Computing offers some potentially significant advantages. By utilizing existing data centers, the up-front capital costs of $5 Billion US can be reduced considerably. By using distributed Internet data centers, there is no requirement of 20K contiguous square meters of floor space. There is no requirement of a single stable power supply with 0.4 Gigawatts of power. The savings in up-front capital costs can be invested into increasing the bisection bandwidth of the Internet. By Moore’s Law, the performance of computing systems is expected to double or quadruple over the next several years, and hence the up-front capital costs and area requirements of an Exascale HPC system will decrease accordingly. However,
90
power requirements are expected to decrease slowly, and energy efficiency will remain a key design goal. III. T HE F UTURE I NTERNET N ETWORK M ODEL To date, there is no clear consensus as to what the Future Internet should look like. The theory for a proposed Future Internet network which can support 2 (or more) service classes, a new Smooth class and the usual Best-Effort (BE) class, has been presented in [12,13]. Legacy Best-Effort-Internet applications developed over the last 40 years typically use the TCP flow-control protocol, which results in bursty traffic flows. These legacy applications will continue to run over the proposed Future Internet, using the same existing BE-Internet routing algorithms such as the Open Shortest Path First (OSPF) algorithm. Equally importantly, new Cloud services can be developed to exploit the new Smooth traffic class. According to theory and extensive simulations [12,13], the proposed Future Internet network will provide the new Smooth class with Deterministic and Essentially-Perfect bandwidth, delay, jitter and QoS guarantees, for all admissible traffic demands within the Capacity Region of the network. Links can operate at 100% loads, and still provide essentially-perfect QoS and extremely low latencies. In the Future-Internet, each router has 2 classes of Virtual Output Queues (VOQs), the Smooth VOQs, and the BE VOQs. Each router filters the incoming packets and forwards them to the appropriate VOQs. BE packets are forwarded to the BEVOQs, while Smooth packets are forwarded to the SmoothVOQs. The routing and scheduling of BE packets through the router is accomplished with the existing BE scheduling and routing algorithms (i.e., OSPF). The scheduling of Smooth packets through the router can be accomplished using deterministic schedules, which can be precomputed by each router using a fast polynomial-time low-jitter QoS-aware router scheduling algorithm [10]. The IETF has recently proposed a flow-signalling technology, where applications can request the creation of end-to-end connections (flows), and signal the desired QoS parameters to the network (bandwidth, latency, burstiness, etc), in RFC 5977 [14]. This RFC does not address the issue of establishing smooth guaranteed-rate connections with exceptionally low jitter and delay, as it is known that finding smooth lowjitter schedulers for routers with 100% efficiency is an NPHard integer-programming problem [11]. We assume that a similar resource-reservation signalling technology is used to signal and establish guaranteed-rate connections in the proposed Future-Internet. Each router in the proposed Future Internet can use the QoS-aware router scheduling algorithm in [10] to realize the smooth guaranteed-rate connections, while achieving 100% throughput efficiency. In this paper, let each Internet data-center have a tokenbucket-based Traffic Shaper Queue (TSQ) to aggregate several bursty ON/OFF inter-processor communication streams and to shape the traffic into a single low-jitter stream (i.e., smooth stream) before transmission on the high-capacity ”trunk”. Each destination node has Traffic Playback Queue (TPQ)
to regenerate the original bursty ON/OFF stream(s) at the destination data-centers with improved latency and energyefficiency. In this Future-Internet network, each ”trunk” between datacenters can specify a guaranteed data-rate. Each Internet router buffers ≈ 2 IP packets per trunk [12,13], several orders of magnitude less buffering that required in traditional Best-Effort Internet routers. By Little’s Law, the maximum queueing delay per router is ≤ the time needed to transmit 2 packets at the trunk’s guaranteed rate, which can be several orders of magnitude less delay than in BE-Internet routers [12,13]. IV. AGGREGATING ON/OFF HPC T RAFFIC S OURCES In this section, a technique to aggregate the bursty ON/OFF inter-processor traffic between data-centers is described. By aggregating this traffic and by exploiting the Smooth traffic class, the latency for communications between data-centers over the Internet can be reduced considerably, by several orders of magnitude, and the resource-utilization and energyefficiency of communications over the Internet can increase considerably. According to prior research, HPC traffic typically follows a bursty ON/OFF model [7]. In this paper, assume each Virtual Machine Image (VMI) generates inter-processor traffic according to a Markov-Modulated ON/OFF Poisson process. The duration of the ON and OFF states are Poisson distributed [7]. The Amazon EC2 ”Cluster Compute” (CC) instance offers a 10 Gbps Ethernet connection for 88 EC2 processors, equivalent to peak IO rate of 113 Mbps per EC2 processor. However, users report lower bandwidth when multiple CC instances are used. In this paper, we assume each VMI generates traffic at a peak rate of 50 Mbps. Most of the traffic will be local, i,e., travel within the data-center, and a smaller fraction γ will exit the data-center and traverse the Internet. To generate an aggregated traffic stream consisting of ”X” ON/OFF streams to be delivered between 2 data-centers over a high-capacity trunk, a Matlab simulator was used to generate the ”X” sample ON/OFF paths. The sample paths are circularly rotated by a random amount and added together to yield the aggregated stream. (No processing of the sample paths to minimize the jitter of the aggregated stream is assumed.) Assume that a cloud data-center will use a resourcereservation signalling protocol such as the IETF protocol in [14] to establish trunks between data-centers with a guaranteed rate, and transport highly aggregated traffic over the trunk(s). The number of aggregated ON/OFF streams on each trunk is determined at the provisioning time, and the minimumbandwidth required by the trunk to support the aggregated stream is also determined. As described in [12,13], the cloud data-center can reserve a trunk with an additional excessbandwidth component, typically between 1% and 10% of the minimum-bandwidth requirement, to control the TSQ and TPQ queueing delays. (This excess-bandwidth can be viewed as a tightly-controlled over-provisioning.) To find the total delays to deliver an aggregated ON/OFF stream between data-centers, the TSQ and the TPQ at the
91
(a) Fig. 3.
(b) (a) TSQ delay vs. excess bandwidth. (b) TPQ delay vs. excess bandwidth.
source and destination data-centers were simulated using the methodology described in [12,13]. Fig. 3 illustrates the queueing delays in the TSQ and the TPQ for the aggregation of between X =1...1M individual bursty ON/OFF streams. For each point in Fig. 3, the TSQ and TPQ were simulated with 100 randomly generated aggregated streams, each 1/2 hour in length. The x-axis illustrates the excess-bandwidth provisioned in the trunk(s). The y-axis illustrates the mean queueing delays. The 95% confidence intervals are very small. The queueing delays in the TSQ and TPQ drop rapidly as the excess-bandwidth increases. (The TSQ and TPQ can easily be incorporated into a software overlay layer.) Table 1 illustrates the end-to-end router queueing delays (RQ) for aggregated ON/OFF streams traversing 5 routers across the Nobel-EU backbone network. Each row represents a level of aggregation and excess-bandwidth in the provisioned trunk. Letting γ = 10%, the aggregation of 10,000 ON/OFF streams each requiring 50 Mbps * γ will require a minimum aggregate bandwidth of 50 Gbps. To achieve a small queueing delay in the TSQ and TPQ, let an excess bandwidth of 5% be used, so the provisioned rate is 1.05x50 = 52.5 Gbps. According to Table 1 for an aggregation of 10,000 ON/OFF streams with an excess-bandwidth of 5%, the mean end-to-end router queueing delay is ≤ 2 μsec, which is very low compared to the existing BE-Internet. The mean queueing delays in the TSQ and TPQ are also very small, ≤ 1.1 millisec each. The bandwidth-efficiency and resource-efficiency of this trunk is 95%, considerably higher than possible in the existing BEInternet. Each router buffers on average ≤ 2 packets per trunk, several orders of magnitude less buffering than required for BE traffic flows [12,13]. It is estimated that router buffers represent a reasonable fraction of the cost, size and power dissipation of existing Best-Effort Internet routers [16]. The same methodology has been tested on many other selfsimilar streams, and the TSQ and TPQ queueing delays are
TABLE I E ND - TO -E ND D ELAY B OUNDS FOR AGGREGATED ON/OFF T RAFFIC S TREAMS . Channels
EXBW
Hops
1 10 100 1,000 10,000 105 106
50% 50% 50% 10% 5% 2% 0.3%
5 5 5 5 5 5 5
RQ Delay (millisec) 1.6 0.16 16 μ sec ≤ 2 μ sec ≤ 2 μ sec ≤ 2 μ sec ≤ 2 μ sec
TSQ Delay (millisec) 2.3 sec 0.26 sec 10 10 ≤1 ≤1 ≤1
TPQ Delay (millisec) 1.53 sec 0.17 sec 10 10 ≤1 ≤1 ≤1
consistent with Fig. 3. In general, high levels of aggregation (≥ 1,000 streams) enables multiple self-similar streams to be shaped into a sufficiently smooth traffic flow at the source node. The smoothened traffic flows can be routed by a MaxFlow-Min-Energy routing algorithm and transmitted using the Smooth traffic class in the Future Internet network, to achieve improved throughput, energy-efficiency and QoS for cloud computing systems. V. T HE M AX -F LOW M IN -E NERGY ROUTING A LGORITHM This section applies the Constrained Max-Flow-Min-Cost routing algorithms presented in [13], to route high-capacity trunks between remote data-centers in the Nobel-EU backbone network. Reference [13] presents Linear programming (LP) formulations of the Constrained Max-Flow-Min-Cost routing algorithms. Each trunk is a represented as a unicast commodity between 2 data-centers, which is constrained to flow over a set of feasible edges. An efficient algorithm to determine a feasible edge set for each commodity is presented in [13]. The feasible edge set for each commodity can ”prune out” undesirable high-cost edges, resulting in smaller LPs to solve. Given a network with N nodes, a requested traffic rate matrix
92
RR ∈ RN ×N , specifies ≤ N × (N − 1) unicast commodities to be routed.
Subject to: c ≥ Λc − ∀c ∈ C, ∀e ∈ E c rout (scc ) ≤ Z(e) ∀c ∈ C, ∀e ∈ E c∈C r (e) c c rin (v) = rout (v) ∀c ∈ C, ∀v ∈ V c c rin (sc ) + rout (dc ) =0 ∀c ∈ C rec × Θ(e) y∗ =
A. The Maximum-Flow LP - LP-#1 The Constrained-Maximum-Flow (CMF) LP for commodities c ∈ C is given by Eq. 1. Each commodity c has a sourcedestination pair (sc , dc ) and a requested traffic demand W c , and a lower bound of Lc . The LP will maximize the aggregate flow over all commodities, while attempting to provide each commodity with its requested rate (but not more). Due to capacity constraints some commodities may only receive a fraction of their requested rates. Maximize: r∗
c∈C e∈E
where V = V − (sc , dc ). Constraint 2.1 requires that the maximum-flow rate for commodity c equals the value Λc determined by the CMF LP, within a threshold (typically 10−4 *100Gbps). The remaining constraints are similar to those in Eq. 1.
(1)
C. Routing Results
Subject to: 0 r (e)c c∈C r (e) c rin (v) c rin (sc ) c rout (dc ) c rout (sc ) c rout (sc ) c
≤ rc (e) ∀c ∈ C, ∀e ∈ E c ≤ Z(e) ∀c ∈ C, ∀e ∈ E c ≤ Z(e) ∀c ∈ C, ∀e ∈ E c = rout (v) ∀c ∈ C, ∀v ∈ V c − (sc , dc ) =0 ∀c ∈ C =0 ∀c ∈ C ≥ Lc ∀c ∈ C ≤ Wc ∀c ∈ C r∗ =
(1.1) (1.2) (1.3) (1.4) (1.5) (1.6) (1.7) (1.8)
c rin (dc )
c∈C
Let rc (e) denote the flow rate of commodity c on edge c c e. Let rin (v) and rout (v) denote the total flow rate into / out of node v due to commodity c, respectively. Constraint 1.3 requires that the sum of all commodity flow-rates over an edge e is ≤ the edge capacity Z(e). Constraints 1.5 and 1.6 for all commodities can also be merged into one constraint, to reduce the problem size. Constraints 1.7 and 1.8 ensure that each commodity receives a rate at least Lc and at most W c . The LP is solved and the maximum-flows are determined. B. The Minimum-Energy LP - LP-#2 To obtain the minimum-achievable latency or energy-cost, a second cost-minimization LP is formulated. Let Λc be the maximum-flow rate of commodity c ∈ C between (sc ,dc ), determined by the Constrained-Maximum-Flow LP. Let the cost associated with every edge e ∈ E be given by Θ(e). To minimize latency, let the edge cost be its distance. The latency of an edge is the time-of-flight of an optical signal over the edge. To minimize energy, let the edge cost be the energy required to transmit each Gbps (in Joules). The following Constrained-Minimum-Cost (CMC) LP given in Eq. 2 will minimize the cost (the latency or energy) of the MaximumFlow determined from the CMF LP: Minimize: y ∗
(2)
(2.1) (2.2) (2.3) (2.4)
This section summarizes the routing results. The NobelEU backbone topology with 28 nodes and 82 edges is shown in Fig. 1. A cloud computing network with a data center at each red node in Fig. 1 is assumed. The data centers are located at: {Amsterdam, Brussels, Paris, Hamburg, Frankfurt, Prague, Berlin and Munich}. Let the capacity of most edges in the Nobel-EU network = 100 Gbps, except for several edges between data-centers, which have been upgraded to 200 Gbps. (A single state-of-the-art fiber can support rates of near 1 Tbps using dense WDM.) The communication requirements between data-centers defines a VPN graph, which is to be routed. It was shown in [13] that every network has a finite Bandwidth-Cost product that cannot be exceeded. The Bandwidth-Distance (BD) product of the Nobel-EU network is 3.8064 ∗ 107 Gigabit-kilometers-per-second (Gbkps) = 3.81 Petabit-kilometers-per-second (Pbkps). The Bandwidth-Energy (BE) cost product of the EU network is 40 KJoules (per second). From theorem 1 in [13], necessary conditions for a requested traffic matrix to be achievable are: (i) the requested BD-cost ≤ 3.81 Pbkps, and (ii) the requested BE-cost ≤ 40 KJ. We assume 8 computing data-centers are interconnected over the network as shown in Fig. 1. The 8x8 traffic rate matrix specifies the traffic between these data-centers, which is determined when computational tasks are assigned to datacenters. 100 random traffic rate matrices where generated, in which each HPC data-center communicates with 8 other HPC data-centers, at a peak rate of 96 Gbps. The computing traffic is super-imposed upon a background traffic rate matrix, where each node (of 28 nodes) generates background traffic to 28 other randomly selected nodes, at a combined rate of approx. 56 Gbps. These matrices represent a very-heavy-load scenario, where much of the bisection bandwidth of the NobelEU network is consumed. Note that the Best-Effort Internet cannot operate at such heavily loads, due to its reliance upon significant over-provisioning [2,3]. The 100 traffic demand matrices where routed on a laptop processor, with 2 processing cores with 2.8 GHz clock-rates, and 8 Gbytes of main memory. Several different edge energy-cost models can be created. According to [5], there are ≈ 106 Internet routers each
93
TABLE II M INIMUM -E NERGY-ROUTING , H EAVY-L OAD T RAFFIC M ODEL Algorithm OSPF (DEL) OSPF (DIST) OSPF (HOP) OSPF (E) Max-Flow Min-Energy
BE(D) (KJ) 24.0 ” ” ” 24.0
BE(U) (KJ) 26.4 19.0 17.4 13.0 23.0
Flow percent 93.9 % 80.4 % 76.6% 66.7 % 100 %
edge load (%) 82.6 % 59.6 % 55.3 % 71.1 % 76.6 %
consuming ≈ 4 KW. Let a router of size 8x8 with 100 Gbps links consume 4 KW = 4KJ/sec on average. The energy-cost of a line card (i.e., a fully-utilized IO port) in the router is therefore 500 W = 0.5 KJ/sec (comparable with the power of a Cisco CRS-1 linecard). We will assume 80% of this edge power is proportional to load, and 20% is fixed. The fixed power is ignored in the optimization, as it cannot be changed. Therefore, let the energy-cost of an edge equal 400 J for a 100 Gbps edge, or 4 J/Gbps. An energy-efficient routing algorithm can use least-energy paths to route flows. Several energy-costs were associated with the edges in E, with 1/3 edges having an energy-cost of 66.6% lower (i.e., 1.33 J/Gbps), with 1/3 edges having an energy-cost of 4 J/Gbps, and 1/3 edges having an energy-cost of 66.6% higher (i.e., 6.664 J/Gbps). Table 2 shows the routing results. Three versions of the existing Open Shortest Path First (OSPF) routing algorithm were considered for comparisons. OSPF(E) will route traffic flows along lowest-energy paths first. OSPF(DEL,DIST,HOP) will route traffic flows along shortest delay, distance and hop paths respectively. The delay of an edge is modelled as its MM1 queueing delay. The distance of an edge reflects its physical distance in kilometers. In the OSPF(HOP) model, each edge incurs a cost of 1 hop. The Max-Flow Min-Energy LP will find a MaximumFlow with minimum energy-cost. In table 2, all flows are expressed relative to the Max-Flow determined by LP #1. According to table 2, the Max-Flow Min-Energy LP achieves the largest aggregate flow (100%), with lowest possible energy requirements. In contrast, OSPF(E) achieves a much lower aggregate flow (66.7%), with proportionally lower bandwidthenergy costs and edge loads. The proposed algorithms have been tested on numerous other network topologies and traffic patterns. The proposed LPs can result in considerably better energy-efficiencies, resource utilizations and edge loads than possible with other routing algorithms, especially at higher loads. We also note that the Best-Effort Internet cannot possibly operate at such high link loads. VI. E NERGY AND C OST S AVINGS We use published data to estimate the cost and energy savings of the proposed technologies. According to Roomey [6], world-wide Internet data-centers consumed about 157.5 billion KW-hrs of energy in the year 2005. According to [5], world-wide Internet routers and LAN devices currently consume about 60 billion KW-hrs of energy. Using published
94
Fig. 4. Edge loads in the Nobel-EU European Network (mean load for OSPF(E)= 71.1%).
industrial electricity rates of 7 cents per KW-hr, the annual operating expense for energy is about $15.3 Billion US. Ref. [6] has estimated that energy costs can be reduced by up to 70% through improved designs and technologies. Ref. [6] indicates utilizations of 30..70% for data-centers. Using 50% utilization, the cost of data-center inefficiencies is estimated at $5.5 Billion US per year. Cloud Computing can reduce this cost, by increasing the utilization of datacenter servers. Using an optimistic 50% utilization for the BE-Internet, the cost of BE-Internet inefficiencies is estimated at $2.2 Billion US per year. Note that Internet delays can contribute significantly to the poor utilization of data-centers, thereby contributing directly to the high costs of data-center inefficiencies. There are several sources of improved performance and energy-efficiency due to the proposed Future Internet: (i) Without using aggregation, several thousand guaranteed-rate connections could be reserved for several thousand individual bursty ON/OFF streams between data-centers. Referring to Table 1, the delay for 1 bursty traffic flow is ≈ 2 seconds at 50% over-provisioning. To achieve millisecond delays, the over-provisioning will be several hundred percent. By aggregating several thousand ON/OFF traffic flows into one guaranteed-rate trunk, and using only 5% excess bandwidth to accommodate bursts, the queueing delay is reduced to ≈ 1 millisec (at the TSQ and TPQ). The queueing delays are reduced by 1-2 orders of magnitude. (ii) Another component of energysavings results from removing the need to significantly overprovision the Internet links. Using the Smooth traffic class, trunks can carry traffic at high loads (95% in this example) and achieve deterministic and essentially-perfect latency and QoS guarantees. In contrast, the BE-Internet operates links carrying time-sensitive traffic at light loads, typically ≤ 33% loads [2,3], and achieves weak latency, energy-efficiency and QoS guarantees. The aggregate capacity and energy efficiency of the BE-Internet can increase significantly by using the Smooth traffic class. (iii) The largest component of energy savings is
the improved utilization and efficiency of the HPC machines in each Internet data-center. The communication delays between remote data-centers over the Nobel-EU backbone network can be reduced to ≤ 5 millisec, and multithreading can now be used to hide these latencies. Significantly less time is spent stalled on network IO delays, resulting in improved utilization and energy-efficiency in the data-centers. Poor utilization in Internet data-centers results in a large source of energy inefficiencies. Cloud Computing can efficiently exploit the idle periods in data-centers and improve utilization, but only if the networking latency is low. Finally, we note that these annual operating costs do not include the capital costs associated with an inefficient BEInternet. Each year billions of dollars are spent deploying more Best-Effort Internet capacity which is inefficiently utilized. According to the 2012 annual report, Cisco’s sales of switching and routing equipment was $22 Billion US. Assuming a 50% utilization, the annual capital costs of networking inefficiency is $11 Billion US, larger than the annual operating costs of networking inefficiency ($7.7 Billion US). Together, the cost of inefficiencies are estimated at $18.7 Billion US per year. Cloud computing using the proposed Future Internet can reduce these inefficiencies and costs, while simultaneously achieving significantly improved QoS guarantees for cloud services. A. Bisection Bandwidth Current optical fiber cables used in continental networks can contain 100s of fibers, since the cost of installing the cables is very high. Only a small fraction of the existing fiber is activated, and the un-used fiber is called ”dark fiber”. The bisection bandwidth of a network can be increased dramatically, by activating dark fiber as needed. Referring to Fig. 1, if each edge contains as few as 100 fibers each capable of achieving 1 Tbps, then the bisection bandwidth can approach several hundred Tbps. Recall that an Exascale HPC machine using today’s technology would require about 50 Cray Titan machines, with a capital cost estimated at $5 Billion US. By exploiting Cloud Computing and using existing Internet datacenters, the upfront capital costs can be lowered significantly, and the cost savings can be invested into increasing the bisection bandwidth of the network. In the search to achieve exa-scale computing, a dedicated HPC machine will have front-end capital costs approaching several billion dollars. Alternatively, a cloud computing solution can utilize existing Internet data-centers, and activate available dark fiber to provide the bisection bandwidth. The cloud computing service would be available as a shared resource on a pay-for-use model, without the extensive frontend capital costs of dedicated HPC machines, and its cost will be distributed over numerous industries, including the pharmaceutical industry, industrial HPC and computational biology/genomics, gaming, and entertainment. VII. CONCLUSIONS
Future-Internet network which combines several technologies is used to establish highly-efficient guaranteed-rate connections between data-centers. A Maximum-Flow MinimumEnergy routing algorithm is used to route the connections between data-centers, with minimum energy requirements. The bursty ON/OFF communications between data-centers are aggregated and smoothened, and transmitted onto the trunks to achieve exceptionally low latencies, with significantly improved bandwidth-efficiency and energy-efficiency. Using the proposed Future-Internet, the average edge latencies over the Nobel-EU network are ≈ 2 millisec, lower than hard disk drive latencies, and multithreading can be used to hide these latencies. Existing dark-fiber over the Nobel-EU network can also be activated to provide bisection bandwidth. By exploiting exceptionally low-latency connections, high bisectionbandwidth and multithreading, the utilization of cloud datacenters will improve, resulting in improved overall performance and energy-efficiency. We conclude that the proposed Future Internet network can improve the utilization and energy efficiency of both the communications and computations in Cloud Computing systems. R EFERENCES [1] J. McKendrick, ”Cloud Computing Not Quite Ready for the Labs: US Government Report”, Forbes, Jan. 19, 2012 [2] P. Gevros, J. Crowcroft, P. Kerstein and S. Bhatti, ”Congestion Control Mechanisms and the Best-Effort Service Model”, IEEE Network Mag., May/June 2001 [3] V. Joseph and B. Chapman, ”Deploying QoS for Cisco IP and NextGeneration Networks: The Definitive Guide”, Elsevier 2009 [4] R. Bolla et al, ”Energy Efficiency in the Future Internet: A Survey of Existing Approaches and Trends in Energy-Aware Fixed Network Infrastructures”, IEEE Comm. Surveys and Tutorials, 2Q, 2011 [5] B. Raghavan and J. Ma, ”The Energy and Emergy of the Internet”, Hotnets 2011 [6] J.G. Roomey, ”Worldwide Electricity Used in Data-Centers”, 2008 Environ. Res. Letters, IOP Science. [7] T. Benson, A. Akella and D.A. Maltz, ”Network Traffic Characteristics of Data Centers in the Wild”, ACM IMC’10, Nov. 2010, Melbourne, Australia [8] T. Leighton and S. Rao, ”Multicommodity Max-Flow Min-Cut Theorems and their use in Designing Approximation Algorithms”, JACM, Vol. 46, No. 6, Nov. 1999 [9] A. Leon-Garcia and I. Widjaja, ”Communication Networks: Fundamental Concepts and Key Architectures”, McGraw-Hill, 2nd Ed., 2004 [10] T.H. Szymanski, ”A Low-Jitter Guaranteed-Rate Scheduling Algorithm for Packet-Switched IP Routers”, IEEE Trans. Comm., Vol. 57, No. 11, Nov. 2009. [11] I. Keslassy, M. Kodialam, T.V. Lakshman and D. Stilliadis, ”On Guaranteed Smooth Scheduling for Input-Queued Switches”, IEEE/ACM Trans. Networking, Vol. 13, No. 6, Dec. 2005 [12] T.H. Szymanski and D. Gilbert, ”Provisioning Mission-Critical Telerobotic Control Systems over Internet Backbone Networks with Essentially-Perfect QoS”, IEEE JSAC, Vol. 28, No. 5, June 2010 [13] T.H. Szymanski, ”Max-Flow Min-Cost Routing in a Future Internet with Improved QoS Guarantees”, IEEE Trans. Comm, Vol. 61, No. 4, April 2013. [14] IETF RFC 5977, ”RMD-QOSM: The NSIS Quality-of-Service Model for Resource Management in Diffserv”, www.ietf.org, Oct. 2010 [15] T.H. Szymanski, ”Low Latency Energy Efficient Communications for Global Scale Cloud Computing Systems”, ACM HPDC Workshop on Energy Efficient HPDC, New York, June 2013. [16] S. Iyer, RR. Kompella, N. Mckeown, ”Designing Packet Buffers for Router Linecards”, IEEE Trans. Networking, Vol. 16, No. 3, June 2008
This paper explores the feasibility of Exascale Cloud Computing systems over the Nobel-EU backbone network. A
95