1
Power Saving Mechanism based on Simple Moving Average for 802.3ad Link Aggregation Hideaki Imaizumi1 , Tomohiro Nagata2 , Goro Kunito2 , Kenichi Yamazaki2 ,Hiroyuki Morikawa1 Research Center for Advanced Science, The University of Tokyo, Tokyo, Japan Research Laboratories, NTT DoCoMo, Inc., Japan E-mail:
[email protected]
Abstract— In this paper, we propose a power saving mechanism based on a simple moving average of the traffic rate in 802.3ad link aggregation. Two key components comprise the proposed mechanism: (1) a negotiation protocol for both nodes connected to an 802.3ad aggregated link and (2) an algorithm to estimate an appropriate number of active links to comprise the aggregated link in accordance with the current rate of traffic outbound on the link. We present an evaluation of the characteristics of the algorithm and its performance through simulation with real traffic trace data. Our simulations demonstrate that under certain conditions the proposed algorithm is capable of reducing the average number of active links by as much as 25.4% (6.7 W out of 26.4 W). Index Terms— Green Internet, Simple Moving Average, 802.3ad Link Aggregation
I. I NTRODUCTION The threat of man-made climate change is currently forcing many industries to reduce both energy consumption and carbon emission; computer networking is no exception. According to the Japanese Ministry of Economy, Trade and Industry (METI) [1], the average Internet traffic volume observed in Japan for the calendar year 2006 was 637 Gbps and projections indicate that by 2025, it will exceed 121 Tbps; projections also indicate that network-related power consumption will increase by approximately 12 times in the same time span. This will cause an increase in energy usage and carbon emission. While the traditional method of reducing power consumption has been to manufacture chips using increasingly small semiconductor fabrication processes technologies such as that used in the recent move to 45 nm, it is well known that increasing current leakage prevents this technique from being carried much further [2]. As a result, much research is now being conducted toward novel means for reducing power consumption in Internet routers and switches, which rely on such process technologies. A variety of methods have been proposed based on the observation that average utilization in Internet backbones and LANs is approximately 15% and 1%, respectively [3]. Gupta’s seminal work [4] in this area proposed a trafficadaptive approach for links, switches, and routers, and demonstrated the existence of inter-packet gap potential for reducing power consumption. Later approaches include dynamic Ethernet link or switch shutdown during non-existence of incoming traffic [5], [6], [7], dynamic adaptive link rate for incoming traffic [9], [10], an approach combining these
two approaches [11], dynamic adaptation of internal clocks in switches [2] and shrinkable/expandable virtual networks with live router migration [12]. Moreover, several varying approaches have been proposed for a future Internet which drastically reduce power consumption through the use of using optics and super-conductivity [13], [14], [15]. In this paper, we propose a traffic-adaptive approach for IEEE 802.3ad link aggregation [16]. IEEE 802.3ad is a technique widely used for providing flexible link bandwidth by aggregating multiple Ethernet links, each providing a limited link bandwidth capacity such as 10, 100, 1000 Mbps, or 10 Gbps. Our approach changes the number of active links based on a simple moving average in accordance with the current rate of traffic outbound onto the aggregated link. In the next section, we design a protocol for two devices to negotiate the number of active links and provide detailed discussion of an algorithm for estimating an appropriate number of active links; in Section III, we provide an evaluation of its characteristics and performance through simulation using real traffic trace data; in Section IV, we summarize the paper and discuss future work. II. P OWER S AVING M ECHANISM BASED ON S IMPLE M OVING AVERAGE In our proposed mechanism, each node connected to a target aggregated link measures the traffic rate bound for the other node via the target link as Figure 1 illustrates. In accordance with the traffic rate, an aggregator bound to an aggregated link in each node periodically estimates an appropriate number of active links based on a simple moving average method and negotiates with the other node in order to determine an appropriate number of active links for both nodes. In this section, we discuss the design of the negotiation protocol and describe the algorithm for determining the appropriate number of active links. A. Traffic-Adaptive Negotiation Protocol Because traffic is bi-directional, shutdown of a link by one side may affect the other; accordingly, it is inappropriate for one of the two nodes sharing an aggregated link to arbitrarily shut down a link without the agreement of the other. Although it would be possible to support such “uni-directional” link state where one of the two nodes shuts a link down and data is transmitted only unidirectionally, in this paper, in order to
Linkup S ign (addition al al)
Complete
Linkdow
n
n Linkdow
Decrease Procedure
Procedure for Changing the Number of Active Links
Our protocol consists of four messages: Offer, Accept, Deny, and Complete. These messages are sent through a default link—one of the aggregated links that is always active. When link failure occurs in the default link, the next link must take over from the failed one. Here, we explain the procedure for changing the number of active links, with reference to Figure 2. The left-hand side of Figure 2 depicts the procedure for increasing the active link number to N . Each node monitors
t Se n
r R cvd Of fe
De ny
Ac Rc cep vd t
/ Li nkd own eR cvd plet Com
N
Emptying Buffer
ignal
ignal Linkup S
Accept N
Linkdown Delay
Linkup S
Linkdown Delay Emptying Buffer
(+α)
Accept N
Node B Offer N
Linkup Delay
Linkup Delay Linkup Delay
Offer N
Increase Procedure
Fig. 2.
Node A
y
Node B
Fig. 3.
Buffer Empty
Ofr Sent
mpt
Node A
Local Ready
Offer Rcvd (Npeer < Nlocal)
er E
simplify the implementation of our method, we assume that no link of the aggregated set of links may be used “unidirectionally’. We note, however, that this does not affect the functionality of the links—they may certainly be still be used uni-directionally—this only affects how their activation and shutdown is handled. The Link Aggregation Control Protocol (LACP) [16] is already defined for management of aggregated links. LACP mainly maintains a relationship between an aggregator and physical links and works on each physical link independently. To maximize power efficiency, both network interfaces connected via a physical link should be deactivated. In order to wake up the network interface of the other node, some notification through another link in the active state is required. The negotiation protocol introduced herein provides the out-ofband communication for such notification. It should be noted that it is possible for our proposed mechanism to be applied to LACP via the addition of some manner of wakeup extension. However, because LACP operates independently on each link, each network interface must be ready to be activated even when the given link is in sleep mode, therefore consideration must be made of the power wasted by receivers waiting in standby mode for wakeup messages.
Offer Rcvd (Npeer > Nlocal)
pt ce Ac ent S
Power Saving Mechanism for Link Aggregation
Ofr Recv
Linkup
Sleep Link
Buff
Node B Active Link
vd
I/F 4
Node A
Fig. 1.
I/F 3
t Sen er
I/F 4
IDLE Traffic
Rc
I/F 3
I/F 2
ny
Aggregated Logical Link
Off
Aggregator
I/F 1
I/F 2
De
Traffic
I/F 1
Aggregator
2
Wait Complete Rcvd
Peer Ready
State Transition Diagram for the Protocol
its outgoing traffic rate to the other as well as its occupied buffer size in accordance with a pre-determined sampling rate and calculates an appropriate number of active links. When Node A detects an increase in traffic towards Node B, Node A sends B an Offer message to change the number of active links to N . In order to avoid packet loss due to a rapid increase of traffic, Node B cannot refuse the offer. Accordingly, immediately after sending the message, Node A begins the process of activating corresponding links, which are currently inactive. After receiving the message, Node B executes the algorithm to calculate an appropriate number of active links based on the rate of traffic bound for Node A in order to know if more active links than N will be necessary. The result is sent back to Node A as an Accept message and Node B then begins the process of activating the corresponding inactive links. Each node confirms the activation and readiness of the corresponding inactive links and the procedure then terminates. The right-hand side of Figure 2 depicts the procedure for reducing the number of active links to N . Similar to the procedure just described, Node A first sends Node B an Offer message. Node B then calculates the necessary number of active links to Node A. If the number is greater than N , Node B sends a Deny message back to A and the transaction terminates. Otherwise, Node B sends an Accept message back to A, changes its packet scheduling policy to stop using the corresponding links, and waits until the buffer of each link is empty. After the buffers are empty, Node B sends A a Complete message. Each link in an aggregated link can be shut down only when link buffers for the link at both ends are empty. Therefore, both nodes must ensure two conditions before deactivation: (1) the link buffer in the node is empty and (2) a Complete message for the link has been received from the other side. The state transition diagram for the protocol is shown in Figure 3. A situation of note is when both nodes send Offer messages simultaneously. In this case, both nodes may wait indefinitely for a response from the other, creating a deadlock state. In order to prevent deadlock, when a node has an Offer message from a peer but is still awaiting a response to its own Offer message from said peer, if the offer message it has received has a greater number of active links than the one it
3
has sent, it must immediately transition to the OfrRecv state. If the numbers are equal, the MAC addresses of the aggregated link are taken as numeric values and used similarly to break the tie. Frame distribution, which determines a physical link delivering an incoming packet, is one of the key functions for avoiding packet reordering during link state transition. Packet reordering would be able to be avoided by scheduling with careful consideration given to link buffer size and link delay.
(a) ISP A (00:00-02:00)
(b) ISP A (15:00-1700)
(c) ISP B (00:00-02:00)
(d) ISP B (15:00-1700)
B. Link Number Estimation Based on Simple Moving Average Here, we detail the algorithm for estimating an appropriate number of active links based on a simple moving average. In order to suppress excessively frequent changes in the number of active links, the algorithm changes the number smoothly in accordance with an average traffic rate calculated using multiple data samples of the traffic. However, in order to avoid frame loss and unnecessary buffering delay, the number of active links will be increased as soon as possible if the size of the buffer currently in use exceeds a given threshold. Performance of the algorithm depends on two parameters: α and β. Parameter α is an additional coefficient multiplied by the average traffic rate described above for calculating the number of active links. The parameter α provides for redundancy of the number of links if α > 1.0. Parameter β (sec) represents the amount of allowable buffering delay. The threshold of used buffer size is determined by β, Ncur (1, 2, ..., n) and Rlink (bps), where Ncur and Rlink are the current number of active links and the link bandwidth, respectively. The threshold Mthres (bytes) is calculated using the following equation. Mthres
βNcur Rlink = 8
(1)
Let S, T , and Ravg (x) (0 Mthres ) (Ncur = Ncndt ) (Ncur < Ncndt ) (Ncur > Ncndt )
(4)
Fig. 4.
Traffic Trace (Sampling Intval = 100 ms)
n Ndecl =
d(Ncur + Ncndt )/2e b(Ncur + Ncndt )/2c
(Ravg (S − 1) < Ravg (bS/2c − 1)) (otherwise)
(5) In essence, when the traffic rate is increasing, the next number of active links is calculated with the higher average rate as Equation (4) In contrast, when traffic rate is decreasing, the next number is given by subtracting a half of the difference between Ncur and Ncndt from Ncur as Equation (5). III. E VALUATION Here, we clarify the characteristics of the proposed algorithm and present an evaluation of its performance via simulation with real traffic trace data. We exploit traffic trace data towards two different ISPs from the WIDE project [18] and shrink the inter-frame interval in order to fit the traffic to about 3 Gbps. The properties of the modified traffic trace data are shown in Figure 4. In order to evaluate the performance of the algorithm itself, we use this traffic trace data as unidirectional traffic in this experiment. Therefore, one of two nodes works in passive mode. Parameters of the simulation are shown in Table I. Linkup Delay and Link-down Delay refer to the amount of time required for a link to change its state between ready-mode and sleep-mode. Although Crock Data Recovery (CDR) would usually require a relatively long time in the link-up process in order to synchronize with incoming signal, we assume that the most of the time required can be eliminated by employing burst-mode receivers exploited in IEEE 802.3av [19], [20]. Moreoever, auto-negotiation should be disabled to avoid additional delay. In order to understand the characteristics of the algorithm, we conducted various simulations, while changing four parameters: Sampling Interval (T), α, β, and Link-up Delay (LUD). There were three main evaluation targets: (1) average number of active links, (2) the number of link state transitions, and (3) packet drop rate and average buffering delay. Target (1)
4
TABLE I S IMULATION PARAMETERS Parameter Sampling Interval (T)) Max # of Links Link Bandwidth Link Delay Buffer Memory Link TX Buffer Link-up Delay Link-down Delay S α β
Value 5-500ms 8 1 Gbps 1 ms 32 Mbytes 4 Kbytes 1-50 ms 500 us 8 0.95-1.5 2-50ms
directly represents power saving. Target (2) indicates potential damage to circuits due to an inrush of current caused by link activation. Target (3) represents the quality of the aggregated link as a communication line. Reasonable assumptions were made regarding buffer size, link availability and link activation time, and our algorithm suffered no packet drop in any of our simulations. In order to more correctly evaluate Target (1), we introduce several parameters related to the power consumption of Ethernet links. According to Gunaratne’s work [17], the power consumption of 1 Gbps Ethernet link in PCs and switches is approximately 3 W and 1.8 W, respectively. The power consumption in the PCs and switches increases linerly with the number of 1 Gbps Ethernet links. We assume that the aggregated link in this simulation connects a PC and a switch. Therefore, the total power consumption of the aggregated link is approximately 8 × (3 + 1.8) = 38.4 W . A. Influence of Sampling Interval (T ) First, let us clarify the influence of modifying the sampling interval T . The results upon modification of T from 5 ms to 500 ms while α, β and LU D are fixed to 1.0, 5 ms and 1 ms, respectively are shown in Figure 5. The graph in Figure 4 (a) illustrates the average number of active links at given sampling interval values for each traffic trace; Figure 4 (b) depicts the ratio between average aggregated link bandwidth and average offered traffic at given sampling interval values for each traffic trace. These graphs indicate that more frequent sampling can decrease the necessary number of active links required by the algorithm and that no significant changes in the number of active links occur in intervals longer than 50 ms. Figure 4 (c) and illustrates the total number of link activation events at each sampling interval value; Figure 4 (d) depicts the percentage of link-up events caused by the parameter β at each sampling interval value. These graphs indicate that frequent sampling—particularly at intervals shorter than 50 ms—trigger increasingly more events related to adaptation to the traffic volume. At sampling intervals longer than about 50 ms, most of the events are caused by parameter β. The total average delay incurred within the buffer memory of the aggregated link and the standard deviation of this delay are shown in Figures 4 (e) and (f). These results indicate that a longer sampling interval generally leads to a larger buffering delay and standard deviation.
(a) Avg. # of active links
(b) Ratio of (a) to traffic rate
(c) # of link activation events (d) Ratio of β events to (c)
(e) Avg. total delay
(f) Stddev of (e)
Fig. 5. Inuence of Sampling Interval T (α = 1.0, β = 5ms, LU D = 1ms)
To sum our results in this evaluation, more frequent sampling can provide better performance in terms of the number of active links and buffering delay, although it tends to induce a greater number of link activation events. Highly frequent transition of link state may be limited by several implementation issues such as Link-up Delay (LUD) and potential of inrush current. Sampling intervals between 50 ms and 100 ms would seem reasonable to avoid such issues without sacrificing performance. In order to clarify the influence of other parameters, because of the reasonably small number of link activation events it causes and its low buffering delay, hereinafter, we choose 100 ms as the sampling interval for simulations. B. Impact of Parameter α and β Herein, we evaluate the impact caused by two main parameters of our proposed algorithm: α and β. The parameters T and LU D are fixed to 100 ms and 1 ms, respectively in these experiments. First, we conducted a simulation wherein α was changed from 0.95 to 1.5 while β was fixed to 5 ms. The results are shown in Figure 6. Each average number of active links
5
(a) Avg. # of active links
(b) Ratio of (a) to traffic rate
(c) # of link activation events (d) Ratio of β events to (c)
(e) Avg. total delay Fig. 6.
(f) Stddev of (e)
Impacts of Parameter α (β = 5ms)
increases so linearly that the ratios fit to the equation f (α) = aα+b shown in the Figure 6 (b) by application of least squares method. There seems to be no significant difference between the number of link activation events at α > 1.1. However, the ratio of β to all the events drastically decreases until α = 1.3. This indicates that an increase of parameter α has the effect of preventing buffering delay or packet drops caused by sudden traffic burst. Buffering delay and its standard deviation are converged at around α > 1.2. As a result, this algorithm reduced the average number of active links by 46% with small buffering delay under conditions wherein α = 1.2 and β =5 ms. This means that the algorithm reduced the power consumption by 17.7 W when the total power consumption was 38.4 W. It reduced the average number of active links by approximately 25.4% with the same parameters even when the available number of links is the minimum number of links for accommodating the maximum instantaneous traffic rate. This indicate that the algorithm reduced power consumption by 6.7 W when the average total power consumption was 26.4 W. This algorithm could perform more effectively under some environments where traffic rates vary widely depending upon the time of day. We now move on the evaluation of the parameter β. Parameter α is fixed to 1.0. The results are depicted in Figure 7. When
(a) Avg. # of active links
(b) Ratio of (a) to traffic rate
(c) # of link activation events (d) Ratio of β events to (c)
(e) Avg. total delay Fig. 7.
(f) Stddev of (e)
Impacts of Parameter β (α = 1.0)
compared with α, an increase of β does not have a significant impact on the number of active links. However, it induces a smaller number of link activation events and it reduces the ratio of parameter β. This indicates that the threshold of accepted delay can suppress frequent changes in the number of active links. At the same time, it increases buffering delay as well as its standard deviation. In terms of buffering delay and its variance, parameter β should be set such that it is less than 15ms. In terms of overall results, parameter α is a greater factor in determining performance than parameter β. However, parameter β is still useful in suppressing frequent link state transitions. C. Influence of Link-up Delay (LU D) One major concern is how fast links in a sleep state can be activated in terms of implementation because the linkup delay (LU D) directly affects performance in terms of buffering delay and its variance. Although we have assumed that LU D = 1 ms in our simulations thus far, we would also like to estimate its effect on the performance of the algorithm. Accordingly, we conducted a simulation with various values for LU D under several scenarios evaluated buffering delay and its standard deviation for each scenario. The results are
6
traffic as well as investigating the influence of link delay on performance. R EFERENCES
Avg. delay (α = 1.0, β = 5ms) Stddev (α = 1.0, β = 5ms)
Avg. delay (α = 1.3, β = 5ms) Stddev (α = 1.3, β = 5ms)
Avg. delay (α = 1.5, β = 5ms) Stddev (α = 1.5, β = 5ms) Fig. 8.
Influence of Linkup Delay
displayed in Figure 8. The average buffering delay and standard deviation when α = 1.0 and beta =5 ms increase almost linearly as LU D increases. However, under other conditions with greater values of α, the influence diminishes significantly. This indicates that longer link-up delay can be accommodated without incurring significant side effects such as an increase in buffering delay and buffering delay variance by setting parameter α to values greater than 1.3. IV. S UMMARY In this paper, we proposed a power saving mechanism based on a simple moving average of traffic rate for 802.3ad link aggregation. As key components for the mechanism, we designed a negotiation protocol to allow both nodes to agree on an appropriate number of active links within the aggregated link and an algorithm to determine the appropriate number of active links in accordance with the incoming traffic rate. Further, we described its characteristics and discussed simulations of its performance conducted with actual traffic data which demonstrate that the mechanism, with parameters α > 1.2, β = 5ms, is capable of reducing the number of active links by approximately 25.4% (6.7 W out of 26.4 W) with reasonable buffering delay and buffering delay variance. We look forward to evaluating the mechanism with bi-directional
[1] T. Hoshino, Plenary Talk, Green IT Symposium, Aug. 2007 [2] M. Yamada, T. Yazaki, N. Ikeda, “Technologies to save power for carrier class routers and switches,” SAINT 2008, Jul. 2008 [3] A. Odlyzko, “Data Networks are Lightly Utilized, and will Stay that Way,” Review of Network Economics, vol. 2(3), 210-237, Sep. 2003 [4] M. Gupta, S. Singh, “Greening of the Internet,” ACM SIGCOMM 2003, Aug. 2003 [5] M. Gupta, S. Singh, “Using Low-power Modes for Energy Conservation in Ethernet LANs”, IEEE INFOCOM 2007, May 2007 [6] M. Gupta, S. Singh, “Dynamic Ethernet Link Shutdown for Energy Conservation on Ethernet Links”, IEEE ICC 2007, Jun. 2007 [7] M. Gupta, S. Grover, S. Singh, “A Feasibility Study for Power Management in LAN switches”, IEEE ICNP 2004, Oct. 2004 [8] H. Tamura, Y. Yahiro, Y. Fukuda, K. Kawahara, Y. Oie, “Performance Analysis of Energy Saving scheme with Extra Active Period for LAN Switches,” IEEE Globecom 2007, Nov. 2007 [9] C. Gunaratne, K. Christensen, B. Nordman, S. Suen, “Reducing the Energy Consumption of Ethernet with Adaptive Link Rate (ALR)“, IEEE Trans. Computers, vol.57, no.4, Apr. 2008. [10] “IEEE P802.3az Task Force: Energy Efficient Ethernet”, http://ieee802.org/3/az/public/index.html [11] S. Nedevschi, L. Popa, G. Iannaccone, S. Ratnasamy, D. Wetherall, “Reducing Network Energy Consumption via Sleeping and Rate Adaptation,” Proc. 5th USENIX Symposium on Networked Systems Design and Implementation, Apr. 2008. [12] Y. Wang, E. Keller, B. Biskeborn, J. Merwe, J. Rexford, “Virtual Routers on the Move: Live Router Migration as a Network-Management Primitive,” ACM SIGCOMM 2008, Aug. 2008 [13] I. Keslassy, S. Chuang, K. Yu, “Scaling Internet Routers Using Optics,” ACM SIGCOMM 2003, Aug. 2003 [14] S. Namiki, T. Hasama, M. Mori, M. Watanabe, H. Ishikawa, “Dynamic Optical Path Switching for Ultra-Low Energy Consumption and Its Enabling Device Technologies,” SAINT 2008, Jul. 2008 [15] Y. Kameda, Y. Hashimoto, S. Yorozu, “Design and Demonstration of a 4 x4 SFQ Network Switch Prototype System and 10-Gbps Bit-Error-Rate Measurement,” IEICE Trans Electron 2008 E91-C, 325-332, 2008 [16] IEEE Standard, 802.3 LAN/MAN CSMA/CD Access Method, The Institute of Electrical andElectronics Engineers, Inc. [17] C. Gunaratne, K. Christensen, B. Nordman, “Managing energy consumption costs in desktop PCs and LAN switches with proxying, split TCP connections, and scaling of link speed,” International Journal of Network Management, Vol. 15, No. 5, pp.297-310, Sep. 2005. [18] WIDE Traffic Archives, http://tracer.csl.sony.co.jp/mawi/, Jan. 2009 [19] K. Ikezawa, H. Sugawara, T. Izawa, T. Suzuki, Y. Akasaka, A. Toyama, S. Uneme, S. Oka, T. Yakihara, A. Miura, “10G bps burst-mode clock recovery with synchronization time of 50ps,” In 34th European Conference on Optical Communications (ECOC 2008), Sep. 2008. [20] IEEE P802.3av Task Force, http://www.ieee802.org/3/av/