New Approach for Flow Control using PAUSE Frame Management Bahareh Pahlevanzadeh, Seyed Amin Hosseini Seno, Tat-Chee Wan, Rahmat Budiarto NAv6 Center, School of Computer Sciences, Universiti Sains Malaysia, Malaysia
[email protected] [email protected] [email protected] [email protected]
Abstract— Day by day Internet communication and services are experiencing an increase in variety and quantity in their capacity and demand. Thus, making traffic management and quality of service approaches for optimization of the Internet become a challenging area of research; meanwhile flow control and congestion control will be considered as significant fundamentals for the traffic control especially on the high speed network (in the range of Gigabit Ethernet family). The existing IEEE802.3x standard which is applicable to all version of full-duplex Ethernet uses PAUSE frames as MAC control frames to enable or disable frame transmission. As a result regulating data flow at the low level of OSI model assists in minimizing frame loss and preventing latency due to error recovery at the higher layer protocols. However some problems encountered; for instance, it does not distinguish the well-behaved stations from misbehaved stations and it causes a head-of-line problem. Moreover, there is no priority awareness all traffic get equal punishment. The adding overhead and other problems associated with IEEE802.3x have made it become inefficient. Realizing the studies to overcome the problems become important has motivated us to propose a new format of PAUSE frame through an intelligent switch, which includes some new flexible programmable fields, such as predetermined pause-time for particular traffics. So our proposed intelligent switches with its new PAUSE frame are able to distinguish between different types of traffic and in turn, both new PAUSE frame and intelligent switch together provide an efficient flow control approach for high speed networks.
Keywords— Congestion control, IEEE802.3x flow control, Intelligent switch, Priority.
I. INTRODUCTION In all the switched local area network (LAN) there is a network switch for supplying data frames between network stations. The task of switch is transitory data frames received from a source station to a receiver station according to the header information of the received frame. Meanwhile network congestion may occur if the transmission rate of transmitting device is greater than the receiving device or if at the same
time it is receiving from other transmitting device, so it is unable to receive these data frames. So flow control has been proposed to reduce network congestion, where a source station temporary suspends transmission of data packets. The initial proposed flow control for a full-duplex Ethernet, referred to as IEEE802.3x, which by generating of flow control message called PAUSE frame can slow down the transmitter. However, there is a need for arrangement in a network switch for selectively outputting data frame from a source to a destination that are using flow control to throttle source bandwidth or data flow to prevent buffer overflow and dropped packet. Initially IEEE802.3x standard is supposed to avoid overflow, but because it inserts dead time on the links, it increases bandwidth loss and jitter, consequently it breaks QoS mechanism. Meanwhile during PAUSE-state, no data traffic at all regardless of priority and all the priorities of traffic get equal punishment, so it creates challenges for differential services to various flows. In addition, due to dynamic nature of network traffic, it is difficult to make decision which sources to hold off when the buffer get overflow. Discriminating delay-sensitive from non-delay sensitive traffic is another difficulty due to link-based flow control in IEEE802.3x standard, which means each station that contributing traffic to the link is affected whether or not that station is a source of the congestion, and as a result there is no respect to the priority of traffic on the link. The existing standard is not so efficient and beside of its advantages provides some drawback for the network. However, potentially there is a possibility to upgrade the existing model and propose a new PAUSE method that could provide an efficient flow control for full-duplex Ethernet. We need to expand the link-level flow control to station-level in our proposed method, in order to provide a better flow control based on traffic priority. Thus, the new proposed method enhances the existing IEEE802.3x PAUSE method to avoid punishing well-behaved traffic sources and however it provides a high throughput to the network. Furthermore, proposing a new structure of an intelligent switch for generating a new PAUSE frame is required and will be discussed in the next two sections.
The remainder of this article is organized as follows. We explain some background information about flow control and IEEE802.3x MAC control frame in section 2. After highlighting the problem statement; and then some proposed solution from other researcher for overcoming the problem in section 2; we explain about our new proposed PAUSE frame that is generated through the new structure of our proposed intelligent switch in section 3. Finally, we provide a conclusion in section 4. II. THEORETICAL BACKGROUND AND RELATED WORKS A. Flow Control Some authors preserve the term “flow control” for the transport level, and refer to the other levels of control as congestion control. This terminology is used to emphasize the physical distinction between the first three levels and the fourth level [1]. However in this paper, we have chosen to use the term flow control for the second layer also, which is our focused area of research. Flow control is one of the most significant responsibilities of data link layer, in which refers to a set of procedures used to limit the quantity of data that sender can send before waiting for acknowledgement [2]. There are varieties of flow control protocols in transport layer as well as data link layer that are shown in Fig. 1. Any receiving computer has a limited speed (at which it can process incoming data) and a limited amount of memory (in which to store incoming data). Before these restrictions are reached the receiving device must be able to inform the source device and request to halt transmission until it is once again able to receive. So the purpose of the flow control mechanism is to place the data transfer at an acceptable speed and to solve the incompatibility of the speed of transmission from a fast sender to a slow receiver or on the other hand managing the rate of transmission data between two network devices. The main functions and goals of flow control in a packet network are [3]:
delimiter (SFD) [4]. But an explicit flow controls is needed when Ethernet switch is used (full-duplex mode), because each host now has its own collision domain and CSMA/CD no longer works for full-duplex Gigabit Ethernet. However congestion control and retransmission cannot automatically be performed, so frames may be lost due to buffer overflow inside the switch. For providing the explicit flow control in full-duplex Ethernet, a new optional sub-layer, called the MAC control, is added between the LLC sub-layer and MAC sub-layer. Typically, the responsibility of MAC control is generating, sending, receiving, and performing the PAUSE operation to hold off the transmitting device if congestion is detected. For instance, if receive buffers on a switch port are approaching saturation, the switch can concern a PAUSE frame to the transmitter so that the receiver buffers have tie to empty [5]. PAUSE function is based on implementing a simple stop-start flow control scheme. IEEE802.3x standard is mandatory for Gigabit Ethernet, optional for Ethernet and Fast Ethernet and also the PAUSE frame protocol is bi-directional and the use of PAUSE frame is not supported in half-duplex environment. As Fig. 2 shows PAUSE frames have higher priority over normal data frames and by using of predefined destination address and operation code (opcode) can define the frame as a PAUSE frame that is slightly different from the conventional data frame. The following information and Fig. 3(a) provide additional details of the PAUSE frame [5]: • The destination address (DA) field in a PAUSE frame continues either the unique DA of the station to be paused or the globally multicast address 01-80-C2-00-00-01 (hex).This multicast address has been reserved by the IEEE 802.3 standard for use in MAC Control PAUSE frames.
• Minimizing frame loss and latency avoidance (resulting • • • •
Flow Control Protocols
from error recovery at the higher layer protocols) Deadlock avoidance, Prevention of throughput degradation and loss of efficiency due to overload, Fair allocation of resources among competing user, and Speed matching between network and its attached users.
For Noiseless Channel
For Noisy Channel
Simplest
Stop-and-wait ARQ
B. IEEE802.3x and MAC Control Frame Piggybacking
Since the 1970's, Ethernet is the world's most comprehensive networking. The Gigabit Ethernet standards are well-suited with other Ethernet installations and it supports full-duplex as well as half duplex modes of operation. Basically once a hub is used (half-duplex mode), back pressure can be used to slow the transmission stations either by generating collisions using jamming signals or by sending the Preamble signal on the link without sending the start frame
Sliding Window ARQ
Stop & Wait
Sliding Window
Go-back-N ARQ
Fig. 1 Flow control protocols
Selective –reject ARQ
• Source address is similar to normal source address of conventional data frame. • MAC control frames are specified by a unique Type field identifier 0X8808, followed by two octets of MAC Control Option Code (opcode). • The MAC Control opcode field is set to 0X0001 (IEEE802.3x Annex31A) to indicate the type of MAC Control frame being used is a PAUSE frame. The PAUSE frame is the only type of MAC Control frame currently defined. • The value following the opcode is the MAC Control Parameters field contains a 16-bit value (0X0000 to 0XFFFF) defines the length of time in term of slot times that the transmitting device wants its partner to PAUSE. The times can be extended or aborted by sending another PAUSE frame. For instance, if an additional PAUSE frame arrives before the current pause-time has expired, its parameter replaces the current pause-time, so if the device wants to terminate the timer (of previous PAUSE frame) at its partner, it can send another PAUSE frame that contains a parameter of zero time. • The PAD field shown at the end of the PAUSE frame is 42-byte reserved field that is required to pad the length of the PAUSE frame to the minimum Ethernet frame size (64 bytes).
Preamble (7-bytes) Start Frame Delimiter (1-byte) Destination MAC Address (6-bytes) (01-80-C2-0001) or Unique DA Source MAC Address (6-bytes) Length/ Type (2-bytes) 802.3 MAC Control (88-08) MAC Control Opcode (2-bytes)= PAUSE (00-01)
Frame Check Sequence (4-bytes)
Frame Check Sequence (4-bytes)
(a). Structure of IEEE802.3x PAUSE frame
(b). Structure of the new proposed PAUSE frame
MAC Control Parameters (2-bytes) (00-00 to FFFF)
Fig. 3 Comparison of the IEEE802.3x PAUSE frame with new proposed PAUSE frame Data Frame
Data Frame 1
Data Frame 2
Data Frame 3
Data Frame 4
MAC Transmitter
PAUSE
Reserved (42-bytes) All Zeros
Preamble (7-bytes) Start Frame Delimiter (1-byte) Destination MAC Address (6-bytes) Only Unique DA Source MAC Address (6-bytes) Length/ Type (2-bytes) 802.3 MAC Control (88-08) New MAC Control Opcode (2-bytes)= New PAUSE A value in range of (0X0002 to 0XFFFF) MAC Control Parameters (2-bytes) (Flexible length of time depend on the Priority) Reserved (42-bytes) All Zeros
C. Vendors and IEEE802.3x Problems
Data Frame
PAUSE
Need to send PAUSE here
Data Frame Data Frame
Fig. 2 Priority of PAUSE frame insertion
So the addition of full-duplex mode to the Ethernet standard included an optional flow control operation known as MAC control frame (PAUSE). Unfortunately, Ethernet flow control is commonly misunderstood. It is not intended to address lack of network capacity, or end-to-end network issues. Properly used, Ethernet flow control can be a useful tool to address short term overloads on a single link and it helps to minimize frame loss and avoids latency resulting from error recovery at the higher layer protocols [6],[7].
Ethernet flow control does not support end-to-end flow control and usually it is handled by transport layer. IEEE 802.3x standard mostly requires a device to respond to PAUSE frames and does not require it to initiate PAUSE frames. That is the term that has been called symmetric flow control and asymmetric flow control and particularly since not all devices are capable of receiving, sending and/or responding to PAUSE messages, vendors had so many issues to say about that. Most vendors have different thresholds for initiating PAUSE frames. Some vendors declare that quality of service (QoS) and class of service (CoS) feature is a good way for congestion management, meanwhile most of them claim that QoS cannot operate properly if a switch sends PAUSE frames because this slows all the ports traffic, including any traffic which may have high priority. They point out that there are instances where sending PAUSE messages could potentially cause head-of-line blocking situation. For this reason, it can be argued that PAUSE frames should not be used in a network core where they could potentially cause the delay of traffic unrelated to the oversubscription of a link. The one area where flow control might be used properly is at the
edge of a network where Gigabit Ethernet attached servers are operating at less than wire-speed, and the link only needs to be paused for a short time, typically measured in nanoseconds. The use of PAUSE frames to manage this situation may be appropriate under such conditions. Since traffic management approach for optimization of the Internet becomes a challenging area of research, there are so many efforts to enhance the IEEE802.3x standard to overcome some of its drawbacks. D. Related Works Hsiaw [8] showed the criteria that have to be taken into account when developing a flow control scheme, such as endto-end and buffer-to-buffer flow control .He also declared that if IEEE802.3x PAUSE frame flow control is used, priority traffic will not get through or be delayed; however he proposed a MAC control type that its generation depends on silicon speed and it does not scale with network speed and as a result its buffer requirement is independent from network speed. After analyzing the flow control scheme for IEEE802.3x in compare with fiber channel credit based flow control, it is confirmed that 802.3x is not so good and would maybe cost more than the fiber channel method and that we should consider this as the Gigabit Ethernet flow control method. The asymmetric flow control was proposed as a requirement for end stations [9].The aim of this proposed was for the MAC client (or the MAC) to be able to throttle traffic independently. There is other similar idea for asymmetric flow control for Gigabit Ethernet in [10], where the only solution to the backup problem is stopping traffic at its source, and since it is not a good idea to have an end station slow down the whole network, asymmetric flow control should be allowed. Asymmetric Flow control would not be a good idea to have it between switches since it places an indefinable burden on the switch designer to add more buffering. An authoritative architecture for congestion management is combination of some mechanisms that 1) provide congestion detection and information feedback from the Ethernet interconnect that can be used to enable higher layer congestion management, 2) support rate control at Ethernet ingresses to avoid oversubscription of the lower layer interconnect resources, and 3) provide flow optimization and rate control on the links local to congestion to deal with transient congestion issues [11]. So basically they must also operate in harmony with upper layer congestion management. Authoritative architecture is a good method for congestion management but it does not provide a good flow control for the Ethernet traffic. Adaptive rate control (ARC) is another purposed for enhancing IEEE802.3x [12], where a congested receiver using a XUP/XDOWN feedback (punish/reward message) controls transmission rate and contester (transmitter). Simulation result of ARC shows that ARC provides better congested-latency characteristics (especially for high priority traffic) than
802.3x, in addition it has better throughput and less packet drop. III. STRUCTURE OF
NEW PROPOSED INTELLIGENT SWITCH
Usually Ethernet switch are connected to different LAN segments of a large Ethernet network, thus each port have connected to the variety traffic sources in the networks. The proposing of the switch is to regulate and forward the packet traffic intelligently. There are various steps necessary for a QoS packet to transverse from an intelligent switch’s ingress to egress port. However besides of some changes in feature of switch functionality (explained later); there are some assumption based on several predictable potential of switches which are listed below to make switches intelligent. Intelligent switches are able to distinguish between different types of traffic and finally provide better efficient flow control approach for high speed networks. • Switches have buffers for temporarily storing data that is received from source stations before it send to destination stations. • Switches are able to realize that buffers are under heavily loaded. Switch is considered congested in which its buffer memories have become filled further than a predetermined threshold. • Switches are able to identify the misbehaved source stations which may be contributing to the buffer loading. • Switches are able to producing PAUSE frame which maybe directed back to the identified misbehaved source stations. The above listed assumptions are almost same properties of most of the switches, so for regulating data flow and better congestion management in an intelligent switch by using a new PAUSE model, we need to modify or add more components to the switch. As Fig. 4 shows the block diagram of the proposed intelligent Ethernet switch architecture; once packets enter the switch go into the inquiring unit of port which extracts header information of the packets (e.g., Ethernet MAC address, VLAN tag (IEEE 802.3q), the frame PDU, and Internet protocol (IPv4 and IPv6) headers). This information is sent into a programmable monitoring table CAM within a switch engine unit as well as traffic policer unit. By using a packet classification, packets are classified into different prioritized flows according to policy. The policy is specified using fields in the header of a packet. So flows are specified by rules (fields’ specifications) applied to incoming packets [13]. Once the packet has been properly classified, it moves to shared memory. The transfer of the packet into shared memory is based on priority. Thus the higher-priority packet moves into shared memory first, with lower-priority packets following accordingly. Once in shared memory, the packet is scheduled to leave the switch using the egress scheduler. The priority in which the packet leaves is determined by the classification allocated to the packet. Higher-priority packets will leave the switch first with lower-
priority packets following subsequently similar to the input buffer situation. This scheme guarantees that high-priority packets take precedence over low-priority packets when traversing the switch. A. Proposed Enhancement to PAUSE Frame As one of our main goals was generating a new format of PAUSE frame that could handle drawbacks of IEEE802.3x PAUSE frame and provide a granularity; we inserted or modified some components of the switch to make it intelligent as much as possible to provide a PAUSE with efficient value of pause-time for each prioritized traffic. Sending a specified PAUSE with the predetermined pause-time, based on the search result of switch engine and traffic policer unit, for a specified prioritized traffic flow of misbehaving source is a novelty of the proposed method. This method can increase the general throughput of the network without affecting other traffic from other sources.
Assume that instead of sending several PAUSE frames (for providing an effective pause-time) consequently that cause bandwidth consuming and throughput reduction, by sending only one PAUSE frame that is carrying the predetermined pause-time (to overcome overflow problem of switch-buffer); we can offer a better throughput. Once the switch buffer starts to become congested, by analysing the latest updated information of the CAM monitoring table and defining some priority for each traffic source based on their threshold values (which is depend on their CoS values), PAUSE generator creates PAUSE frames and send it to the misbehaved sources. Since we need to modify some fields of existing PAUSE frame, as it is shown in Fig. 3(b), the new proposed components together offer a new feature for switch to generate PAUSE frames intelligently based on the traffic priority. As mentioned in section 2.2, the only opcode that is currently defined by IEEE802.3x is the PAUSE opcode (0X0001) and its associated 2 bytes parameter field which called pause-time parameters as default values in hardware chip, but these values can be overridden by a user.
Ethernet Transceivers
MAC
Packet Decoder
FIFO
Queue Handler
Hashing Unit
Classifier Unit
Programmable Monitoring Table (CAM)
Traffic Conditioner Unit
Management unit
PAUSE Generator Unit
VLAN Table
Memory Manager
Switch‐Shared Memory Buffer
Buffer Arbiter
Fig. 4 Block diagram of the proposed intelligent Ethernet switch architecture
Scheduler
Traffic Policer Traffic Profiles
Other Physical Component (CPU,PCI/DMA)
Switch Engine
MAC Table
S/P Conversion
Therefore in first aspect of our proposed PAUSE we can use the reserved opcode values (0X0002 to 0XFFFF) to assign to the new PAUSE function, and in harmony with the protocol, only a single opcode may be associated with a particular MAC control frame; however unsupported MAC control frames are to be discarded at the MAC control sub-layer. For the parameter field the original pause-time quantum is equivalent 512 bit times in which a bit time is 1ns for Gigabit Ethernet, 0.1 ns for 10 Gigabit Ethernet, and 0.01 ns for new 100 Gigabit Ethernet where it was fixed values regarding to the speed of NIC. But in our new proposed model the NICspeed is not the only parameter that has role for defining a pause-time parameter field. It makes decision for amount of pause-time using PAUSE generator unit within the traffic policer based on flexible information of a packet that have different weights and have been captured in the CAM monitoring table (refer to Fig. 4).
IV. DISCUSSION AND CONCLUSIONS In this paper we have proposed an enhancement to the existing IEEE802.3x standard for providing a better flow control based on the traffic priority in full-duplex Gigabit Ethernet. We proposed a new intelligent Ethernet switch architecture that has some extra or modified units compared to the normal switch to be able to make decision based on the collected information of monitoring table and generate a new PAUSE frame with a predetermined pause-time. In our proposed model PAUSE frame generated automatically and transmitted upon the occurrence of a threshold condition such as full or near full detected at the switch buffer. One of the problems that the previous methods faced with was defining a suitable pause-time. For instance, a long pausetime may reduce the throughput due to long waiting of sources. If it was too short, then the switch needs to send another PAUSE frame and maybe several PAUSE frames which cause to wasting bandwidth and reducing the throughput of the network. But in the current proposed intelligent switch we have transmission of only one PAUSE frame with a predetermined pause-time which is an accurate value. Moreover, there is no need to send pause-time=0 for disabling the pause-state in the transmitter stations, and a transmitter automatically after finishing its pause-time changes to the normal-state of data frames transmission. Because the cost of CPU and chips are reducing over time so the cost to incorporate our method into the development of the intelligent switch is no longer expensive. However by using our method we could regulate data flow and manage congestion on the switch based on traffic priorities better than the previous methods. Finally, we could achieve to our goal that was enhancing the IEEE802.3x standard. Our future work
will be focusing on designing more solid model that could be applied on core switches and undertaking some further testing for its efficiency.
ACKNOWLEDGMENT Our thanks to the Universiti Sains Malaysia and NAv6 Center for the fellowship and grant. REFERENCES [1] L. Pouzin, “Flow control in Data networks-Methods and tools”, in Proc. International Conference Computer Communication, Toronto, Canada, Aug.1976. [2] B.A., Forouzan, Data Communications and Networking, 4th ed., McGraw-Hill Higher Education, ISBN:13978-0-07-296775-3, Chapter 11, P: 31.2006. [3] M., Gerla and L., Kleinrock, “Flow control: A Comparative Survey”, in Proc. IEEE Transaction on Communications, Vol. COM-28, No.4, P: 553- 574, April1980. [4] I.P. Kaminow, and T. Li, Optical Fiber Telecommunications IV-A: Components, Publisher: Elsevier Inc. Academic Press, ISBN: 978-0-12395172-4, P: 532-533, 2002. [5] G. Held, Ethernet Networks: Design, Implementation, Operation, Management, 4th ed., Publisher: John Wiley and Sons Inc, ISBN: 978-0470-84476-2, Chapter 6, P: 339-342, 2003. [6] T. Clark, IP Sans: A Guide to Iscsi, Ifcp, and Fcip Protocols for Storage Area Networks, ISBN: 0201752778, Published 2001, Addison-Wesley, Chapter 3, P: 61-62. [7] D. Stephen, F. West, T. Eric and D.W. Alderrou, “Automatic MAC control frame generating apparatus for LAN flow control”, United States Patent 6098103, 1997. [8] Hsiaw, H., and Nelson, C., Flow Control for Gigabit Ethernet, MMC Networks, Inc. Available : http://grouper.ieee.org/groups/802/3/z/public/presentations/nov1996/HF agenda.pdf [9] R. Seifert, The Switch Book: The Complete Guide to LAN Switching Technology, John Wiley and Sons Inc., Chapter 8, ISBN: 978-0-47134586-2, July 2000. [10] M., Baldi, and P., Nicoletti, Switched LAN, McGraw-Hill Companies, Chapter 2, ISBN: 88-386-3426-2, Italy, 2002. [11] M. Wadekar, “Enhanced Ethernet for Data Center; Reliable, Channelized and Robust”, in Proc. 15th IEEE workshop on local and metropolitan area networks, 2007. [12] G. McAlphine, M. Wadekar, G. Tanmay, A. Crouch, D. newell, “An Architecture for Congestion Management in Ethernet Clusters”, in Proc. 19th IEEE International Parallel and distributed processing Symposium (IPDPS’05) Anderson, R.E. Social impacts of computing: Codes of professional ethics. Social Science Computing Review, Pages: 453-469, 1992. [13] X. Sun, K. S. Sartaj, and Q. Z. Yiqiang, “Packet Classification Consuming Small Amount of Memory”, in Proc of IEEE/ACM transactions on Networking, Vol.13, No.5, October 2005.